IdeaHeap https://ideaheap.com Percolating Insight from Omaha, NE Fri, 19 Jun 2020 03:38:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://ideaheap.com/wp-content/uploads/2016/04/IdeaheapLogo-2.png IdeaHeap https://ideaheap.com 32 32 Writing functional code in Java https://ideaheap.com/2016/04/writing-functional-code-in-java/ https://ideaheap.com/2016/04/writing-functional-code-in-java/#respond Sat, 30 Apr 2016 03:42:31 +0000 http://ideaheap.com/?p=861 So, with the onslaught of java 8, I, like many, was happy to believe java was finally a functionally compliant language. This is only somewhat true. Unfortunately, breaking old habits die hard in a language that makes them easy.

Why do we use mutable datatypes by default?

Mutability is a performance improvement, and should be treated as such. There are already some really great immutable data structure libraries out there for java, specifically guava, but I really can’t get behind some of the difficult interfaces for creating a changed version of an object that they offer. Because of this, I’ve created a new library to try and start making this process easier. I’m calling it barely-functional.

Lists

I love immutable lists. Whenever I don’t use immutable lists, that’s when I get into trouble, so here it is, I wrapped the calls to guava’s immutable list builder to do the lifting for me. I’m sick of typing ImmutableList.of(), and I want to use ImmutableMap.of(), so this is just no good.

Here it is, I’m drawing a line in the Java sand.

List ints = list(1, 2, 3, 4)); // [1, 2, 3, 4]
List moreInts = push(ints, 5); // [1, 2, 3, 4, 5]
List evenMore = unshift(ints, -1, 0); // [ -1, 0, 1, 2, 3, 4, 5]
List firstNumberIsTwo = assoc(ints, 0, 2); // [ 2, 0, 1, 2, 3, 4, 5]
List oneRemoved = remove(ints, 2); // [1, 2, 4]
List oneInserted = insert(ints, 1, 6); //  [1, 6, 2, 3, 4]

Let’s also actually make it easy to get new lists with changed elements! Why do we make this hard?

Maps

I love guava maps! But once again, I am sick of typing ImmutableMap.of()! And i want arbitrary numbers of entries! And I want those entries to be succinct! So here’s the plan, we’re taking the letter e.

Map<string, integer=""> vals = map(e("a",1), e("b", 2)); // { "a" : 1, "b" : 2 }
Map<string, integer=""> moreVals = assoc(vals, e("c", 3)); // { "a" : 1, "b" : 2, "c" : 3 }
Map<string, integer=""> changeVals = assoc(vals, e("a", 3)); // { "a" : 3, "b" : 2, "c" : 3 }
Map<string, integer=""> changeVals2 = assoc(vals, "b", 3); // { "a" : 3, "b" : 3, "c" : 3 }
Map<string, integer=""> changeVals3 = dissoc(vals, "b"); // { "a" : 3, "c" : 3 }
</string,></string,></string,></string,></string,>

Sets

This is just like everything else.

Set ints = set(1, 2, 3); // (1, 2, 3)
Set ints2 = assoc(ints, 4, 5); // (1, 2, 3, 4, 5)
Set ints3 = dissoc(ints, 2, 3); // (1)
Set union = union(ints, ints2); // (1, 2, 3, 4, 5)
Set inter = intersection(ints, ints2); // (1, 2, 3)
Set xor = xor(ints, ints2); // (4, 5)
Set not1 = not(ints2, ints); // (4, 5)
Set not2 = not(ints, ints2); // ()

Find out more on GitHub.

]]>
https://ideaheap.com/2016/04/writing-functional-code-in-java/feed/ 0
Logging context with Mapped Diagnostic Contexts https://ideaheap.com/2016/04/logging-context-with-mapped-diagnostic-contexts/ https://ideaheap.com/2016/04/logging-context-with-mapped-diagnostic-contexts/#comments Thu, 14 Apr 2016 03:07:39 +0000 http://ideaheap.com/?p=815 Since upgrading my log acquisition platform to be metadata-aware (it’s effectively syslog messages + log flume + “other stuff”), I’ve really started to take advantage of the message diagnostic context available with major logging frameworks for tracking a specific complex interaction. Below is an example of what I’m talking about:

public void myThingIDo(String targetUser, UserPreferences morePreferences) {
    MDC.put(TARGET_USER, targetUser);
    callSomeFunction(targetUser, morePreferences);
}

// Imagine a lot of other stuff going on. callSomeFunction eventually leads to the functions below.

private void checkTheThing() {
    // Imagine this being somewhere very different from here
    logger.info("trying something else");
}

private void checkAccess(String targetUser, String targetResource) throws AccessNotGrantedException {
    // And even different still
    logger.error("User {} cannot access {}", targetUser, targetResource);
}

Imagine getting the call from this user saying they can’t access something that they totally could before your last production push. Now imagine there are thousands of users on this app. Now imagine finding this error message pretty quickly, but then trying to sift through those logs to find all related messages for this particular user in your system.

With proper usage of the MDC, you can actually go find those logs which created this error in the first place without having to manually determine which messages are relevant and which are not.

Dealing with multiple threads

The one problem with the MDC, especially if you’re using parallel streams in java’s streams API, is that it depends on thread local data. Because of this, the use of a static helper class for tracking context is sometimes required. Here’s one below:

import org.slf4j.MDC;

import java.util.Map;

public class MdcSnapshot {
    private final Map mdc;

    private MdcSnapshot() {
        this.mdc = MDC.getCopyOfContextMap();
    }

    public static MdcSnapshot getCurrentMdc() {
        return new MdcContext();
    }

    public void populateMdc() {
        MDC.clear();
        if (mdc != null) {
            MDC.setContextMap(mdc);
        }
    }
}

This can be used as follows:

private void doThingsReallyFast(List things) {
    MdcSnapshot context = MdcSnapshot.getCurrentMdc();
    things.parallelStream().forEach((thing) -> {
        context.populateMdc();
        doTheThing(thing);
    });
}

It’s really handy!

After I’ve started using the MDC in my code, I’ve stopped having to add all that metadata to the log message itself just so I could find that line. I’m now just worried about writing a log message that makes sense.

Depending on your logging framework, to get the MDC data to show up in the log message itself, you will need to use the %X pattern (doc available for log4j2).

]]>
https://ideaheap.com/2016/04/logging-context-with-mapped-diagnostic-contexts/feed/ 3
JADE Setup for Beginners https://ideaheap.com/2015/05/jade-setup-for-beginners/ https://ideaheap.com/2015/05/jade-setup-for-beginners/#respond Wed, 06 May 2015 04:51:58 +0000 http://www.ideaheap.com/?p=761 Introduction

Prerequisites

This howto is written assuming that you have basic understanding of Java and that you are capable of downloading Maven and get it on your command line. You may also want to choose an IDE (Eclipse or Intellij are both good choices).

For more information about what JADE is, visit Their main website. It’s a messaging framework and a collection of classes that allow the rapid creation of agent-based applications.

What’s Agent-based programming?

Agent based programming is a paradigm where modules of state and code work together to create an independent module that can “observe” the world around it and “output” actions into this world. A great example of a simple agent is an event-based service, where it is listening on a port and, once it has received input, reacts to this.

The model feels very natural to a person familiar with object oriented programming. JADE does a good job of enforcing this paradigm and, if you write a well-designed JADE application, your code will be relatively performant, as JADE is non-blocking, and uses only one thread per agent.

Environment Setup

Maven configuration

Maven is a build system that takes care of bringing in dependencies, and allows you to write how to build your application in a declarative manner. It has a learning curve, but for the sake of this tutorial, you can use the code below with the commands given, and you’ll probably only have to worry about adding dependencies as you find things.

Your new project can start with this maven configuration (pom.xml in the example source):



    4.0.0

    com.ideaheap.tutorials
    jade-tutorial-agent

    
        
            tilab
            http://jade.tilab.com/maven/
        
    

    
        
            
                maven-compiler-plugin
                
                    1.8
                    1.8
                
            
        
    
    
        
          jade-main
          
            
              
                org.codehaus.mojo
                exec-maven-plugin
                1.3.2
                
                  jade.Boot
                  
                    -conf
                    src/main/resources/jade-main-container.properties
                  
                
              
            
          
        
        
          jade-agent
          
            
              
                org.codehaus.mojo
                exec-maven-plugin
                1.3.2
                
                  jade.Boot
                  
                    -conf
                    src/main/resources/jade-agent-container.properties
                  
                
              
            
          
        
    

    
        
            com.tilab.jade
            jade
            4.3.3
        
        
            com.tilab.jade
            jade-test-suite
            1.12.0
        
        
            org.slf4j
            slf4j-api
            1.7.10
        
        
            org.slf4j
            slf4j-simple
            1.7.10
        
    

In this POM, we have included a logging framework (I don’t want to bind everything to JADE), and the tools needed to include JADE.

The build profiles specify two different ways to start jade using two configuration files, which will be explained in the next section.

Maven Calls

Once this is set up, your application will be able to start using the following two commands:

mvn -Pjade-main exec:java
mvn -Pjade-agent exec:java

The first command starts our profile with a configuration that creates a default main container with no custom agents. The second command starts our profile, which starts a headless container running any custom agents specified in that configuration file.

Container Configurations

The two configuration files mentioned in the exec plugin configurations define how to start the jade container.

Jade Main Container Configuration

JADE requires a main container to be running in order to be up, but it also supports the creation of additional containers to run agents in. A container can run many agents, and containers can be on many computers, allowing a distributed architecture. There can be only one main JADE container in a platform, but a backup container can be running that will take over in the event the original main container fails.

The first configuration file is located in src/main/resource/jade-main-container.properties. It contains the following:

gui=true
host=localhost
port=10099
local-port=10099
jade_domain_df_autocleanup=true

This will set up a main container that has a gui available for debugging and controlling your jade installation. It’s configured to run on port 10099 with this.

The autocleanup line is a setting for jade’s df agent, which is reponsible for knowing where different agents are that are capable of communicating about different message types. It doesn’t clean up listing for killed agents by default, but with this configuration, it does.

Jade Agent Container Configuration

The agent container is designed to be unobtrusive and easy to kill. It is headless, and connects to the main container. Below is our agent container configuration.

agents=\
    sample-agent-1:com.ideaheap.tutorial.jade.agents.sample.SampleAgent(sample-agent-2);\
    sample-agent-2:com.ideaheap.tutorial.jade.agents.sample.SampleAgent(sample-agent-1)
port=10099
host=localhost
main=false
no-display=true

Because Maven has a little trouble cleaning up un-stopped projects, if you want to create a separate project only for starting jade-main, that is fine. Having this run as a separate executable lets you keep its gui active while you are starting and killing the other jvm started with the jade-agent profile.

Creating a sample agent

This tutorial doesn’t just give an example agent, it also is a recommendation on how to go about architecting your agent-based program. Agents naturally create modular code but clearly defining how modules should communicate – through messages between agents. I have opted for the following directory structure:

base.package.name.agents.agentName.AgentNameAgent
                                  .behaviors.Behavior

Each agent follows this packaging setup. We’ll now go through each of these things.

The Agent Class

In the directory structure above, this class would be base.package.name.agents.agentName.AgentNameAgent. I come from a background of very long names, and memory is cheap these days, so this does not bother me. This package scheme creates a natural encapsulation for all the helper classes that show up as you build out an agent.

The AgentNameAgent is responsible for defining an agent configuration, and the code written inside an agent follows the design of “Inversion of Control” or “Dependency Injection” without the use of a framework (like Spring).

In the example project, this was named the “SampleAgent”. Below is the code for this agent:

public class SampleAgent extends Agent {
    private static final Logger logger = LoggerFactory.getLogger(SampleAgent.class);

    @Override
    public void setup() {
        final String otherAgentName = (String) this.getArguments()[0];
        addBehaviour(new IncrementBaseNumber(this, otherAgentName));
    }

    @Override
    public void takeDown() {
    }
}

This is all that should be in an agent class, save registry code (which will be covered in the next tutorial). The Agent is responsible for wiring together the classes that make up its behaviors.

The Behaviour Class

The behaviour class is where all the action is. There are some very important notes to this class: the most important of which is that everything done in this class must be non-blocking (i.e. don’t go making long-running calls to databases or external services on the main thread here). Each agent has one thread that multiplexes EVERY behaviour.

This sort of requirement basically guarantees that you’ll be writing a state machine, because complicated interactions will naturally lead to this. The sample behavior is a three part behavior that includes sending, receiving, and replying to a message.

package com.ideaheap.tutorial.jade.agents.sample.behaviours;

import com.ideaheap.tutorial.jade.tools.ContainerKiller;
import jade.core.Agent;
import jade.core.behaviours.Behaviour;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import static com.ideaheap.tutorial.jade.messages.MessageBuilder.inform;
import static com.ideaheap.tutorial.jade.messages.MessageReceiver.listen;

/**
 * Created by nwertzberger on 4/23/15.
 */
public class IncrementBaseNumber extends Behaviour {
    private static final Logger logger = LoggerFactory.getLogger(IncrementBaseNumber.class);
    private static final int MAX_INCREMENT = 10;

    private enum State {
        START_INCREMENTING, CONTINUE_INCREMENTING, STOP_INCREMENTING
    }

    private final Agent agent;
    private final String otherAgentName;
    private State state;

    public IncrementBaseNumber(Agent agent, String otherAgentName) {
        this.agent = agent;
        this.otherAgentName = otherAgentName;
        this.state = State.START_INCREMENTING;
    }

    @Override
    public void action() {
        switch (state) {
            case START_INCREMENTING:
                startIncrementing();
                break;
            case CONTINUE_INCREMENTING:
                continueIncrementing();
                break;
            case STOP_INCREMENTING:
                stopIncrementing();
                break;
            default:
                block();
        }
    }

    private void startIncrementing() {
        agent.send(inform().toLocal(otherAgentName).withContent(1).build());
        state = State.CONTINUE_INCREMENTING;
    }

    private void continueIncrementing() {
        listen(agent, this).forInteger((toIncrement) -> {
            logger.info("Recieved " + toIncrement);
            toIncrement++;
            agent.send(inform().toLocal(otherAgentName).withContent(toIncrement).build());
            if (toIncrement > MAX_INCREMENT) {
                state = State.STOP_INCREMENTING;
            }
        });
    }

    private void stopIncrementing() {
        listen(agent, this).forInteger((toIgnore) -> {
            logger.info("I'm just going to ignore this: " + toIgnore);
            ContainerKiller.kill(agent);
        });
    }

    @Override
    public boolean done() {
        return false;
    }
}

Class Breakdown

action()

This class is run by a small state machine defined here:

    @Override
    public void action() {
        switch (state) {
            case START_INCREMENTING:
                startIncrementing();
                break;
            case CONTINUE_INCREMENTING:
                continueIncrementing();
                break;
            case STOP_INCREMENTING:
                stopIncrementing();
                break;
            default:
                block();
        }
    }

This method, which is a required part of any class implementing Behaviour, is called over and over again. You may be wondering what that block() method is doing there after I had emphasized how important it is for things to be non-blocking. In this case, block() is actually a signal to the containing agent that it should not call this action again until after it has a reason to think things have changed. This is most often the receipt of a message. In this way, block() is actually non-blocking! This also means that any code put after a block() is still going to run immediately.

startIncrementing()

Our behavior begins with a state that creates and sends a message off to a designated receiving agent. The message is built to send just the number 1 to the agent we sent in as a part of our constructor. The builder for this will also be discussed.

    private void startIncrementing() {
        agent.send(inform().toLocal(otherAgentName).withContent(1).build());
        state = State.CONTINUE_INCREMENTING;
    }
continueIncrementing()

The continueIncrementing state utilizes a wrapper that checks for a pending message, and if it exists, parses the content of that message into an integer and calls the defined callback:

    private void continueIncrementing() {
        listen(agent, this).forInteger((toIncrement) -> {
            logger.info("Recieved " + toIncrement);
            toIncrement++;
            agent.send(inform().toLocal(otherAgentName).withContent(toIncrement).build());
            if (toIncrement > MAX_INCREMENT) {
                state = State.STOP_INCREMENTING;
            }
        });
    }

The callback is set to send the integer it was given as a message to the agent defined in its constructor.

stopIncrementing()

The stopIncrementing class showcases options available to you if you are sure that your experiment is done. You can just kill the agent that was doing the work, you can just set your behavior’s done() method to return “true”, or you can do what this example shows: kill the entire container.

    private void stopIncrementing() {
        listen(agent, this).forInteger((toIgnore) -> {
            logger.info("I'm just going to ignore this: " + toIgnore);
            ContainerKiller.killContainerOf(agent);
        });
    }

Conclusion

With this basic setup, you should be able to create a simple group of agents that can communicate with each other inside of a maven managed project. The *listen* and *inform* functions you have seen are part of a small set of classes in this project which encourage a more fluent-style use of JADE. They are just the beginning, and will be built out considerably.

All code is available for free on GitHub.

Challenge: this example only has two agents communicating with each other for incrementing. Try making a ring of three. What happens if it’s an agent “love triangle”, and two agents are set to send messages to the same agent?

]]>
https://ideaheap.com/2015/05/jade-setup-for-beginners/feed/ 0
log4j2 + rsyslog for the client side of centralized logging https://ideaheap.com/2015/03/log4j2-rsyslog-logging/ https://ideaheap.com/2015/03/log4j2-rsyslog-logging/#comments Tue, 17 Mar 2015 01:05:01 +0000 http://www.ideaheap.com/?p=729 Introduction

So right now, the absolute coolest setup is the ELK (Elasticsearch, Logstash, Kibana) stack, and as far as I can tell, that community is growing and doing very cool things. However, you may not feel like installing LogStash on all your servers, especially if they are constrained for memory or older. Luckily, rsyslog is pretty common, and, so long as you are running at least version 5, you can use a relatively sane setup shown by the wonderful men and women of Loggly. They know a ton about logging!

This will get logs off the box. I prefer to buffer logs on the server so that I can minimize the pain of losing logs if my reciever goes down. For this reason, I also really think using TCP all-the-way is worth the overhead. I also like to use a localhost relay instead of firing them directly from applications.

I also write a lot of Java. As a Java guy in my day job, I am wildly impressed at how well rsyslog handles log load.

Logging Formats

A logging format that is automatically supported but structural in nature turned out to be the one specified by RFC5424. Using this format in rsyslog ends up looking like this in version 5 rsyslog (version 6+ is described here). There is a whole spec available here as well.

$template Rfc5424Format,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%"

rsyslog

We need an endpoint that can handle huge messages, is tcp all the way, buffers messages if it can’t write out, and can fail over.

My friends at Loggly have started a great base configuration with this actionQueue. This will help buffer any intermittent network issues that may occur. It will not completely mitigate them, however. ActionQueue (via loggly)

We want a TCP reciever as well, so we can use this: IMTCP.

Now that we have TCP, we want to fail over: failover. I don’t however, want to shoot them locally if we can’t fail.

Combining this together with our rfc5424 format gives the following:

# Thanks, Loggly, this is excellent
$WorkDirectory /var/spool/rsyslog
$ActionQueueFileName fwdRule1 
$ActionQueueMaxDiskSpace 1g
$ActionQueueSaveOnShutdown on
$ActionQueueType LinkedList
$ActionResumeRetryCount -1

# Start up the tcp relay
$MaxMessageSize 64k # BEFORE imtcp
$ModLoad imtcp
$InputTCPMaxSession 200
$InputTCPServerRun 514

# RFC5424! Well-structured!  If I had more guts, I'd drop %msg%
$template Rfc5424Format,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%"

*.* @@primary-syslog.example.com
$ActionExecOnlyWhenPreviousIsSuspended on
& @@secondary-1-syslog.example.com
& @@secondary-2-syslog.example.com
$ActionExecOnlyWhenPreviousIsSuspended off

Log4j2

After digging into log4j configurations, it became apparent that log4j2 made a lot more sense. I have ripped off the following configuration by combining some of this doc with StackOverflow.

Also, reading up on what RFC5424 requires will encourage you to go make a company id with IANA here.

If you aren’t already, PLEASE make sure you’re using slf4j-api bindings in your application. It makes migrating between logging solutions a breeze. Today, log4j2. Tomorrow, logback.



    
        
            
                
                
                
                
                
            
        
    
    
        
        
            
        
    

With that, you have log4j2-enabled syslog all firing into localhost, which then can forward to any endpoint you like (like LogStash, Apache Flume, or Loggly).

]]>
https://ideaheap.com/2015/03/log4j2-rsyslog-logging/feed/ 5
The science of top-down debugging https://ideaheap.com/2015/01/the-science-of-top-down-debugging/ https://ideaheap.com/2015/01/the-science-of-top-down-debugging/#comments Tue, 20 Jan 2015 03:37:38 +0000 http://www.ideaheap.com/?p=698 I’ve found that the biggest difference between effective debugging and ineffective debugging is the process used to root out a problem. Inexperienced people too often will hit their head against a wall without even knowing how to take the next step for solving a problem. Their problem is not following an age old technique that is responsible for lifting mankind out of the dark ages: the scientific method!

In any situation, I (and literally centuries of human progress) claim the following steps will eventually lead to an answer, or at least a reason why you can’t figure out the answer. The best part? It’s recursive! Just ask physicists.

Steps for top-down debugging

The trick to debugging a problem is to use the power of deduction. Start with research, create a hypothesis, find a way to prove my hypothesis, execute the “experiment”, and then refine the hypothesis. For debugging, I am looking for the action that disproves my hypothesis, as I start with the assumption that “if this system were working, it would {insert action}”.

Find an interface

An interface takes many forms. It may be the User Interface, like a webpage, it may be an api, it may be a single line of code. It is something with a definable surface area on or near the thing you’re trying to debug. Hopefully it’s a function, because you might be able to wrap a unit test around it and ensure that nobody ever checks there again.

Common interfaces include:

  • user interfaces
  • function calls – the best kind
  • api endpoints
  • system calls (try “strace -ff -t -s 1000 -p {process id}” to experience the matrix)
  • poorly structured blocks of code
  • state variables?
  • Writing out to a database
  • Writing out to files?
  • Just wires. lots of wires.

Define expected behavior

For this part, you may need to ask around, use intuition, or check documentation, but you need to understand what “working” looks like. If you don’t know what SHOULD happen, you don’t stand a chance of fixing the problem.

Find a way to verify expected behavior at that interface

This can be anything from a unit test (preferred / even becomes easier the deeper you go) to manually looking at the output. This is sometimes very tricky and will lead you to learning many new tools. If you don’t know a unit testing framework, code katas, like this are a great way to learn by example. Google for “prime factors kata in {language}” to find one in your language. Learn to do this by yourself.

In order to find your interface to test, a debugger is often one of the most precious tools you can use. All major browsers have one (hit f12), and every “real language” has one as well. Learn one. It’s going to help you. They are often built into IDE’s. If you can’t use a debugger (maybe the issue really only happens in production!), you may be relegated to using loggers, or worse, printf. Find something that can get the job done.

Here is a set of tool classes that you should have some familiarity with:

  • A unit testing framework
  • A mocking framework
  • A logging framework
  • A step debugger
  • netcat / telnet / some sort of tcp-based communication tool
  • A profiling tool
  • curl / wget / some http-based network tool (for web)
  • the built-in browser debugger (for web)

If possible, make that work an “investment” that can continue to improve the codebase; make it an automated test.

If there’s a difference, dig into that interface.

At the level you are testing, you will find something that breaks your hypothesis. This is the chunk of code to investigate next. Open it up, and examine it. Generate hypotheses, and repeat.

If everything looked like it worked, think outside the box

Something obviously didn’t work, otherwise you would not be investigating this!

Think about the resources being consumed. Are any of these being constrained?

Disk is finite. So is memory, cpu, and even the number of threads you can use. Are any of these limits being reached?

  • Compare disk usage to mount availability with “df -h”
  • Check disk utilization, file I/O, and long-term cpu usage with “sar”
  • Compare cpu usage with number of cpu’s with “w” and “cat /proc/cpuinfo”. Sometimes you’ll be surprised to find out that a cpu is not running as fast as you would think.
  • Check memory usage with “free -m” (also, please read linuxatemyram.com)
  • Check file handle counts vs available file handles using “lsof | wc -l” and “ulimit -a”
  • Check thread counts with “ps uxH | wc -l” and “ulimit -a” (again)

If any one of these things are constrained, you are in real trouble, and, if any thing else, you should probably fix that.

Is this a timing-based issue? What happens when I add a “gate” around the code in question?

Timing-related issues are notoriously hard to solve, and they are due to an assumption being wrong about the order of how things happen. A common symptom of a timing-related issue is if adding log messages makes the problem goes away. A great way to debug threading issues is to see if it still happens when you get rid of threading issues.

A technique I employ when I run into these issues in java is the addition of a synchronized block:

synchronized void questionablyThreadsafeFunction() {
   // begin code that is blatantly not threadsafe
   this.thingsThatArentThreadSafe++; // totally atomic ;P
}

This sort of investigation is usually done after I have a test that reliably re-creates the issue, like by calling that function on the same instance of a class thousands of times on hundreds of threads.

Additional Tweaks

Use the path of least surprise

Just because you “know” your code is flawless does not mean the first place you should check is for bugs in your language’s implementation of “string”. That doesn’t mean the bug isn’t there, but you should focus on the areas that are most likely to be the source of your error. Protip: It’s usually your fault.

Reducing Cycle Time

I am also looking for the test that has the highest “effectiveness / cycle time” quotient. I try to find ways to test on my desktop, then in dev, then in test, and then (if all else fails) in production. A lower cycle time is the key to productive debugging. In my experience, a totally out there bug will take you to a search depth of about six to find a root cause. If your cycle time is greater than an hour, there is likely enough time for you to spend an hour learning a new skill to reduce that cycle time that will pay for itself on this bug alone.

The complexity of this technique

From a computational complexity standpoint, this code has a worst case search depth of K \cdot log(N), with K being the maximum number of “chunks” you divide your problems and subproblems into at each layer or api, and N representing the total number of things you might have to check (i.e. lines of code). Even terrible code can at least be split into parts with very large log statements covering the current state of things. It’s logarithmic.

Wrapping it up

Following these steps will inevitably lead to the solution. In your career, you will often be surprised at which interface fails you! Sometimes the filesystem will be full, causing a logger to hang. Sometimes /etc/host.conf will no longer be resolving a domain name. Very occasionally, the compiler itself will have failed you. Anything and everything can and will break once in a while. Regardless of the problem, if you are following these steps, you will find a root cause, or at least the reason why you can’t find a root cause (e.g. you don’t have proper permissions to test a network interface).

]]>
https://ideaheap.com/2015/01/the-science-of-top-down-debugging/feed/ 2
The Bellman Equation https://ideaheap.com/2014/07/bellman-equation/ https://ideaheap.com/2014/07/bellman-equation/#respond Thu, 10 Jul 2014 06:03:04 +0000 http://www.ideaheap.com/?p=682 Back in college, I learned about a tool called the “Bellman Equation”. It’s very nice because it turns into a local calculation for each node, and you only need to know about your neighbors’ previous values. It’s parallelizable.
(Do every node in parallel, sync, repeat, until convergence).

The only gotcha for using a bellman equation is that you have to be able to know what state you are in, so it’s only appropriate for problems which are fully observable.

Anyways, It’s easy, and it’s a great way to determine expected utility. If you want to know what the utility out to a specific horizon would be, just run it that many times.

Code example

So here we want to see the expected utility given:

  • An action/node policy (this is just a map of “do action x when in node y”)
  • A graph that describes states and contains transition probabilities (represented as “p” below) based on the action chosen in a given state
  • Two optional parameters of the previous calculated utility and the decay for future utility
def bellman_values(policy, graph, util = {}, decay = 0.0):
  new_util = {}
  for node in graph.nodes:
    new_util[node] = node.reward(policy[node])
    for neighbor, p in graph.transitions(node, policy[node]):
      new_util[node] += p * util.get(neighbor, 0.0) * decay
  return new_util

If you can use that equation, it will converge eventually, so you could check for convergence based on biggest change, or you can run it T times, and have an expected utility for horizon T.

]]>
https://ideaheap.com/2014/07/bellman-equation/feed/ 0
Tour de Heuristics: Joint Equilibrium Search (JESP) https://ideaheap.com/2014/05/tour-de-heuristics-joint-equilibrium-search-jesp/ https://ideaheap.com/2014/05/tour-de-heuristics-joint-equilibrium-search-jesp/#respond Thu, 22 May 2014 20:05:04 +0000 http://www.ideaheap.com/?p=645 The final stop on this heuristics tour, and the last stop for our overview of Cooperative Decision Making is Joint Equilibrium Search. This technique starts with some pre-set horizon T policies for each agent, and then cycles through each agent so that it may tweak its behaviors to maximize the response with all other policies held fixed. It continues this cycle until all trees have stabilized.

The Algorithm

def joint_equilibrium_search(policy):
    curr_policy = policy
    do:
        prev_policy = curr_policy
        for agent in get_all_agents() :
            calculate_expected_values(curr_policy)
            curr_policy[agent] = get_best_response(curr_policy, agent)
    while prev_policy != curr_policy
    return curr_policy

Well, That’s Nifty

It seems like a slam dunk, but there are a few gotchas. From an optimality standpoint, it is only locally optimal. In order to try and mitigate this, there needs to be some sort of algorithm before this that helps select somewhat optimal trees before coming in.

]]>
https://ideaheap.com/2014/05/tour-de-heuristics-joint-equilibrium-search-jesp/feed/ 0
Tour de Heuristics: Memory-Bounded Dynamic Programming https://ideaheap.com/2014/05/tour-de-heuristics-memory-bounded-dynamic-programming/ https://ideaheap.com/2014/05/tour-de-heuristics-memory-bounded-dynamic-programming/#respond Thu, 22 May 2014 19:43:27 +0000 http://www.ideaheap.com/?p=643 Memory bounded dynamic programming is another technique offered in Cooperative Decision Making. This is the first sub-optimal heuristic that is brought up. It takes the same techniques as seen before with an exhaustive backup, but at each stage, only a specific number of trees remain at the end of these operations. Due to this, the entire algorithm is bounded by the selected memory boundary. It can run efficiently, but not optimally.

It combines exhaustive backup for actual tree generation, but, like A*, it uses a heuristic to estimate potential tree value. It then trims the number of trees until only the top N trees remain.

The Algorithm

def memory_bound_dynamic_programming(max_trees=100, max_depth, heuristic):
    depth = 0
    policy[0] = []
    while depth < max_depth:
        policy[depth + 1] = exhaustive_backup(policy[depth])
        compute_expected_values(policy[depth + 1])
        tweaked_policy[depth + 1] = []
        for k in range(0,max_trees):
            belief[k] = generate_belief(heuristic, max_trees - depth - 1)
            tweaked_policy[depth + 1].push(get_best_tree(policy[depth + 1])
        depth += 1
        policy[depth] = tweaked_policy[depth]
    return policy[depth]

How bad is a heuristic, really?

IF your sensors and location data get you "almost there" (i.e. MDP vs Dec-POMDP), an MDP-like heuristic is probably going to be pretty close to what you could expect. If your sensor information really is quite terrible and your transitions are so bad that you really do need to know precisely where you are to have any confidence in this, then you probably won't get a particularly optimal answer. IF, however, you are relatively able to determine where you are in whatever graph you are traversing, an MDP-based estimator is going to be fine.

]]>
https://ideaheap.com/2014/05/tour-de-heuristics-memory-bounded-dynamic-programming/feed/ 0
Tour de Heuristics: Policy Iteration https://ideaheap.com/2014/05/tour-de-heuristics-policy-iteration/ https://ideaheap.com/2014/05/tour-de-heuristics-policy-iteration/#respond Thu, 22 May 2014 19:17:54 +0000 http://www.ideaheap.com/?p=641 Policy Iteration is the most available option for dealing with infinite horizon DEC-POMDP’s. In this space, it is sub-optimal. It can be, however, epsilon-optimal. Epsilon optimality means that based on the starting point and a decay factor, we can plan a controller out for enough steps that the expected discounted reward for any more steps is negligible.

Dec-POMDP’s have to track the entire decision tree they have used so far in order to optimally account for the belief states of other agents (even if you are back to belief X, your partners have a variety of potential belief states).

The Algorithm

This is a pseudo-representation of the algorithm presented in Algorithm 7.5 of Cooperative Decision Making.

# In algorithms, epsilon is used to represent "the smallest difference".
def dec_policy_iteration(initial_policy, decay_factor, epsilon):
    # in this world, it more means 
    # "what depth is the controller designed for"
    depth = 0
    # Instead of being a decision tree, this 
    # is a state machine
    controller = Nil

    policy[0] = initial_policy
    
    while decay_factor ** (depth + 1) * get_max_reward() \
        / (1 - decay_factor) > epsilon:
        # Continue updating the controller until we decay
        # enough that even if the best possible move is
        # available, we don't care.

        policy[depth + 1] = create_all_possible_children(policy[depth])
        
        expected_value = calculate_expected_value(
            policy[depth + 1],
            depth)
        tweaked_policy[depth + 1] = policy[depth + 1]
        do:
            for agent in get_all_agents():
                prune_policy(tweaked_policy, agent)
                update_controller(tweaked_policy, agent)
                calculate_expected_values(tweaked_policy)
            if tweaked_policy[depth] != policy[depth]:
                break
            policy[depth + 1] = tweaked_policy[depth + 1]
        while True
        depth += 1
    return policy[depth]

The exit determination for the loop in this case is deterministic and, instead of being based on the controller itself, is based on the absolute highest value single step state-action in the graph given. This represents the biggest change that could happen from adding another step to a controller for a horizon T controller.

How are controllers made?

A controller is made by taking the initial controller and adding every possible node to it for one step. This is, as it suggests, a very exhaustive activity. these nodes will tie into the nodes of the original controller at each step. A backup implies we are making the “top” of the tree, or starting from an initial state.

]]>
https://ideaheap.com/2014/05/tour-de-heuristics-policy-iteration/feed/ 0
Tour de Heuristics: MAA* https://ideaheap.com/2014/05/tour-de-heuristics-maa/ https://ideaheap.com/2014/05/tour-de-heuristics-maa/#respond Thu, 22 May 2014 04:12:25 +0000 http://www.ideaheap.com/?p=637 Multiagent A* is a heuristic that takes the commonly used A* algorithm and applies it to Dec-POMDP’s. Let’s investigate how it works.

The Algorithm

def estimated_state_value(belief, action):
    """
    The cornerstone of the A* algorithm
    is to have an optimistic estimator. Because
    an MDP assumes more information, it will
    always have at least as much value as a
    POMDP solution or a Dec-POMDP solution.

    For this, I have chosen mdp, as it is
    solveable in polynomial time.
    """
    estimated_value = 0
    for state, probability in belief:
        estimated_value += probability \
            * calculate_mdp(state, action)
    return estimated_value

def select_top_policy(policies):
    """
    Select the policy currently valued highest
    This will be a mix of the actual policy value
    based on the current policy tree, and the
    estimated value of the policy based on an
    optimistic estimator.
    """
    # Policies is most likely implemented as a tree
    # ...
    return top_policy

def top_policy(
    candidate,
    best_policy,
    best_value):
    candidate_value = calculate_value(candidate)
    if candidate_value > best_value:
        return (candidate, candidate_value) 
    else:
        return (best_policy, best_value)

    
def multi_agent_astar(max_layers=10,initial_belief):
    best_policy_value = float("-inf")
    best_policy = Nil

    open_policies = actions_at_belief(initial_belief)
    while len(open_policies) > 0:
        candidate = select_top_policy(open_policies)
        expanded_policies = expand_child_nodes(candidate)
        (complete_policies, remaining_policies) = split_on_depth_of(
            expanded_policies, max_layers)
        for policy in complete_policies:
            (best_policy, best_policy_value) = top_policy(
                policy,
                best_policy,
                best_policy_value)

        open_policies.insert(remaining_policies)
        clean_out_policies_worse_than(
            open_policies,
            best_policy_value)
    return best_policy

Analysis

The algorithm described above follows the traditional approach seen in any A*-inspired algorithm. It is optimal given an optimistic estimator, and it requires a defined start state. It is a top-down algorithm.

The part of this algorithm that makes it multi-agent is really just the implementation of the selector and the node expansion. This leverages a well known and extremely algorithmic tool.

]]>
https://ideaheap.com/2014/05/tour-de-heuristics-maa/feed/ 0