Sunday, November 2, 2008

Quotes from Randy Pausch

I've watched Randy Pausch's last lecture a few times and I always find I can learn something from it. It is a one hour session packed with many amazing life lessons. If you don't know who Randy Pausch is (from which planet are you from??), take the time to read about his story and watch his presentation.

I chose three quotes that even though they are no programming related, can still be applied to work on a team or within a company.

"When you're screwing up, and nobody is saying anything to you
anymore, that means they gave up."


Nobody likes to listen to criticism, but this is the best way to improve ourselves. Often the self image we have is totally different from the way other people see us, and this is why it is very important to hear what other people have to say about us so that we can get better and better.

"When you see yourself doing something badly and nobody's bothering to
tell you anymore, that's a very bad place to be. Your critics are the
ones telling you that they still love you and care"


Embrace criticism with an open mind. When you are failing at something, your enemies will not tell you anything. If somebody is taking the time to go through the embarrassment of talking to you about a flaw in your ways, it is because that person really cares about you and wants to see you get better.

"Experience is what you get when you didn't get what you wanted"


Love this one. When someone in my team is frustrated at a stand up meeting because they didn't make any progress at a problem, I tell them that they were able to eliminate all the dead ends they found, and this is already progress.

"Wait long enough and people will surprise and impress you. When you're pissed off at somebody and you're angry at them, you just haven't given them enough time".


I've seen this pattern many times at many places. I learned that the good people want to do things right. It is specially frustrating when you know that someone has the potential but does not deliver. But if you give that person enough time you will be pleasantly surprised.

Thursday, October 30, 2008

Programming Styles

Over the years I've met many developers and got to know many different programming styles. Here are some of the odd ones. Do you know someone like that?

Shotgun Programming

It is a programming style where the developer tries random shots at the code. "Well, this method call is failing.... I'll try changing this parameter from false to true!" Then of course it doesn't work and the developer goes: "Well, maybe I could just comment out the whole method call!" and so on. It can go on forever until it works by pure chance or the developer is rescued by a peer who points the correct solution.

A regular developer can go crazy in a few hours if he finds himself pairing with a shotgun programmer. It can drive you NUTS. Two shotgun programmers should never do pair programming together, because their destructive results are magnified when they work together.

Programming by accident

It is a mild form of Shotgun Programming, and it is surprising to see how common it is. I think this category encompasses the majority of developers worldwide by a wide margin. It happens when the developers doesn't really understand what he is doing, but things work. The dev codes some more, and the program still works. Since this is happening by accident, at some point something will break and the dev will have no idea on how to fix it. At this point, he usually has 2 courses of action: stop and understand what he did, in order to find the cause of the error, or, most likely, engage into Shotgun Programming to try to fix the problem.

Test Driven Development came to the rescue of the millions of programmers by accident. Now, you have an excuse to be program by accident: as long as the tests pass, you are good. Don't get me wrong, Test Driven Development is a Good Thing, and it limits the damage that can be caused by Programming by Accident.

Cargo-cult programming

The term comes from the Cargo Cults that appeared in many pacific islands during World War II. During the war, the US used the illands as bases and built airstrips for their cargo planes. The natives were amazed by the planes who brought all those goods and food. When the war was over, the planes disappeared, and the natives built their own air strips, with bamboo control towers, in the hope that if they did exactly like the white men did, the planes would return and bring back the beloved cargo.

Cargo cult programming is the practice of applying a popular solution just because everybody else is doing and it seems to work form them, but without understanding why it is being done that way. Lots of people engaged on it during the first years of J2EE by overusing EJBs and Entity Beans, for example.

Least effort programming

This style is very common specially among junior developers. One day you are assigned a task to fix a NullPointerException, so you just go to the line of code where the exception is generated and surrounds it with a if (reference != null).

It may very well work but you didn't solve the cause of the bug, you just hid it until it comes back to haunt you again. What you should have done is to go back and fix the problem that caused the reference to be null in the first place.

Design pattern driven programming

As the name says, it is the programming style where you use design patterns for EVERYTHING. Your code is full of Facade this, Observer that, Strategy whatever, Adapter, blah blah blah. It reaches a point where you have to dig real deep to find the code that does the actual job in the middle of the Design Pattern Tangle.

Surgical programmer

When working on a bug, the Surgical Programmer investigates the cause. And then the cause's cause. Then, he investigates the consequences of changing the code that is causing the other code to cause the other code to fail. Then he does a text search to find all usages of that class in the code, just in case. And for each match, he does another text search to find what uses the usage's usage. Then he writes unit tests for 30 different possible scenarios, even those that don't have anything to do with the bug he is fixing. In the end, full of confidence and with surgical precision, he fixes a typo.

In the meantime, the regular programmer fixed five other bugs.

Butcher programmer

It is the programmer that has an extreme itch to refactor everything he touches. It is the kind of programmer that, the night before shipping, when fixing a typo in an error message, changes 10 classes, refactors other 20, plus changes the build script and 5 deployment descriptors.

Tuesday, September 23, 2008

Weak Object Pools with WeakHashMap

In a previous article, I wrote about what the WeakHashMap is not good for. Now, to make justice to this very useful but misunderstood class, I am going to devote an article to talk about one of its good applications. I will explain how to use a WeakHashMap to implement an object pool that can be used to promote the reuse of instances of immutable objects, while at the same time avoiding memory leaks.

Reviewing the WeakHashMap

The WeakHashMap is a Java collection that uses the WeakReference class to hold its keys. As we've already seen, weak references to an object are cleared by the garbage collector as soon as there are no strong or soft references to the same object. The result is that entries only remain in the WeakHashMap as long as there are references to the map keys lying around your JVM.

Here is an example:
import java.util.WeakHashMap;

public class WeakHashMapSample1 {
public static void main(String[] args) {
WeakHashMap weakHashMap = new WeakHashMap();
// Create a key for the map, but keep the strong reference
String keyStrongReference = new String("key");
weakHashMap.put(keyStrongReference, "value");
// Run the GC and check if the key is still there.
System.gc();
System.out.println(weakHashMap.get("key"));
// Now, null-out the strong reference and try again.
keyStrongReference = null;
System.gc();
System.out.println(weakHashMap.get("key"));
}
}

The code above prints:
value
null

What happened here? At the first time we called System.gc(), there was still a reference to the key in the keyStrongReference class. Because of this, the map key was not cleared by the garbage collector. In the following line, we got rid of the strong reference to the key and tried again. This time, the call to weakHashMap.get("key") returned null.

Implementing a Weak Object Pool with a WeakHashMap

If you make heavy use of small immutable classes, like String and the primitive wrapper classes like java.lang.Integer, you can take advantage of the WeakHashMap behavior to share instances and reduce memory usage, while at the same time not having to worry about objects lingering in memory after they are no longer memory. The real benefit depends on how many instances you create and how the values are distributed, but you can potentially reduce the number of objects created by several orders of magnitude.

Here is an example of a weak object pool:

import java.lang.ref.WeakReference;
import java.util.Map;
import java.util.WeakHashMap;

/**
* Oversimplistic implementation of an object pool
*/
public class WeakObjectPool {
// Map where the key is an object, and the value is a weak reference
// to the same object. We use the key to do the lookup, and the value
// to actually return the object when it is found.
private Map map = new WeakHashMap();

public Object replace(Object object) {
WeakReference reference = (WeakReference) map.get(object);
if (reference != null) {
Object result = reference.get();
// Another null check, since the GC may have kicked in between the
// two lines above.
if (result != null) {
return result;
}
}
// If we got here it is because the map doesn't have the key, add it.
map.put(object, new WeakReference(object));
return object;
}
}

Now, this class can be used like this:
class ObjectPoolClient {
private static WeakObjectPool objectPool = new WeakObjectPool();

public static void main(String args[]) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("input.csv"));
List<String[]> parsedLines = new ArrayList<String[]>();
String line;
while ((line = reader.readLine()) != null) {
String[] elements = line.split(",");
for (int i = 0; i < elements.length; i++) {
// replace the string read from the file with the pool instance
elements[i] = (String) objectPool.replace(elements[i]);
}
parsedLines.add(elements);
}
reader.close();

// Cool, we saved a lot of memory by reusing the repeated strings!
doSomethingInteresting(parsedLines);
// Now, we get rid of the references and soon the garbage collector
// will reclaim the memory
parsedLines = null;
doMoreInterestingStuff();
}
}

Assuming the input file contains lots of repeated values, we've been able to save a lot of heap space by not having the same string repeated over and over again in memory. Also, we get the added benefit of releasing the memory when the strings are no longer needed.

Weak Object Pool in conjunction with the Flyweight pattern

The Flyweight pattern is a perfect match for the Weak Object Pool. The assumption behind this pattern is that the flyweight instances are shared to reduce memory consumption. The flyweight factory could use a Weak Object Pool to store the flyweights. This way, once a given flyweight is no longer in use, it will be released from the factory's storage and its memory reclaimed.

Words of caution

Don't go out using Weak Object Pools everywhere you have immutable classes instantiation. It is only an advantage to use it when there is a lot of repetition of the values. If the values are more randomly distributed, you will be better off not using this pattern, because it incurs in a small memory overhead for the internal map structures. If you apply the Weak Object Pool pattern in the wrong situation, you may end up with worse performance!

Also, it is very important to use this pattern only to store immutable classes . If you use this pattern for a non-immutable class, you can find yourself with bugs that are very difficult to reproduce and fix. Those bugs may happen if an instance of an object stored in the Weak Object Pool is shared by two completely unrelated clients, and one client modifies the instance. Then the other client will see the modified value and it will be very difficult to trace the original modification of the object.

More information:

Sunday, September 21, 2008

Martin Fowler got it wrong?

Martin Fowler recently posted to his blog a comment about software requirements. He starts by quoting the opening paragraph of the "Mastering the Requirements Process" book:

Requirements are the things that you should discover before starting to build your product. Discovering the requirements during construction, or worse, when you client starts using your product, is so expensive and so inefficient, that we will assume that no right-thinking person would do it, and will not mention it again.


Then he goes on to criticize the statement above, by comparing it to the agile processes, where you have continuous discovery of the requirements. Here is what he says:

Agile methods violate this underlying assumption by intending to discover the 'requirements' during construction and after delivery.


I don't quite agree with Martin Fowler here. Agile processes are inherently iterative, but even in an iterative process the requirements for a given iteration are supposed to be known before that iteration starts. At the beginning of a Scrum iteration, for instance, the team is supposed to know exactly what it has to build, as defined in the Sprint Backlog. An iteration doesn't start if the team doesn't have the scope very well defined. The opposite to it would be to sit down in front of a computer in a given morning to begin writing the revenues recognition module without any idea of how the revenues recognition rules are suppose to work, and starting to ask around to see how revenues recognition works.

The even worse thing mentioned in the quote above is to discover the requirements when the system is in production. It is equivalent to write the revenues recognition module using your own assumptions to what the business rules are and then finding out during production that the rules you implemented are completely wrong.

What Agile methods do have is the ability to adapt to change. Requirements change because the market or the business change, and they also change because the people involved get a better understanding of what the requirements are. The assumption is that you will get the requirements wrong, and this is very often the case, but you are still hoping to have gotten them right at the first time. If you got them wrong, though, Agile processes have the means to adapt to this change.

Saturday, September 20, 2008

WeakHashMap is not a cache! Understanding WeakReference and SoftReference

If you ever found yourself in the need to implement a simple caching functionality in your Java programs, chances are that you at least considered using the WeakHashMap class as the cache.

It turns out that the WeakHashMap makes for a terrible cache, and for two reasons. The first reason is that it uses weak references as the underlying memory management mechanism. The second reason is that the weak references are used for the keys and not for the values, which is what you would want.

Reference classes and reachability

To understand what the WeakHashMap is good for, we need to understand the WeakReference and SoftReference classes and what is the difference between them. Both extend from the Reference class, which resides, along with its children, in the java.lang.ref package. The Reference classes are used to represent object references that are weaker than regular java references, which are called strong references Objects that can be reached by a chain of only strong references never get garbage collected. The weaker the references to an object, the more likely the object will be reclaimed by the garbage collector.

The stronger type of reference is the strong reference, like when you declare String name = "John Doe";. The name variable is a strong reference to the "John Doe" String object. SoftReferences are weaker than strong references, and WeakReferences are weaker than SoftReferences. There is also an even weaker type of reference, the PhantonReference, of which I'm not going to talk about here.

The type of references involved in the reference chain that starts from a local or a static variable and ends in an object defines the type of object's reachability. The Java API explains the different categories of object reachability this way:

  • An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it.

  • An object is softly reachable if it is not strongly reachable but can be reached by traversing a soft reference.

  • An object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for finalization.

  • An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it.

  • Finally, an object is unreachable, and therefore eligible for reclamation, when it is not reachable in any of the above ways.


if the garbage collector determines that an object is strongly reachable, it will not reclaim the object. This is what we would expect. Nobody wants to have an object garbage collected when it can still be reached by a chain of strong references. Now, here is the important point that is not written in the explanation above: If the garbage collector determines that an object is softly reachable, it may clear atomically all soft references to the object, in the case that it finds that memory is running low, or at its own discretion. But if the garbage collector determines that an object is weakly reachable, it will clear atomically all weak references to the object. This is the major difference between weak and soft references and the reason that makes the WeakReference ill-suited for caching.

What is the WeakHashMap good for?

Now it is easy to understand why the WeakHashMap doesn't work for caching. First of all it wouldn't work anyway because it uses soft references for the keys and not for the map values. But additional to that, the garbage collector aggressively reclaims the memory that is referenced only by weak references. It means that once you lose the last strong reference to an object that is working as a key in a WeakHashMap, the garbage collector will soon reclaim that map entry.

If the WeakHashMap is no good for caching, then what is it good for? It is good to implement canonical maps. Lets say you want to associate some extra information to an object that you have a strong reference to. You put an entry in a WeakHashMap with the object as the key, and the extra information as the map value. Then, as long as you keep a strong reference to the object, you will be able to check the map to retrieve the extra information. And once you release the object, the map entry will be cleared and the memory used by the extra information will be released.

Can I just copy and paste the WeakHashMap class to write my cache?

No. Please, don't copy and paste the WeakHashMap source code replacing WeakReference with SoftReference. This won't be effective. To understand why, look at this example:
SoftHashMap cache = new SoftHashMap(); // A copy and paste from WeakHashMap
SomeExpensiveClass myReference1 = .... // get expensive class instance
cache.put(new Long(10), myReference1); // put the expensive object in the cache
... // do some stuff, but keep the myReference1 variable around!
SomeExpensiveClass myReference2 = cache.get(new Long(10)); // query the cache
if (myReference2 == null) {
// Uh-oh, the cache got rid of the object, even though I
// still had a reference to it in the myReference1 variable!
}

You would expect the cache to keep the reference to the object, since the myReference1 variable was still around. This may happen, but it may also happen that the cache will have been cleared. Why is this? Because at the "do some stuff" block, the garbage collector may have kicked in, noticed that the map's key was softly reachable, and chosen to garbage collect it. Remember, the WeakHashMap uses WeakReference for the keys, so your copy-paste implementation would use SoftReference for the keys. This reduces the effectiveness of the cache because you would expect the object to remain in the cache since the strong reference to it is still around. But the problem is that there is no strong reference to the key in the example above.

So how the hell do I implement a cache in Java?

My suggestion is to use one of the freely available Cache implementations, like JCS, OSCache and others. Those libraries provide better memory management with LRU and FIFO policies for instance, disk overflow, data expiration and many other optional advanced features.

If you still want to implement a cache class that takes advantage of SoftReferences yourself, implement your own Map class that extends AbstractMap and internally wrap the map values in SoftReferences. Then you need to implement an internal mechanism to remove the stale entries from the map. For this, you may want to use a ReferenceQueue, and poll the ReferenceQueue for collected entries. Then, for each collected entry, you will need to remove it from the map. Make sure you do this in an efficient way! Or you may end up with some nested loops and O(n2) operations and your cache performance will suck.

More Information

API documentation of the java.lang.ref package for an explanation of the Reference classes and object reachability.
API documentation of the WeakHashMap class for more information on the limitations and application of the SeakHashMap class.

Sunday, September 7, 2008

Work environments and the Flow

In this post I will talk about the Flow state, and why it is important to provide developers with environments that enable then to attain that state. First, what is Flow?

"Flow" is the name given to a mental state of deep concentration, when you can spend literally hours on a problem without getting easily distracted. In this state people can work on harder problems than they can usually work on, because they can keep track of more problem variables and levels of abstraction. When you are in the Flow it seems that the time flies by, but in the end you are still amazed at how much work you accomplished during that period. Flow is the state you are usually in when you are alone at the office working late hours without anything to distract you. During those late hours, you are usually able to get more work done than during the whole day.

It is very difficult to get into the Flow, but it is very easy to lose it. It usually takes 10 to 15 minutes to get into a Flow state, and during this period any small distraction breaks you concentration and you have to start over. Once you are in the Flow things are better, and you only lose it if there is a distraction directed specifically to you, like a coworker calling your name to ask you a question, for instance. Every time you wake up from the Flow state, it takes you 10 to 15 minutes to go back to it again.

Flow is THE productivity silver bullet for software development. It enables developers to work faster and produce better quality code. If people are writing software outside of the flow, they usually introduce much more bugs because of the lack of concentration.

Software companies should provide their developers with work environments that make the Flow possible. The problem is that it is very difficult. Here is a quick list of things a developer needs to get and stay into the Flow.

  1. Quiet work environment. No one will get into the flow if there are people talking around them all day long. Ideally, developers should have individual rooms with doors, or at most, small rooms where two or three people work together. This goes against some Agile evangelists that encourage the open spaces, under the theory that this promotes communication. But people who encourage open spaces clearly don't do programming as their main activity.

  2. Kill other distractions. Things like IM software and mail popups can kill the concentration very easily. So developers should be allowed, and actually encouraged, to close their IM and email software whenever they like.

  3. Quick compile cycles. Having a quiet environment is no good if it takes you too long to compile the software. And too long in this case is anything longer than a few seconds. If this happens, the developer will most likely switch to the web browser to check the latest news, or switch to the email client to check the last messages, and the hard earned Flow goes down the drain. Using Test Driven Development helps you to stay in the flow, because it is quicker to compile the code and run the tests as opposed to having to redeploy and restart the application every time.


Having said all that, most software companies don't do any of these. It is very hard to convince upper management to give individual rooms to developers. It is also very hard to convince them that too much information at once is a bad thing and it is OK to turn off the internal IM and email sofware.

But even if you don't have the ideal work environment, there are still things you can do by yourself. You can avoid asking other people silly questions that you can figure out by yourself with a little research. Otherwise, To save yourself a couple of minutes, you make the other person lose 10 minutes to get back into the Flow.

You can also take actions to isolate yourself from the surrounding distractions. The approached preferred by 9 in 10 developers is to listen to music in a headphone. I don't like to listen to music all day, so I sometimes listen to pink noise. You can also try noise canceling headphones.

Another suggestion is to move your desk so that you are not looking at the aisle. Try facing a window, a wall, or a corner of the office that has less movement. This avoids visual distractions.

Neal Ford suggests in his The Productive Programmer book that the team institutes a Quiet Time, a few hours a day when people should be left alone to work and chatter in the office should be avoided. He says that in the team where it was instituted, people looked forward to those hours as the most anticipated period of the day. In order to work, though, this has to be agreed by the team members so that everyone respects it.

Getting in the flow is very difficult, and most managers don't value it since managers are supposed to multitask so the Flow is not important to them. Setting up the environment so that the team can achieve the Flow state can be a daunting task, but if this is done, the productivity gains will be worth the effort.

More information:
Peopleware is a must read book for any software development manager. It discusses the concept of Flow and also good work environments for developers.

Sunday, August 24, 2008

FODA Methodology

This post is for brazilian readers only.

Tim brought my attention to the FODA methodology (Feature-Oriented Domain Analysis). I don't know about the methodology but the name sounds great :).