Archive for the ‘Uncategorized’ Category

Developers performance and random variation

Monday, September 8th, 2008

Is it possible to rank the relative performance of developers in a team without falling foul of random variation?
Ben Goldacre wrote a very interesting piece (as usual) in The Guardian this week-end (Sat 6 September edition) about the silliness of national “studies” which failed to take a vetted statistical sample before coming to any conclusion. You can read the piece on Ben Goldacre’s blog here.
However, this is most interesting when we discover random variation at work in other domains than research and it stroke me as particularly applicable to evaluating developers performance in a team.

Individual performance

In order to evaluate performance, we must define measure points in all the work produced by the individual. These metrics will include quantity-based indicators like the number of lines of code written, the number of bugs solved, or the number of bugs introduced (for negative performance) but they also would need to account for quality-based indicators as percentage of code written that is covered by tests, package and class cohesion or number of dependencies introduced.
The problem with the latter set of metrics is that it is very difficult to predict what impact on the overall system quality they will have. Nonetheless, the practice is to keep them as sweet as possible.

Individual performance, therefore can be measured absolutely from the code produced and can be compared to previous performance for the same individual. However, is it enough to use these metrics? How has the developer coped with the work that he had been assigned? Has he been able to complete the task in the most efficient way?

Feature-relative performance

Given 2 new features A and B to develop (or say bugs A and B to fix) with respective complexity Ca and Cb, how does an individual developer implement the code required to support them?
In an ideal world, every developer would code the same way on Monday at 9am, that he is on Wednesday at 12.30pm or Friday at 5pm… but there is not a single way to solve a problem, there is not a single algorithm that sorts a table, nor is there a single spelling convention for naming methods, and developers will choose the one that suits their state of mind at the very moment they need to implement it; and that doesn’t even encompass the boredom parameter which will see developers implement the famous Hello World! in every fashion possible just to make sure they keep themselves entertained!

With each feature having a different impact on the system and the developer being equally likely to implement the same feature in two different ways at different times, what is the value of those metrics when comparing them feature by feature?
And what is the value of comparing them between developers?

Team-relative performance

And here we introduce random variation: when you are trying to evaluate the relative performance of members of a team, how can you make sure you have comparable metrics to compare them with?
Let’s see with an example where we consider 2 developers of same competence that we want to evaluate against each other:

Iteration #1
Developer A has developed features F1 and F2 of complexity C1 and C2 and has achieved metrics M1 and M2. Developer B has developed features F3 and F4 of respective complexities C3 and C4. Because they are of same competence, we can assume that they have been allocated equivalent tasks and that C1+C2 ≈ C3+C4.

Iteration #2
Developer A is given to develop features F5 and F6 (complexities C5 and C6) and to fix bug B1 (Cb1) found on F3; he completes F5 and fixes B1, but doesn’t complete F6 by himself. Developer B completes features F7 and F8 (C7 and C8), fixes bugs B2 and B3 found in F4 (Cb2 and Cb3) and helps developer A complete F6.
Again, the tasks have been allocated on the assumption that C5+C5+Cb1 ≈ C7+C8+Cb2+Cb3.

Notice how I didn’t mention metrics in Iteration #2; that is where the problems actually begin. While we can easily measure our metrics for F5, F7 and F8, and allocate them to the performance measurement of each developer, there is an issue with F6 to determine how to share the performance between the developers.
Moreover the bugs solves incur new metrics being calculated on F3 and F4 as composite metrics of the pre-existing code with the fix code; therefore M3′=M3+ΔMb1 and M4′=M4+ΔMb2+ΔMb3.
It is likely at this point that we want to allocate only the delta performance to each developers for bugs. We probably want to remove this delta to the previously allocated metric for the feature.

We can also note that at the end of the Iteration #2, we could either conclude that the complexity estimation for the tasks was wrong or that developer B is more skilled than developer A.
By experience, I can safely say that neither conclusion can be drawn automatically.

As a wrap-up of this small theoretical example, we try to determine the composite performance of each developer:

  • Developer A: Mx = M1+M2+M5+∂M6+ΔMb1
  • Developer B: My = M3+M4+∂M6+ΔMb1-ΔMb1+ΔMb2-ΔMb2
    My = M3+M4+∂M6

With the performance adjustments (negative deltas due to bugs), it becomes clear that the conclusions that we could possibly have drawn after Iteration #2 cannot be drawn with any level of confidence without looking at the bigger picture.

Normalisation of the random variation

The previous example shows that if we want to be able to measure relative performance accurately, we need to find ways to normalise our input performance data. In the example, we used negative deltas of performance to impact an individual’s performance over subsequent iterations. We could also sample our metrics on features with a given complexity. Evidently, the best possible measurement would be to allocate the same set of features to each developer, but that would be counter-productive and difficult to implement in a real-world project.

Finally, we need to acknowledge that the complexity estimation for a given feature might be substantially wrong and that solving a bug for that feature could uncover a whole new range of complexity, rendering ineffective our negative delta adjustment; we therefore would need to mitigate errors and changes in complexity in our performance evaluation.

There is probably more to it than that, and I happily welcome comments and discussion. How do you measure your team’s performance?

Sphere: Related Content

Observable systems

Thursday, September 27th, 2007

So, you have been landed that juicy development contract: requirements, development, test, deployment, maintenance… you’ve got it all covered! But have you? Whatever method you will choose for your development, there is an area that is often overlooked when developing software: production. How do you know your system is running well? How do you know what it is doing? Or even why it is doing it? In this post, I would like to introduce a paradigm in software development methodologies: the observable system.

Your client knows nothing about maintenance

The problem with developing maintainable software is two-fold: the quality of code and the fact that you haven’t thought of it in the first place. You can’t be blamed though.

When you develop software, you focus essentially on client requirements. When you are Agile, this focus takes a new dimension: in very short iterations, you pick the client’s brain and implement straight ahead. When you are not Agile, you gather a huge mass of client requirements before trying to design and develop all of them at the same time (and often fail delivering most of the software’s intended value). In any case, the approach is to listen and transcribe.

But the story the client will tell you is one about what he cares about: making their business leaner, simpler, more competitive… (well, in fact, we wish they would all actually care about that, don’t we?)

What the client doesn’t know is that you will have to care for the newborn system. If someone is to maintain the software in production, they will need a wealth of information on the system’s behaviour; and be able to react to this behaviour.

Observe the newborn system

I would like to take this newborn analogy further: when you have your first baby, you take all the care in the world for conception and development. You care for quality of the food for the mother, her resting patterns, multiple measurements (blood samples, morphological measurements, scans, etc.)… but come the preparation for the baby coming and you can’t actually prepare specifically except by buying everything that any baby could ever be in need of; and let me tell you that’s costly! That’s the (untrained) parents take.

baby dashboard
image found via

The midwives approach is to have a set of defined measurements to track baby’s needs in the early stages: weight, eight, hours of sleep, amount of milk taken, length and frequency of feeding time… from these metrics, they decide to switch to formula, decrease the doses, add food complements or send your baby to foster care!

The observable system paradigm

Intent: make a system maintainable by allowing observation of its behaviour while in production
Motivation: systems are usually used in production as if they were black boxes, and their maintenance is operated by investigating production data and comparing to test data and systems; there is a need to gather instant meta-information on the system, its processes and its state.
Applicability: virtually every software system can benefit from this paradigm
Structure:

observable system structure
click for a larger image

Collaboration: the System exposes Meta-Information and the Maintainer accesses the Meta-Information to use it to asses the state of the System
Known uses: Java’s MXBeans (JMX)

Implement your observable system

In a further post, I will show an example of observable system. In the mean time, I would very much like to hear about your own experience of observable systems: as usual, contact@<my main domain>.com

Sphere: Related Content

About

Friday, October 13th, 2006

Guillaume BERTRAND

Facts

I’m living in London.
I work as consultant in Information Systems.
I read Le Monde and The Guardian (in no special order and among other publications).
I often look through the window to find my muse.

Figures

Age : 29 yo
Height : 1.86m or 6ft 1″
Average commute time : 30mn

Dreams

Winning the lottery. Becoming architect or designer. Traveling around the world. Understanding art, one day. Being able to remember everything.

Sphere: Related Content