Tuesday, April 24, 2018

Proactively Manage Your Project With RAID Lists (Risks, Assumptions, Issues and Decisions)

RAID Lists
RAID is an acronym for Risks, Assumptions, Issues, and Decisions. Some use the "D" for dependencies instead of decisions, and some use the "A" for actions instead of assumptions. I personally track dependencies on my assumption list (because that is what dependencies are) and I have no need for a separate action list since I track actions in a separate column of each of the other lists.

Although I speak from four lists you can use one tool to maintain them and filter on type when necessary. This is actually advisable since items will move from one list to another on a regular basis, but we will come back to that later.

Often the terms risks, assumptions, and issues cause confusion. Their definitions are similar but applied differently in different context or areas of work.

Risks

Risks are made up of two parts: the probability of something going wrong, and the negative consequences if it does, often called impact. Caution, risk is not the same as uncertainty. Risk and uncertainty are different terms, but most people think they are the same and ignore them. Managing risk is easier because you can identify risks and develop a response plan in advance based on your past experience. However, managing uncertainty is very difficult as previous information is not available, too many parameters are involved, and you cannot predict the outcome. See Risk Management is Project Management for Adults.

Assumptions

Assumptions are hypotheses about necessary conditions, both internal and external, identified in a design to ensure that the presumed cause-effect relationships function as expected and that planned activities will produce expected results. In other words, an assumption is an event, condition or fact that we need to happen or stay the same in order to assure project success.

Issues

Issues are a current problem – something that is happening now. Issues can be either something that was not predicted or a risk that has materialized. It can take the form of an unresolved decision, situation or problem that will significantly impact the project.

We could explain these terms with the following simple expressions

Risk: "What if?"
Assumption: "We need that it happens/stays this way".
Issue: "Oh, damn!".

If you have predicted the issue and treated it as a risk, you would already have a response plan, and the expression would be: "Hmm, that was to be expected. Let's handle it as discussed".

Even though risk, assumption, and issue have different definitions, they are deeply connected. An assumption might have a response plan if it doesn't happen or change, a risk will become an issue at the moment it happens, and an (unpredicted) issue must be included in the risk list once we identify it (and there is a chance that it will happen again).

Decisions

Decisions have NOT been made until people know:

> the name of the person accountable for carrying it out;
> the deadline;
> the names of the people who will be affected by the decision and therefore have to know about, understand, and approve it—or at least not be strongly opposed to it; and       
> the names of the people who have to be informed of the decision, even if they are not directly affected by it.

An extraordinary number of organizational decisions run into trouble because these bases aren’t covered. The answers to these questions you should put in a list and act accordingly. It’s just as important to review decisions periodically—at a time that’s been agreed on in advance—as it is to make them carefully in the first place. That way, a poor decision can be corrected before it does real damage. These reviews can cover anything from the results to the assumptions underlying the decision. Such a review is especially important for the most crucial and most difficult of all decisions, the ones about hiring, firing and promoting people.

Closing Thoughts

The secret to good RAID lists is to record the right risks, assumptions, issues, and decisions at the right level of detail. Too many and in too much detail simply create unnecessary bureaucracy. Too few in too little details does not provide actionable insights. For example, I have often seen risk lists populated with boilerplate risks: these are generic project risks, such as 'we may not get sufficient executive sponsorship' which don't really add any insight to the project. If you genuinely think that is a risk, you are better off identifying the underlying reasons which might cause this, and then express the risk in those terms.

For me, these four lists are the base for proactive instead of reactive project management.

Proactive behavior involves acting in advance of a future situation, rather than just reacting. It means taking control and making things happen rather than just adjusting to a situation or waiting for something to happen.
RAID logs are an excellent governance mechanism, and worth keeping even if for no other reason than so that you have all of the information to hand if the internal audit department or some other stakeholder decides to audit your project. But don't forget that projects are fundamentally about doing things. So identifying risks and issues without also identifying actions to mitigate them, and making decisions to resolve them will not get you very far.

In a nutshell: Your RAID lists are the base for proactive instead of reactive project management.

Read more…

Saturday, April 14, 2018

Start Your Project With a Walking Skeleton

Start your project with a Walking Skeleton
In order to reduce risk on large software development projects, you need to figure out all the big unknowns as early as possible. The best way to do this is to have a real end-to-end test with no stubs against a system that’s deployed in production.

You could do this by building a so-called Walking Skeleton, a term coined by Alistair Cockburn. He defined it as a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel. A similar concept called “Tracer Bullets” was introduced in The Pragmatic Programmer.

A Walking Skeleton is a tiny implementation of the system that performs a small end-to-end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.
If the system needs to talk to one or more datastores then the walking skeleton should perform a simple query against each of them, as well as simple requests against any external or internal service. If it needs to output something to the screen, insert an item to a queue or create a file, you need to exercise these in the simplest possible way. As part of building it, you should write your deployment and build scripts, set up the project, including its tests, and make sure all the automations are in place — such as Continuous Integration, monitoring, and exception handling. The focus is the infrastructure, not the features. Only after you have your walking skeleton should you write your first automated acceptance tests.

This is only the skeleton of the application, but the parts are connected and the skeleton does walk in the sense that it exercises all the system’s parts as you currently understand them. Because of this partial understanding, you must make the walking skeleton minimal. But it’s not a prototype and not a proof of concept — it’s production code, so you should definitely write tests as you work on it.

High Risk First 

According to Hofstadter’s Law, “It always takes longer than you expect, even when you take into account Hofstadter’s Law”. This law is valid all too often. It makes sense than to work on the riskiest parts of the project first, which are usually the parts that have dependencies: on third-party services, on in-house services, on other groups in the organization you belong to. It makes sense to get the ball rolling with these groups simply because you don’t know how long it will take and what problems should arise.

Making changes to architecture is harder and more expensive the longer it has been around and the bigger it gets. We want to find mistakes early. This approach gives us a short feedback cycle, from which we can more quickly adapt and work iteratively as necessary to meet the business' prioritized list of runtime-discernable quality attributes. Assumptions about the architecture are validated earlier. The architecture is more easily evolved because problems are found at an earlier stage when less has been invested in its implementation.

The bigger the system, the more important it is to use this strategy. In a small application, one developer can implement a feature from top to bottom relatively quickly, but this becomes impractical with larger systems. It is quite common to have multiple developers on a single team or even on multiple, possibly distributed teams involved in implementing end-to-end. Consequently, more coordination is necessary.

No Shortcuts

It’s important to stress that until the walking skeleton is deployed to production (possibly behind a feature flag or just hidden from the outside world) you are not ready to write the first acceptance test. You want to exercise your deployment and build scripts and discover as many potential problems as you can as early as possible. The Walking Skeleton is a way to validate the architecture and get early feedback so that it can be improved. You will be missing this feedback if you cut corners or take shortcuts.

In a nutshell: Start with a Walking Skeleton, keep it running, and grow it incrementally.

Read more…

Monday, April 09, 2018

It's Never Too Early to Think About Performance

It's never too early to think about performance
Business users specify their needs primarily through functional requirements. The non-functional aspects of the systems, like performance, responsiveness, up-time, support needs, and so on, are left up to the development team.

Testing of these non-functional requirements is left until very late in the development cycle, and is sometimes delegated completely to the operations team. This is a big mistake that is made far too often. Having separate development and operations team is already a mistake by itself, but I will leave that discussion for another article.

I was recently part of two large software development projects were performance was addressed to late, and the costs and time necessary fixing it was a magnitude larger as it would have to address performance early in the project. Not to mention the bad reputation the teams and systems got after going live with such a bad performance that users could hardly do their daily work with the system. 

Besides knowing before you go live that users are not going to be happy (and therefore should NOT go live) there is another big advantage of early performance testing. If you aren't looking at performance until late in the project cycle, you have lost an incredible amount of information as to when performance changed. If performance is going to be an important architectural and design criterion, then performance testing should begin as soon as possible. If you are using an Agile methodology based on two-week iterations, I'd say performance testing should be included in the process no later than the third iteration.

Why is this so important? The biggest reason is that at the very least you know the kinds of changes that made performance fall off a cliff. Instead of having to think about the entire architecture when you encounter performance problems, you can focus on the most recent changes. 

Doing performance testing early and often provides you with a narrow range of changes on which to focus. In early testing, you may not even try to diagnose performance, but you do have a baseline of performance figures to work from. This trend data provides vital information in diagnosing the source of performance issues and resolving them.

This approach also allows for the architectural and design choices to be validated against the actual performance requirements. Particularly for systems with hard performance requirements, early validation is crucial to delivering the system in a timely fashion.

“Fast” Is Not a Requirement 

"Fast" is not a requirement. Neither is "responsive". Nor "extensible". The main reason why not is that you have no objective way to tell if they're met. 

Some simple questions to ask: How many? In what period? How often? How soon? Increasing or decreasing? At what rate? If these questions cannot be answered then the need is not understood. The answers should be in the business case for the system and if they are not, then some hard thinking needs to be done. If you work as an architect and the business hasn't (or won't) tell you these numbers ask yourself why not. Then go get them. The next time someone tells you that a system needs to be "scalable" ask them where new users are going to come from and why. Ask how many and by when? Reject "lots" and "soon" as answers.

Uncertain quantitative criteria must be given as a range: the least, the nominal, and the most. If this range cannot be given, then the required behavior is not understood. As an architecture unfolds it can be checked against these criteria to see if it is (still) in tolerance. As the performance against some criteria drifts over time, valuable feedback is obtained. Finding these ranges and checking against them is a time-consuming and expensive business. 

If no one cares enough about the system being "performant" (neither a requirement nor a word) to pay for performance tests, then more than likely performance doesn't matter. quote

You are then free to focus your efforts on aspects of the system that are worth paying for.

Automated Performance Testing

In order to keep costs and spend time on performance testing in check, I advise you to automate this as much as possible. Tools like Taurus simplifies the automation of performance testing, is built for developers and DevOps, and relies on JMeter, Selenium, Gatling and Grinder as underlying engines. It also enables parallel testing, its configuration format is readable and can be parsed by your version control system, it’s tool friendly and tests can be expressed using YAML or JSON.

Here are some types of tests you can run automated:

> Load Tests are conducted to understand the behavior of the system under a specific expected load.

> Stress Tests are used to understand the upper limits of capacity within the system.

> Soak Tests determine if the system can sustain the continuous expected load.

> Spike Tests determine if the system can sustain a suddenly increasing load generated by a large number of users.

> Isolation Tests determine if a previously detected system issue has been fixed by repeating a test execution that resulted in a system problem.

Closing Thoughts

Technical testing is notoriously difficult to get going. Setting up the appropriate environments, generating the proper data sets, and defining the necessary test cases all take a lot of time. By addressing performance testing early you can establish your test environment incrementally avoiding much more expensive efforts once after you discover performance issues.

In a nutshell: It's never too early to think about performance.

Read more…