Sunday, August 20, 2017

Labels: Software Development

14 Essential Software Engineering Practices for Your Agile Project

Agile initiatives are found in almost any company that build software. Many of these initiatives have not brought the results that were expected. Nor are the people much happier in their work. This has many reasons, and a quick search on Google or LinkedIn will give you a plethora of them. But the one that I am confronted with almost every project I work on is the lack of experience with modern software engineering practices.

Flexibility (agility) can only happen when you are able to make changes to your product in an easy, fast, and flexible way. That is why:

Organizational Agility is constrained by Technical Agility

In other words, when you are slow in making changes to your product, then it doesn’t matter how you structure your teams, your organization or what framework you adopt, you will be slow to respond to changes. These changes include bug fixes, performance issues, feature requests, changed user behavior, new business opportunities, pivots, etc.

Luckily, there are a set of software engineering practices that will help your team to deliver a high-quality product in a flexible state. These practices are not limited to building software, they cover the whole process until actual delivery of working software to the end user.

Unfortunately, many software engineers I have met in the financial service industry have had very limited exposure to these practices and their supporting tools. And don't get me started on the newly born army of agile coaches and Scrum Masters that have not produced a single line of working code in their life. Besides hindering agility, it frustrates the teams because without these practices it is almost impossible to deliver working software in high quality within one Sprint.

Your organization will need to invest in these software engineering practices by training, infrastructure, tools, coaching and continuous improvement when you want your agile initiatives to actually able to deliver agility.

1. Unit Testing

The purpose of unit testing is not for finding bugs. It is a specification for the expected behaviors of the code under test. The code under test is the implementation for those expected behaviors. So unit test and the code under test are used to check the correctness of each other and protect each other. Later when someone changed the code under test, and it changed the behavior that is expected by the original author, the test will fail. If you code is covered by a reasonable amount of unit tests, you can maintain the code without breaking the existing feature. That’s why Michael Feathers define legacy code in his book as code without unit tests. Without unit tests your refactoring efforts will be a major risk every time you do it. 

2. Continuous Integration

Martin Fowler defines Continuous Integration (CI) in his key article as follows: "Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly." You see, without unit tests and test automation, it is impossible to do CI right. And only when you do CI right you might be able to succeed at Continuous Deployment. 

3. Collective Code Ownership

Collective Ownership encourages everyone to contribute new ideas to all segments of the project. Any developer can change any line of code to add functionality, fix bugs, improve designs or refactor. No one person becomes a bottle neck for changes. This is easy to do when you have all your code covered with unit tests and automated acceptance tests. 

4. Refactoring

Code should be written to solve the known problem at the time. Often, teams become wiser about the problem they are solving, and continuously refactoring and changing code ensures the code base is forever meeting the most current needs of the business in the most efficient way. In order to guarantee that changes do not break existing functionality, your regression tests should be automated. I.e. unit tests are essential. 

5. Test Driven Development

Test-driven development is a development style that drives the design by tests developed in short cycles of:   

1. Write one test, 

2. Implement just enough code to make it pass,

 3. Refactor the code so it is clean. 

 Ward Cunningham argues that test-first coding is not testing. Test-first coding is not new. It is nearly as old as programming. It is an analysis technique. We decide what we are programming and what we are not programming, and we decide what answers we expect. Test-first is also a design technique. 

6. Automated Acceptance Testing

Also known as Specification by Example. Specification by Example or Acceptance test-driven development (A-TDD) is a collaborative requirements discovery approach where examples and automatable tests are used for specifying requirements—creating executable specifications. These are created with the team, Product Owner, and other stakeholders in requirements workshops. I have written about a successful implementation of this technique within Actuarial Modeling. 

7. Chaos Engineering

Even when all of the individual services in a distributed system are functioning properly, the interactions between those services can cause unpredictable outcomes. Unpredictable outcomes, compounded by rare but disruptive real-world events that affect production environments, make these distributed systems inherently chaotic.

You need to identify weaknesses before they manifest in system-wide, aberrant behaviors. Systemic weaknesses could take the form of: improper fallback settings when a service is unavailable; retry storms from improperly tuned timeouts; outages when a downstream dependency receives too much traffic; cascading failures when a single point of failure crashes; etc. You must address the most significant weaknesses proactively before they affect our customers in production. You need a way to manage the chaos inherent in these systems, take advantage of increased flexibility and velocity, and have confidence in your production deployments despite the complexity that they represent.

An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. This is called Chaos Engineering. A good example of this would be the Chaos Monkey of Netflix.

8. Continuous Deployment

Continuous delivery is a series of practices designed to ensure that code can be rapidly and safely deployed to production by delivering every change to a production-like environment and ensuring business applications and services function as expected through rigorous automated testing. Since every change is delivered to a staging environment using complete automation, you can have confidence the application can be deployed to production with a push of a button when the business is ready. Continuous deployment is the next step of continuous delivery: Every change that passes the automated tests is deployed to production automatically. Continuous deployment should be the goal of most companies that are not constrained by regulatory or other requirements.  

9. Micro Services

The micro services architecture runs a set of small services in a single application. These services are independent of each other and communication between these services is by means of a well-defined interface that uses a lightweight mechanism, for example, a REST API. Each service has a single function which matches micro services with business needs. There are different frameworks or programming languages that can be used to write micro services and they can also be set to either function as a single or group of services. 

10. Infrastructure as Code

Code and software development techniques like version control and continuous integration are used to merge and provision infrastructure under this practice. The interaction with infrastructure is programmer based and at scale rather than a manual setup and configuration resource. The API-driven model of its cloud makes it possible for system administrators and developers to interact with the infrastructure as such. Code-based tools are used by engineers to interface with infrastructure; hence it is treated like an application code. There being code based makes it possible for infrastructure and servers to be quickly deployed, using fixed standards, also the latest patches and versions can either be updated or repetitively duplicated. 

11. Configuration Management

The operating system, host configuration, operational tasks etc. are automated with codes by developers and system administrators. As codes are used, configuration changes become standard and repeatable. This relieves developers and system administrators of the burden of configuring the operating system, system applications or server software manually. 

12. Policy as Code

The configuration of infrastructure and infrastructure itself are codified with the cloud. This makes it possible for organizations to dynamically monitor and enforce compliance. It enables the automatic tracking, validation, and reconfiguration of infrastructure. In that way, organizations can easily control changes over resources and security measures are properly and distributively enforced. The fact that resources that do not comply can be flagged automatically for further investigation or automatically restarted to comply, increases the speed level of teams within an organization. 

13. Monitoring and Logging

To gauge the impact that the performance of application and infrastructure have on consumers, organizations monitor metrics and logs. The data and logs generated by applications and infrastructure are captured, categorized and then analyzed by organizations to understand how users are impacted by changes or updates. This makes it easy to detect sources of unexpected problems or changes. It is necessary that there be a constant monitoring, to ensure a steady availability of services and an increment in the speed at which infrastructure is updated. When these data are analyzed in real-time, organizations proficiently monitor their services 

14. Communication and Collaboration

This is the key feature of any Agile and/or DevOps model; as development, test, security, and operations come together and share their responsibilities, team spirit is enhanced and communication skills are facilitated. Chat applications, project tracking systems, and wikis are some of the tools that can be used by not just developers and operations but also other teams like marketing or sales. This brings all parts of the organization closely together as they cooperate to see to the realization of goals and projects.

Monday, August 07, 2017

Labels: Software Development

Would Your Team Work With the Chaos Monkey?

Advances in large-scale, distributed software systems are changing the game for software engineering. As an industry, we are quick to adopt practices that increase flexibility of development and velocity of deployment. An urgent question follows on the heels of these benefits: How much confidence we can have in the complex systems that we put into production?

Chaos Engineering

We need to identify weaknesses before they manifest in system-wide, aberrant behaviors. Systemic weaknesses could take the form of: improper fallback settings when a service is unavailable; retry storms from improperly tuned timeouts; outages when a downstream dependency receives too much traffic; cascading failures when a single point of failure crashes; etc. We must address the most significant weaknesses proactively, before they affect our customers in production. We need a way to manage the chaos inherent in these systems, take advantage of increasing flexibility and velocity, and have confidence in our production deployments despite the complexity that they represent.

Chaos Monkey

Chaos Engineering was the philosophy when Netflix built Chaos Monkey, a tool that randomly disables Amazon Web Services (AWS) production instances to make sure you can survive this common type of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while you continue serving your customers without interruption.

By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, you can still learn the lessons about the weaknesses of your system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, you won’t even notice.

Chaos Monkey has a configurable schedule that allows simulated failures to occur at times when they can be closely monitored. In this way, it’s possible to prepare for major unexpected errors rather than just waiting for catastrophe to strike and seeing how well you can manage.

Chaos Monkey was the original member of Netflix’s Simian Army, a collection of software tools designed to test the AWS infrastructure. The software is open source (GitHub) to allow other cloud services users to adapt it for their use. Other Simian Army members have been added to create failures and check for abnormal conditions, configurations and security issues.

An Agile and DevOps engineering culture doesn’t have a mechanism to force engineers to architect their code in any specific way. Instead, you can build strong alignment around resiliency by taking the pain of disappearing servers and bringing that pain forward.

Most people think this is a crazy idea, but you can’t depend on the infrequent occurrence of outages to impact behavior. Knowing that this will happen on a frequent basis creates strong alignment among engineers to build in the redundancy and automation to survive this type of incident without any impact on your customers.

Would your team be willing to implement their own Chaos Monkey?

Saturday, August 05, 2017

Labels: Project Portfolio Management

Do we need Service Portfolio Management?

I am not a big fan of ITIL. DevOps and ITIL are basically both IT Service Management (ITSM). The patterns are the same though the execution is different. Service management is not one theory among many. Service management models reality and ITSM is the description of IT reality: of all the activities that happen in the real world.

Everybody manages change; everybody releases into the world; everybody manages capacity and availability and continuity; everybody manages incidents and requests; everybody defines strategy, plans, designs, and builds services; everybody does improvement. The levels of maturity might very but these practices are present in every organization.   DevOps is one codification of that reality and ITIL is another. I am a DevOps guy.

Nevertheless there is a concept in ITIL that I like very much. It is called service portfolio management. A company’s service portfolio can be defined as a group of services that are listed in terms of their value for business. This portfolio can include everything from in-house services to outsourced ones. If we compare project and service portfolio management, they have obviously many things in common. First of all they both share the same technique and same goal of general portfolio management: Making right decisions about investments, which is not that different from classical financial portfolio management. The main goal of service portfolio management is to maximize the realization of value to the business and meanwhile balance the risks with the costs.

Too many organizations take a portfolio view of only programs and projects, whilst neglecting the operational systems. This is one order removed from a truly holistic view. Project portfolio management only looks at change, not the current state. Service portfolio management looks at the services in production as well as the proposed changes to service. It looks at the distribution of resources across both Build and Run, not just Build. It considers the impact on the existing services when we try to introduce new ones. Failure to do this is why so many IT departments are drowning in projects and have ever increasing operational costs.

Failure to manage across the whole portfolio of current and planned services - focusing instead on only the portfolio of change - means that there will be a continuous deluge of CapEx expenditure on new services with zero consideration of Run's ability to absorb them (and often zero budget to do so too). Notice that the differentiation between Build and Run is also valid for DevOps teams. When they need most of their time to Run things, there is no time to Build things. On the other hand, when they are expected to Build continuously, there is no time to Run things. We hollow out our capacity to deliver the services because we are so busy building and changing them.

Balancing priorities and resources across the portfolio of change, of projects, of programs, is not enough. We must balance across all activity, across Build and Run together. ITIL Service Strategy tells us exactly that by defining the concept of service portfolio management. IT management is a constant balancing act of Build and Run.

If you look at project portfolio management historically, it is clearly not about services, it’s about products. At the end of a project there is a result, after that is delivered, the project ends. There is a fundamental difference in the philosophies of project and service portfolio management. With our customers, we should talk about services. A service is a means of delivering value to customers by facilitating the outcome customers want to achieve, without the ownership of specific costs and risks. This is what your customers want.

When it comes to successful service portfolio management, Product Managers and Product Owners play an important role as they are expected to manage the services and products throughout their lifecycle. Project Managers manage only the project.

A typical service portfolio contains three subsets.

1. The service pipeline includes the list of services that are currently under development or consideration for a specific market or demographic. This is your project portfolio. The service pipeline can also include goals, priorities as well as short term and long term goals of your business.  

2. The service catalog is the section of the portfolio that is visible to your customers and provides an insight into the services and products your business delivers.  

3. The last subset of the portfolio is the retired services that include products and services that are soon to be withdrawn or have already phased out.  

Alex Lichtenberger created a nice overview of these subsets and the relation with project portfolio management with the diagram below.