Monday, July 22, 2024

The Most Important Role on Any Large Transformation Project

Change Management and Your CAST Of Characters

The most important role on a large transformation project is the project sponsor. 

Not the project manager. 

According to the Project Management Institute (PMI)'s 2018 Pulse of the Profession In-Depth Report, "1 in 4 organisations (26%) report that the primary cause of failed projects is inadequate sponsor support". 

By contrast, "organisations with a higher percentage of projects that include actively engaged executive sponsors, report 40% more successful projects than those with a lower percentage of projects with actively engaged sponsors".

And according to the 2015 Annual Review of Projects of the UKs National Audit Office “the effectiveness of the project sponsor is the best single predictor of project success or failure”. 

Project sponsors on large and complex multi million dollar transformation projects are often senior executives and most are not trained in any way to be successful in their executive sponsor role. 

Nor do they take the time that is needed to execute this role.

Often the same is the case for the project steering committee members.

Guess what happens with these projects?

If you are in need for a training for executive sponsors and steering committee members have a look at my offering;

(Non)-Executive Crash Course - How to navigate large and complex transformation projects

Read more…

Tuesday, July 02, 2024

Case Study 18: How Excel Errors and Risk Oversights Cost JP Morgan $6 Billion

Case Study 17: The Disastrous Launch of Healthcare.gov

In the spring of 2012, JP Morgan Chase & Co. faced one of the most significant financial debacles in recent history, known as the "London Whale" incident. The debacle resulted in losses amounting to approximately $6 billion, fundamentally shaking the confidence in the bank's risk management practices. 

At the core of this catastrophe was the failure of the Synthetic Credit Portfolio Value at Risk (VaR) Model, a sophisticated financial tool intended to manage the risk associated with the bank's trading strategies. 

The failure of the VaR model not only had severe financial repercussions but also led to intense scrutiny from regulators and the public. It highlighted the vulnerabilities within JP Morgan's risk management framework and underscored the potential dangers of relying heavily on quantitative models without adequate oversight. 

This case study explores the intricacies of what went wrong and how such failures can be prevented in the future. By analyzing this incident, I seek to understand the systemic issues that contributed to the failure and to identify strategies that can mitigate similar risks in other financial institutions. The insights gleaned from this case are not just relevant to JP Morgan but to the broader financial industry, which increasingly depends on complex models to manage risk.

Background

The Synthetic Credit Portfolio (SCP) at JP Morgan was a part of the bank's Chief Investment Office (CIO), which managed the company's excess deposits through various investments, including credit derivatives. The SCP was specifically designed to hedge against credit risk by trading credit default swaps and other credit derivatives. The portfolio aimed to offset potential losses from the bank's other exposures, thereby stabilizing overall performance.

In 2011, JP Morgan developed the Synthetic Credit VaR Model to quantify and manage the risk associated with the SCP. The model was intended to provide a comprehensive measure of the potential losses the bank could face under various market conditions. This would enable the bank to make informed decisions about its trading strategies and risk exposures. The VaR model was implemented using a series of Excel spreadsheets, which were manually updated and managed.

Despite the sophistication of the model, its development was plagued by several critical issues. The model's architect lacked prior experience in developing VaR models, and the resources allocated to the project were inadequate. This led to a reliance on manual processes, increasing the risk of errors and inaccuracies. Furthermore, the model's implementation and monitoring were insufficiently rigorous, contributing to the eventual failure that led to massive financial losses.

The primary objective of JP Morgan's Synthetic Credit VaR Model was to provide an accurate and reliable measure of the risk associated with the bank's credit derivatives portfolio. This would enable the bank to manage its risk exposures effectively, ensuring that its trading strategies remained within acceptable limits. The model aimed to capture the potential losses under various market conditions, allowing the bank to make informed decisions about its investments.

In addition to the primary objective, the Synthetic Credit VaR Model was expected to provide a foundation for further advancements in the bank's risk management practices. By leveraging the insights gained from the model, JP Morgan hoped to develop more sophisticated tools and techniques for managing risk. This would enable the bank to stay ahead of emerging threats and maintain a competitive edge in the financial industry.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Timeline of Events

Early 2011: Development of the Synthetic Credit VaR Model begins. The project is led by an individual with limited experience in developing VaR models. The model is built using Excel spreadsheets, which are manually updated and managed.

September 2011: The Synthetic Credit VaR Model is completed and implemented within the CIO. The model is intended to provide a comprehensive measure of the potential losses the bank could face under various market conditions.

January 2012: Increased trading activity in the SCP causes the CIO to exceed its stress loss risk limits. This breach continues for seven weeks. The bank informs the OCC of the ongoing breach, but no additional details are provided, and the matter is dropped.

March 23, 2012: Ina Drew, head of the CIO, orders a halt to SCP trading due to mounting concerns about the portfolio's risk exposure.

April 6, 2012: Bloomberg and the Wall Street Journal publish reports on the London Whale, revealing massive positions in credit derivatives held by Bruno Iksil and his team.

April 9, 2012: Thomas Curry becomes the 30th Comptroller of the Currency. Instead of planning for the upcoming 150th anniversary of the Office of the Comptroller of the Currency (OCC), Mr. Curry is confronted with the outbreak of news reports about the London Whale incident.

April 16, 2012: JP Morgan provides regulators with a presentation on SCP. The presentation states that the objective of the "Core Credit Book" since its inception in 2007 was to protect against a significant downturn in credit. However, internal reports indicate growing losses in the SCP.

May 4, 2012: JP Morgan reports SCP losses of $1.6 billion for the second quarter. The losses continue to grow rapidly even though active trading has stopped.

December 31, 2012: Total SCP losses reach $6.2 billion, marking one of the most significant financial debacles in the bank's history.

January 2013: The OCC issues a Cease and Desist Order against JP Morgan, directing the bank to correct deficiencies in its derivatives trading activity. The Federal Reserve issues a related Cease and Desist Order against JP Morgan's holding company.

September - October 2013: JP Morgan settles with regulators, paying $1.020 billion in penalties. The OCC levies a $300 million fine for inadequate oversight and governance, insufficient risk management processes, and other deficiencies.

What Went Wrong?

Model Development and Implementation Failures

The development of JP Morgan's Synthetic Credit VaR Model was marred by several critical issues. The model was built using Excel spreadsheets, which involved manual data entry and copying and pasting of data. This approach introduced significant potential for errors and inaccuracies. As noted in JP Morgan's internal report, "the spreadsheets ‘had to be completed manually, by a process of copying and pasting data from one spreadsheet to another’". This manual process was inherently risky, as even a minor error in data entry or formula could lead to significant discrepancies in the model's output.

Furthermore, the individual responsible for developing the model lacked prior experience in creating VaR models. This lack of expertise, combined with inadequate resources, resulted in a model that was not robust enough to handle the complexities of the bank's trading strategies. The internal report highlighted this issue: "The individual who was responsible for the model’s development had not previously developed or implemented a VaR model, and was also not provided sufficient support". This lack of support and expertise significantly compromised the quality and reliability of the model.

Insufficient Testing and Monitoring

The Model Review Group (MRG) did not conduct thorough testing of the new model. They relied on limited back-testing and did not compare results with the existing model. This lack of rigorous testing meant that potential issues and discrepancies were not identified and addressed before the model was implemented. The internal report criticized this approach: "The Model Review Group’s review of the new model was not as rigorous as it should have been". Without comprehensive testing, the model was not validated adequately, leading to unreliable risk assessments.

Moreover, the monitoring and oversight of the model's implementation were insufficient. The CIO risk management team played a passive role in the model's development, approval, implementation, and monitoring. They viewed themselves more as consumers of the model rather than as responsible for its development and operation. This passive approach resulted in inadequate quality control and frequent formula and code changes in the spreadsheets. The internal report noted, "Data were uploaded manually without sufficient quality control. Spreadsheet-based calculations were conducted with insufficient controls and frequent formula and code changes were made". This lack of oversight and quality control further compromised the reliability of the model.

Regulatory Oversight Failures

Regulatory oversight was inadequate throughout the development and implementation of the Synthetic Credit VaR Model. The OCC, JP Morgan's primary regulator, did not request critical performance data and failed to act on risk limit breaches. As highlighted in the Journal of Financial Crises, "JPM did not provide the OCC with required monthly reports... yet the OCC did not request the missing data". This lack of proactive oversight allowed significant issues to go unnoticed and unaddressed.

Additionally, the OCC was informed of risk limit breaches but did not investigate the causes or implications of these breaches. For instance, the OCC was contemporaneously notified in January 2012 that the CIO exceeded its Value at Risk (VaR) limit and the higher bank-wide VaR limit for four consecutive days. However, the OCC did not investigate why the breach happened or inquire why a new model would cause such a large reduction in VaR. This failure to follow up on critical risk indicators exemplified the shortcomings in regulatory oversight.

How JP Morgan Could Have Done Things Differently?

Improved Model Development Processes

One of the primary ways JP Morgan could have avoided the failure of the Synthetic Credit VaR Model was by improving the model development processes. Implementing automated systems for data management could have significantly reduced the risk of human error and improved accuracy. Manual data entry and copying and pasting of data in Excel spreadsheets were inherently risky practices. By automating these processes, the bank could have ensured more reliable and consistent data management.

Moreover, allocating experienced personnel and adequate resources for model development and testing would have ensured more robust results. The individual responsible for developing the model lacked prior experience in VaR models, and the resources allocated to the project were inadequate. By involving experts in the field and providing sufficient support, the bank could have developed a more sophisticated and reliable model. As highlighted in the internal report, "Inadequate resources were dedicated to the development of the model".

Conducting extensive back-testing and validation against existing models could have identified potential discrepancies and flaws. The Model Review Group did not conduct thorough testing of the new model, relying on limited back-testing. By implementing a more rigorous testing process, the bank could have validated the model's accuracy and reliability before its implementation.

Enhanced Oversight and Governance

Enhanced oversight and governance could have prevented the failure of the Synthetic Credit VaR Model. Ensuring regular, detailed reporting to regulators and internal oversight bodies would have maintained transparency and accountability. JP Morgan failed to provide the OCC with required monthly reports, and the OCC did not request the missing data. By establishing regular reporting protocols and ensuring compliance, the bank could have maintained better oversight of the model's performance.

Addressing risk limit breaches promptly and thoroughly would have mitigated escalating risks. The OCC was informed of risk limit breaches but did not investigate the causes or implications of these breaches. By taking immediate action to address and rectify risk limit breaches, the bank could have prevented further escalation of risks. Proactive risk management is crucial in identifying and mitigating potential issues before they lead to significant losses.

Implementing continuous monitoring and review processes for all models and strategies could have identified issues before they led to significant losses. The CIO risk management team played a passive role in the model's development, approval, implementation, and monitoring. By adopting a more proactive approach to monitoring and reviewing the model, the bank could have ensured that potential issues were identified and addressed promptly. Continuous monitoring and review processes are essential in maintaining the accuracy and reliability of risk management models.

Comprehensive Risk Management Framework

Developing a comprehensive risk management framework could have further strengthened JP Morgan's ability to manage risks effectively. This framework should have included clear policies and procedures for model development, implementation, and monitoring. By establishing a robust risk management framework, the bank could have ensured that all aspects of the model's lifecycle were adequately managed.

Additionally, enhancing collaboration and communication between different teams involved in risk management could have improved the model's reliability. The CIO risk management team viewed themselves more as consumers of the model rather than as responsible for its development and operation. By fostering collaboration and communication between different teams, the bank could have ensured that all stakeholders were actively involved in the model's development and monitoring.

Closing Thoughts

The failure of JP Morgan's Synthetic Credit VaR Model underscores the critical importance of rigorous development, testing, and oversight in financial risk management. This incident serves as a cautionary tale for financial institutions relying on complex models and emphasizes the need for robust governance and proactive risk management strategies. By learning from this failure, financial institutions can develop more reliable and effective risk management frameworks.

The insights gleaned from this case study are not just relevant to JP Morgan but to the broader financial industry, which increasingly depends on complex models to manage risk. By addressing the systemic issues that contributed to the failure and implementing the strategies outlined in this case study, financial institutions can mitigate similar risks in the future.

In conclusion, the London Whale incident highlights the vulnerabilities within JP Morgan's risk management framework and underscores the potential dangers of relying heavily on quantitative models without adequate oversight. By enhancing model development processes, improving oversight and governance, and developing a comprehensive risk management framework, financial institutions can ensure more reliable and effective risk management practices.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Sources

1) Internal Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses, January 16, 2013

2) A whale in shallow waters: JPMorgan Chase, the “London Whale” and the organisational catastrophe of 2012, François Valérian, November 2017

3) JPMorgan Chase London Whale E: Supervisory Oversight, Arwin G. Zeissler and Andrew Metrick, Journal of Financial Crises, 2019

4) JPMorgan Chase London Whale C: Risk Limits, Metrics, and Models, Arwin G. Zeissler and Andrew Metrick, Journal of Financial Crises, 2019

5) JPMorgan Chase Whale Trades: A Case History of Derivatives Risks and Abuses, Permanent Subcommittee on Investigations United States Senate, 2013

Read more…

Monday, July 01, 2024

Boards Must Understand Technology. Period.

Boards Must Understand Technology. Period.

Reflecting on the 2024 Swiss Board Day in Bern it has become even more clear to me that understanding the current technological landscape and its associated opportunities, challenges, and risks is now essential for both executive and non-executive board members. 

Equally important is staying informed about governance issues related to these technologies, including regulatory challenges and potential pitfalls. 

There is now way around it anymore, in order to set the company's vision and strategy, the board must understand how technology impacts the business and its future value creation.

Consider the narratives surrounding artificial reality (AI). While ChatGPT brought large language models into the spotlight, various AI applications like face ID, image recognition, customer service chatbots, and expert systems for tasks such as self-driving cars and chess have been in use for decades. 

Despite media focus on the risks of AI, such as deep fakes and cyber threats, there are significant defensive benefits, including enhancing cybersecurity and verification processes. Boards need to understand AI’s role within their organizations, lead the way in defining “responsible AI,” and ensure issues like privacy, bias, and equity are addressed in AI development and deployment.

Clients, regulators, and markets now expect rapid and effective integration of new business drivers into strategies. Building trust around new technologies with internal and external stakeholders is crucial. 

Cybersecurity, augmented reality (AR), robotics, and AI are just a few examples where companies must identify, measure, disclose, and adapt to strategic opportunities and risks. Not all technology is relevant for your company, but the ones that are should be evaluated in detail.

How can a board effectively oversee the long-term growth and evolution of their company amidst ongoing new opportunities and challenges, especially if they lack specific knowledge on existing and emerging technologies and its risks? 

Boards should start with leveraging internal company resources. You should seek out knowledge by visiting your company's offices, attend small group sessions, do production tours, and join town halls to witness new developments firsthand and understand their strategic alignment.

Dedicated training and workshops with relevant experts can help you grasp the business implications of key technologies within their industry. Your trainer(s) should have experience implementing technology in your industry. 

What is even more important is that your trainer(s) can explain technology in a way that non-technical people can understand and are able to apply their newly won knowledge onto their business.

The aim isn’t to create a board of tech experts but to shift mindsets, open new possibilities, evaluate risks, and enhance the board’s ability to challenge management in business development.

In a nutshell: In order to set the company's vision and strategy, the board must understand how technology impacts the business and its future value creation.

If your board is in need for such a training or workshop have a look at my offerings;

> (Non)-Executive Crash Course - Technology Trends Shaping Our Future

> (Non)-Executive Workshop - Technology Vision Definition

Read more…

Friday, June 14, 2024

How To Select a Good Project Manager for Your Large and Complex Transformation Project

How To Select a Good Project Manager for Your Large and Complex Transformation Project

One of your most important jobs as a project sponsor is to select a good project manager for your project. 

Selecting the right project manager is crucial for the success of your project. 

Here are the five key factors to consider when choosing the right person for the role:

1) Experience

Nothing beats relevant experience when it comes to managing large and complex transformation projects. On smaller and less complex projects you can give people a chance. On your business critical projects you should not. 

You will need to be looking for project managers that have managed projects that were;

> in the same industry. Bonus points when it was in your own company or a direct competitor.

> having a similar objective and scope. After a full cycle SAP implementation at three different companies you understand a thing or two. Unless it were completely different modules and products. 

> having a similar size and complexity. Rolling out a new software in one country is different from doing it in twelve. Having hundreds of products, thousands of clients

Your project manager should have a track record. Check references and past project outcomes. 

A project gone belly up is not necessarily the fault of the project manager, but you will need to look for successful project completions and satisfied clients or employers.

2) Leadership and Communication Skills

A good project manager should be able to lead a team, make decisions, and motivate team members. Effective communication is critical for ensuring that all stakeholders are on the same page. You will get a feeling for this during your interviews. But the easiest way to check this is by checking references and calling your own contacts that might have worked with them.

3) Problem Understanding and Solving Skills

They should be able to analyse and understand problems and come up with effective solutions quickly. Understanding your problem is half the solution.  You can assess this by presenting a number of the problems you want to address with your project to the project manager in an interview and ask them to come up with a solution on the fly. 

4) Team Dynamics

They should be able to work well with you and your existing team. Ensure the project manager’s work style and values align with your team and company’s culture. Micromanagement sucks for everybody. Involve key team members in the interview process to get their input on potential candidates.

5) Gut Feeling

If your intuition about a candidate’s fit is good, but one or more of the 4 factors above seems to be not good, then look for a better candidate. Don't rely only on your intuition in this case.

If your intuition about a candidate’s fit is bad, but all of the 4 factors above seem to be good, then look for a better candidate. Trust your intuition in this case.

If your candidate scores well on these five factors there is a high probability they are the right candidate for the job!

PS: What is absolutely not important are certifications. Possessing the PMP shouts to the world that they have passed a comprehensive exam and confirmed that they are aware of and understand the processes, terms, tools, and techniques as represented in the PMI's Guide to the Project Management Body of Knowledge. Thats it! The same for Prince 2, SAFe, IPMA, and others. 

Passing these exams does not confirm that they are an accomplished project manager with a long history of leading successful projects. To claim or even imply that earning such a certification is any more than an indicator of general knowledge in the field is questionable.

In a nutshell: Nothing beats relevant experience when it comes to managing large and complex transformation projects. On smaller and less complex projects you can give people a chance. On your business critical projects you should not.

If you are a senior (non)-executive in the role of a project sponsor or steering committee member in a large and complex transformation project, and you are confronted with topics like the above have a look at this training;

(Non)-Executive Crash Course - How to navigate large and complex transformation projects.

I will teach you the most relevant things you need to know in half a day.

Read more…

Sunday, March 19, 2023

Case Study 17: The Disastrous Launch of Healthcare.gov

Case Study 17: The Disastrous Launch of Healthcare.gov

Barack Obama was inaugurated on January 20, 2009, after defeating his opponent John McCain by 365 electoral college votes to 175. One of Obama's primary campaign issues was fixing America's healthcare system by providing affordable options to the 43.8 million uninsured Americans. 

In 2010, the year Obama signed the Affordable Care Act (ACA), the United States spent 17.6% of its GDP on health care, nearly double the OECD average of 9.5%, with the next closest developed nation, the Netherlands, spending 12%.

The 44th president was successful in introducing the ACA; however, the launch of the website that would connect Americans to the marketplace, Healthcare.gov, was a failure. While the platform would eventually enroll an estimated 10 million uninsured Americans in 2014, the rollout was a complete disaster that exposed the challenges the United States government faces in implementing technology.

According to a 2008 report by the Government Accountability Office (GAO), 48% of federal IT projects had to be restructured because of cost overages or changes in project goals. In addition, over half had to be restarted two or more times.

On the first day Heathcare.gov was launched, four million unique users visited the portal, but only six successfully registered. Over the next few days, the site experienced eight million visitors, but according to estimates, around 1% enrolled in a new healthcare plan. Even the users that did sign up experienced errors, including duplicates in enrollment applications submitted to insurers.

The trouble launching Healthcare.gov presents a seemingly reoccurring problem when the US government tech projects. Standish Group International Chairman Jim Johnson is on record praising the rollout based on the government's history of software failing by default. "Anyone who has written a line of code or built a system from the ground up cannot be surprised or even mildly concerned that Healthcare.gov did not work out of the gate. The real news would have been if it actually did work. The very fact that most of it did work at all is a success in itself."

However, there's far more to the failed launch of the federally facilitated marketplace (FFM). The agency responsible for the project, the Centers for Medicare and Medicaid Services (CMS), didn't follow many regulations in place to ensure transparency, proper oversight, and accountability. So was the project destined to fail from the start due to overwhelming layers of bureaucracy, or were the vendors tasked with developing the online marketplace to blame?

In this case study, we'll examine why Healthcare.gov failed to meet expectations and point out what CMS could have done differently to deliver a functioning FFM.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Timeline of Events

2010

On March 23, 2010, President Barack Obama signed ACA, also known as Obamacare, into law. The legislation was the most comprehensive reform of the US medical system in 50 years and is still in place today.

Under the ACA, US citizens were required to have health insurance or pay a monthly fee. The law also required the establishment of online marketplaces that would allow individuals to compare and select health insurance policies by January 1, 2014. States could set up their own marketplace or use the FFM.

Each marketplace created under the ACA was intended to provide a seamless, single point of access for individuals to enroll in qualified healthcare plans and access income-based financial subsidies created under the new law.

Users were required to visit Healthcare.gov, register, verify their identity, determine their eligibility for subsidies, and enroll in a plan. The process appears straightforward; President Obama even touted the marketplace weeks before its launch by saying, "Now, this is real simple. It's a website where you can compare and purchase affordable health insurance plans, side by side, the same way you shop for a plane ticket on kayak… the same way you shop for a TV on Amazon."

However, building an identity verification platform on such a large scale alone is exceptionally challenging. The marketplace also required integration from databases in other government agencies. Once the user successfully was verified as an American citizen, income was determined, and they were filtered through state and federal government programs like Medicaid or the State Children's Health Insurance Program, then they would be matched with private health insurance plans.

The process was not simple and was far more complex than online shopping because it required integration with identification verification software, other government databases, and health insurance providers.

From day one, the project was underestimated. In addition, the requirements in the ACA that all citizens must enroll by January 1, 2014 or would be required to pay a fine created a hard deadline with economic and political consequences.

March 2010 - September 2011

Over a year passed between the ACA becoming a law and the CMS signing contracts with vendors who would build the FFM. During this period, problems directly affecting the launch were already beginning.

While the CMS was in charge of oversight of the project and hiring contractors, leadership was fractured across multiple government agencies. The project was headed by the CMS's Deputy CIO, Henry Chao, but the committee also included:

Todd Park – White House CTO

Jeanne Lambrew – Executive Office of Health Reform

Kathleen Sebelius and Bryan Sivak – Department of Health and Human Services

Members of the committee outside of the CMS executed a great deal of power and influence over the project; however, no one at the various agencies had visibility of the critical milestones that each group needed to reach to complete the project successfully.

The CMS awarded 60 contracts to 33 different vendors. The largest was granted to Conseillers en Gestion et Informatique (CGI), a Montreal-based IT company that employed more than 70,000 people. CGI grew to be worth more than $11 billion by 2013 by acquiring other companies, some of which handled US government contracts such as the 2004 purchase of American management Systems and the 2010 purchase of Stanley.

CGI’s contract consisted of developing the FFM and was valued at over $90 million. CGI was responsible for the most significant, user-facing aspect of the project but was not officially assigned a lead integrator role. CMS would later report they perceived CGI to be the project's lead integrator but didn't have written documentation outlining the agreement.

Representatives from CGI stated they did not have the same understanding at this point of the project.

US federal agencies are required to perform a procurement framework when awarding private companies with government contracts outlined by the Federal Acquisition Regulation (FAR). However, CMS failed to satisfy specific aspects of FAR, including:

> Preparing a written procurement strategy documenting the factors, assumptions, risks, and tradeoffs that would guide project decisions.

> Conduct thorough past performance reviews of potential contractors.

> Only used the federal government's contractor database (PPIRS) when evaluating bids on four of the six key contracts.

CMS leadership later claimed they were unaware that a procurement strategy was required. 

December 2011 – Summer 2013

Development of the Healthcare.gov project began in December 2011 consisting of four major components:

> Front-end website for the FFM (the marketplace)

Back-end data services hub

Enterprise identity management sub-system

Hosting infrastructure

CGI was responsible for the front-end website. The UI was developed with standard web tools, including Bootstrap, CSS, jQuery, Jekyll, Prose.io, and Ruby; however, it would later be revealed that common optimization features to aggregate and minify CSS and JS files were not used.

The back-end data services hub was developed by CGI and Quality Software Services, Inc (QSSI) using Java, JBoss, and the NoSQL MarkLogic database. The hub was responsible for orchestrating data and services from multiple external sources such as agent brokers, insurers, CMS, DHS, Experian, the IRS, state insurance exchanges, and the US Social Security Administration (SSA). While the integration was incredibly complex, utilizing data from multiple sources, the developers chose machine-generated middleware objects to save time.

Enterprise Identity Management (EIDM) was handled by QSSI but depended on the back-end to retrieve data from multiple sources. Before the launch, the EIDM was tested with an expected load of only 2,000 concurrent users.

The final major system component was the hardware infrastructure hosting of the website, FFM, data services hub, and EIDM. Akamai's CDN hosted Healthcare.gov's UI. The original back-end (it would be replaced after the failed launch) consisted of 48 VMWare virtual machine nodes running Red Hat Enterprise Linux and hosted on twelve physical servers in a Terremark data center. Some of the servers ran vSphere v4.1, with others running v5.1; the network was also running at 1 Gb/sec, far below its capacity of 4 Gb/sec.

Failures on all critical components of the project exemplify that CMS didn't have the personnel or experience to handle an IT project of this magnitude.

Late 2013

With only a couple of months left before the scheduled launch, CMS raised its concerns about CGI's performance but didn't take steps to hold the contractor accountable. In September 2013, CMS moved into CGI's offices for on-site support.

CMS delayed governance reviews that would have exposed the issues and did not receive the required approvals to move forward. However, they decided to continue with an on-schedule launch with all the platform's features, available to every American citizen needing affordable healthcare. The estimated project cost at this point was $500 million.

On October 1, 2013, the Healthcare.gov website was online. Most visitors experienced crashes, delays, errors, and slow performance throughout the week. By the weekend, the decision was made to take the site down because it was practically unusable.

Later in the month, HHS announced the following changes to the project:

Project management was centralized and led by Jeffrey Zients, former OMB director who had a reputation within the Whitehouse for solving tough problems and managing teams.

Todd Park, White House CTO, reorganized the technology leadership team, demoted some underperforming CMS employees and 3rd party contractors, and recruited top talent from Silicon Valley for a government sabbatical to save the site.

A Tiger team was formed with the narrow mandate of getting the FFM working properly.

The new team scrummed daily, triaged existential risks, and prioritized defects based on urgency. Over the next six weeks, the Tiger team resolved 400 system defects, increased system concurrency to 25,000 users, and improved the site's responsiveness to one second. The site went back online, and enrollment jumped from 26,000 in October to 975,000 in December.  

By Christmas, most problems had been fixed, but the site was still not fully operational.

2014

CGI was replaced by Accenture as the lead contractor and awarded a $90 billion contract to replace the FFM.

The individual mandate requirement was pushed back to March 31, 2014, giving uninsured Americans more time to sign up without being penalized.  

In July, the GOA released a detailed report outlining the critical failures of the project.

According to the GOA's findings, FFM obligations increased from $56 million to more than $209 million. Similarly, data hub obligations increased from $30 million to nearly $85 million from September 2011 to February 2014. The study recommended that "CMS take immediate actions to assess increasing contract costs and ensure that acquisition strategies are completed, and oversight tools are used as required, among other actions. CMS concurred with four recommendations and partially concurred with one. "

In August, the Office of Inspector General released a report finding that the total cost of the Healthcare.gov website had reached $1.7 billion. A month later, Bloomberg News reported the cost exceeded $2 billion.

By November, open enrollment on Healthcare.gov began for 2015.

What Went Wrong?

A multitude of issues caused the failed launch of the Healthcare.gov website. While it is a clear example of the federal government's continuous struggle to implement functioning, secure software, the problems go beyond Washington's bureaucracy. Nevertheless, much can be learned about releasing digital solutions and managing large-scale projects with many moving parts in general.

Overconfidence

The project started with overconfidence and unrealistic expectations set by the White House. Obama's campaign staff had a reputation for being technologically savvy because they pioneered using social media and data mining in the 2008 presidential election. However, running a social media campaign and releasing a single point of contact that pulls from multiple government agencies and insurance companies aren't comparable.

Underestimated Scale of the Project

Due to overconfidence and unrealistic expectations, the project scale was drastically underestimated, resulting in mismanagement in organizational structure, leadership, accountability, and transparency.

As the deadline approached, the project scope grew, while CMS identified 45 critical and 324 severe code defects across FFM modules.

Politics

Launching large-scale software projects are extremely challenging but adding a hostile political climate made the rollout even more difficult. Not only did CMS not have the personnel or experience to handle an FFM, but they also experienced pressure and influence from outside the agency. One of the most significant examples came in August of 2013, the White House and executive Office of Health Reform insisted on requiring site user registration before shopping for insurance so that concrete user numbers could be shown as proof of the system's success.

Members of the project committee that weren't from the CMS exhibited a great deal of influence over critical decision-making while not having access to accurate progress reports. In addition, the 2012 presidential elections likely impacted delays. Polarizing decisions, such as final rules on private insurance premiums, coverage availability, risk pools, and catastrophic plans, were put off till after the election cycle. These rules had to be translated into software and tested before a successful rollout was possible.

Lack of technology understanding and experience at CMS

The CMS was not prepared to handle a technology project at this scale. Other government agencies, such as the DOD and NASA, had decades of experience navigating the institutional challenges required to develop, deliver, and operate reliable IT systems.

Throughout procurement, development, and launch, CMS made it clear that its personnel didn't understand the requirements necessary to oversee a large-scale technology project, let alone one that had additional regulatory hurdles set by government agencies.

Poor Project Management

The project committee was spread across various government agencies, including the CMS, the White House, the Office of Health Reform, and the Department of Health and Human Services. As a result, there wasn't an organizational structure standard on even small-scale projects. Fractured leadership contributed to the lack of project management, but CMS was primarily responsible. While the problem could have been lessened if CMS had followed the guidelines from FAR, the operators from the agency pleaded ignorance in the GAO report, strengthening the case that CMS didn't have personnel suitable for the project.  

Failed to Postpone Launch

All the problems accumulated, resulting in a failed launch. Typically, the release is delayed when a website or software isn't ready. The engineers perform more testing, fix the problems, and launch at a later time. Healthcare.gov was a unique project with consequences transcending a poor user experience. The time constraints pressured CMS to go forward with the launch rather than being transparent and communicating that the infrastructure couldn't handle millions of users.

Leading up to the launch, CMS was given more business rules and a broader scope. While they had no control over the pressure coming from the Whitehouse, the agency failed to be upfront, and instead of delaying the launch or releasing in stages, they scrambled to save the project.

How CMS Could Have Done Things Differently

CMS was in charge of the project, but the blame falls on the Obama administration. Appointing CMS to lead the project was the first mistake made in a number of shortcomings that led to billions of wasted tax dollars and US citizens being delayed health care coverage.

While appointing a different agency, one that had experience delivering technology projects, was the first significant error, analyzing the oversights made by CMS leading up to the launch is the most practical way to assess what could have been done differently.  

Preparation

An understanding of the scale of Healthcare.gov would have dramatically influenced CMS to prepare better and implement project management best practices. One way the leadership at CMS could have prevented a launch failure was to realize they couldn't handle heading the project. Assigning lead manager and integrator roles to an outside firm or the lead vendor would have been more sensible.

Procurement

Many institutional challenges out of the hands of CMS contributed to the failed launch of Healthcare.gov. However, the agency had complete control over the procurement of government contractors. Simply following the Federal Acquisition Regulation (FAR) procurement framework could have prevented many organizational problems that led to the failure.

Had CMS adhered to standard government agency procurement guidelines, issues like the confusion around who was lead integrator wouldn't have existed. Furthermore, decisions like depending on data provided by Experian, a data source that neither the government nor the other contractors could do any data quality work on, would have likely been questioned if not denied when submitted for approval.

Adopt Iterative Software Development Framework

The project would have benefited if an iterative system development philosophy and a lean software manufacturing process such as Scrum or Kanban were adopted. Project managers could have organized sprints driven by the top priorities, including the complete end-to-end testing of the technology solution. An Iterative Development Framework would have increased visibility across multiple contractors and federal agencies, improved development quality, and reduced time to market.

Strong Leadership

Leadership was a fundamental problem that affected every aspect of the Healthcare.gov launch. Distribution of authority, management, and accountability created an environment where a functioning FFM was impossible to deliver on time. The project steering committee should have elected one project executive to make final decisions, hold contractors accountable, and communicate with other government agencies.

See "Consensus Is the Absence of Leadership" for  for more insights on this topic.

Set and Guard Technical Standards

A rushed, unqualified, and fractured committee led to the development of poor technology. As a result, CGI and other contractors experience zero to very little oversight allowing for an unfinished UI, partially operating back-end, and unstable hosting. 

The developers from CGI delivered an unpolished UI with excessive typos, a bloated directory, sloppy code, and even Lorem Ipsum placeholder text on the web pages. In addition, best practices were not followed or ignored regarding file compression, causing the website to take eight seconds to load and 71 seconds for user account registration pages with client-side loading, according to a report by AppDynamics. 

A basic small business website delivered at this standard would be unacceptable, let alone the focal point of one of the most transformative pieces of legislation in recent US history.

CGI should have been held to higher standards and required to deliver a polished, functioning UI.

While the front end was a disaster, fixing HTML, CSS, and JS and optimizing webpages can be done overnight. Healthcare.gov's back-end was a different story requiring systemic changes to function. More oversight was needed on the database and server-side development, but it could have only gone smoothly with drastic changes to leadership and organizational structure.

Improve Security

Healthcare.gov requires an abundance of personal data to be submitted and collected. While security wasn't the primary failure of the project, the servers were hacked in July 2014. The malware was uploaded into the system but failed to communicate to any external IP addresses. In addition, multiple security defects were found, including the insecure transmission of personal data, unvalidated password resets, error stack traces revealing internal components, and violations of user data privacy.

Comprehensive security audits must be conducted before the launch rather than on the fly after a site is live.

Implement Testing and Bug Fixing Protocols

Adequate testing could have prevented many of the FFM's problems; however, CMS was well aware of the issues but failed to communicate with HHS and other government agencies. Still, testing protocols and a strategy to manage and fix bugs are necessary for an IT project of this scale.

CMS needed to coordinate unit and component testing by 3rd parties much sooner than a couple of weeks before launch. Testing conducted by teams outside of specific features of the project ensures there aren't biases and that the project works on various devices, browsers, and servers.

Another issue CMS encountered was communication with states implementing their own healthcare marketplaces. The date was pushed from November 2012 to December 2012, and some were confirmed as late as February 2013. Uncertainty of the traffic volume should have led to overpreparation and expanded capacity testing of concurrent users. Instead, a load of only 2,000 simultaneous users has been tested rather than the tens of thousands they should have expected.  

Phased or Staged Rollout

One way the project steering committee could have responded to the issues without pushing back the launch was to release the project in phases or stages. Below are multiple options the committee could have taken instead of the Big Bang launch approach:

> Released certain features that were ready or could have been prepared when prioritized before October 1, 2013.

> Roll out a beta version of the platform to a small number of applicants (by region, government employees, sample group, etc.) months before the deadline.

> Release the platform in strategic phases leading up to the deadline, e.g., encourage applicants to register early in August, request eligibility in September, and shop for plans in October.

> Limit the scope of the project months before the launch and focus on minimal requirements rather than continue expanding leading up to the hard deadline.

See "How Your Rollout in Waves Can End in a Tsunami" for more insights on this topic.

Face Reality

The problems facing the Healthcare.gov launch were apparent, and there's evidence in the GAO report that suggests CMS was well aware of the issues. In addition, a McKinsey report was released just a few months before the scheduled rollout in April 2013. The report highlighted the initiative's complexity and identified more than a dozen critical risks spanning the marketplace technology and project governance. 

The problems we've covered were clearly outlined in the report, as well as multiple definitions of success and concerns with a Big Bang launch approach. The McKinsey report also suggested several actionable methods to mitigate the risks. Still, the project steering committee did not act upon the McKinsey report's findings and recommendations before the system launch.

Whether CMS caved under the pressure of the deadline, the political consequences, or was just incompetent enough to expect a positive outcome is unclear. All the warning signs pointed to an unsuccessful launch and should have been taken seriously, resulting in making the necessary adjustments.

See "It Is Time to Face Your Project's Reality" for more insights on this topic.

Closing Thoughts

The Healthcare.gov project exemplifies just about everything that can go wrong with software integration between the federal government and the private sector. The project's failures are incredibly complicated due to the influence of multiple government agencies and a hostile political climate. When one problem is identified, more come to the surface with increasingly challenging solutions.

But was the project doomed to fail just because of the layers of government bureaucracy? I don't believe so.

Had the committee executed strong leadership with personnel who had experience working on institutional software, the project could have gone much differently. The remarkable aspect of Healthcare.gov is how fast it was turned around, given the state of the project after the launch. Once the Whitehouse was fully aware of the problems and competent leaders were put in place, the site was made functional in a few weeks and essentially operational in December 2013.

In a nutshell: Experienced leadership leads to realistic expectations, a healthy organizational structure, and strong communication. Never underestimate the importance of the person or group that is in charge.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Sources

> https://www.bloomberg.com/news/articles/2014-09-24/obamacare-website-costs-exceed-2-billion-study-finds

> https://oig.hhs.gov/oei/reports/oei-03-14-00231.asp

> https://hackernoon.com/small-is-beautiful-the-launch-failure-of-healthcare-gov-5e60f20eb967

> https://www.appdynamics.com/blog/product/technical-deep-dive-whats-impacting-healthcare-gov/

https://www.gao.gov/products/gao-14-694


Read more…

Monday, November 14, 2022

How Your Rollout in Waves Can End in a Tsunami

How Your Rollout in Waves Can End in a Tsunami

Many multinational organizations are bringing larger system implementations to a screeching halt because they misunderstand what it means to do a rollout in waves. 

We’re probably all familiar with the “phased rollout”. A phased rollout means you roll a project out to all targeted users at once but don’t deploy all of its planned functionality.

A good example of this would be rolling out a new CRM system to your organization. You go live in the first phase with Contact, Client, and Opportunity Management, and Account Management and Pipeline Management follow in the second phase.

Another popular type of rollout is the so-called staged rollout (also known as a rollout in waves). A rollout in waves or stages means that all the planned functionalities will be rolled out at once, but not for all users.

A rollout in waves gives you time to analyze the system’s quality, stability, and performance against your business goals. You can then decide if you want to roll out the system to more users, wait for more data, or stop the rollout. 

A rollout in waves is one of the core building blocks of making continuous delivery a reality. Facebook, Netflix, Microsoft, Google, and similar companies all rely heavily on staged rollouts.

One wave rollout method frequently used by multinational companies for new system implementation is the rollout by country or geographical territory. 

This is the preferred approach by companies implementing a new CRM, ERP, HCM, or some other key business application. Sometimes it’s combined with a phased approach.

Rolling out in waves is usually a good idea, especially compared to a “big bang” rollout. 

But before undertaking a rollout in waves, you have to carefully consider the following three realities:

1) The moment you switch on a new system in one country, you’ll need to address a bunch of Business As Usual (BAU) activities including Release Management, Change Management, New User Training … you name it. Your users will also discover bugs in the system and/or interfaces that weren’t discovered during testing. Many of them will be critical and need to be fixed ASAP. You’ll probably find that performance issues will be more common than not. Some companies call the first few months “Hyper Care” or some equivalent, but it is nothing else as BAU. 

2) As is always the case with a new system, it won’t work completely as expected. In addition to the bugs that need to be addressed within the BAU process, you’ll have a high number of Change Requests, because only now will users realize they need additional or different functionality to do their work. Again, a number of these requests will be critical and/or urgent. Users will probably ask for many additional reports because they don’t understand the data they see in the new system. If you combine your rollout in waves with a phased rollout, you’ll need to build and test the functionalities for the next phase.

3) At the same time, you’ll want to proceed with the next waves of your rollout, and you’ll need people to work on this. Think about discovery, migration, configuration, training, etc. for each new country that needs to be onboarded. The big idea is always to have one system for everyone, but local legislation and regulations and differences in how business is done in each country will force you to implement additional Change Requests in the system.

The number-one mistake I see is that organizations allocate a single team to accomplish all of the above tasks after the first-wave rollout. This approach always fails miserably and will bring the rollout to a screeching halt.

For a successful rollout in waves, you’ll need three different teams after the first wave:  one for BAU activities, one to deliver Change Requests, and one to onboard additional waves. Some people may work on more than one team, but this really should be the exception.

You’ll need to plan and budget for these teams, hire and train people for them, and define their organizational setup. 

And you’ll need to do all of this before you go live with the first wave – not after!

In a nutshell: You will need three teams for a successful system rollout in waves.

Read more…

Saturday, November 12, 2022

My Talk "Why Big Technology Projects Fail" @ Synergy

Why Big Technology Projects Fail@ Synergy
In September I was invited by Synergex to give a talk at their 2022 Synergy DevPartner Conference. 

The title of my talk was "Why Big Technology Projects Fail" and it covers my personal top ten reasons why this happens so often. 

If you want to watch the talk you can do this as Synergex have put the talk online.  

And if you think I might be a good match for speaking at one of your events just have a look at my speaking page

Read more…