Tuesday, July 02, 2024

Case Study 18: How Excel Errors and Risk Oversights Cost JP Morgan $6 Billion

Case Study 17: The Disastrous Launch of Healthcare.gov

In the spring of 2012, JP Morgan Chase & Co. faced one of the most significant financial debacles in recent history, known as the "London Whale" incident. The debacle resulted in losses amounting to approximately $6 billion, fundamentally shaking the confidence in the bank's risk management practices. 

At the core of this catastrophe was the failure of the Synthetic Credit Portfolio Value at Risk (VaR) Model, a sophisticated financial tool intended to manage the risk associated with the bank's trading strategies. 

The failure of the VaR model not only had severe financial repercussions but also led to intense scrutiny from regulators and the public. It highlighted the vulnerabilities within JP Morgan's risk management framework and underscored the potential dangers of relying heavily on quantitative models without adequate oversight. 

This case study explores the intricacies of what went wrong and how such failures can be prevented in the future. By analyzing this incident, I seek to understand the systemic issues that contributed to the failure and to identify strategies that can mitigate similar risks in other financial institutions. The insights gleaned from this case are not just relevant to JP Morgan but to the broader financial industry, which increasingly depends on complex models to manage risk.

Background

The Synthetic Credit Portfolio (SCP) at JP Morgan was a part of the bank's Chief Investment Office (CIO), which managed the company's excess deposits through various investments, including credit derivatives. The SCP was specifically designed to hedge against credit risk by trading credit default swaps and other credit derivatives. The portfolio aimed to offset potential losses from the bank's other exposures, thereby stabilizing overall performance.

In 2011, JP Morgan developed the Synthetic Credit VaR Model to quantify and manage the risk associated with the SCP. The model was intended to provide a comprehensive measure of the potential losses the bank could face under various market conditions. This would enable the bank to make informed decisions about its trading strategies and risk exposures. The VaR model was implemented using a series of Excel spreadsheets, which were manually updated and managed.

Despite the sophistication of the model, its development was plagued by several critical issues. The model's architect lacked prior experience in developing VaR models, and the resources allocated to the project were inadequate. This led to a reliance on manual processes, increasing the risk of errors and inaccuracies. Furthermore, the model's implementation and monitoring were insufficiently rigorous, contributing to the eventual failure that led to massive financial losses.

The primary objective of JP Morgan's Synthetic Credit VaR Model was to provide an accurate and reliable measure of the risk associated with the bank's credit derivatives portfolio. This would enable the bank to manage its risk exposures effectively, ensuring that its trading strategies remained within acceptable limits. The model aimed to capture the potential losses under various market conditions, allowing the bank to make informed decisions about its investments.

In addition to the primary objective, the Synthetic Credit VaR Model was expected to provide a foundation for further advancements in the bank's risk management practices. By leveraging the insights gained from the model, JP Morgan hoped to develop more sophisticated tools and techniques for managing risk. This would enable the bank to stay ahead of emerging threats and maintain a competitive edge in the financial industry.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Timeline of Events

Early 2011: Development of the Synthetic Credit VaR Model begins. The project is led by an individual with limited experience in developing VaR models. The model is built using Excel spreadsheets, which are manually updated and managed.

September 2011: The Synthetic Credit VaR Model is completed and implemented within the CIO. The model is intended to provide a comprehensive measure of the potential losses the bank could face under various market conditions.

January 2012: Increased trading activity in the SCP causes the CIO to exceed its stress loss risk limits. This breach continues for seven weeks. The bank informs the OCC of the ongoing breach, but no additional details are provided, and the matter is dropped.

March 23, 2012: Ina Drew, head of the CIO, orders a halt to SCP trading due to mounting concerns about the portfolio's risk exposure.

April 6, 2012: Bloomberg and the Wall Street Journal publish reports on the London Whale, revealing massive positions in credit derivatives held by Bruno Iksil and his team.

April 9, 2012: Thomas Curry becomes the 30th Comptroller of the Currency. Instead of planning for the upcoming 150th anniversary of the Office of the Comptroller of the Currency (OCC), Mr. Curry is confronted with the outbreak of news reports about the London Whale incident.

April 16, 2012: JP Morgan provides regulators with a presentation on SCP. The presentation states that the objective of the "Core Credit Book" since its inception in 2007 was to protect against a significant downturn in credit. However, internal reports indicate growing losses in the SCP.

May 4, 2012: JP Morgan reports SCP losses of $1.6 billion for the second quarter. The losses continue to grow rapidly even though active trading has stopped.

December 31, 2012: Total SCP losses reach $6.2 billion, marking one of the most significant financial debacles in the bank's history.

January 2013: The OCC issues a Cease and Desist Order against JP Morgan, directing the bank to correct deficiencies in its derivatives trading activity. The Federal Reserve issues a related Cease and Desist Order against JP Morgan's holding company.

September - October 2013: JP Morgan settles with regulators, paying $1.020 billion in penalties. The OCC levies a $300 million fine for inadequate oversight and governance, insufficient risk management processes, and other deficiencies.

What Went Wrong?

Model Development and Implementation Failures

The development of JP Morgan's Synthetic Credit VaR Model was marred by several critical issues. The model was built using Excel spreadsheets, which involved manual data entry and copying and pasting of data. This approach introduced significant potential for errors and inaccuracies. As noted in JP Morgan's internal report, "the spreadsheets ‘had to be completed manually, by a process of copying and pasting data from one spreadsheet to another’". This manual process was inherently risky, as even a minor error in data entry or formula could lead to significant discrepancies in the model's output.

Furthermore, the individual responsible for developing the model lacked prior experience in creating VaR models. This lack of expertise, combined with inadequate resources, resulted in a model that was not robust enough to handle the complexities of the bank's trading strategies. The internal report highlighted this issue: "The individual who was responsible for the model’s development had not previously developed or implemented a VaR model, and was also not provided sufficient support". This lack of support and expertise significantly compromised the quality and reliability of the model.

Insufficient Testing and Monitoring

The Model Review Group (MRG) did not conduct thorough testing of the new model. They relied on limited back-testing and did not compare results with the existing model. This lack of rigorous testing meant that potential issues and discrepancies were not identified and addressed before the model was implemented. The internal report criticized this approach: "The Model Review Group’s review of the new model was not as rigorous as it should have been". Without comprehensive testing, the model was not validated adequately, leading to unreliable risk assessments.

Moreover, the monitoring and oversight of the model's implementation were insufficient. The CIO risk management team played a passive role in the model's development, approval, implementation, and monitoring. They viewed themselves more as consumers of the model rather than as responsible for its development and operation. This passive approach resulted in inadequate quality control and frequent formula and code changes in the spreadsheets. The internal report noted, "Data were uploaded manually without sufficient quality control. Spreadsheet-based calculations were conducted with insufficient controls and frequent formula and code changes were made". This lack of oversight and quality control further compromised the reliability of the model.

Regulatory Oversight Failures

Regulatory oversight was inadequate throughout the development and implementation of the Synthetic Credit VaR Model. The OCC, JP Morgan's primary regulator, did not request critical performance data and failed to act on risk limit breaches. As highlighted in the Journal of Financial Crises, "JPM did not provide the OCC with required monthly reports... yet the OCC did not request the missing data". This lack of proactive oversight allowed significant issues to go unnoticed and unaddressed.

Additionally, the OCC was informed of risk limit breaches but did not investigate the causes or implications of these breaches. For instance, the OCC was contemporaneously notified in January 2012 that the CIO exceeded its Value at Risk (VaR) limit and the higher bank-wide VaR limit for four consecutive days. However, the OCC did not investigate why the breach happened or inquire why a new model would cause such a large reduction in VaR. This failure to follow up on critical risk indicators exemplified the shortcomings in regulatory oversight.

How JP Morgan Could Have Done Things Differently?

Improved Model Development Processes

One of the primary ways JP Morgan could have avoided the failure of the Synthetic Credit VaR Model was by improving the model development processes. Implementing automated systems for data management could have significantly reduced the risk of human error and improved accuracy. Manual data entry and copying and pasting of data in Excel spreadsheets were inherently risky practices. By automating these processes, the bank could have ensured more reliable and consistent data management.

Moreover, allocating experienced personnel and adequate resources for model development and testing would have ensured more robust results. The individual responsible for developing the model lacked prior experience in VaR models, and the resources allocated to the project were inadequate. By involving experts in the field and providing sufficient support, the bank could have developed a more sophisticated and reliable model. As highlighted in the internal report, "Inadequate resources were dedicated to the development of the model".

Conducting extensive back-testing and validation against existing models could have identified potential discrepancies and flaws. The Model Review Group did not conduct thorough testing of the new model, relying on limited back-testing. By implementing a more rigorous testing process, the bank could have validated the model's accuracy and reliability before its implementation.

Enhanced Oversight and Governance

Enhanced oversight and governance could have prevented the failure of the Synthetic Credit VaR Model. Ensuring regular, detailed reporting to regulators and internal oversight bodies would have maintained transparency and accountability. JP Morgan failed to provide the OCC with required monthly reports, and the OCC did not request the missing data. By establishing regular reporting protocols and ensuring compliance, the bank could have maintained better oversight of the model's performance.

Addressing risk limit breaches promptly and thoroughly would have mitigated escalating risks. The OCC was informed of risk limit breaches but did not investigate the causes or implications of these breaches. By taking immediate action to address and rectify risk limit breaches, the bank could have prevented further escalation of risks. Proactive risk management is crucial in identifying and mitigating potential issues before they lead to significant losses.

Implementing continuous monitoring and review processes for all models and strategies could have identified issues before they led to significant losses. The CIO risk management team played a passive role in the model's development, approval, implementation, and monitoring. By adopting a more proactive approach to monitoring and reviewing the model, the bank could have ensured that potential issues were identified and addressed promptly. Continuous monitoring and review processes are essential in maintaining the accuracy and reliability of risk management models.

Comprehensive Risk Management Framework

Developing a comprehensive risk management framework could have further strengthened JP Morgan's ability to manage risks effectively. This framework should have included clear policies and procedures for model development, implementation, and monitoring. By establishing a robust risk management framework, the bank could have ensured that all aspects of the model's lifecycle were adequately managed.

Additionally, enhancing collaboration and communication between different teams involved in risk management could have improved the model's reliability. The CIO risk management team viewed themselves more as consumers of the model rather than as responsible for its development and operation. By fostering collaboration and communication between different teams, the bank could have ensured that all stakeholders were actively involved in the model's development and monitoring.

Closing Thoughts

The failure of JP Morgan's Synthetic Credit VaR Model underscores the critical importance of rigorous development, testing, and oversight in financial risk management. This incident serves as a cautionary tale for financial institutions relying on complex models and emphasizes the need for robust governance and proactive risk management strategies. By learning from this failure, financial institutions can develop more reliable and effective risk management frameworks.

The insights gleaned from this case study are not just relevant to JP Morgan but to the broader financial industry, which increasingly depends on complex models to manage risk. By addressing the systemic issues that contributed to the failure and implementing the strategies outlined in this case study, financial institutions can mitigate similar risks in the future.

In conclusion, the London Whale incident highlights the vulnerabilities within JP Morgan's risk management framework and underscores the potential dangers of relying heavily on quantitative models without adequate oversight. By enhancing model development processes, improving oversight and governance, and developing a comprehensive risk management framework, financial institutions can ensure more reliable and effective risk management practices.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Sources

1) Internal Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses, January 16, 2013

2) A whale in shallow waters: JPMorgan Chase, the “London Whale” and the organisational catastrophe of 2012, François Valérian, November 2017

3) JPMorgan Chase London Whale E: Supervisory Oversight, Arwin G. Zeissler and Andrew Metrick, Journal of Financial Crises, 2019

4) JPMorgan Chase London Whale C: Risk Limits, Metrics, and Models, Arwin G. Zeissler and Andrew Metrick, Journal of Financial Crises, 2019

5) JPMorgan Chase Whale Trades: A Case History of Derivatives Risks and Abuses, Permanent Subcommittee on Investigations United States Senate, 2013

Read more…

Monday, July 01, 2024

Boards Must Understand Technology. Period.

Boards Must Understand Technology. Period.

Reflecting on the 2024 Swiss Board Day in Bern it has become even more clear to me that understanding the current technological landscape and its associated opportunities, challenges, and risks is now essential for both executive and non-executive board members. 

Equally important is staying informed about governance issues related to these technologies, including regulatory challenges and potential pitfalls. 

There is now way around it anymore, in order to set the company's vision and strategy, the board must understand how technology impacts the business and its future value creation.

Consider the narratives surrounding artificial reality (AI). While ChatGPT brought large language models into the spotlight, various AI applications like face ID, image recognition, customer service chatbots, and expert systems for tasks such as self-driving cars and chess have been in use for decades. 

Despite media focus on the risks of AI, such as deep fakes and cyber threats, there are significant defensive benefits, including enhancing cybersecurity and verification processes. Boards need to understand AI’s role within their organizations, lead the way in defining “responsible AI,” and ensure issues like privacy, bias, and equity are addressed in AI development and deployment.

Clients, regulators, and markets now expect rapid and effective integration of new business drivers into strategies. Building trust around new technologies with internal and external stakeholders is crucial. 

Cybersecurity, augmented reality (AR), robotics, and AI are just a few examples where companies must identify, measure, disclose, and adapt to strategic opportunities and risks. Not all technology is relevant for your company, but the ones that are should be evaluated in detail.

How can a board effectively oversee the long-term growth and evolution of their company amidst ongoing new opportunities and challenges, especially if they lack specific knowledge on existing and emerging technologies and its risks? 

Boards should start with leveraging internal company resources. You should seek out knowledge by visiting your company's offices, attend small group sessions, do production tours, and join town halls to witness new developments firsthand and understand their strategic alignment.

Dedicated training and workshops with relevant experts can help you grasp the business implications of key technologies within their industry. Your trainer(s) should have experience implementing technology in your industry. 

What is even more important is that your trainer(s) can explain technology in a way that non-technical people can understand and are able to apply their newly won knowledge onto their business.

The aim isn’t to create a board of tech experts but to shift mindsets, open new possibilities, evaluate risks, and enhance the board’s ability to challenge management in business development.

In a nutshell: In order to set the company's vision and strategy, the board must understand how technology impacts the business and its future value creation.

If your board is in need for such a training or workshop have a look at my offerings;

> (Non)-Executive Crash Course - Technology Trends Shaping Our Future

> (Non)-Executive Workshop - Technology Vision Definition

Read more…

Friday, June 14, 2024

How To Select a Good Project Manager for Your Large and Complex Transformation Project

How To Select a Good Project Manager for Your Large and Complex Transformation Project

One of your most important jobs as a project sponsor is to select a good project manager for your project. 

Selecting the right project manager is crucial for the success of your project. 

Here are the five key factors to consider when choosing the right person for the role:

1) Experience

Nothing beats relevant experience when it comes to managing large and complex transformation projects. On smaller and less complex projects you can give people a chance. On your business critical projects you should not. 

You will need to be looking for project managers that have managed projects that were;

> in the same industry. Bonus points when it was in your own company or a direct competitor.

> having a similar objective and scope. After a full cycle SAP implementation at three different companies you understand a thing or two. Unless it were completely different modules and products. 

> having a similar size and complexity. Rolling out a new software in one country is different from doing it in twelve. Having hundreds of products, thousands of clients

Your project manager should have a track record. Check references and past project outcomes. 

A project gone belly up is not necessarily the fault of the project manager, but you will need to look for successful project completions and satisfied clients or employers.

2) Leadership and Communication Skills

A good project manager should be able to lead a team, make decisions, and motivate team members. Effective communication is critical for ensuring that all stakeholders are on the same page. You will get a feeling for this during your interviews. But the easiest way to check this is by checking references and calling your own contacts that might have worked with them.

3) Problem Understanding and Solving Skills

They should be able to analyse and understand problems and come up with effective solutions quickly. Understanding your problem is half the solution.  You can assess this by presenting a number of the problems you want to address with your project to the project manager in an interview and ask them to come up with a solution on the fly. 

4) Team Dynamics

They should be able to work well with you and your existing team. Ensure the project manager’s work style and values align with your team and company’s culture. Micromanagement sucks for everybody. Involve key team members in the interview process to get their input on potential candidates.

5) Gut Feeling

If your intuition about a candidate’s fit is good, but one or more of the 4 factors above seems to be not good, then look for a better candidate. Don't rely only on your intuition in this case.

If your intuition about a candidate’s fit is bad, but all of the 4 factors above seem to be good, then look for a better candidate. Trust your intuition in this case.

If your candidate scores well on these five factors there is a high probability they are the right candidate for the job!

PS: What is absolutely not important are certifications. Possessing the PMP shouts to the world that they have passed a comprehensive exam and confirmed that they are aware of and understand the processes, terms, tools, and techniques as represented in the PMI's Guide to the Project Management Body of Knowledge. Thats it! The same for Prince 2, SAFe, IPMA, and others. 

Passing these exams does not confirm that they are an accomplished project manager with a long history of leading successful projects. To claim or even imply that earning such a certification is any more than an indicator of general knowledge in the field is questionable.

In a nutshell: Nothing beats relevant experience when it comes to managing large and complex transformation projects. On smaller and less complex projects you can give people a chance. On your business critical projects you should not.

If you are a senior (non)-executive in the role of a project sponsor or steering committee member in a large and complex transformation project, and you are confronted with topics like the above have a look at this training;

(Non)-Executive Crash Course - How to navigate large and complex transformation projects.

I will teach you the most relevant things you need to know in half a day.

Read more…

Sunday, March 19, 2023

Case Study 17: The Disastrous Launch of Healthcare.gov

Case Study 17: The Disastrous Launch of Healthcare.gov

Barack Obama was inaugurated on January 20, 2009, after defeating his opponent John McCain by 365 electoral college votes to 175. One of Obama's primary campaign issues was fixing America's healthcare system by providing affordable options to the 43.8 million uninsured Americans. 

In 2010, the year Obama signed the Affordable Care Act (ACA), the United States spent 17.6% of its GDP on health care, nearly double the OECD average of 9.5%, with the next closest developed nation, the Netherlands, spending 12%.

The 44th president was successful in introducing the ACA; however, the launch of the website that would connect Americans to the marketplace, Healthcare.gov, was a failure. While the platform would eventually enroll an estimated 10 million uninsured Americans in 2014, the rollout was a complete disaster that exposed the challenges the United States government faces in implementing technology.

According to a 2008 report by the Government Accountability Office (GAO), 48% of federal IT projects had to be restructured because of cost overages or changes in project goals. In addition, over half had to be restarted two or more times.

On the first day Heathcare.gov was launched, four million unique users visited the portal, but only six successfully registered. Over the next few days, the site experienced eight million visitors, but according to estimates, around 1% enrolled in a new healthcare plan. Even the users that did sign up experienced errors, including duplicates in enrollment applications submitted to insurers.

The trouble launching Healthcare.gov presents a seemingly reoccurring problem when the US government tech projects. Standish Group International Chairman Jim Johnson is on record praising the rollout based on the government's history of software failing by default. "Anyone who has written a line of code or built a system from the ground up cannot be surprised or even mildly concerned that Healthcare.gov did not work out of the gate. The real news would have been if it actually did work. The very fact that most of it did work at all is a success in itself."

However, there's far more to the failed launch of the federally facilitated marketplace (FFM). The agency responsible for the project, the Centers for Medicare and Medicaid Services (CMS), didn't follow many regulations in place to ensure transparency, proper oversight, and accountability. So was the project destined to fail from the start due to overwhelming layers of bureaucracy, or were the vendors tasked with developing the online marketplace to blame?

In this case study, we'll examine why Healthcare.gov failed to meet expectations and point out what CMS could have done differently to deliver a functioning FFM.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Timeline of Events

2010

On March 23, 2010, President Barack Obama signed ACA, also known as Obamacare, into law. The legislation was the most comprehensive reform of the US medical system in 50 years and is still in place today.

Under the ACA, US citizens were required to have health insurance or pay a monthly fee. The law also required the establishment of online marketplaces that would allow individuals to compare and select health insurance policies by January 1, 2014. States could set up their own marketplace or use the FFM.

Each marketplace created under the ACA was intended to provide a seamless, single point of access for individuals to enroll in qualified healthcare plans and access income-based financial subsidies created under the new law.

Users were required to visit Healthcare.gov, register, verify their identity, determine their eligibility for subsidies, and enroll in a plan. The process appears straightforward; President Obama even touted the marketplace weeks before its launch by saying, "Now, this is real simple. It's a website where you can compare and purchase affordable health insurance plans, side by side, the same way you shop for a plane ticket on kayak… the same way you shop for a TV on Amazon."

However, building an identity verification platform on such a large scale alone is exceptionally challenging. The marketplace also required integration from databases in other government agencies. Once the user successfully was verified as an American citizen, income was determined, and they were filtered through state and federal government programs like Medicaid or the State Children's Health Insurance Program, then they would be matched with private health insurance plans.

The process was not simple and was far more complex than online shopping because it required integration with identification verification software, other government databases, and health insurance providers.

From day one, the project was underestimated. In addition, the requirements in the ACA that all citizens must enroll by January 1, 2014 or would be required to pay a fine created a hard deadline with economic and political consequences.

March 2010 - September 2011

Over a year passed between the ACA becoming a law and the CMS signing contracts with vendors who would build the FFM. During this period, problems directly affecting the launch were already beginning.

While the CMS was in charge of oversight of the project and hiring contractors, leadership was fractured across multiple government agencies. The project was headed by the CMS's Deputy CIO, Henry Chao, but the committee also included:

Todd Park – White House CTO

Jeanne Lambrew – Executive Office of Health Reform

Kathleen Sebelius and Bryan Sivak – Department of Health and Human Services

Members of the committee outside of the CMS executed a great deal of power and influence over the project; however, no one at the various agencies had visibility of the critical milestones that each group needed to reach to complete the project successfully.

The CMS awarded 60 contracts to 33 different vendors. The largest was granted to Conseillers en Gestion et Informatique (CGI), a Montreal-based IT company that employed more than 70,000 people. CGI grew to be worth more than $11 billion by 2013 by acquiring other companies, some of which handled US government contracts such as the 2004 purchase of American management Systems and the 2010 purchase of Stanley.

CGI’s contract consisted of developing the FFM and was valued at over $90 million. CGI was responsible for the most significant, user-facing aspect of the project but was not officially assigned a lead integrator role. CMS would later report they perceived CGI to be the project's lead integrator but didn't have written documentation outlining the agreement.

Representatives from CGI stated they did not have the same understanding at this point of the project.

US federal agencies are required to perform a procurement framework when awarding private companies with government contracts outlined by the Federal Acquisition Regulation (FAR). However, CMS failed to satisfy specific aspects of FAR, including:

> Preparing a written procurement strategy documenting the factors, assumptions, risks, and tradeoffs that would guide project decisions.

> Conduct thorough past performance reviews of potential contractors.

> Only used the federal government's contractor database (PPIRS) when evaluating bids on four of the six key contracts.

CMS leadership later claimed they were unaware that a procurement strategy was required. 

December 2011 – Summer 2013

Development of the Healthcare.gov project began in December 2011 consisting of four major components:

> Front-end website for the FFM (the marketplace)

Back-end data services hub

Enterprise identity management sub-system

Hosting infrastructure

CGI was responsible for the front-end website. The UI was developed with standard web tools, including Bootstrap, CSS, jQuery, Jekyll, Prose.io, and Ruby; however, it would later be revealed that common optimization features to aggregate and minify CSS and JS files were not used.

The back-end data services hub was developed by CGI and Quality Software Services, Inc (QSSI) using Java, JBoss, and the NoSQL MarkLogic database. The hub was responsible for orchestrating data and services from multiple external sources such as agent brokers, insurers, CMS, DHS, Experian, the IRS, state insurance exchanges, and the US Social Security Administration (SSA). While the integration was incredibly complex, utilizing data from multiple sources, the developers chose machine-generated middleware objects to save time.

Enterprise Identity Management (EIDM) was handled by QSSI but depended on the back-end to retrieve data from multiple sources. Before the launch, the EIDM was tested with an expected load of only 2,000 concurrent users.

The final major system component was the hardware infrastructure hosting of the website, FFM, data services hub, and EIDM. Akamai's CDN hosted Healthcare.gov's UI. The original back-end (it would be replaced after the failed launch) consisted of 48 VMWare virtual machine nodes running Red Hat Enterprise Linux and hosted on twelve physical servers in a Terremark data center. Some of the servers ran vSphere v4.1, with others running v5.1; the network was also running at 1 Gb/sec, far below its capacity of 4 Gb/sec.

Failures on all critical components of the project exemplify that CMS didn't have the personnel or experience to handle an IT project of this magnitude.

Late 2013

With only a couple of months left before the scheduled launch, CMS raised its concerns about CGI's performance but didn't take steps to hold the contractor accountable. In September 2013, CMS moved into CGI's offices for on-site support.

CMS delayed governance reviews that would have exposed the issues and did not receive the required approvals to move forward. However, they decided to continue with an on-schedule launch with all the platform's features, available to every American citizen needing affordable healthcare. The estimated project cost at this point was $500 million.

On October 1, 2013, the Healthcare.gov website was online. Most visitors experienced crashes, delays, errors, and slow performance throughout the week. By the weekend, the decision was made to take the site down because it was practically unusable.

Later in the month, HHS announced the following changes to the project:

Project management was centralized and led by Jeffrey Zients, former OMB director who had a reputation within the Whitehouse for solving tough problems and managing teams.

Todd Park, White House CTO, reorganized the technology leadership team, demoted some underperforming CMS employees and 3rd party contractors, and recruited top talent from Silicon Valley for a government sabbatical to save the site.

A Tiger team was formed with the narrow mandate of getting the FFM working properly.

The new team scrummed daily, triaged existential risks, and prioritized defects based on urgency. Over the next six weeks, the Tiger team resolved 400 system defects, increased system concurrency to 25,000 users, and improved the site's responsiveness to one second. The site went back online, and enrollment jumped from 26,000 in October to 975,000 in December.  

By Christmas, most problems had been fixed, but the site was still not fully operational.

2014

CGI was replaced by Accenture as the lead contractor and awarded a $90 billion contract to replace the FFM.

The individual mandate requirement was pushed back to March 31, 2014, giving uninsured Americans more time to sign up without being penalized.  

In July, the GOA released a detailed report outlining the critical failures of the project.

According to the GOA's findings, FFM obligations increased from $56 million to more than $209 million. Similarly, data hub obligations increased from $30 million to nearly $85 million from September 2011 to February 2014. The study recommended that "CMS take immediate actions to assess increasing contract costs and ensure that acquisition strategies are completed, and oversight tools are used as required, among other actions. CMS concurred with four recommendations and partially concurred with one. "

In August, the Office of Inspector General released a report finding that the total cost of the Healthcare.gov website had reached $1.7 billion. A month later, Bloomberg News reported the cost exceeded $2 billion.

By November, open enrollment on Healthcare.gov began for 2015.

What Went Wrong?

A multitude of issues caused the failed launch of the Healthcare.gov website. While it is a clear example of the federal government's continuous struggle to implement functioning, secure software, the problems go beyond Washington's bureaucracy. Nevertheless, much can be learned about releasing digital solutions and managing large-scale projects with many moving parts in general.

Overconfidence

The project started with overconfidence and unrealistic expectations set by the White House. Obama's campaign staff had a reputation for being technologically savvy because they pioneered using social media and data mining in the 2008 presidential election. However, running a social media campaign and releasing a single point of contact that pulls from multiple government agencies and insurance companies aren't comparable.

Underestimated Scale of the Project

Due to overconfidence and unrealistic expectations, the project scale was drastically underestimated, resulting in mismanagement in organizational structure, leadership, accountability, and transparency.

As the deadline approached, the project scope grew, while CMS identified 45 critical and 324 severe code defects across FFM modules.

Politics

Launching large-scale software projects are extremely challenging but adding a hostile political climate made the rollout even more difficult. Not only did CMS not have the personnel or experience to handle an FFM, but they also experienced pressure and influence from outside the agency. One of the most significant examples came in August of 2013, the White House and executive Office of Health Reform insisted on requiring site user registration before shopping for insurance so that concrete user numbers could be shown as proof of the system's success.

Members of the project committee that weren't from the CMS exhibited a great deal of influence over critical decision-making while not having access to accurate progress reports. In addition, the 2012 presidential elections likely impacted delays. Polarizing decisions, such as final rules on private insurance premiums, coverage availability, risk pools, and catastrophic plans, were put off till after the election cycle. These rules had to be translated into software and tested before a successful rollout was possible.

Lack of technology understanding and experience at CMS

The CMS was not prepared to handle a technology project at this scale. Other government agencies, such as the DOD and NASA, had decades of experience navigating the institutional challenges required to develop, deliver, and operate reliable IT systems.

Throughout procurement, development, and launch, CMS made it clear that its personnel didn't understand the requirements necessary to oversee a large-scale technology project, let alone one that had additional regulatory hurdles set by government agencies.

Poor Project Management

The project committee was spread across various government agencies, including the CMS, the White House, the Office of Health Reform, and the Department of Health and Human Services. As a result, there wasn't an organizational structure standard on even small-scale projects. Fractured leadership contributed to the lack of project management, but CMS was primarily responsible. While the problem could have been lessened if CMS had followed the guidelines from FAR, the operators from the agency pleaded ignorance in the GAO report, strengthening the case that CMS didn't have personnel suitable for the project.  

Failed to Postpone Launch

All the problems accumulated, resulting in a failed launch. Typically, the release is delayed when a website or software isn't ready. The engineers perform more testing, fix the problems, and launch at a later time. Healthcare.gov was a unique project with consequences transcending a poor user experience. The time constraints pressured CMS to go forward with the launch rather than being transparent and communicating that the infrastructure couldn't handle millions of users.

Leading up to the launch, CMS was given more business rules and a broader scope. While they had no control over the pressure coming from the Whitehouse, the agency failed to be upfront, and instead of delaying the launch or releasing in stages, they scrambled to save the project.

How CMS Could Have Done Things Differently

CMS was in charge of the project, but the blame falls on the Obama administration. Appointing CMS to lead the project was the first mistake made in a number of shortcomings that led to billions of wasted tax dollars and US citizens being delayed health care coverage.

While appointing a different agency, one that had experience delivering technology projects, was the first significant error, analyzing the oversights made by CMS leading up to the launch is the most practical way to assess what could have been done differently.  

Preparation

An understanding of the scale of Healthcare.gov would have dramatically influenced CMS to prepare better and implement project management best practices. One way the leadership at CMS could have prevented a launch failure was to realize they couldn't handle heading the project. Assigning lead manager and integrator roles to an outside firm or the lead vendor would have been more sensible.

Procurement

Many institutional challenges out of the hands of CMS contributed to the failed launch of Healthcare.gov. However, the agency had complete control over the procurement of government contractors. Simply following the Federal Acquisition Regulation (FAR) procurement framework could have prevented many organizational problems that led to the failure.

Had CMS adhered to standard government agency procurement guidelines, issues like the confusion around who was lead integrator wouldn't have existed. Furthermore, decisions like depending on data provided by Experian, a data source that neither the government nor the other contractors could do any data quality work on, would have likely been questioned if not denied when submitted for approval.

Adopt Iterative Software Development Framework

The project would have benefited if an iterative system development philosophy and a lean software manufacturing process such as Scrum or Kanban were adopted. Project managers could have organized sprints driven by the top priorities, including the complete end-to-end testing of the technology solution. An Iterative Development Framework would have increased visibility across multiple contractors and federal agencies, improved development quality, and reduced time to market.

Strong Leadership

Leadership was a fundamental problem that affected every aspect of the Healthcare.gov launch. Distribution of authority, management, and accountability created an environment where a functioning FFM was impossible to deliver on time. The project steering committee should have elected one project executive to make final decisions, hold contractors accountable, and communicate with other government agencies.

See "Consensus Is the Absence of Leadership" for  for more insights on this topic.

Set and Guard Technical Standards

A rushed, unqualified, and fractured committee led to the development of poor technology. As a result, CGI and other contractors experience zero to very little oversight allowing for an unfinished UI, partially operating back-end, and unstable hosting. 

The developers from CGI delivered an unpolished UI with excessive typos, a bloated directory, sloppy code, and even Lorem Ipsum placeholder text on the web pages. In addition, best practices were not followed or ignored regarding file compression, causing the website to take eight seconds to load and 71 seconds for user account registration pages with client-side loading, according to a report by AppDynamics. 

A basic small business website delivered at this standard would be unacceptable, let alone the focal point of one of the most transformative pieces of legislation in recent US history.

CGI should have been held to higher standards and required to deliver a polished, functioning UI.

While the front end was a disaster, fixing HTML, CSS, and JS and optimizing webpages can be done overnight. Healthcare.gov's back-end was a different story requiring systemic changes to function. More oversight was needed on the database and server-side development, but it could have only gone smoothly with drastic changes to leadership and organizational structure.

Improve Security

Healthcare.gov requires an abundance of personal data to be submitted and collected. While security wasn't the primary failure of the project, the servers were hacked in July 2014. The malware was uploaded into the system but failed to communicate to any external IP addresses. In addition, multiple security defects were found, including the insecure transmission of personal data, unvalidated password resets, error stack traces revealing internal components, and violations of user data privacy.

Comprehensive security audits must be conducted before the launch rather than on the fly after a site is live.

Implement Testing and Bug Fixing Protocols

Adequate testing could have prevented many of the FFM's problems; however, CMS was well aware of the issues but failed to communicate with HHS and other government agencies. Still, testing protocols and a strategy to manage and fix bugs are necessary for an IT project of this scale.

CMS needed to coordinate unit and component testing by 3rd parties much sooner than a couple of weeks before launch. Testing conducted by teams outside of specific features of the project ensures there aren't biases and that the project works on various devices, browsers, and servers.

Another issue CMS encountered was communication with states implementing their own healthcare marketplaces. The date was pushed from November 2012 to December 2012, and some were confirmed as late as February 2013. Uncertainty of the traffic volume should have led to overpreparation and expanded capacity testing of concurrent users. Instead, a load of only 2,000 simultaneous users has been tested rather than the tens of thousands they should have expected.  

Phased or Staged Rollout

One way the project steering committee could have responded to the issues without pushing back the launch was to release the project in phases or stages. Below are multiple options the committee could have taken instead of the Big Bang launch approach:

> Released certain features that were ready or could have been prepared when prioritized before October 1, 2013.

> Roll out a beta version of the platform to a small number of applicants (by region, government employees, sample group, etc.) months before the deadline.

> Release the platform in strategic phases leading up to the deadline, e.g., encourage applicants to register early in August, request eligibility in September, and shop for plans in October.

> Limit the scope of the project months before the launch and focus on minimal requirements rather than continue expanding leading up to the hard deadline.

See "How Your Rollout in Waves Can End in a Tsunami" for more insights on this topic.

Face Reality

The problems facing the Healthcare.gov launch were apparent, and there's evidence in the GAO report that suggests CMS was well aware of the issues. In addition, a McKinsey report was released just a few months before the scheduled rollout in April 2013. The report highlighted the initiative's complexity and identified more than a dozen critical risks spanning the marketplace technology and project governance. 

The problems we've covered were clearly outlined in the report, as well as multiple definitions of success and concerns with a Big Bang launch approach. The McKinsey report also suggested several actionable methods to mitigate the risks. Still, the project steering committee did not act upon the McKinsey report's findings and recommendations before the system launch.

Whether CMS caved under the pressure of the deadline, the political consequences, or was just incompetent enough to expect a positive outcome is unclear. All the warning signs pointed to an unsuccessful launch and should have been taken seriously, resulting in making the necessary adjustments.

See "It Is Time to Face Your Project's Reality" for more insights on this topic.

Closing Thoughts

The Healthcare.gov project exemplifies just about everything that can go wrong with software integration between the federal government and the private sector. The project's failures are incredibly complicated due to the influence of multiple government agencies and a hostile political climate. When one problem is identified, more come to the surface with increasingly challenging solutions.

But was the project doomed to fail just because of the layers of government bureaucracy? I don't believe so.

Had the committee executed strong leadership with personnel who had experience working on institutional software, the project could have gone much differently. The remarkable aspect of Healthcare.gov is how fast it was turned around, given the state of the project after the launch. Once the Whitehouse was fully aware of the problems and competent leaders were put in place, the site was made functional in a few weeks and essentially operational in December 2013.

In a nutshell: Experienced leadership leads to realistic expectations, a healthy organizational structure, and strong communication. Never underestimate the importance of the person or group that is in charge.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Sources

> https://www.bloomberg.com/news/articles/2014-09-24/obamacare-website-costs-exceed-2-billion-study-finds

> https://oig.hhs.gov/oei/reports/oei-03-14-00231.asp

> https://hackernoon.com/small-is-beautiful-the-launch-failure-of-healthcare-gov-5e60f20eb967

> https://www.appdynamics.com/blog/product/technical-deep-dive-whats-impacting-healthcare-gov/

https://www.gao.gov/products/gao-14-694


Read more…

Monday, November 14, 2022

How Your Rollout in Waves Can End in a Tsunami

How Your Rollout in Waves Can End in a Tsunami

Many multinational organizations are bringing larger system implementations to a screeching halt because they misunderstand what it means to do a rollout in waves. 

We’re probably all familiar with the “phased rollout”. A phased rollout means you roll a project out to all targeted users at once but don’t deploy all of its planned functionality.

A good example of this would be rolling out a new CRM system to your organization. You go live in the first phase with Contact, Client, and Opportunity Management, and Account Management and Pipeline Management follow in the second phase.

Another popular type of rollout is the so-called staged rollout (also known as a rollout in waves). A rollout in waves or stages means that all the planned functionalities will be rolled out at once, but not for all users.

A rollout in waves gives you time to analyze the system’s quality, stability, and performance against your business goals. You can then decide if you want to roll out the system to more users, wait for more data, or stop the rollout. 

A rollout in waves is one of the core building blocks of making continuous delivery a reality. Facebook, Netflix, Microsoft, Google, and similar companies all rely heavily on staged rollouts.

One wave rollout method frequently used by multinational companies for new system implementation is the rollout by country or geographical territory. 

This is the preferred approach by companies implementing a new CRM, ERP, HCM, or some other key business application. Sometimes it’s combined with a phased approach.

Rolling out in waves is usually a good idea, especially compared to a “big bang” rollout. 

But before undertaking a rollout in waves, you have to carefully consider the following three realities:

1) The moment you switch on a new system in one country, you’ll need to address a bunch of Business As Usual (BAU) activities including Release Management, Change Management, New User Training … you name it. Your users will also discover bugs in the system and/or interfaces that weren’t discovered during testing. Many of them will be critical and need to be fixed ASAP. You’ll probably find that performance issues will be more common than not. Some companies call the first few months “Hyper Care” or some equivalent, but it is nothing else as BAU. 

2) As is always the case with a new system, it won’t work completely as expected. In addition to the bugs that need to be addressed within the BAU process, you’ll have a high number of Change Requests, because only now will users realize they need additional or different functionality to do their work. Again, a number of these requests will be critical and/or urgent. Users will probably ask for many additional reports because they don’t understand the data they see in the new system. If you combine your rollout in waves with a phased rollout, you’ll need to build and test the functionalities for the next phase.

3) At the same time, you’ll want to proceed with the next waves of your rollout, and you’ll need people to work on this. Think about discovery, migration, configuration, training, etc. for each new country that needs to be onboarded. The big idea is always to have one system for everyone, but local legislation and regulations and differences in how business is done in each country will force you to implement additional Change Requests in the system.

The number-one mistake I see is that organizations allocate a single team to accomplish all of the above tasks after the first-wave rollout. This approach always fails miserably and will bring the rollout to a screeching halt.

For a successful rollout in waves, you’ll need three different teams after the first wave:  one for BAU activities, one to deliver Change Requests, and one to onboard additional waves. Some people may work on more than one team, but this really should be the exception.

You’ll need to plan and budget for these teams, hire and train people for them, and define their organizational setup. 

And you’ll need to do all of this before you go live with the first wave – not after!

In a nutshell: You will need three teams for a successful system rollout in waves.

Read more…

Saturday, November 12, 2022

My Talk "Why Big Technology Projects Fail" @ Synergy

Why Big Technology Projects Fail@ Synergy
In September I was invited by Synergex to give a talk at their 2022 Synergy DevPartner Conference. 

The title of my talk was "Why Big Technology Projects Fail" and it covers my personal top ten reasons why this happens so often. 

If you want to watch the talk you can do this as Synergex have put the talk online.  

And if you think I might be a good match for speaking at one of your events just have a look at my speaking page

Read more…

Sunday, October 16, 2022

Case Study 16: Nike’s 100 Million Dollar Supply Chain "Speed bump"

Case Study 16 – Nike’s 100 Million Dollar Supply Chain Speed bump

“This is what you get for 400 million, huh?” 

Nike President and CEO Phil Knight famously raised the question in a conference call days before announcing the company would miss its third-quarter earnings by at least 28% due to a glitch in the new supply chain management software. The announcement would then send Nike’s stock down 19.8%. In addition, Dallas-based supply-chain vendor i2 Technologies, which Nike assigned blame, would suffer a 22.4% drop in stock price.

The relationship would ultimately cost Nike an estimated $100 million. Each company blamed the other for the failure, but the damage could have been dramatically reduced if realistic expectations had been set early on and a proper software implementation plan had been put in place. Most companies wouldn’t overcome such a disastrous supply chain glitch or “speed bump,” as Knight would call it, but Nike would recover due to its dominant position in the retail footwear and apparel market.

In 1999, two years before Knight’s famous outburst, Nike paid i2 $10 million to centralize its supply, demand, and collaboration planning system with a total estimated implementation cost of $40 million. Initially, i2 was the first phase of The Nike Supply Chain (NSC) project. The plan was to implement i2 to replace the existing system and introduce enterprise resource planning (ERP) software from SAP and customer relationship management (CRM) software from Siebel Systems.  

The goal of the NSC project was to improve Nike’s existing 9-month product cycle and fractured supply chain. As the brand experienced rapid growth and market dominance in the 1990s, it accumulated 27 separate order management systems around the globe. Each is entirely different from the next and poorly linked to Nike’s headquarters in Beaverton, Oregon.

At the time, there wasn’t a model to follow at the scale Nike required. Competitors like Reebok struggled to find a functional supply chain solution specific to the retail footwear and apparel industry. In an effort to solidify its position as the leader in sportswear, Nike decided to move forward quickly with i2’s predictive demand application and its supply chain planner software.

"Once we got into this, we quickly realized that what we originally thought was going to be a two-to-three-year effort would be more like five to seven," - Roland Wolfram, Nike’s vice president of global operations and technology.

The NCS project would be a success, and Nike would eventually accomplish all its supply chain goals. However, the process took much longer than expected, cost the company an additional $100 million, and could have been avoided had the operators or both companies taken a different approach to implementation.

"I think it will, in the long run, be a competitive advantage." – Phil Knight

In the end, Knight was right, but there are many valuable lessons to learn from the Nike i2 failure.

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

So, before we get into the case study, let’s look at precisely what happened...

Timeline of Events

1996 - 1999

Nike experienced incredible growth during this period but was at a crossroads. Strategic endorsement deals and groundbreaking marketing campaigns gave the company a clear edge over Adidas and Reebok, its two most substantial competitors in the 80s and 90s. However, as Nike became a world-renowned athletics brand, its supply chain became more complex and challenging to manage.

Part of Nike’s strategy that separated itself from competitors was the centralized approach. Product design, factory contracting, and order fulfillment were coordinated from headquarters in Oregon. The process resulted in some of the most iconic designs and athlete partnerships in sports history. However, manufacturing was much more disoriented.

During the 1970s and 80s, Nike battled to develop and control the emerging Asian sneaker supply chain. Eventually, the brand won the market but struggled to expand because of the nine-month manufacturing cycle.

At the time, there wasn’t an established method to outsource manufacturing from Asia, making the ordering process disorganized and inefficient across the industry. In addition, Nike’s fractured order management system contained tens of millions of product numbers with different business rules and data formats. The brand needed a new way to measure consumer demand and manage purchasing orders, but the state of the legacy system would make implementing new software difficult.

1999

At the beginning of 1999, Nike decided to implement the first stage of its NSC project with the existing system. i2 cost the company $10 million but estimated the entire project would cost upwards of $400 million. The project would be one of the most ambitious supply chain overhauls by a company of Nike’s size. 

i2 Technologies is a Dallas, Texas-based software company specializing in designing solutions that simplify supply and demand chain management while maximizing efficiency and minimizing cost. Before the Nike relationship, i2 was an emerging player in logistics software with year-over-year growth. Involvement in the Nike project would position the company as the leading name in supply chain management software.

Nike’s vision for the i2 phase of NSC was “achieving greater flexibility in planning execution and delivery processes…looking for better forecasting and more profitable order fulfillment." When successfully implemented, the manufacturing cycle would be reduced from nine months the six. This would convert the supply chain to make-to-order rather than make-to-sell, an accomplishment not yet achieved in the footwear and apparel industry.

Predicting demand required inputting historical sales numbers into i2’s software. “Crystal balling” the market had substantial support at the time among SCM companies. While the belief that entering numbers into an algorithm and spitting out a magical prediction didn’t age well, the methodology required reliable, uniform data sets to function.

Nike decided to implement the “Big Bang” ERP approach when switching to i2 for the supply chain management. The method consists of going live where the business completely changes without phasing out the old system. Nike also opted for a single instance strategy for implementation. The CIO at the time, Gordon Steele, is quoted saying, “single instance is a decision, not a discussion.” Typically, global corporations choose a multi-instance ERP solution, using separate instances in various regions or for different product categories.

2000

By June of 2000, various problems with the new system had already become apparent. According to documents filed by Nike and i2 shareholders in class-action suits, the system used different business rules and stored data in various formats, making integration difficult. In addition, the software needed customization beyond the 10-15% limit recommended by i2. Heavy customization slowed down the software. For example, entries were reportedly taking over a minute to be recorded. In addition, the SCM system frequently crashed as it struggled to handle Nike’s tens of millions of product numbers.

The issues persisted but were fixable. Unfortunately, the software was linked to core business processes, specifically factory orders, that sent a ripple effect that would result in over and under-purchasing critical products. The demand planner would also delete ordering data six to eight weeks after it was entered. As a result, planners couldn’t access purchasing orders that had been sent to factories.

Problems in the system caused far too many factory orders for the less popular shoes like the Air Garnett IIIs and not enough popular shoes like the Air Jordan to meet the market's demand. Foot Locker was forced to reduce prices for the Air Garnett to $90 instead of the projected retail price of $140 to move the product. Many shoes were also delivered late due to late production. As a result, Nike had to ship the shoes by plane at $4-$8 a pair compared to sending them across the Pacific by boat for $0.75.   

November 2000

According to Nike, all the problems with i2’s supply chain management system were resolved by the fall. Once the issues were identified, Nike built manual workarounds. For example, programmers had to download data from i2’s demand predictor and reload it into the supply chain planner on a weekly basis. While the software glitches were fixed and orders weren’t being duplicated or disappearing, the damage was done. Sales for the following quarter were dramatically affected by the purchasing order errors resulting in a loss of over $100 million in sales.

2001

Nike made the problem public on February 27, 2001. The company was forced to report quarterly earnings to stakeholders to avoid repercussions from the SEC. As a result, the stock price dove 20%, numerous class-action lawsuits were filed, and Phil Knight famously voiced his opinion on the implementation, "This is what you get for $400 million, huh?"

In the meeting, Nike told shareholders they expected profits from the quarter to decline from around $0.50 a share to about $0.35. In addition, the inventory problems would persist for the next six to nine months as the overproduced products were sold off.

As for the future of NSC, the company, including its CEO and President, expressed optimism. Knight said, "We believe that we have addressed the issues around this implementation and that over the long term, we will achieve significant financial and organizational benefit from our global supply-chain initiative."

A spokeswoman from Nike also assured stakeholders that the problems would be resolved; she said that they were working closely with i2 to solve the problems by creating “some technical and operational workarounds” and that the supply chain software was now stable.

While Nike was positive about the implementation process moving forward, they placed full blame on the SCM software and i2 Technologies.

Nike stopped using i2’s demand-planning software for short-and-medium range sneaker planning; however, it still used the application for short range and its emerging apparel business. By the Spring of 2001, Nike integrated i2 into its more extensive SAP ERP system, focusing more on orders and invoices rather than predictive modeling.

What Went Wrong?

While the failures damaged each company’s reputation in the IT industry, both companies would go on to recover from the poorly executed software implementation. Each side has assigned blame outward, but after reviewing all the events, it's safe to say each had a role in the breakdown of the supply chain management system.

Underestimating Complexity

Implementing software at this scale always has risks. Tom Harwick, Gigi Information Group’s research director for supply chain management, said, “Implementing a supply-chain management solution is like crossing a street, high risk if you don't look both ways, but if you do it right, low risk.”

One of Nike's most significant mistakes was underestimating the complexity of implementing software at such a large scale. According to Roland Wolfram, Nike’s operators had a false sense of security regarding the i2 installation because it was small compared to the larger NSC project. "This felt like something we could do a little easier since it wasn’t changing everything else [in the business]," he says. "But it turned out it was very complicated."

Part of the reason why the project was so complicated was because of Nike’s fractured legacy supply chain system and disoriented data sets. i2’s software wasn’t designed for the footwear and apparel industry, let alone Nike’s unique position in the market.  

Data Quality

Execution by both parties was also to blame. i2 Technologies is on record recommending customization not to exceed 10-15%. Nike and i2 should have recognized early on that this range would be impossible to accommodate the existing SCM system.

Choosing a Big Bang implementation strategy didn’t make sense in this scenario. Nike’s legacy system data was too disorganized to be integrated into the i2 without making dramatic changes before a full-on launch.

Poor Communication

Communication between Nike and i2 from 1999 to the summer of 2000 was poor. i2 claimed not to be aware of problems until Knight issued blame publicly. Greg Brady, the President of i2 Technologies who was directly involved with the project, reacted to the finger-pointing by saying, "If our deployment was creating a business problem for them, why were we never informed?" Brady also claimed, "There is no way that software is responsible for Nike's earnings problem." i2 blamed Nike’s failure to follow the customization limitations, which was caused by the link to Nike’s bake-end.

Rush to Market

At the time, Nike was on the verge of solidifying its position as the leader in footwear and sports apparel for decades to come. Building a solid supply chain that could adapt to market trends and reduce the manufacturing cycle was the last step toward complete market dominance. In addition, the existing supply chain solutions built for the footwear and apparel industry weren’t ready to deploy on a large scale. This gave Nike the opportunity to develop its own SCM system putting the company years ahead of competitors. Implementing functional demand-planning software would be highly valuable for Nike and its retail clients.

i2 also was experiencing market pressure to deploy a major project. Had the implementation gone smoothly, i2 would have a massive competitive advantage. The desire to please Nike likely played a factor in i2’s missteps. Failing to provide clear expectations and communication throughout the process may not have happened with a less prominent client.  

Failure to Train

After the problems became apparent in the summer of 2000, Nike had to hire consultants to create workarounds to make the SCM system operational. This clearly indicates that Nike’s internal team wasn’t trained adequately to handle the complexity of the new ERP software.

Nike’s CIO at the time reflected on the situation. "Could we have taken more time with the rollout?" he asked. "Probably. Could we have done a better job with software quality? Sure. Could the planners have been better prepared to use the system before it went live? You can never train enough."

How Nike Could Have Done Things Differently

While Nike and i2 attempted to implement software that had never been successfully deployed in the global footwear and apparel industry, many problems could have been avoided. We can learn from the mistakes and how Nike overcame their challenges with i2 to build a functioning ERP system.

Understanding and Managing Complexity

Nike’s failure to assess the complexity of the problem is at the root of the situation. Regardless if the i2 implementation was just the beginning of a larger project, it featured a significant transition from the legacy system. Nike’s leadership should have realized the scale of the project and the importance of starting NSC off on the right foot.  

i2 also is to blame for not providing its client with realistic expectations. As a software vendor, i2 is responsible for providing its client with clear limitations and the potential risks of failing to deploy successfully.

See "Understanding and Managing Your Project’s Complexity" for more insights on this topic.

Collaborate with i2 Technologies

Both companies should have realized that Nike required more than 10-15% customization. Working together during the implementation process could have prevented the ordering issues that were the reason for the lost revenue.

Collaboration before deployment and at the early stages of implementation is critical when integrating a new system with fractured data. Nike and i2 should have coordinated throughout the process to ensure a smooth rollout; instead, both parties executed poor project management resulting in significant financial and reputational blows.  

See "Solving Your Between Problems" for more insights on this topic.

Hire a 3rd Party Integration Company

Nike’s lack of understanding of the complexity of SCM implementation is difficult to understand. If i2 had been truthful in that they did not know about problems with their software, Nike could have made a coordinated decision not to involve the software company during the process.

Assuming that is the case, Nike should have hired a 3rd party to help with the integration process. Unfortunately, Nike’s internal team was not ready for the project. Outside integrators could have prevented the problems before the damage was done.

Not seeking outside help may be the most significant aspect of Nike’s failure to implement a new SCM system.   

See "Be a Responsible Buyer of Technology" for more insights on this topic.

Deploy in Stages

A “Big Bang” implementation strategy was a massive mistake by Nike. While i2 should have made it clear this was not the logical path considering the capabilities of their software and Nike’s legacy system, this was Nike’s decision.

Ego, rush to market, or failure to understand the complexities of the project could all have been a factor in the decision. Lee Geishecker, a Gartner analyst, stated that Nike chose to go live a little over a year after starting the project, while projects of this scale should take two years before deployment. In addition, the system should be rolled out in stages, not all at once.

Brent Thrill, an analyst at Credit Suisse First Boston, is on record saying he would have kept the old system running for three years while testing i2’s software. In another analysis, Larry Lapide commented on the i2 project by saying, "Whenever you put software in, you don't go big bang, and you don't go into production right away. Usually, you get these bugs worked out . . . before it goes live across the whole business."

Train Employees Sufficiently

At the time, Nike’s planners weren’t prepared for the project. While we will never know what would have happened if the team had been adequately trained, proper preparation would have put Nike in a much better position to handle the glitches and required customizations.

See "User Enablement is Critical for Project Success" for more insights on this topic.

Practice Patience in Software Implementation

At the time, a software glitch causing a ripple effect that would impact the entire supply chain was a novel idea. Nike likely made their decisions to risk the “Big Bang” strategy, deploy in a year without phases and proper testing, and not seek outside help because they assumed the repercussions of a glitch wouldn’t be as catastrophic.

Impatience resulted in avoidable errors. A more conservative implementation strategy with adequate testing would have likely caught the mistakes.

See "Going Live Too Early Can Be Worse as Going Late" for more insights on this topic.

Closing Thoughts

One of the most incredible aspects of Nike’s implementation failure is how quickly the company bounced back. While Nike undoubtedly made numerous mistakes during the process, NSC was 80% operational in 2004.

Nike turned the project around by making adjustments and learning patience. Few companies can suffer a $100 million “speed bump” without filing bankruptcy, but Nike is in that position because of its resilience. The SAP installation wasn’t rushed and resumed many aspects of its original strategy. In addition, a training culture was established due to the i2 failures. Customer service representatives receive 140 to 180 hours of training from highly skilled “super users,” All employees are locked out of the system until they complete their required training courses.

Aside from the $100 million loss, the NSC project was successful. Lead times were reduced from nine months to six (the initial goal), and Nike’s factory inventory levels were reduced from a month to a week in some cases. Implementing a new SCM system also created an integration between departments, better visibility of customer orders, and increased gross margins.

While Nike could have executed far more efficiently, Phil Knight’s early assessment of the i2 failure turned out to be true. In the long run, the process gave Nike a competitive advantage and was instrumental in building an effective SCM system. 

In a nutshell: A failure to demonstrate patience, seek outside help, and rush software implementation can have drastic consequences. 

If you are an executive sponsor, steering committee member, or a non-executive board member and want to learn what you need to do so that your project does not land on my list with project failures? Then my (Non)-Executive Crash Course is what you are looking for.

If you want to know where you are standing with that large, multi-year, strategic project? Or you think one of your key projects is in trouble? Then a Project Review is what you are looking for.

If you just want to read more project failure case studies? Then have a look at the overview of all case studies I have written here.

Sources

> Nike says i2 hurt its profits

> I2 Technologies, Inc.

> How Not to Spend $400 Million

> i2-Nike fallout a cautionary tale

> Nike rebounds: How Nike recovered from its supply chain disaster

> Scm and Erp Software Implementation at Nike – from Failure to Success 

> I2 Says: "You Too, Nike"

Read more…