I’m sure many of you have followed the launch (and re-launch) of HealthCare.gov over the last few weeks, as this must rank as one of the most highly publicized (and scrutinized) technology projects in recent memory.
IT projects are never easy, and the complexity of this initiative is staggering (not to mention the politics involved). While I hate to rub salt in the wounds of the many IT professionals working hard to get this new system up and running, I have to think this is a teachable moment for the technology industry.
So, in the spirit of learning from the past, and fully aware that is this Monday-morning quarterbacking, here are a few politically neutral “lessons learned” from the launch that will hopefully help with your next nonprofit technology project.
Selection and Procurement
President Obama made a number of interesting comments related to the initiation of the overall IT project. “When we (the Federal Government) buy I.T. services generally, it is so bureaucratic and so cumbersome that a whole bunch of it doesn’t work or it ends up being way over cost.”
I can’t speak to the Federal procurement process in particular, but I’d encourage organizations with similarly bureaucratic or rigid selection processes (e.g., strictly scripted demonstrations, limited input from end-users, extensive requirements documentation neither vendor nor stakeholders can fully understand) to take note, as the selection process sets the tone and informs the quality of the rest of the project.
The contract for development of the site was awarded in December 2011, but ” the government was so slow in issuing specifications that the firm did not start writing software code until this spring (of 2013).” That means that development didn’t start until 6-8 months prior to go-live (?!), which of course introduces a host of risks into the project, and leaves little room for testing and adjustments (more on that later). I suspect this was a result of dependencies on policymakers developing regulations and business processes, which created bottlenecks for requirement gathering, business analysis, and subsequently development. Regardless an unrealistic burden was placed on the development team.
On top of that, I suspect the go-live date of October 1st was determined well in advance, with the rest of the project plan developed to back into the go-live date. Less than ideal, highly risky, but perhaps inevitable with such a high-publicity initiative.
Changes late in a development cycle are exponentially more labor-intensive (and expensive) to implement than changes early in cycle (this is a must-read on the topic).
According to the New York Times, “as late as the last week of September, officials were still changing features of the Web site.” Making major design decisions less than a month prior to a rollout date is a nightmare for developers that puts time pressure on acceptance testing, as all of the components aren’t in place to test the end-to-end solution. This leaves no “wiggle room” for addressing issues identified in testing or other adjustments to plan.
Healthcare.gov reportedly runs on “500 million” lines of code, about “25 times the size of Facebook, one of the world’s busiest sites,” according to experts at a Congressional Hearing. Here’s a visualization that puts this in perspective.
If true, the maintenance and troubleshooting costs of the system are going to be astronomical, and the system is going to be an example of “Software Bloat” in Management Information Systems textbooks for years to come.
While I am slightly incredulous of this claim, I have to assume that if true, this is due to building large numbers of components from scratch and/or the work to develop integrations and maintain interoperability across disparate back-end platforms and different agencies.
User Adoption and Change Management
Preparing end-users to change their ways and use a new system is critical to any IT project – what’s the point of going through a deployment if the system doesn’t get used in production (see here for some of my high-level thoughts on the topic)?
The HealthCare.gov launch has a great example of this: a group of “Navigators” (think of a Change Team) promoting the new system and communicating value to end users on a personal level. I can’t speak to the success of the Navigators program, but have to applaud the intent of involving end users in the project in a meaningful and personalized way (on a side note, here’s a great case study on another large technology project gone wrong, because end-users weren’t involved in design and other aspects of the implementation).
According to CBS News, the website failed in tests of 200-300 users “just before launch date,” which explains why the site was failing under a load of multiple tens of thousands of concurrent users.
So, the systemic risks related to system performance were known, but as I mentioned above the go-live date was likely “hard coded” (no pun intended). I suspect the launch date was mandated for political reasons, as often “must go live by” projects are, making the failure of the site inevitable, unfortunately. See my comments above about delays in requirement definition leading to delays in development, which puts a time-crunch on testing. It goes without saying that testing only works if the team has time to address failures prior to launch.
Big bang go-lives introduce the possibility of a big-boom event. In the real world projects are constrained by dates, expectations, and executive mandates, but I’d suggest that the most successful software implementations are smaller, more agile, and incremental in nature. In my experience, I typically find that the most successful systems generally started from humble beginnings.
Could the site have been piloted with smaller groups? Were mission-critical items prioritized, with “nice to have” items phased in over time? Keep in mind that increases in requirements have an exponential relationship to work effort and project risk – large projects with significant requirements have a much greater technical debt and many more points of failure.
“Begin with the end in mind,” as Steven Covey might say. How will you know you are successful if you can’t measure results? This is a problem on every project and with every initiative, and often one that takes the form of an onion – once you peel back one layer of reporting and analysis, there’s another deeper one below.
At this point, it would be great to focus on value and report the number of individuals who are now insured through the system, putting the various glitches and proverbial bumps in the road in perspective. However, it is apparently not possible to report on the number of people who have successfully used the site.
Reporting is built on data, which of course is determined by underlying business practices and solution design. Data quality is not something that can be added at the end of a project. Similarly, it is very difficult to “reverse-engineer” reporting and information delivery after launch.
Hindsight is always 20/20, but hopefully this will provide some food for thought for those planning their next IT project. As always, comments are welcome and appreciated!
Bonus: Related Reading
Like many others in the technology space, I’ve been captivated by this topic, and (for better or worse) have read much more than could ever fit neatly into a single blog post.
So, here are a few other articles on the topic that I found interesting:
What to do when a project goes wrong
Fascinating discussion from launch day offered here only for its technical commentary (semi-NSFW like everything on reddit)
Neither Quality nor Security can be retrofitted into a solution
Another interesting read, although I can’t vouch for all of the technical details
If this wasn’t painful enough for you, or if you want to feel better about a current project gone wrong, read more IT fails here