The Santiago de Compostela Train Accident
Revisiting a tale of two stories *
Almost one year after the tragic train accident just outside the Spanish city of Santiago de Compostela, that cost 79 passengers their lives and injured everyone else on board, the national accident investigation board released its findings through the final investigation report.
It concluded that “human error was the sole cause of the accident”. The smoking gun the investigators pointed towards was the train drivers’ timetable. This clearly prescribed the speed limit, which had been exceeded excessively in the accident.
This of course had been known to be the immediate cause right from the beginning. It was well documented and circulated heavily through mass media, given it had been documented by a surveillance camera on the track. Further confirming this perspective was the train drivers´ public confession of his sins. He himself stated shortly after the accident, that he had recklessly exceeded the speed limit and would have to deal with the guilt of what happened for the rest of his life.
In this light it is hardly surprising what response was triggered within the judicial system. The driver was dragged into a criminal proceeding, accused of 79 accounts of reckless manslaughter and would face up to 5 years of imprisonment.
Note: Unfortunately, it has not been possible to find any public accounts of how these proceedings ended.
But let us start with a quick recount of the events leading up to the accident on the 24th of July 2013.
Santiago de Compostela was in the midst of preparations for its large religious festival the feast of Saint James, which always is held on the 25th of July. The city is a religious center for Catholics and the final destination of the famous Spanish pilgrim route known as the Camino.
18 months earlier the new high-speed train line had been inaugurated as the last part of the south-north axis from Madrid to Ferrol. It brought a sense of pride to this northern region, which often has felt underprivileged and somewhat forgotten by national government. Finally, they were now participants in the modern age of train transportation with high speed performance, comfort and punctuality that would make quite a few other European countries envious.
On the evening of the 24th of July however this harmony was heavily disturbed.
The high-speed train, on its way from Ourense to the main station at Santiago de Compostela, entered a steep bend with an 80 km/h speed limit at 179 km/h. It was thrown of the rails and slammed into the concrete wall that surrounded the outer perimeter of the curve. A large-scale crisis intervention followed and understandably the upcoming festival was cancelled.
How could something like this happen?
Apart from the usual gossiping rumors about inappropriate use of an iPad and alleged FB posts with the train driver bragging about speeding in his train, it very soon became known that the train driver had been engaged in a phone conversation and hereby missed the standard breaking point some 4 km before the curve. As the phone conversation ended the train driver realized the approaching curve and immediately commenced emergency braking. This however was only sufficient to reduce speed from 199 km/h to the 179 km/h with which the train eventually entered the curve. Hence, from this account the accident could easily be attributed to inattention and non-adherence to the prescribed speed limit. “Shouldn´t he be driving his train instead of talking in the phone?”
This is what constitutes the first story of accidents.
Human error is seen as the cause of accidents. These sharp end individuals undermine otherwise well-functioning systems and the search for failure stops when we have found an individual who could have or should have done something different that would have averted the outcome.
This can be contrasted by the second story of accidents.
The second story of accidents sees human error as the consequence of deeper trouble inside the system. It is a story of multiple factors that create the conditions which lead to operator errors. This is also the concept which was later refined by Prof. Sidney Dekker and let to the formulation of the New View on Human Error.
The revisionist account
The following is my attempt to provide this second story in an abbreviated form. In a more extensive version, it has been part and parcel of my investigator and leadership training for the last 5 years. Interestingly all the data to provide this account are provided by the official accident investigation report, which is the absolute primary source of information. Some challenge resided in translation, since the report has only been released in Spanish.
To understand the larger Spanish railway system better it is necessary to zoom out a bit at first. The highspeed trajectory from Madrid to Ferrol consists of a mixture of different system components. The rails themselves are partially of a Spanish standard and partially built to European standards, which means they have a difference in rail width. Some parts are double, some parts single rail. Sections are installed with electrical overhead powerlines while some parts have to be operated with diesel engines. Last but not least the system also comprises two different ground-based train management systems. A Spanish one called ASFA and then once again the European standard ERTMS. To operate this line a multifunctional train was built. It can operate different rail width with both diesel and electrical power at speeds up to 220 km/h and communicates with both ASFA and ERTMS.
On the infrastructure side the design ended up having a sharp bend with the maximum speed of 80 km/h directly after a long stretch of completely linear high-speed rails designed for speeds up to 300 km/h. This was not the original intent, but expropriation of land turned out to be more difficult than anticipated and led to this trade off in the design. (see pic below)
Train management systems:
This final part of the high-speed rail also originally was foreseen to have full ERTMS coverage, as this system offers a wider range of functionalities. As an example, ASFA is only able to apply auto braking for the crossing of red lights while ERTMS can apply more advanced logics including reminders for safety critical operations. But as projects have a tendency to develop in the face of resource scarcity, it was revised to only have partial ERTMS coverage. This was subsequently approved by the regulator. However, the design was still somewhat resilient. The revised system foresaw ERTMS coverage until the point, where breaking would have to be commenced in order to reach the low speed bend at the prescribed 80 km/h. Here the more advanced logic of ERTMS offered the option to sound a reminder alarm for exiting ERTMS coverage. If such an alarm is not acknowledged by the train driver, auto braking is initiated and creates a final systemic barrier to help avoid overspeed in the curve. The question here obviously arises. How was this accident possible with such a design feature in place? More on this in the next section. (pic below of the revised project incl. partial ERTMS coverage)
Organizational decision making:
Already in November 2011, when testing the new tracks and during the driver conversion training in 2011 concerns were raised by train drivers. They were worried about the potential risk of the steep curve in direct connection to the high-speed trajectory. This led to the creation of an internal memo for management to address. The inauguration went ahead as planned on the 9th of December the same year. The memo containing the safety concern was part of the agenda in 5 management meetings in the following months. At the 6th management meeting in mid 2012, the memo disappeared from the agenda without a trace of any decisions being made.
In June 2012 the operator RENFE faced a new challenge. ERTMS was foreseen to be updated to version 2.0 but this turned out to be impossible due to compatibility problems resulting from the interoperability with ASFA. Unfortunately, the report shares only few details on this problem and provides no explanation why a fallback to the previous version was not an option either. In order to continue operating RENFE consequentially asked the Spanish railway authorities for a waiver to continue operating without a functioning ERTMS system. On the 23rd of June this waiver was granted. Furthermore, the waver contained no expiry date.
From here on detecting the correct braking point visually in the form of the small sign E´7 below the light signal at 200 km/h, was the only remaining barrier to ensure timely braking towards the curve. (The point can be seen in the picture below. The signal was green when the train driver passed it).
Still, the drivers continued safe operations for 13 months and one day under these conditions.
Driving the train on the 24th of July
As the train left Ourense, which is the station just prior to Santiago, it had accumulated a delay of three minutes. This was considered substantial on this line, which prided itself with its punctuality. The train driver still operated the train conservatively in relation to the prescribed timetable and accelerated the train to 199 km/h where 220 km/h was the limit. The train conductor however was concerned about the delay and had formed a plan. If they could get a different track at the central station of Santiago de Compostela, they could make up some lost time through faster embarking and disembarking. The conductor wanted to propose this solution to the train driver and called him on the cell phone. This being their only means of internal communication.
From the first ring tone until the conversation was terminated less than 2 minutes elapsed. Yet shortly into the conversation, the sign to commence breaking was passed and when the driver hung up, he was just about to exit the last tunnel before the curve. He immediately recognized his position and initiated the emergency breaking procedure. 20 seconds later the train derailed and crashed into the concrete wall.
A second story perspective
Even in this abbreviated version, I hope it has become visible how the simple first story of human error as a cause now has unfolded into a complex organizational accident. An accident that contains a long range of ingredients, including at least the following:
Integrating European harmonization requirements
Reporting of safety concerns
Contribution of the protective structure (Regulator)
Traceability of managerial decision making
Unruly technology including unforeseen compatibility problems
Managing risk associated with change incl. removal of safety critical barrier
Not addressing these themes creates the potential for new failures. Failures that can remain dormant for extended periods of time but then manifest in completely different and unforeseen ways.
Rather than clear cut causes, these terms or themes a more akin towards serving as explanations. Also, they contain a multitude of stakeholders ranging from operator over infrastructure provider to regulatory bodies and cover issues across the entire range from front line sharp end to societal questions at the blunt end. But compared to complicated tree structures they provide a very simple communicable short list. They can also be used for new ways of working with learning processes. This topic will be expanded in an upcoming article.
Working with first and second stories
I am fully aware that this content may no longer be as fundamentally new to many readers as it was in 1998. Still, I have experienced how many organizations still wrestle with how to enact the good intentions of moving away from blaming individuals. And here the concept of first and second stories can provide an excellent addition to the existing tool kit. Alone the fact that Human Error still can surface as the main cause in reports from accident investigation boards in 2014 indicates the need for more work in this area.
Does your organization still struggle to move beyond human error and identifying systemic issues rather than individual shortcomings? NewView Consulting offers tailormade workshops and trainings for both leadership and investigators, that provide reflection processes and tools to ensure progress in this matter. A range of external incidents can be combined with local content, to provide the most optimal basis for internal discussions and create a company specific framework to foster optimal learning from incidents.
* This term is humbly borrowed from a hallmark in the safety scientific literature by Prof. David Woods and Dr. Richard Cook from 1998. I highly recommend it as an introduction to the difference between individual vs system perspective.