New Zealand / Technology

No warnings of IT outage that cancelled treatments at Wellington, Wairarapa hospitals

14:50 pm on 23 August 2023

Photo: RNZ / DOM THOMAS

Wellington Hospital says it had no way of knowing its computer systems would overheat, causing a major shutdown that cancelled treatments in June.

The country's public health system is in the middle of assessing a plethora of old, unreliable and insecure IT systems that need billions of dollars to upgrade.

However, Te Whatu Ora said it had never recorded any risks or issues with Wellington's computer chillers.

In fact, annual maintenance had been done on them in late May, just weeks before the main chiller and two standbys malfunctioned.

An Official Information Act response said this was compounded by a cloud-computing outage that prevented online checks of the chillers, that began shutting down around 3am, taking out data services across the city, Hutt Valley and Wairarapa.

"As far as Te Whatu Ora is aware, there were no flags/warnings/alerts that this incident would occur," the interim head of integration and special projects, Stuart Bloomfield, said in an OIA response.

"Prior to 21 June, 2023, there was no indication that a chiller failure was about to occur."

Wellington has had a host of problems with its electrical infrastructure - it was at "high risk of a catastrophic failure", a 2017 report said.

As for IT, a partial stocktake at hospitals nationwide in 2020 found huge problems it said would take $2.3 billion and a decade to fix.

Three years on, not much has changed.

The health minister was told in May that the main barrier to an IT transformation "is the current state and complexity of data and digital systems (including over 4000 applications nationally) and the deficit of data and digital skills across the health ecosystem", the OIA showed.

At Wellington in June, the temperature in the data centre began to rise when the main chiller broke down on 21 June.

"The standby chillers should have automatically come online when the primary chiller malfunctioned, but this did not occur."

By 3am the next morning, servers began shutting down "due to this excessive heat and to avoid damage to the hardware".

The service desk began getting calls 45 minutes later.

Technicians on the job at 6.30am managed in two hours to get the temperature down to an acceptable 23C.

The company that serviced the system could not see online just what was going on, because the cloud-connected services hub was out of sorts, too.

"At the time of the incident, the service provider's remote visibility of the equipment status onsite was not available." This was later fixed.

Computer services were restored around 2pm, though they took another 12 hours to come fully up to speed.

It turned out that one chiller's pump had played up, shutting down the chiller management system.

"The system used the cold water stored in the buffer tank, which eventually ran out."

The management system has since run properly.

"We will be replacing parts and carrying out further testing," Bloomfield said.