Possible temperature sensor issue causing chassis downtime (Investigating) Medium

Affecting System - UK Data Centre

  • 11/01/2025 11:30
  • Last Updated 21/01/2025 11:37

UPDATE 4 (21/Jan): The PSU swap over has been completed and power supply health is reporting all green across the board. We will continue to monitor closely and update this status report if any further issues or updates are available to share.


UPDATE 4 (21/Jan): We are performing upgrades to the PSU on the Dell Blade Chassis where we believe the tripping of services into a "safe mode" was caused.


UPDATE 4 (20/Jan):All services are back online and operating as expected. As mentioned previously, we have new Dell PSUs coming to the data centre this week which are higher grade and newer to replace the existing ones. This was found as the fault to the original post-investigation. We are doing a second round of investigation to see if this incident has any further information to help us.


UPDATE 4 (20/Jan):We have started a full power cycle on the Dell nodes, this should resolve the issue. We are also expecting new replacement PSU parts in the coming days to help resolve this issue fully going forward.


UPDATE 3 (20/Jan): A new occurrence has been detected on the same chassis, we are investigating.


UPDATE 2 (11/Jan): All services are back online, early indications show a possible tripped temperature sensor which caused servers to go into a safe mode. Once we have details or any other updates we will share via this notice. We are lowering the severity to medium while we monitor. Thank you and once again sorry for the inconvenience caused.


UPDATE (11/Jan): We found that one of our Dell chassis has network issues which our team are working on. We hope to have all servers back online asap.


We are investigating with our data centre an issue related to a certain set of servers and their linked IPs. We are sorry for the inconvenience caused.