Tuesday, 14 July 2009

[INCIDENT 2009/001] July 14th 2009 - Unexpected server failure

Incident log for July 14th 2009
Attending: dwm, ncm
Status: Completed at 11:32hrs, July 14th 2009.
Summary:
  • The server kalimdor.tastycake.net stopped functioning correctly at or shortly after 0900.02hrs for reasons unknown. This was detected at 1045hrs, and normal service was restored at 1132hrs.

Transcript, times are in GMT+1:
  • [1132] Normal services restored.
  • [1130] All filesystems pass checks. Bring server up into normal multi-user mode.
  • [1113] Server booted into single-user mode using secondary kernel image. Checking all local filesystems for errors.
  • [1057] Reboot into single-user mode failed; initial ramdisk for primary kernel image found to be corrupted or truncated. Rebooting into backup kernel image.
  • [1045] Service failure discovered. Emergency reboot triggered after serial console found unresponsive.
  • [0902] Clients running on kalimdor.tastycake.net time-out from remote services.
  • [0900] kalimdor.tastycake.net stops logging to local system log.