Saturday 9 February 2008

[AT RISK 2008/001] February 9th 2008 - scheduled maintenance

Maintenance log for February 8th 2008
Attending: dwm
Status: completed at 15:11hrs
Summary:
  • Tastycake.net server kalimdor.tastycake.net being taken offline at 1200noon GMT for maintenance.
  • No Tastycake.net services will be available whilst works are in progress.
  • Works to be carried out:
    • Install third 250GB hard-drive into RAID mirror. (Complete)
    • Install GRUB bootloader on third drive. (Complete)
    • Replace old 127GB hard-drive with new 250GB replacement. (Complete)
    • Install GRUB bootloader on replacement drive. (Complete)
    • Create new RAID mirror set on as-yet unallocated space. (Complete)
    • Expand LVM working set using new RAID mirror set. (Complete)
    • Upgrade local kernel to 2.6.24. (Complete)
    • Discontinue local NFS server, use read-only bind mount for /vol/recover instead.
      (New feature in 2.6.24.) (Cancelled)
Transcript, times are in GMT:
  • [1511] Final checks complete, at-risk period ends.
  • [1506] Performing final checks prior to announcing end of at-risk period.
  • [1505] Read-only bind mounts don't seem to be functioning, we perhaps need an updated mount-utils. This can be done safely at a later date.
  • [1456] Initialized /dev/md0 as new LVM PV and added PV to existing volume VG; total capacity: 232GB.
  • [1454] Added new second disk to main RAID mirror set /dev/md7.
  • [1451] Created new RAID mirror across previously-unused disk space.
    (NOTE: /dev/md0 is not the RAID mirror containing /boot, /dev/md6 is.)
  • [1440] Kernel installed, rebooting to verify correct operation and to reload DOS partition tables.
  • [1437] Installing updated kernel packages. (And SNMP security updates, whilst we're here.)
  • [1434] Partitioned unallocated space on all three disks. Leaving creation of new RAID mirror array until last, as it would only be interrupted by reboots anyway..
  • [1431] Partitioned drive 2 to match other disks. Added drive 2 partition to /boot RAID mirror volume. Installed bootloader on new drive.
  • [1423] Replacement drive 2 installed, booting.
  • [1415] Re-sync completed. Rebooting to replace drive 2.
  • [1402] Re-sync 90% complete. (Unfortunately, it seems to be slowing down to about 15MB/sec again.)
  • [1357] Installed spare GRUB bootloader on new disk.
  • [1350] Re-sync 80% complete.
  • [1334] Re-sync 66.6% complete.
  • [1318] Re-sync 50% complete. (Now peaking at ~22MB/sec; ETA at present rate: 44mins.)
  • [1301] Re-sync 33.3% complete. (Now peaking at ~19MB/sec; ETA at present rate: 67mins.)
  • [1250] Re-sync 25% complete. (Looks like its speeding up as it proceeds, probably due to disk geometry.)
  • [1230] Re-sync 10% complete.
  • [1219] RAID re-sync in progress; need to wait for it to complete before replacing disk 2. ETA @ ~15MB/sec: 115mins.
  • [1211] Adding new disk 3 partitions to RAID mirror sets.
  • [1209] Disk installed, server rebooted. Partitioned disk 3 to match existing layout.
  • [1201] Serial terminal up; sent reboot instruction with 2-minute grace.
  • [1159] Sent final warning via wall to save all state; disk installed in caddy and ready for reboot.
  • [1150] Readying disk three for hot-insertion. (Though, because we're running on IDE, this will require a reboot..)
  • [1147] Had to abort the transfer drive update; don't have access to the rear of the rack, and the front-side USB port is far too slow. Will just have to do today's work carefully..
  • [1105] Taking full filesystem image backup to spare transfer drive.
  • [1045] Initial update of offsite maintenance log.
  • [1035] Arrived at Telehouse Docklands.