Annotation: Discusses tradeoffs in the design of giant-scale services that allow for graceful degredation of service under load and failure and support for online evolution. Advocates automatic upgrade systems, and describes three approaches: fast reboot (everyone at once), rolling upgrade (round-robin), and big flip (partition the system, then upgrade each partition). Insists that these systems need a safe and fast way to roll back to the old version, since new versions tend to be buggy. Mentions that many systems use a staging area where the new software is set up alongside the old software before going live --- makes switchover (in either direction) easy.
BibTeX entry:
@article{brewer01lessons,
author = {Eric A. Brewer},
title = {Lessons from Giant-Scale Services},
journal = {IEEE Internet Computing},
month = jul,
year = {2001}
}
Sameer Ajmani