Cloud computing makes high availability affordable. You can launch extra servers when your system goes down and do not need any extra hardware in stock any more. Costs for your failover systems are low since you only pay for what you actually use.

High availability does not mean your system will never be offline. But you can reduce the downtime to a minimum and to a level that has been agreed to between the IT department and senior management in form of a Service Level Agreement (SLA). The higher the availability the higher the costs, so you will need to find an acceptable balance here.

The goal in designing your system is to have a certain amount of fault tolerance. That means if a part of your system goes down your application will still function, albeit a bit slower or otherwise limited. You might have two database servers that are in sync for example, so that one can take over if the other one fails. This also means you have to prevent single points of failure (SPOF).

Part of the planning for high availability is to prepare a disaster recovery (DR) plan which should include processes, policies and procedures for restoring your systems after a catastrophic event. In it you also need to define how long your system may be offline in the worst case. We call this the Recovery Time Objective (RTO). Then there is a Recovery Point Objective (RPO) that defines how much data loss is acceptable, for example it may be acceptable to loose the data of people that subscribed to your newsletter in the last hour. If you don’t need to recover all data you might be able to get your systems back up faster, e.g. from a snapshot that is made in regular intervals, thus reducing your recovery time.