Relational Database Service (RDS)

RDS is a Database as a Service (DBaaS) product. It can be used to provision a fully functional database without the admin overhead traditionally associated with DB platforms. It can perform at scale, be made publicly accessible, and can be configured for demanding availability and durability scenarios.

RDS supports a number of database engines:

MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server
Aurora: An in-house developed engine with substantial feature and performance enhancements

RDS can be deployed in single AZ or Multi-AZ mode (for resilience) and supports the following instance types:

General purpose (currently DB.M4 and DB.M5)
Memory optimized (currently DB.R4 and DB.R5, and DB.X1 e and DB.X1 for Oracle)
Burstable (DB.T2 and DB.T3)

Two types of storage are supported:

General Purpose SSD (gp2): 3 IOPS per GIB, burst to 3,000 IOPS (pool architecture like EBS)
Provisioned IOPS SSD (io1): 1,000 to 80,000 IOPS (engine dependent) size, and IOPS can be configured independently

RDS instances are charged based on:

Instance size
Provisioned storage (not used)
IOPS if using io1
Data transferred out
Any backups/snapshots beyond the 100% that is free with each DB instance

RDS supports encryption with the following limits/restrictions/conditions:

Encryption can be configured when creating DB instances.
Encryption can be added by taking a snapshot, making an encrypted snapshot, and creating a new encrypted instance from that encrypted snapshot.
Encryption cannot be removed.
Read Replicas need to be the same state as the primary instance (encrypted or not).
Encrypted snapshots can be copied between regions — but a new destination region KMS CMK is used (because they are region specific).

Network access to an RDS instance is controlled by a security group (SG) associated with the RDS instance.

RDS is capable of a number of different types of backups. Automated backups to S3 occur daily and can be retained from 0 to 35 days (with 0 being disabled). Manual snapshots are taken manually and exist until deleted, and point-in-time log-based backups are also stored on S3.

Multi-AZ

RDS can be provisioned in single or Multi-AZ mode.
Multi-AZ provisions a primary instance and a standby instance in a different AA of the same region.
Only the primary can be accessed using the instance CNAME.
There is no performance benefit, but it provides a better RTO than restoring a snapshot.

Replication of data is synchronous — it’s copied in real time from the primary to the standby as it’s written. The primary and master each have their own storage. Backups are taken using the standby, ensuring no performance impact. Maintenance is performed on the standby first, which is then promoted to minimize downtime.

Multi-AZ

Provisions and maintains a standby replica in a different AZ
The primary synchronously replicates to the standby instance for redundancy
Can reduce downtime in the event of a failure on the primary
The feature can be turned on from the console or API
Amazon automatically handles replication
Replication can cause higher write latency: Using Provisioned IOPS is recommended

Maintenance

AWS will perform the following steps:

Perform maintenance on the standby
Promote the standby
Perform maintenance on the old primary, now the standby

Read Replicas are read-only copies of an RDS instance that can be created in the same region or a different region from the primary instance.

Read Replicas can be addressed independently (each having their own DNS name) and used for read workloads, allowing you to scale reads. Five Read Replicas can be created from a RDS instance, allowing a 5x increase in reads. Read Replicas can be created from Read Replicas, and they can be promoted to primary instances and can be themselves Multi-AZ.

Reads from a Read Replica are eventually consistent — normally seconds, but the application needs to support it.

Scaling for Performance

Read replicas can be used to offload work from the main database:
- Writes go to the source instance.
- Reads go to the read replica(s).
Replication to Read Replicas is made asynchronously (not at the same time) .
Data is written to the source instance and then replicated to the read replica(s).

Scenario: You need to pull data for analysis, but you don’t want to degrade performance on your production database.

Solution: Create a read replica that’s only used for this reason.

AWS RDS Read Replication vs. Multi-AZ Failover Deployments

Read replicas are built primarily for performance and offloading work.
Multi-AZ deployments are used for high availability and durability.
Multi-AZ deployments give us synchronous replication instead of asynchronous.
Multi-AZ deployments are only used to perform a failover; they are idle the rest of the time.
Read replicas are used to serve legitimate traffic.
It is often beneficial to use both of these as complements.

What Can Trigger a Failover?

Loss of availability in the primary Availability Zone
Loss of network connectivity to the primary instance
Resource failure with the underlying virtualized resources
Storage failure on the primary database
The DB instance’s server type is changed
Maintenance

How Do Failovers Work?

The process is automated by AWS:

1. Amazon detects an issue and starts the failover process.

2. DNS records are modified to point to the standby instance.

3. The application re-establishes any existing DB connections.

The application requires no changes since the DB endpoint is the same.

How Do We Know When a Failover Happens?

Use RDS events to notify via email or SMS.
Use the API or console to manually check events.
Use the API or console to check the state of the Multi-AZ deployment.

Monitoring for performance and availability

Managed database web service AWS manages patching, backups, detecting failures, and recovery
Supports these engines: MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, and Amazon Aurora

Instance Classes

General Purpose (M4, M5)
Memory Optimized (R4, R5, X1e, X1)
Burstable Performance (T2, T3)

Storage Type

General Purpose (SSD) 3 IOPS per GIB, burst to 3,000 IOPS
Provisioned IOPS (SSD) • 1,000 to 80,000 IOPS (depending on the engine)

Monitoring

CloudWatch metrics:
- Swap Usage: Increase = low or no available RAM
- ReadIOPS/WriteIOPS: Use this to determine storage type changes
- ReadLatency/VVriteLatency: Higher latency = more IOPS needed
- ReadThroughPut/WriteThroughput: Average bytes per second
RDS events:
- A record of instance, snapshot, security group, and parameter group events
Enhanced monitoring:
- Real-time metrics for the OS of the DB instance
- Gets metrics from an agent on the instance