Skip to content

How to migrate PostgreSQL data from RDS to Aurora

Introduction

This document describes two methods on how to migrate data from Relational Database Service (RDS) to Aurora Serverless. Both processes will require a maintenance window in which downtime is expected.

Use method 1 if you are migrating from an AWS RDS database to Aurora PostgreSQL (non-Serverless). Use method 2 if you are migrating from any other database to Aurora Serverless or Aurora PostgreSQL. Check each method for any limitations or special considerations.

Prerequisites

Migrate using RDS snapshot

By using a snapshot of the RDS instance, you will be able to create a new Aurora cluster based on the data that exists in the database at the moment the snapshot was taken.

Any data written to the database after the snapshot was taken will not be migrated. Consider turning off any services that might write to the database during the migration process.

Limitations

  • Major version upgrades are not supported for example PostgreSQL 9.6 to 10.7

Step-by-step using snapshot

  1. Get the RDS instance identifier and engine version of the database you want to migrate data from:

    aws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'
    
  2. Get the engine version of the database you want to migrate data from:

    # Replace <db-instance-identifier> with the RDS instance identifier from step 1
    aws rds describe-db-instances --db-instance-identifier <db-instance-identifier> --query 'DBInstances[*].EngineVersion'
    
  3. Create a snapshot of the RDS instance:

    # Replace <rds-instance-id> with the RDS instance identifier from step 1
    # Replace <snapshot-name> with the name of the snapshot
    aws rds create-db-snapshot --db-instance-identifier <rds-instance-id> --db-snapshot-identifier <snapshot-name>
    
  4. Wait for the snapshot to be created:

    You can check the progress in RDS console or by waiting for the aws-cli command below to complete.

    # Replace <snapshot-name> with the name of the snapshot used in step 3
    aws rds wait db-snapshot-completed --db-snapshot-identifier <snapshot-name>
    # This command will return when the snapshot is created, but you won't see anything in the process.
    
  5. Get the ARN of the snapshot for use in the next step:

    # Replace <snapshot-name> with the name of the snapshot used in step 3
    aws rds describe-db-snapshots --db-snapshot-identifier <snapshot-name> --query 'DBSnapshots[*].DBSnapshotArn'
    
  6. Update the Terraform file which represents the new database

    This file usually exists in the infra/ directory in your IaC repository and is usually one of either postgres-aurora or postgres-aurora-severless.

       module "args_rds" {
          source = "git@github.com:oslokommune/golden-path-iac//terraform/modules/args-rds-aurora?ref=args-rds-aurora-v0.1.0"
       }
    
       locals {
          args_rds = module.args_rds.data.serverless.small
       }
    
       module "rds_aurora_serverless" {
          source  = "terraform-aws-modules/rds-aurora/aws"
    
          name        = local.db_name
          engine      = local.args_rds.engine
          engine_mode = local.args_rds.engine_mode
          # Replace <engine-version> with the engine version from step 2, or upgrade to another valid version
          engine_version = "<engine-version>"
          ..
    
          # Replace <snapshot-arn> with the ARN of the snapshot from step 5
          snapshot_identifier = "<snapshot-arn>"
    
          tags = local.common_tags
       }
    
  7. The migration starts when the Terraform plan is applied. The migration will take some time depending on the size of the database. You can check the progress of the migration in the AWS RDS console and look at the Status column.

  8. You can now update endpoints and other services to point to the new database.

  9. Once the migration is done you can remove the snapshot identifier from the Terraform file and apply again. This will remove the snapshot identifier from the Terraform state.

  10. Verify that the migration was successful and remove the old database.

Migrate using pg_dump and pg_restore

By using pg_dump and pg_restore you can create a dump of the database and then restore it to the new database. Similar to the snapshot method, any data written to the database after the dump was taken will not be migrated. Consider turning off any services that might write to the database during the migration process.

Step-by-step using pg_dump and pg_restore

You need to have access to the database you want to migrate data from. This can be done by locally connecting to the database using a bastion host and then dumping the database to your machine, or by running the commands from within a EC2 instance that has access to the database and a storage medium where the dump can be stored to and subsequently restored from to new the new database.

EC2

Team Kjøremiljo recommends dumping and restoring from an EC2 instance, as it will be faster, secure and more reliable than doing it from a local machine. Read more about setting up a EC2 instance with access to RDS.

Dump the database

  1. Have a shell open to the EC2 instance or bastion host that has access to the database you want to migrate data from.

  2. Create a dump of the database:

    # Replace <hostname> with the hostname of the database you want to migrate data from
    # Replace <username> with the master username
    # Replace <database> with the name of the database
    # Replace <output-file> with the name of the dump file
    pg_dump --host <hostname> --format=directory --create --jobs 5 --dbname <database> --username <username> --file <output-file>.dump
    
  3. Copy the dump file to a storage medium that can be accessed from the new database. This can be done by copying the file to an S3 bucket or to a shared drive.

Restore the database

  1. Create a new database using the Terraform file from the first method. You can use the same Terraform file as the one used in the first method, but you need to remove the snapshot identifier from the file.

  2. Have a shell open to the EC2 instance or bastion host that has access to the database you want to migrate data to.

  3. Restore the dump file to the new database:

    # Replace <hostname> with the hostname of the new database
    # Replace <username> with the master username
    # Replace <database> with the name of the database
    # Replace <input-file> with the name of the dump file
    pg_restore --host <hostname> --jobs 5 --dbname <database> --username <username> --file <input-file>.dump
    
  4. You can now update endpoints and other services to point to the new database.

  5. Verify that the data has been migrated correctly and delete the dump file from the storage medium.

See also

Best practices for migrating PostgreSQL databases to Amazon RDS and Amazon Aurora