Create DR Cluster for PostgreSQL

1. Introduction

There are three methods for creating a DR cluster, each with two options. A PostgreSQL configuration must be created before attempting to create a DR Cluster.

WAL (Write Ahead Logs) Streaming
- Hot Standby
- Warm Standby
WAL (Write Ahead Logs) file Archiving
- Hot Standby
- Warm Standby
WAL (Write Ahead Logs) file Shipping
- Hot Standby
- Warm Standby

For Hot Standby, the standby database is accessible by the application as a read-only cluster, and the databases under the cluster can be queried and used for reporting purposes. This can happen while the standby cluster databases are still getting synced by the primary cluster.

The Warm Standby Cluster databases are not accessible by the application but are still getting synced by the primary cluster.

The process of creating a configuration is same for all the three method of creating a DR cluster

To create a DR cluster, you can either click on setup now? or “Set Up Disaster Recovery”

2. Create DR Cluster using WAL Streaming Mode

2.1 Explanation

Below is the diagrammatic and explanation of steps that happen on the background when we run the creation of DR cluster using the streaming mode.

Trigger CSD in streaming mode

Create a Replication Slot on Primary

Run the following SQL query on the primary cluster to create a physical replication slot named SLOT_NAME

SELECT * FROM pg_create_physical_replication_slot(SLOT_NAME, false, false) WHERE NOT EXISTS (SELECT * FROM pg_replication_slots WHERE slot_name = SLOT_NAME)

Tar PostgreSQL Config Files on Primary Server:
- postgres.conf
- pg_ident.conf
- pg_hba.conf
Transfer Config Files to Standby Server
- Transfer the tarball containing PostgreSQL config files from the primary cluster to the standby cluster
Data Directory Setup Using pg_basebackup
- Initiate a streaming backup from the standby cluster to the primary cluster . This process retrieves all files from the primary cluster’s data directory and includes the previously established replication slot on step 2
```
/opt/rh/rh-postgresql13/root/usr/bin/pg_basebackup -D /var/lib/pgsql/data/13/main -F plain -X stream -S dbvisit_2 -R --progress --verbose --checkpoint=fast -d postgres://postgres@postgresqlcentos1:5432
```
Config Directory Setup
- Extract files from tarball received in step 4 and place them in the target configuration directory on the standby server
- Configure the PostgreSQL instance on the standby cluster for streaming replication based on the CSD parameter and the replication slot created earlier
If the operating system is Windows, register a Windows service for the PostgreSQL cluster
Restart Standby cluster
- After setting up the data directory and config directory, restart the standby cluster.
Applying Configuration Changes
- Applying the latest changes on configuration coming CSD and persisting configuration data on the Control Center

2.2 Streaming mode using Dbvisit

WAL Streaming replication allows real-time transfer of primary cluster database changes to standby. The databases are kept in sync. There are three processes wal receiver, wal sender and startup. The wal receiver process sends the LSN(Log Sequence Number) up until the standby (slave) has replayed and send it to the primary. The wal sender process sends the WAL data until the latest LSN, starting from the LSN sent by the wal receiver and the same process writes the data to the WAL Segments. The startup process on the slave replays the data written to the WAL segment.

The primary cluster must update a few parameters to ensure the streaming happens.

In Postgresql.conf file we need to ensure that the listen_address parameter is set to provide a list of IP addresses that it listens to. This is usually the standby server address. In the below example, we have kept the listen_address to “*” so it listens to all the addresses.

We also have to change the pg_hba.conf file to provide connections to the standby server and also add replication connections from localhost to the remote host.

host    all             all             10.0.2.23/24 (Address of remote server)          md5 (scram-sha-256 in version after 10 if needed)

host    replication     all                 10.0.2.22/30   (Address of local server needed after Graceful Switchover)        md5 (scram-sha-256 in version after 10 if needed)
host    replication     all                 10.0.2.23/30  (Address of remote server)         md5 (scram-sha-256 in version after 10 if needed)

Line 1 of the above code block is just for remote connection. Lines 3 &4 are mainly for replication connections that are for all privileged users (usually postgres or some specific user). Line 3 is a local server address and is mainly used to ensure that the switchback happens without any issues.

The top screen displays the usual configuration details

Configuration name of the PostgreSQL for which DR cluster will be created
The primary hostname and the status of the PostgreSQL cluster (ONLINE)
The standby hostname and also mentioning that no standby cluster exists
The locations of the standby cluster datafiles. This location is recommended to be the same as the primary, but this can be changed and provided a custom location. For Linux the permission of these folders must be like below.

-bash-4.2$ ls -lrt /u01
drwx------.  5 postgres postgres  4096 Aug 16 16:41 tablespace
drwx------. 20 postgres postgres  4096 Aug 26 12:06 PG5435-13

The location of the configuration files (postgresql.conf, pg_hba.conf, and pg_indent.conf). This location can be different from the primary data directory, but the example below has these two in the same directory. Keeping the same as the primary is recommended, but this can also be different in the standby.
The tablespace location of the primary is in a different location from the default data directory. The above code snippet also shows the permission of these folders.

The method used for creating the DR cluster (WAL Streaming)
The replication user that is used for streaming connection to the primary cluster. By default, the Postgres user is used. It is recommended to use a different user for establishing this connection. Sample user creation SQL used. SUPERUSER role needed here because only SUPERUSER can retrieve the sync status between primary database and standby database

create user repuser replication login encrypted password 'xxxxxx';
alter ROLE repuser SUPERUSER;

The password of the replication user that is used for streaming connection to the primary cluster.
A replication test that can be done to ensure the correct username and password with necessary privileges and the whether the pg_hba.conf updates are correct.
The result of the replication test.
By, default the observer is enabled that will start monitoring the configuration as soon the DR cluster is created.
Click start to start creating the DR cluster using WAL streaming method.

The report of successful completion of the DR Cluster creation using the WAL Streaming method. The time it took to complete the DR Cluster creation ( 1 ). The overview of the steps performed during this process ( 2 ).

The configuration name and the status of the WAL Streaming which is STREAMING(In Sync).
The observer is monitoring the configuration.
The primary hostname and the status of the production cluster. The port on which the primary cluster is running.
The method used for creating the DR cluster. We have used WAL Streaming method (STREAMING).
The standby hostname and the status of the standby cluster (HOT STANDBY). The port in which the standby cluster is running.

The configuration parameter changes done by Dbvisit as part of the DR cluster creation.

PRIMARY:

No changes here

STANDBY:

postgresql.conf

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

# Added by Dbvisit StandbyMP
archive_mode = 'off'

# Added by Dbvisit StandbyMP
hot_standby = 'on'

# Added by Dbvisit StandbyMP
log_destination = 'stderr'

# Added by Dbvisit StandbyMP
log_directory = 'log'

# Added by Dbvisit StandbyMP
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

# Added by Dbvisit StandbyMP
log_rotation_age = '1d'

# Added by Dbvisit StandbyMP
logging_collector = 'on'

# Added by Dbvisit StandbyMP
primary_conninfo = ''

# Added by Dbvisit StandbyMP
primary_slot_name = ''

# Added by Dbvisit StandbyMP
data_directory = '/u01/PG5435-13'

postgresql.auto.conf

primary_conninfo = 'user=postgres password=pg channel_binding=prefer host=postgresql1 port=5434 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'dbvisit_3'

3. Create DR Cluster using WAL Archive Mode

3.1 Explanation

Below is a detailed explanation of how the DR cluster is created in the background.

Trigger CSD in archive mode.
Create Base Backup Files
- Generate the essential base backup files using the following command (By default Postgres is installed under /usr/pgsql-<version> directory)
```
$PG_HOME/bin/pg_basebackup -D tempBackupDir -F tar -X stream -R --progress --verbose 
```
- The aforementioned command will generate the subsequent files to be transferred to the Standby to facilitate the cluster restoration process.
  - base.tar: PostgreSQL base files, including settings and etc
  - pg_wal.tar: wal data files for standby to restore
  - tblspc_xxxxx.tar: customized table space files
  - backup_manifest: backup metadata, is used for pg_verifybackup to verify the backup
Tar the following PostgreSQL configuration files on the primary server
- Postgresql.conf
- pg_ident.conf
- pg_hba.conf
Transfer Backup Files from Primary to Standby, which are generated on step 3 and step 2
Data directory setup
- On the Standby, receive and extract all files generated from Step 2 and Step 3. Place these files in the designated data folder, previously allocated, before initiating CSD.
Config directory setup
- Extract the files from the tarball received in Step 4 and place them into the target configuration directory on the standby server.
- Configure PostgreSQL in archive mode as per the Dbvisit DR cluster creation parameter
If the OS is Windows, register the Windows service for the cluster
After configuring the data and config directories, restart the standby PostgreSQL server
Apply Configuration Changes from CSD
- Applying the latest changes on configuration coming CSD and persisting configuration data on the Control Center

3.2 Steps for creating DR cluster using Archive Mode

To start the DR cluster creation using archive mode

The Primary Cluster will be restarted to enable the archive mode and also provide the location for archive and restore command along with archive timeout parameter. This is updated on postgresql.conf file.

The primary cluster will be restarted even if the archive mode is already enabled to update other parameters.

The primary cluster has already been configured with archive backup in archive mode. Setting up a new configuration with Dbvisit DR in archive mode could potentially disrupt the existing backup setup. Therefore, it is advisable to create a backup of the postgresql.conf file if deemed necessary.

The creation of the DR cluster can be started either by clicking on Set up now? ( 1 ) or from the ACTIONS menu by clicking on “Set Up Disaster Recovery” ( 2 )

The top screen displays the usual configuration details

Configuration name of the PostgreSQL for which DR cluster will be created
The primary hostname and the status of the PostgreSQL cluster (ONLINE)
The standby hostname and also mentioning that no standby cluster exists
The locations of the standby cluster datafiles. This location is recommended to be the same as the primary, but this can be changed and provided a custom location. For Linux the permission of these folders must be like below.

-bash-4.2$ ls -lrt /u01
drwx------.  5 postgres postgres  4096 Aug 16 16:41 tablespace
drwx------. 20 postgres postgres  4096 Aug 26 12:06 PG5432-12

The location of the configuration files (postgresql.conf, pg_hba.conf, and pg_indent.conf). This location can be different from the primary data directory, but the example below has these two in the same directory. Keeping the same as the primary is recommended, but this can also be different in the standby.
The tablespace location of the primary is in a different location from the default data directory. The above code snippet also shows the permission of these folders.
The tablespace location of the standby. This folder can also be different from the primary but it is recommended to keep the directory structures same as primary.

After this choose the WAL file archiving method with a Hot or Warm standby option.

For this example, a Hot standby is created that can be accessible for reporting purposes as the databases in the cluster are in read only mode.
The WAL file archiving method is used to create the standby cluster.
The location on a primary where the base backup is taken and the location to which it's transferred in the standby. This location provided is a default location but it's recommended to provide a different location to accommodate backups that are larger in size.
The custom backup location of primary and standby in which the WAL archive files are copied in the primary and transferred to standby.
The archive timeout in seconds. This corresponds to the archive_timeout parameter in postgresql.conf. The archive_timeout specifies the maximum time interval between the completion of a WAL file and its archiving, and also indicates the maximum lag standby (replica) database can potentially fall behind the primary database in terms of the time of changes applied.
By, default the observer is enabled that will start monitoring the configuration as soon the DR cluster is created.
The primary cluster will be restarted irrespective of whether the archive_mode is off or on in the postgresql.conf file.
Click start to create the DR cluster.

After successful completion. The summary displayed will show the time it took to create the DR cluster ( 1 ) and also the step by steps for the DR cluster ( 2 ).

The dashboard will now show the following

The configuration is in sync. The standby is in sync with the primary.
The observer is monitoring the configuration.
The method used in creating the DR cluster (WAL file archiving).
The standby status is now “HOT STANDBY”. The standby cluster database is in read-only status.

Below are the parameters that are modified on both Primary and Standby clusters.

PRIMARY:

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

# Added by Dbvisit StandbyMP
archive_command = '"/usr/dbvisit/standbymp/bin/dbvpgarchive" "-configurationId" "6" "-targetAddr" "centlin72.oraclekiwi.co.nz" "-targetPort" "7890" "-sourceFile" "/u01/PG5432-12/%p" "-targetFile" "/u01/backup/PG5432-12/6/%f" "-localAddr" "" "-localPort" "7890" "-pgBin" "/usr/pgsql-12/bin" "-pgData" "/u01/PG5432-12" "-backupFile" "/u01/backup/PG5432-12/6/%f"'

# Added by Dbvisit StandbyMP
archive_mode = 'on'

# Added by Dbvisit StandbyMP
archive_timeout = 60

# Added by Dbvisit StandbyMP
data_directory = '/u01/PG5432-12'

STANDBY:

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

# Added by Dbvisit StandbyMP
archive_mode = 'off'

# Added by Dbvisit StandbyMP
archive_timeout = 60

# Added by Dbvisit StandbyMP
hot_standby = 'on'

# Added by Dbvisit StandbyMP
archive_cleanup_command = '"/usr/pgsql-12/bin/pg_archivecleanup" "/u01/backup/PG5432-12/6" "%r"'

# Added by Dbvisit StandbyMP
data_directory = '/u01/PG5432-12'

# Added by Dbvisit StandbyMP
port = 5432

# Added by Dbvisit StandbyMP
restore_command = '"/usr/dbvisit/standbymp/bin/dbvpgrestore" "-configurationId" "6" "-sourceFile" "/u01/backup/PG5432-12/6/%f" "-targetFile" "%p" "-localAddr" "" "-localPort" "7890" "-pgBin" "/usr/pgsql-12/bin" "-pgData" ""'

4. Create DR Cluster using WAL File shipping Mode

4.1 Technical explanation

This method of creating the standby is specific to Dbvisit. The wal files from the primary are copied to standby at regular intervals and applied on the standby.

Trigger CSD in archive mode.

Create a Replication Slot on the Primary

Run the following SQL query on the primary cluster to create a physical replication slot named SLOT_NAME

SELECT * FROM pg_create_physical_replication_slot(SLOT_NAME, false, false) WHERE NOT EXISTS (SELECT * FROM pg_replication_slots WHERE slot_name = SLOT_NAME)

Create Base Backup Files
1. Generate the essential base backup files using the following command
```
/opt/rh/rh-postgresql13/root/usr/bin/pg_basebackup -D tempBackupDir -F tar -X stream -R --progress --verbose -S slot_name 
```
2. The aforementioned command will generate the subsequent files to be transferred to the Standby to facilitate the cluster restoration process.
  1. base.tar: PostgreSQL base files, including settings.
  2. pg_wal.tar: wal data files for standby to restore
  3. tblspc_xxxxx.tar: customized table space files
  4. backup_manifest: backup metadata, is used for pg_verifybackup to verify the backup
3. Note: Different archive mode base backups and log shipping backups also include the replication_slot within the base backup files and transfer it to the standby.
Tar the following PostgreSQL configuration files on the primary server
1. postgresql.conf
2. pg_ident.conf
3. pg_hba.conf
Transfer Backup Files from Primary to Standby, which are generated in step 3
Data directory setup
1. On the standby server, receive and extract all files generated from Step 2 and Step 3. Place these files in the designated data folder, previously allocated, before initiating DR Cluster creation.
Config directory setup
1. Extract the files from the tarball received in Step 4 and place them into the target configuration directory on the standby server.
2. Configure PostgreSQL in archive mode as per the CSD parameter
If the OS is Windows, register the Windows service for the cluster
After configuring the data and config directories, restart the standby PostgreSQL cluster
Apply Configuration Changes from DR Cluster creation
1. Applying the latest changes on configuration coming during DR cluster creation and persisting configuration data on the Control Center

4.2 Steps for creating DR cluster using Archive Shipping.

The top half of the DR Cluster creation page is similar to WAL Streaming and WAL archive mode.

Configuration name of the PostgreSQL for which DR cluster will be created.
The primary hostname and the status of the PostgreSQL cluster (ONLINE).
The standby hostname and also mentioning that no standby cluster exists.
The locations of the standby cluster datafiles. This location is recommended to be the same as the primary, but this can be changed and provided a custom location. For Linux the permission of these folders must be like below.

-bash-4.2$ ls -lrt /u01
drwx------.  5 postgres postgres  4096 Aug 16 16:41 tablespace
drwx------. 20 postgres postgres  4096 Aug 26 12:06 PG5433-12

The location of the configuration files (postgresql.conf, pg_hba.conf, and pg_indent.conf). This location can be different from the primary data directory, but the example below has these two in the same directory. Keeping the same as the primary is recommended, but this can also be different in the standby.
For this example, a Hot standby is created that can be accessible for reporting purposes as the databases in the cluster are in read only mode.

The second half of the DR cluster is specific to the method used for the creation.

The method being used to create the DR cluster. In the below example, the WAL file shipping method is used.
The location on a primary where the base backup is taken and the location to which it's transferred in the standby. This location is a default location, but it's recommended to provide a different location to accommodate larger backups.
The custom backup location of primary and standby in which the WAL archive files are copied in the primary and transferred to standby. It is recommended to provide different locations to ensure the location has enough free space.
The automated standby update time interval in seconds. Every 300 sec (default) the WAL files are copied from primary to standby to keep the standby in sync. The minimum time interval cannot be below 59 seconds.
By, default the observer is enabled that will start monitoring the configuration as soon the DR cluster is created.
Click start to create the DR cluster with the WAL file shipping method.

The final screen after successfully completing the DR cluster creation. The configuration for which the cluster is created ( 1 ). The duration of the process ( 2 ) and the steps for the process ( 3 )

The dashboard for the configuration is now updated, and the details are updated.

The configuration name and the time gap between standby and primary.
The automated standby update is enabled, and also the observer is enabled.
The primary cluster is online
The method using which the standby is synced with primary. In this case its the SHIPPING ( WAL file shipping) method.
The standby cluster status is now “HOT STANDBY,” and the standby cluster databases are accessible via read-only mode.

The changes done on postgresql.conf file done by Dbvisit.

PRIMARY:

No changes here

STANDBY:

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

# Added by Dbvisit StandbyMP
archive_mode = 'off'

# Added by Dbvisit StandbyMP
hot_standby = 'on'

# Added by Dbvisit StandbyMP
log_destination = 'stderr'

# Added by Dbvisit StandbyMP
log_directory = 'log'

# Added by Dbvisit StandbyMP
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

# Added by Dbvisit StandbyMP
log_rotation_age = '1d'

# Added by Dbvisit StandbyMP
logging_collector = 'on'