/
Observer / Automatic Failover

Observer / Automatic Failover

Observer is Dbvisit StandbyMP component which is embedded in dbvcontrol and is operatated automatically by starting or stopping dbvctonrol. Purpose of observer is to periodically check primary and standby database, primary and standby server, send notifications and / or initiate Automatic failover.

1. Observer Concept

Observer performs check whether host is alive by checking the dbvagentmanager. If the dbvagentmanager is unreachable, the host is then reported as down by observer. Observer checks the database status through dbvagentmanager.

image-20250310-102106.png

Observer concept is the same and does same checks for all three database platforms: Oracle, Postrges and SQL Server.

Observer send notifications every time server or database become unreachable. The reported error is however always the same “Primary/Standby database check failed” regardless whether whole server or only database fail the status check.

User can customize interval between checks and number of failed checks. The number of failed checks have following purpose:

  • If number of failed checks is exceeded for primary or standby database/server and Observer is set to “Notifications Only”, observer disables any further notifications, but remains checking silently in the background

  • If number of failed checks is exceeded for primary database/server and Observer is set to “Perform Automatic Failover”, observer activates standby database, and disables itself for particular configuration

  • If number of failed checks is exceeded for standby database/server and Observer is set to “Perform Automatic Failover”, the situation is treated as if Observer is set to “Notifications Only”

2. Observer Configuration

Observer is always enabled separately for each DDC configuration. To configrue the observer, you need to click on “Observer/Auto Failover” in the right ACTIONS pane:

image-20250310-104953.png

You will then see following Observer Configuration options:

image-20250310-110452.png

Explanation of individual options:

Number

Description

Setting Explanation

Number

Description

Setting Explanation

1

Enable/Disable Observer

Wholly enables or disables observer for particular DDC. Observer gets automatically disabled in certain processes. For example disabled at start of graceful switchover and enabled again once the switchover is completed.

2

Selected Emergency Action

Preffered emergency action which will take place once number of failed checks is reached.

Notifications Only = observer sends only notifications. If number of failed checks is reached, notifications will be disabled

Perform Automatic Failover = If number of failed checks is reached, observer activates standby database, and disables itself for particular configuration

3

Number of failed checks and interval

You can specify number of failed checks and intervals between the checks. When using automatic failover, It is helpful to consider common situations such as scheduled server reboots and set the intervals + the number of check so as there would not be accidental standby database activation during such scheduled reboot.

4

Type of notifications to send

You can choose between Email and Slack notifications. Observer will then send notifications via chosen channel. If no mail nor slack notifications are enabled, observer will notify user through events visible in the dashboard task pane.

5

Heartbeat Message

This option is only valid if either Email or Slack notifications are selected. At specified time, notification is sent about status of dbvisit sychronization. Heartbeat notification contains following text:

Heartbeat for <DDC> The Dbvisit Observer is monitoring configuration <DDC>. The primary database on <primary host> is ONLINE. The standby database on <standby host> is RECOVERING.. Automated Standby Update is not enabled. The current Time Gap is 2 minutes 38 seconds.

Specified in 24hr format, for example:

image-20250310-111148.png

6

Custom Observer Scripts

Described in detail further on this page. User has possibility of integrating custom checks script with default observer checks

7

Save or discard changes

 

SQL Server and Postgres have additionally possibility to enable observer notification based on time gap

image-20250310-111957.png

The gap is specified in seconds and there are no events created in dashboard for this metric.

3. Custom Observer Scripts

3.1 Custom Observer Scripts Concept

Default checks done by observer can be replaced or complemented by custom user scripts. The DBA or system administrator can add own checks to the environment. It could be connectivity checks, storage checks, application checks, anything they might want to validate for the environment.

The requirement for the user script - which must exist on both primary and standby (folder location can be custom) - is that it can have only 2 possible exit codes which will be monitored by the Observer:

  • 0 = OK

  • 1 = ERROR

Each ERROR exit of custom script execution counts towards total count of Failed Checks. After specified number of failed checks, Selected Emergency Action is executed by observer.

Check Failure of Custom observer script (regardless whether because of server unreachability or check failure) always creates new event in the dashboard. These events will not be cleared.

3.2 Custom Observer Scripts vs. Default Observer Checks

Custom (or user) scripts can then be combined with existing Observer checks in following logical relations:

A. Custom Observer Scripts Failed Checks only

In this case, custom observer scripts are the only entity which increases number of Failed Checks. Default Observer Checks are completely disabled and will not be counted towards total number of failed checks.

Once primary or standby server will get unreachable, it will mean that custom script can’t be executed and therefore Observer will get immediately disabled

Example of event when whole server goes down:

image-20250310-185358.png
image-20250310-185423.png

B. Custom Observer Scripts Rule OR Default Observer Failed Checks (Either)

In this case, whenever Default Observer check will fail OR whenever Custom observer script will fail, total number of failed checks increases by one.

Default observer check is executed first and if this check fails, user script check is not executed afterwards

If whole host goes down, custom script is ignored as Default Ovserver Checks have preceddence.

C. Custom Observer Scripts Rule AND Default Observer Failed Checks (Both)

In this case, the total number of failed checks increases by one only in case that custom observer script check fails AND Default Observer checks fails as well.

If primary or standby host become unreachable, Observer will automatically disable custom script check and will rely only on Default observer checks

Example of failed Custom script check with BOTH rule:

image-20250310-133234.png
image-20250310-133257.png

Example of primary host down and user script check disable:

image-20250310-184802.png
image-20250310-184736.png

3.3 Configuring Custom Observer Scripts

Custom observer scripts can be enabled in the Observer configuration as follows:

image-20250310-132757.png

 

Custom Observer Scripts = A. Custom Observer Scripts Failed Checks only

Either/Or = B. Custom Observer Scripts Rule OR Default Observer Failed Checks (Either)

Both = C. Custom Observer Scripts Rule AND Default Observer Failed Checks (Both)

3.4 Custom Observer Scripts Additional Information

When using “Either” or “Both” mode, default observer check will be always executed first. Once the default observer check is done, then observer will execute user script with following five parameters:

  • DDC name - a name of DDC for which the check is being executed.

  • status - is a state of the previous rule in the rule chain (previous rule is the Observer's own connectivity and system health check), can be one of the following:

    • ok - previous rule returned no errors

    • fail - previous rule returned an error

  • role - the database role of the database where the error occurred, can be one of the following:

    • not_applicable - this is when there's no error

    • P - the previous error was on primary

    • S - the previous error was on standby

  • error_type - type of error occurred:

    • err_none - no error

    • err_connection - there was a connection error

    • err_database - there was an error on a database, or the database is in an invalid state

  • hostname - if the error_type is not err_none - will contain a hostname of the node where the error occurred, otherwise, this parameter is empty.

So for example, if the Default observer check is successful, user script for configuration SLASH on primary host czlin0231 will be called with following parameters:

SLASH ok P err_none

If Default observer check is unsuccessful, user script for configuration SLASH on primary host czlin0231 will be called with following parameters (in this example primary database is down):

SLASH fail P err_connection czlin0231

you can make use of this behavior to make various corrective actions in your code.

Related content