Observer / Automatic Failover
Observer is Dbvisit StandbyMP component which is embedded in dbvcontrol and is operatated automatically by starting or stopping dbvctonrol. Purpose of observer is to periodically check primary and standby database, primary and standby server, send notifications and / or initiate Automatic failover.
1. Observer Concept
Observer performs check whether host is alive by checking the dbvagentmanager. If the dbvagentmanager is unreachable, the host is then reported as down by observer. Observer checks the database status through dbvagentmanager.
Observer concept is the same and does same checks for all three database platforms: Oracle, Postrges and SQL Server.
Observer send notifications every time server or database become unreachable. The reported error is however always the same “Primary/Standby database check failed” regardless whether whole server or only database fail the status check.
User can customize interval between checks and number of failed checks. The number of failed checks have following purpose:
If number of failed checks is exceeded for primary or standby database/server and Observer is set to “Notifications Only”, observer disables any further notifications, but remains checking silently in the background
If number of failed checks is exceeded for primary database/server and Observer is set to “Perform Automatic Failover”, observer activates standby database, and disables itself for particular configuration
If number of failed checks is exceeded for standby database/server and Observer is set to “Perform Automatic Failover”, the situation is treated as if Observer is set to “Notifications Only”
2. Observer Configuration
Observer is always enabled separately for each DDC configuration. To configrue the observer, you need to click on “Observer/Auto Failover” in the right ACTIONS pane:
You will then see following Observer Configuration options:
Explanation of individual options:
Number | Description | Setting Explanation |
---|---|---|
1 | Enable/Disable Observer | Wholly enables or disables observer for particular DDC. Observer gets automatically disabled in certain processes. For example disabled at start of graceful switchover and enabled again once the switchover is completed. |
2 | Selected Emergency Action | Preffered emergency action which will take place once number of failed checks is reached. Notifications Only = observer sends only notifications. If number of failed checks is reached, notifications will be disabled Perform Automatic Failover = If number of failed checks is reached, observer activates standby database, and disables itself for particular configuration |
3 | Number of failed checks and interval | You can specify number of failed checks and intervals between the checks. When using automatic failover, It is helpful to consider common situations such as scheduled server reboots and set the intervals + the number of check so as there would not be accidental standby database activation during such scheduled reboot. |
4 | Type of notifications to send | You can choose between Email and Slack notifications. Observer will then send notifications via chosen channel. If no mail nor slack notifications are enabled, observer will notify user through events visible in the dashboard task pane. |
5 | Heartbeat Message | This option is only valid if either Email or Slack notifications are selected. At specified time, notification is sent about status of dbvisit sychronization. Heartbeat notification contains following text: Heartbeat for <DDC>
The Dbvisit Observer is monitoring configuration <DDC>.
The primary database on <primary host> is ONLINE.
The standby database on <standby host> is RECOVERING..
Automated Standby Update is not enabled.
The current Time Gap is 2 minutes 38 seconds. Specified in 24hr format, for example: |
6 | Custom Observer Scripts | Described in detail further on this page. User has possibility of integrating custom checks script with default observer checks |
7 | Save or discard changes |
|
SQL Server and Postgres have additionally possibility to enable observer notification based on time gap
The gap is specified in seconds and there are no events created in dashboard for this metric.
3. Custom Observer Scripts
3.1 Custom Observer Scripts Concept
Default checks done by observer can be replaced or complemented by custom user scripts. The DBA or system administrator can add own checks to the environment. It could be connectivity checks, storage checks, application checks, anything they might want to validate for the environment.
The requirement for the user script - which must exist on both primary and standby (folder location can be custom) - is that it can have only 2 possible exit codes which will be monitored by the Observer:
0 = OK
1 = ERROR
Each ERROR exit of custom script execution counts towards total count of Failed Checks. After specified number of failed checks, Selected Emergency Action is executed by observer.
Check Failure of Custom observer script (regardless whether because of server unreachability or check failure) always creates new event in the dashboard. These events will not be cleared.
3.2 Custom Observer Scripts vs. Default Observer Checks
Custom (or user) scripts can then be combined with existing Observer checks in following logical relations:
A. Custom Observer Scripts Failed Checks only
In this case, custom observer scripts are the only entity which increases number of Failed Checks. Default Observer Checks are completely disabled and will not be counted towards total number of failed checks.
Once primary or standby server will get unreachable, it will mean that custom script can’t be executed and therefore Observer will get immediately disabled
Example of event when whole server goes down:
B. Custom Observer Scripts Rule OR Default Observer Failed Checks (Either)
In this case, whenever Default Observer check will fail OR whenever Custom observer script will fail, total number of failed checks increases by one.
Default observer check is executed first and if this check fails, user script check is not executed afterwards
If whole host goes down, custom script is ignored as Default Ovserver Checks have preceddence.
C. Custom Observer Scripts Rule AND Default Observer Failed Checks (Both)
In this case, the total number of failed checks increases by one only in case that custom observer script check fails AND Default Observer checks fails as well.
If primary or standby host become unreachable, Observer will automatically disable custom script check and will rely only on Default observer checks
Example of failed Custom script check with BOTH rule:
Example of primary host down and user script check disable:
3.3 Configuring Custom Observer Scripts
Custom observer scripts can be enabled in the Observer configuration as follows:
Custom Observer Scripts = A. Custom Observer Scripts Failed Checks only
Either/Or = B. Custom Observer Scripts Rule OR Default Observer Failed Checks (Either)
Both = C. Custom Observer Scripts Rule AND Default Observer Failed Checks (Both)
3.4 Custom Observer Scripts Additional Information
When using “Either” or “Both” mode, default observer check will be always executed first. Once the default observer check is done, then observer will execute user script with following five parameters:
DDC name - a name of DDC for which the check is being executed.
status - is a state of the previous rule in the rule chain (previous rule is the Observer's own connectivity and system health check), can be one of the following:
ok
- previous rule returned no errorsfail
- previous rule returned an error
role - the database role of the database where the error occurred, can be one of the following:
not_applicable
- this is when there's no errorP
- the previous error was on primaryS
- the previous error was on standby
error_type - type of error occurred:
err_none
- no errorerr_connection
- there was a connection errorerr_database
- there was an error on a database, or the database is in an invalid state
hostname - if the error_type is not err_none - will contain a hostname of the node where the error occurred, otherwise, this parameter is empty.
So for example, if the Default observer check is successful, user script for configuration SLASH on primary host czlin0231 will be called with following parameters:
SLASH ok P err_none
If Default observer check is unsuccessful, user script for configuration SLASH on primary host czlin0231 will be called with following parameters (in this example primary database is down):
SLASH fail P err_connection czlin0231
you can make use of this behavior to make various corrective actions in your code.