Target Kafka (via Kafka Connect)

Dbvisit Replicate and the Dbvisit Replicate Connector for Kafka

Dbvisit Replicate supports the streaming of Oracle database change data (inserts, deletes, updates and limited DDL) to Kafka as a target, via an open source Java connector, initiated by the Dbvisit team - the Dbvisit Replicate Connector for Kafka. This connector runs within the Kafka Connect framework developed by Confluent, which itself can be thought of as an import/export layer for Kafka, simplifying both data ingest and output to various endpoints, from Kafka.

Overview

Dbvisit Replicate's MINE process generates PLOG files in the regular manner and delivers these to a location accessible by the Replicate Connector for Kafka. In this configuration there is no partner APPLY process that runs, as the Replicate Connector effectively picks up this function. The connector itself runs within the Kafka Connect framework, and we recommend installing the Confluent Platform for this. Key resources are listed below:

Notes and Limitations

NOTE

  • Only limited DDL replication is currently supported for Kafka as a target (add tables, add column, remove column).
  • Two-way replication with Kafka (as also for other non-Oracle targets) is not currently supported.
  • Tables should have SUPPLEMENTAL LOG DATA (ALL) COLUMNS enabled so that all columns are written for redo if LOAD is used as instantiation method. 

            SQL> ALTER TABLE <owner>.<table_name> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

  • Change response is Step 2 in setup wizard for the question from default NO to YES    

         Should where clauses (and Event Streaming) include all columns, not just changed and PK? (Yes/No) [No] YES  

  • Edit config/*-onetime.ddc & uncomment out (formerly #set...)
         set REPLICATE_INTERNAL_TABLES NO 
  • See additional information on supported datatypes including LOBs @ http://replicate-connector-for-kafka.readthedocs.io/en/latest/source_connector.html#lobs

Setup Outlined

The installation and configuration of Dbvisit Replicate, when supporting Kafka as a target, is basically the same as for a regular implementation. As such the general prerequisites and approach outlined in the Installing and Upgrading section of the user guide apply, and should be reviewed. You can install Dbvisit Replicate directly on the Oracle database server itself, use the lightweight FETCHER as an alternative to minimise any impact on the database server, or look at NFS mount options. Whatever option is chosen you need to ensure that the location the PLOG files Dbvisit Replicate generates will be accessible by the Replicate Connector for Kafka, which runs inside the Kafka Connect framework.

At a high level the setup process for the Dbvisit Replicate piece of this configuration is as follows:

  1. install the Dbvisit Replicate software - and this only needs to be done for the source side against which the MINE process will run
  2. rename the executable to dbvpf
  3. configure the replication (the MINE process) using the setup wizard

The Process

Installing the software

Download the Dbvisit Replicate software from the Dbvisit website and follow the user guide instructions to install the software on your server/platform. 

NOTE

  • If the FETCHER component is used then the software must be installed on both the Source (database) and Mine servers.

Rename the dbvrep executable

In order to trigger specific functionality in support of Kafka as a target the dbvrep executable needs to be renamed to dbvpf. To do so navigate to your Dbvisit Replicate installation directory and, as a user with appropriate privileges to do so, rename the executable as follows:

[root@dbvrep01 /usr/dbvisit/replicate] # mv dbvrep dbvpf
[root@dbvrep01 /usr/dbvisit/replicate] # ls -ltr
total 66340
-rw-r--r-- 1 root root 100518 Oct 27 05:42 README.txt
-rw-r--r-- 1 root root 120 Oct 27 05:42 online_user_guide_reference_dbvrep.txt
-rwxr-xr-x 1 root root 67806725 Oct 27 05:42 dbvpf
-rw-r--r-- 1 root root 15892 Oct 27 05:42 Dbvisit-MIB-SNMP.txt
[root@dbvrep01 /usr/dbvisit/replicate] #

NOTE

  • If the FETCHER component is used then the exectubles must be renamed to dbvpf on both the Source (database) and Mine servers.


Run the setup wizard

You can then invoke the Dbvisit Replicate console, by calling the executable as follows, and launching the setup wizard to configure the replication:

[oracle@dbvrep01 REPCON]$ /usr/dbvisit/replicate/dbvpf
Initializing......done
Dbvisit PureFlow version 2.9.00
Copyright (C) Dbvisit Software Limited. All rights reserved.
No DDC file loaded.
Run "setup wizard" to start the configuration wizard or try "help" to see all commands available.
dbvpf> setup wizard
[Integration option enabled]
This wizard configures Dbvisit PureFlow.

The setup wizard creates configuration scripts, which need to be run after the wizard ends. No changes to the databases are made before that.

The progress is saved every time a list of databases, replications, etc. is shown. It will be re-read if wizard is restarted and the same DDC name and script path is selected.
Run the wizard now? [Yes]

Setup Wizard Options Available

We recommend you review both the Setup Wizard overview and the Setup Wizard Reference details prior to running the Setup Wizard itself.

The key point to be aware of when running this in the context of setting up Kafka as a target (dbvpf/Replicate Connector for Kafka) is that you will be presented with less options in the wizard due to the nature of the target itself and because you will not be configuring an APPLY process. This piece of the replication will be handled by the Replicate Connector for Kafka, running inside Kafka Connect, with its own set of configuration options and requirements.

As a starting point going with many of the default options provided in the Setup Wizard will be sufficient to get you going, but a couple of parameters are worth commenting on:

  • LOAD - selecting this option under "Step 2 - Replication pairs > What data instantiation script to create?" uses built in functionality in Dbvisit Replicate to select all existing information from a table(s) and deliver this to Kafka, to initialise or baseline these data sets in Kafka itself, as outlined in the connector documentation.
     
  • Replicated Tables - this is the key step of identifying those Oracle schemas and tables you want to have replicated through to Kafka - and each must be specifically included here.
     
  • Limited DDL - under "Step 2 - Replication pairs > Will limited DDL replication be enabled?" you can choose to enable/disable the limited DDL options supported for deliver to Kafka. If this is set to yes and you have included schema(s) level replication then any new tables created under that schema(s) will be replicated through to Kafka also.
     
  • Where Clauses and ALL Columns - supplemental logging needs to be enabled for ALL columns with dbvpf for delivery to Kafka. Answering Yes to this question will set this accordingly in the configuration files.

Setup Wizard Example

[oracle@dbvrep01 REPCON]$ /usr/dbvisit/replicate/dbvpf
Initializing......done
Dbvisit PureFlow version 2.9.00
Copyright (C) Dbvisit Software Limited.  All rights reserved.
No DDC file loaded.
Run "setup wizard" to start the configuration wizard or try "help" to see all commands available.
dbvpf> setup wizard
[Integration option enabled]
This wizard configures Dbvisit PureFlow.

The setup wizard creates configuration scripts, which need to be run after the wizard ends. No changes to the databases are made before that.

The progress is saved every time a list of databases, replications, etc. is shown. It will be re-read if wizard is restarted and the same DDC name and
script path is selected.
Run the wizard now? [Yes] yes

Before starting the actual configuration, some basic information is needed. The DDC name and script path determines where all files created by the wizard go
(and where to reread them if wizard is rerun) and the license key determines which options are available for this configuration.
(DDC_NAME) - Please enter a name for this replication: [] REPCON-TEST
(LICENSE_KEY) - Please enter your license key: [I0JSW-898RU-TL6QF-BJLHR-0U7GN-M99I2-F2OP5]
(SETUP_SCRIPT_PATH) - Please enter a directory for location of configuration scripts on this machine: [/home/oracle/REPCON-TEST]

Network configuration files were detected on this system in these locations:
/u01/app/oracle/product/11.2.0/xe/network/admin
(TNS_ADMIN) - Please enter TNS configuration directory for this machine: [/u01/app/oracle/product/11.2.0/xe/network/admin]

Step 1 - Describe databases
========================================
The first step is to describe databases used in the replication. There are usually two of them (source and target).
Store SYSDBA and DBA passwords? Passwords only required during setup and initialization? (Yes/No) [Yes]
Let's configure the database, describing its type, connectivity, user names etc.
What type of database is this? (Oracle/MySQL/Google Cloud SQL/SQL Server/Oracle AWS RDS/CSV/Hadoop): [Oracle]
Please enter database TNS alias: [] XE
Please enter SYSDBA user name: [SYS]
Please enter password for this user: [change_on_install]
Please enter user with DBA role: [SYSTEM]
Please enter password for this user: [manager]

Connecting to database XE as SYSTEM to query list of tablespaces and to detect ASM (by looking whether any redo logs or archived logs are stored in ASM).
Enter the Dbvisit PureFlow owner (this user will be created by this script): [dbvrep]
Please enter password for this user: [dbvpasswd]

Permanent tablespaces detected on the database: DATA, USERS.
Please enter default permanent tablespace for this user: [DATA] USERS

Temporary tablespaces detected on the database: TEMP.
Please enter default temporary tablespace for this user: [TEMP]

Following databases are now configured:
1: Oracle XE, SYS/***, SYSTEM/***, dbvrep/***, USERS/TEMP, dbvrep/, ASM:No, TZ: +00:00
Enter the number of the database to modify it or "done": [done]

Step 2 - Replication pairs
========================================
The second step is to set source and targets for each replication pair.
Let's configure the replication pair, selecting source and target.
Following databases are described:
1: XE (Oracle)
2: Dbvisit PureFlow (dbvpf) (cannot be source, is not Oracle)
Select source database: [1]
Select target database: [2]
Will limited DDL replication be enabled? (Yes/No) [Yes]
Use fetcher to offload the mining to a different server? (Yes/No) [No]
Should where clauses (and Event Streaming) include all columns, not just changed and PK? (Yes/No) [Yes]
Would you like to encrypt the data across the network? (Yes/No) [No]
Would you like to compress the data across the network? (Yes/No) [No]
How long do you want to set the network timeouts. Recommended range between 60-300 seconds [60]
Lock and copy the data initially one-by-one or at a single SCN?
one-by-one : Lock tables one by one and capture SCN
single-scn : One SCN for all tables
ddl-only   : Only DDL script for target objects
resetlogs  : Use SCN from last resetlogs operation (standby activation, rman incomplete recovery)
no-lock    : Do not lock tables. Captures previous SCN of oldest active transaction. Requires pre-requisite running of pre-all.sh script                     (one-by-one/single-scn/ddl-only/resetlogs/no-lock) [single-scn] no-lock

What data instantiation script to create?
scn_list       : A file with SCN for every replicated object is created (APPLY.txt)
load           : All replicated data is created and loaded automatically
none                                                                                                                                                         (scn_list/load/none) [scn_list] load

Following replication pairs are now configured:
1: XE (Oracle) ==> Dbvisit PureFlow (dbvpf), DDL: Yes, fetcher: No, process suffix: (no suffix), compression: No, encryption: No, network timeout: 60,
prepare type: no-lock, data load: load_keep
Enter number of replication pair to modify it, or "add", or "done": [done]

Step 3 - Replicated tables
========================================
The third step is to choose the schemas and tables to be replicated. If the databases are reachable, the tables are checked for existence, datatype support,
etc., schemas are queried for tables. Note that all messages are merely hints/warnings and may be ignored if issues are rectified before the scripts are
actually executed.

Following tables are defined for replication pairs:
1: XE (Oracle) ==> Dbvisit PureFlow (dbvpf), DDL: Yes, suffix: (no suffix), prepare: no-lock
  No tables defined.
Enter number of replication pair to modify it, or "done": [1]

Please enter list of all individual tables to be replicated. Enter schema name(s) only to replicate all tables in that schema. Use comma or space to delimit
the entries.
Enter the tables and schemas: [] SOE,SCOTT
Selected schemas: SCOTT,SOE
Add more tables or schemas? (Yes/No) [No]

You can also specify some advanced options:
1. Exclude some tables from schema-level replication
2. Rename schemas or tables.
3. Specify filtering conditions.
4. (Tables only) Configure Event Streaming; this does not maintain a copy of the source table, but logs all operations as separate entries. This is useful
for ETL or as an audit trail. This usually requires adding of new columns (timestamps, old/new values etc.) to the target table.
Specify rename name, filter condition or audit for any of the specified schemas? (Yes/No) [No]
(PREPARE_SCHEMA_EXCEPTIONS) - Specify tables to exclude from PREPARE SCHEMA, if any: []

Following tables are defined for replication pairs:
1: XE (Oracle) ==> Dbvisit PureFlow (dbvpf), DDL: Yes, suffix: (no suffix), prepare: no-lock
  SCOTT(tables), SOE(tables)
Enter number of replication pair to modify it, or "done": [done]

Step 4 - Process configuration
========================================
The fourth step is to configure the replication processes for each replication.

Following processes are defined:
1: MINE on XE
  Not configured.
Enter number of process to modify it, or "done": [1]
Fully qualified name of the server for the process (usually co-located with the database, unless mine is offloaded using fetcher): [dbvrep01]
Server type (Windows/Linux/Unix): [Linux]
Enable email notifications about problems? (Yes/No) [No]
Enable SNMP traps/notifications about problems? (Yes/No) [No]
Directory with DDC file and default where to create log files etc. (recommended: same as global setting, if possible)? [/home/oracle/REPCON-TEST]

Following settings were pre-filled with defaults or your reloaded settings:
----------------------------------------
[MINE_REMOTE_INTERFACE]: Network remote interface: dbvrep01:7901
[MINE_DATABASE]: Database TNS: XE
[TNS_ADMIN]: tnsnames.ora path: /u01/app/oracle/product/11.2.0/xe/network/admin
[MINE_PLOG]: Filemask for generated plogs: /home/oracle/REPCON-TEST/mine/%S.%E.%Z (%S is sequence, %T thread, %F original filename (stripped extension), %P
process type, %N process name, %E default extension)
[LOG_FILE]: General log file: /home/oracle/REPCON-TEST/log/dbvpf_%N_%D.%E
[LOG_FILE_TRACE]: Error traces: /home/oracle/REPCON-TEST/log/trace/dbvpf_%N_%D_%I_%U.%E

Checking that these settings are valid...
Do you want to change any of the settings? [No]

Following processes are defined:
1: MINE on XE
  Host: dbvrep01, SMTP: No, SNMP: No
Enter number of process to modify it, or "done": [done]
Created file /home/oracle/REPCON-TEST/REPCON-TEST-MINE.ddc.
Created file /home/oracle/REPCON-TEST/config/REPCON-TEST-setup.dbvpf.
Created file /home/oracle/REPCON-TEST/config/REPCON-TEST-dbsetup_XE_dbvrep.sql.
Created file /home/oracle/REPCON-TEST/config/REPCON-TEST-grants_XE_dbvrep.sql.
Created file /home/oracle/REPCON-TEST/config/REPCON-TEST-pre-suplog_XE_dbvrep.sql.
Created file /home/oracle/REPCON-TEST/REPCON-TEST-pre-all.sh.
Created file /home/oracle/REPCON-TEST/config/REPCON-TEST-onetime.ddc.
Created file /home/oracle/REPCON-TEST/start-console.sh.
Created file /home/oracle/REPCON-TEST/REPCON-TEST-run-dbvrep01.sh.
Created file /home/oracle/REPCON-TEST/scripts/REPCON-TEST-dbvrep01-start-MINE.sh.
Created file /home/oracle/REPCON-TEST/scripts/REPCON-TEST-dbvrep01-stop-MINE.sh.
Created file /home/oracle/REPCON-TEST/scripts/REPCON-TEST-dbvrep01-dbvpf-MINE.sh.
Created file /home/oracle/REPCON-TEST/scripts/systemd-dbvpf-MINE_REPCON-TEST.service.
Created file /home/oracle/REPCON-TEST/scripts/upstart-dbvpf-MINE_REPCON-TEST.conf.
Created file /home/oracle/REPCON-TEST/Nextsteps.txt.
Created file /home/oracle/REPCON-TEST/REPCON-TEST-all.sh.
===========================================================================================================================================================

Dbvisit PureFlow wizard completed
Script /home/oracle/REPCON-TEST/REPCON-TEST-pre-all.sh created. This needs to be run when transactions are not active on source database either during
maintenance window or when there is no/low activity on source database.
Script /home/oracle/REPCON-TEST/REPCON-TEST-all.sh created. This runs all the above created scripts. Please exit out of dbvpf, review and run script as
current user to setup and start Dbvisit PureFlow.
===========================================================================================================================================================
Optionally, the script can be invoked now by this wizard.
Run this script now? (Yes/No) [No]
dbvpf> exit

Completing the Configuration Setup

If you choose not to run the scripts generated by the setup wizard (as in our example above) then you need to do so manually, and if they process through cleanly you can then start the replication and connect to the console to review its progress. So the steps are:

NOTE

The Replicate Connector for Kafka, a source connector which runs inside Kafka Connect, has its own set of configuration options and requirements, and the details of this can be found here.

Monitoring

To monitor the Dbvisit Replicate once it is up and running there are a number of options available to you, and please follow the provided links for more information on each. Note that these notifications only inform about the operations of the Dbvisit Replicate MINE process, which itself is not aware of the Dbvisit Replicate Connection for Kafka operations and must be monitored separately.

  • the command console (which looks as follows):

    \ Dbvisit PureFlow 2.9.00(MAX edition) - Evaluation License expires in 60 days
    MINE is running. Currently at plog 1231 and SCN 27364050 (11/26/2016 08:53:32).
    Progress of replication REPCON:MINE->APPLY: total/this execution (stale)
    ------------------------------------------------------------------------------------------
    SOE.CUSTOMERS:                  Mine:105171/105171       Unrecov:0/0
    SOE.ADDRESSES:                  Mine:105359/105359       Unrecov:0/0
    SOE.CARD_DETAILS:               Mine:105232/105232       Unrecov:0/0
    SOE.WAREHOUSES:                 Mine:1000/1000           Unrecov:0/0
    SOE.ORDER_ITEMS:                Mine:722468/722468       Unrecov:0/0
    SOE.ORDERS:                     Mine:282339/282339       Unrecov:0/0
    SOE.INVENTORIES:                Mine:902640/902640       Unrecov:0/0
    SOE.PRODUCT_INFORMATION:        Mine:1000/1000           Unrecov:0/0
    SOE.LOGON:                      Mine:769848/769848       Unrecov:0/0
    SOE.PRODUCT_DESCRIPTIONS:       Mine:1000/1000           Unrecov:0/0
    SOE.ORDERENTRY_METADATA:        Mine:4/4                 Unrecov:0/0
    SOE.TEST1:                      Mine:19/19               Unrecov:0/0
    ------------------------------------------------------------------------------------------
    12 tables listed.
    dbvpf>

     
     

  • Email (for the MINE process)
  • SNMP (for the MINE process)
  • Console silent mode operations, outlined in this Dbvisit blog post.

Operations

How do we regenerate a PLOG?

If PLOG files should become corrupted, or are deleted accidentally before being ingested to Kafka it is possible to recreate them. A PLOG can be regenerated by the MINE process, prodiving that the related redo/archive logs are available, using an internal engine command.

The process is:

  1. Stop the Replicate Connector for Kafka (if it is running) and then the MINE process
  2. In the console type (use the PLOG sequence instead of X):

    dbvpf> ENGINE MINE RESET TO PLOG X
  3. On the source or Mine server, move all the plogs >= sequence X that are in the "mine" directory to another location for backup purposes.  
  4. Restart the MINE process and the Replicate Connector for Kafka

NOTE

The LCR numbering may differ for specific changes made by operations within transactions between multiple iterations of generating the same PLOG(s) . For already partially processed records, and depending on how you use/reference LCRs in Kafka, it may be important to take this into account. At some point in the future development of Dbvisit Replicate this will be addressed.