Understanding Reasons For and Against Bi-Directional Replication

Problem Description

Understanding Reasons For and Against Bi-Directional Replication

 Strongly consider the reasons you're looking to go full bi-directional first and foremost. I found thorough personal experience that it is difficult to think of all the things that could go wrong with replication (until they did). Hindsight is truly twenty twenty. I found that others have a had similar experiences and consolidated theirs and mine into one doc.

Hope you may find them helpful.

Requirement for Success: “If you do not have the application owners 100% buy in to resolve issues including data issues the project is not going to succeed”

 Article

1 ) Sequences

       If any sequences are used to drive a table's PK, you're going to have to consider alternate approaches. Alternatives include, Odd/even for the 2 sites, using GUID or doing a mod(n) approach, where n is the number of replication points of origin you have.

 

2 ) Source database should be in forced logging mode with plenty of supplemental logging to support the keys - so you're going need space for a bit more redo.

By default Dbvisit enables supplemental logging on the primary key. If keys are not present then supplemental logging groups should be created to ensure all the columns are written to the redo (or at a minimum those that make the row unique).

 

3 ) Every table should have a PK or UK. If it doesn't, add a surrogate one. Trying to uniquely identify rows with multiple columns will require lots of extra redo supplemental logging (all columns) at the very least.

Additionally you may wish to determine if keys can be added.

 

Now for the hard ones…

4 ) Data consistency issues

Remember that you're replicating data between two databases, not replicating a database. As such this poses a new set of problems. What happens when the data already exists in this environment (prior insert) and the DB sees an insert operation. If it's the PK and/or unique constraint it will fail. You will want to do regular comparisons with the data.   It's not a question of if you'll run into inconsistencies. The real question is when and how you'll resolve them. Even with automated conflict handlers you must validate with the business the correctness of the rule and the action it should take. You need to consider the conflict resolution rules carefully as when you have a conflict that is not auto-resolved by a rule, all replication will stop until you resolve it introducing significant replication delay. When you have conflicting records there is no way you'll understand which record is 'correct' without the business' help. Consider how you will *prove* you have the same data in all DB's. What is your reconciliation process? Get the buy-in from your app/business owner out of the gate that they will help resolve data consistency issues.

 

5 ) "In flight" transactions

Any replication methodology on earth will introduce some sort of delay. Simple physics limits the speed of transmission from the US east coast to US west coast at about 70ms. What happens when people try to insert (or update) data in multiple Data Centers at the same time. What record do you want to keep in that scenario?

 

6 ) Application and schema upgrades now require 10 times more effort

If you're running live out of both data centers you have to insure that your application is very decoupled from the database schema. Think of this scenario. Your application is using all columns of a given table. It is running live in both data centers and knows the table structure underneath it. You upgrade the application in data center A, and add a column to a table. What does the application in data center B think when it sees the new column? Conversely, what happens when the new column has a not null constraint and sees data replicating in from data center B with that column having a null value?

 

 7 ) Additionally there may be a several issues you may consider as part of your implementation plan

There is time needed to mine the transaction from the source database then send to target database where it is applied which may range from a few seconds to minutes or hours depending on the load on the server & database, type of storage, network speed, etc. Under normal operating conditions it should within a few minutes until you have an application that is normally running in data center A, and it 'fails over' to data center B, ie app servers are rebooted in data center A and connectivity is lost so GSLB redirects the requests to data center B. Reality of today is that client requests never stop, and they're now asking for data they expect to exist when the app reconnects to data center B, and the database doesn't know of them yet (because of replication delay). You and the business will need to determine how these situations should be handled.

 An example would be where a client creates an account to make a purchase. They create the account in the database in data center A. Application servers go down in data center A and the user goes to make an additional purchase but it then prompts them to create an account so they create the account again & repurchase the 2 items. Now the application comes up in data center A and the original transaction of the initial account creation and purchase of items results in a conflict due to that account already existing in the database in data center B. Now you have a conflict and replication stops if you don’t have an automated way of resolving this conflict according to your business rules or someone manually reviewing the data in both databases at data center A and data center B.

There is no way a dba will know all the business rules for an application which is why you need the assistance, and backing of the business. From a technical perspective each of these issues are all easy to deal with. The problem is that behind every one of these technical issues is a people / political problem waiting to happen. A great item to research is the CAP theorem, also named Brewer's theorem (Consistency, Availability, Persistence). It will give you a fair understanding of the issues you'll be facing and ways an application can be engineered to get around some of them.

 

Conflict resolution is a massive undertaking which cannot be left by chance or until you are in production to figure out how to resolve. It requires a significant amount of planning. Many meetings need to be held with business teams, dba’s and developers discussing "so with this table... what when we have an update clash. What should we do? What value wins? Should we merge the data?" I read on a thread a person had an interesting saying that I liked that summed it up pretty well. ”Like Pokemon, you gotta catch 'em all or you will end up in a whole world of pain; of paused replication, incorrect resolutions causing data discrepancies, system functional issues and anxious senior management.” These are impossible to resolve at a solely database level. This is why it's crucial to have complete buy in from both the business and application team, ie ‘Requirement for Success’.