What is replica lag?

What is replica lag?

To understand what replica lag is, we first have to put a bit of context. Replication lag occurs in a special type of database instance that we call read replicas. These are created from a source DB instance that acts as the primary database. Updates (inserts of new data) made to the primary database are copied asynchronously to the read replica. Having these types of replicas allows you to reduce the load on the primary DB instance by routing read-type queries to those instances. Replication between these instances occurs through a secure communication channel and asynchronously.

So what is replica lag and how does it affect our instances? Replication lag is the delay that occurs in asynchronous replication between the primary and read databases. To identify the status of the replication, you will need to look at the binlog dump, io_thread, and sql_thread threads. Commands such as show master status, show slave status executed on the respective instances and continuous monitoring help us identify if there is any delay that could affect performance causing data inconsistency.

We cannot eliminate replica lag but we must reduce it as much as possible. Determine if there are configuration differences between the primary and replica instance, too much write workload on the primary instance, too long transactions, incorrect parameter settings, review version changelogs… these are what we usually do at Geko Cloud to decide if any type of update or other type of action should be carried out.

Lag replication example

This is what recently happened to a client who, after updating the version of the main database and its respective mariadb replicas to v.10.4.24, began to notice a deterioration in the replica lag as we can see in the graph.

Our team was investigating the problem and after reviewing the Changelogs from version 10.5.0 to 10.5.17, among other things, we were able to notice that this latest version solved the replication lag problems caused by the previous version. Also, we found that the best thing to do in this case was to upgrade the replicas to version 10.15.17 and leave the primary database at version 10.4.24. To do this, we created two more reading replicas, which we updated to that version in such a way that we could verify that the replica lag was effectively reduced again, as can be seen in the graph.

These graphs were taken from the two read replicas with versions 10.4.24 and 10.5.17 of mariadb in the same time slot to verify that the replica lag had actually decreased.

 

We hope this can give you an idea of how the Geko Cloud team can help you if you need to deploy your services in a Cloud environment. Just contact us and we will be happy to help you.

Leave a Reply

Your email address will not be published. Required fields are marked *