Tips&Tricks - When my data was replicated in RA-GRS?

Today is Friday so it's time for something quick and easy. When working with Azure Storage, you migh

Today is Friday so it's time for something quick and easy. When working with Azure Storage, you might wonder from time to time when your data was replicated recently. This gives some insight into how Azure Storage internally works and what are the drawbacks of this component. Let's find last synchronization date and consider for a moment what it really means for us.

Synchronization date and time

Before we start - to be actually able to find the synchronization timestamp the actual replication has to happen. This means, that for this feature LRS mode of a storage account, it just won't work. You may ask why - the answer is fairly simple. LRS replicates data only within a datacenter and, what is even more important, it does it synchronously. There's no replication between many regions = there's no such thing like synchronization because it's either data is saved and replicated or the whole operation fails.

Presumably synchronization timestamp should be available in other three models of data replication - ZRS, GRS and RA-GRS - but surprisingly... it's not. This feature works only for RA-GRS accounts because of one simple thing - this is the only mode, which allows you to read data from a secondary location. Of course it has some limits(like you cannot declare failover to another region), but finally you'll be able to read replicated data. 

You can easily read last synchronization date by going to Azure Portal and accessing e.g. Tables:

Initial status of Table Storage with RA-GRS mode


Is my data replicated?

This is a serious question - if geo-replication happens asynchronously, will my data be copied to the secondary location without corruption? Well, it depends - there's an obvious gap between the primary and the secondary storage and the answer is directly related to the fact when a disaster happened. What is important here is the fact, that Azure Storage out-of-the-box doesn't guarantee, that each and every record or blob will be replicated on time. Of course it doesn't mean, that data from the half of a dat will be lost - we're talking about a few minutes - but still for some systems losing a one record means being totally unreliable.

What can you do improve your guarantees and improve consistency in geo-replication? I strongly advise to read this article regarding possible outages in Azure Storage and consequences. What for sure you can do is to implement your own backup policy and support in-built mechanism of replication by performing synchronous writes to additional storage. Depending on your needs and expectation using a different storage(like CosmosDB, which introduced Table Storage on steroid) could be a viable solution also.

Add comment