Tips&Tricks - When my data was replicated in RA-GRS?

Today is Friday so it's time for something quick and easy. When working with Azure Storage, you might wonder from time to time when your data was replicated recently. This gives some insight into how Azure Storage internally works and what are the drawbacks of this component. Let's find last synchronization date and consider for a moment what it really means for us.

Synchronization date and time

Before we start - to be actually able to find the synchronization timestamp the actual replication has to happen. This means, that for this feature LRS mode of a storage account, it just won't work. You may ask why - the answer is fairly simple. LRS replicates data only within a datacenter and, what is even more important, it does it synchronously. There's no replication between many regions = there's no such thing like synchronization because it's either data is saved and replicated or the whole operation fails.

Presumably synchronization timestamp should be available in other three models of data replication - ZRS, GRS and RA-GRS - but surprisingly... it's not. This feature works only for RA-GRS accounts because of one simple thing - this is the only mode, which allows you to read data from a secondary location. Of course it has some limits(like you cannot declare failover to another region), but finally you'll be able to read replicated data. 

You can easily read last synchronization date by going to Azure Portal and accessing e.g. Tables:

Initial status of Table Storage with RA-GRS mode

 

Is my data replicated?

This is a serious question - if geo-replication happens asynchronously, will my data be copied to the secondary location without corruption? Well, it depends - there's an obvious gap between the primary and the secondary storage and the answer is directly related to the fact when a disaster happened. What is important here is the fact, that Azure Storage out-of-the-box doesn't guarantee, that each and every record or blob will be replicated on time. Of course it doesn't mean, that data from the half of a dat will be lost - we're talking about a few minutes - but still for some systems losing a one record means being totally unreliable.

What can you do improve your guarantees and improve consistency in geo-replication? I strongly advise to read this article regarding possible outages in Azure Storage and consequences. What for sure you can do is to implement your own backup policy and support in-built mechanism of replication by performing synchronous writes to additional storage. Depending on your needs and expectation using a different storage(like CosmosDB, which introduced Table Storage on steroid) could be a viable solution also.

Achieving consistency in Azure Table Storage #2

In the previous post I presented some easy ways of achieving consistency in Table Storage by storing data within one partition. This works in the majority of scenarios, there're some cases however, when you have to divide records(because of the design, to preserve scalability or to logically separate different concerns) and still ensure, that you can perform operations within a transaction. You may be a bit concerned - all in all we just talked, that storing data within a single(at least from the transaction point of view) partition is required to actually be able to perform EGTs. Well - as always there's a solution to go beyond some limits and achieve what we're aiming for. Let's go!

Eventually consistent transactions

Just a note before we start - this pattern won't guarantee, that a transaction is isolated. This means that a client will be able to read data while a transaction is being processed. Unfortunately there's no easy way to completely lock tables while an inter-partition operation is being performed.

Let's back to our eventual consistency. What does it mean? The answer is pretty simple - once a transaction is finished, our data can be considered consistent. All right - but this is something new. What's the difference between transaction performed as EGT? 

In EGT your are performing maximally 100 operations without a possibility to see an ongoing process. In other words - you always see the result of a transaction. With eventual consistency you can divide the process into steps:

  • get an entity from Partition 1
  • inserty an entity into Partition 2
  • delete an entity from Partition 1

Of course you can have more than only 3 steps. The crux here is the clear division between each step. If we consider other operations performed during a transaction:

  • get an entity from Partition 1
  • get an entity from Partition 2
  • inserty an entity into Partition 2
  • get an entity from Partition 1
  • delete an entity from Partition 1

The whole view should be clearer. With eventual consistency those bolded steps stand for operations, which clearly are victims of read phenomenas. Always consider possible drawbacks of solutions like this and if needed, use other database which isolates transactions.

Achieving eventual consistency

To achieve eventual consistency we have to introduce a few other components to our architecture. We'll need at least two things:

  • queue which holds actions, which should be performed in a transaction
  • worker roles, which reads messages from a queue and perform the actual transactions

Now let's talk about each new component in details.

Queue

By using a queue we're able to easily orchestrate operations, which should be performed by worker roles. The easiest example is creating a project, which will archive records stored in Table Storage. Thanks to a queue we can post a message saying 'Archive a record', which can be read by other components and processed. Finally workers can post their messages saying, that an action has been finished. 

Worker role

When we're saying about workers we think about simple services, which perform some part of a flow. In eventual consistency pattern they're responsible for handling a transaction logic. If we come back to the example from the previous point, a worker would be responsible for moving an entity from the one table to another and then deleting it. The important note here is idempotence - you have to ensure, that you won't add more than one instance of an entity in the case of restarting the flow. The same goes when deleting things - you should delete only if an entity exists.

Considerations

You can apply this pattern not only to perform operations between different partitions - it also works when you're working with other components like blobs. It has some obvious drawbacks like lack of isolation or external segment, which have to be handled in your code. On the other hand it's a perfectly valid approach, especially in table-other_storage scenario.