Achieving consistency in Azure Table Storage #1

In the upcoming two posts I'll present you two ways of achieving consistency in Azure Table Storage. I split this topic into two parts mostly because one has to understand how transactions work in Table Storage before we go any further. Enough talking for now - let's dive deeper!

EGTs or Entity Group Transactions

This is something not so obvious initially and to be honest, I wasn't aware of this fact when I started working to Table Storage. This is mostly due to a simple reason - in documentation two terms - EGT and batch transactions - are often used alternately, but in reality they are the same thing. I guess most people are familiar with batching in this Azure component, but for the sake of clarity, I'll quote a bit of information here.

Tables in Azure Storage allow you to perform batches, which will be executed as a one operation. Consider following example(taken from https://docs.microsoft.com/en-us/azure/cosmos-db/table-storage-how-to-use-dotnet):

/
// Retrieve the storage account from the connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
    CloudConfigurationManager.GetSetting("StorageConnectionString"));

// Create the table client.
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

// Create the CloudTable object that represents the "people" table.
CloudTable table = tableClient.GetTableReference("people");

// Create the batch operation.
TableBatchOperation batchOperation = new TableBatchOperation();

// Create a customer entity and add it to the table.
CustomerEntity customer1 = new CustomerEntity("Smith", "Jeff");
customer1.Email = "Jeff@contoso.com";
customer1.PhoneNumber = "425-555-0104";

// Create another customer entity and add it to the table.
CustomerEntity customer2 = new CustomerEntity("Smith", "Ben");
customer2.Email = "Ben@contoso.com";
customer2.PhoneNumber = "425-555-0102";

// Add both customer entities to the batch insert operation.
batchOperation.Insert(customer1);
batchOperation.Insert(customer2);

// Execute the batch operation.
table.ExecuteBatch(batchOperation);

There are however some restrictions:

  • all operations have to be executed withing a single partition
  • you're limited to only 100 operations per batch

Depending on how you designed your table and what you're actually building, those numbers can be potentially more or less problematic. Let's try to investigate possible patterns, which can be helpful here.

Denormalization

Since many developers came from the world of relational database, normalizing tables in a database is their second nature. This is a great skill... and unfortunately it becomes a real pain in the arse when working with NoSQL storages. Let's say we have a many-to-one relation. Now if on the left side we have more than 100 items, which we'd like to move to another entity, we can lost consistency since we have limited number of operation we can perform at once. In such scenario it could be viable to store references to items in the main table to able to perform a transaction(of course as long as we are in a single partition scenario).

Denormalization & performance

We can extend the previous example a bit and consider following scenario - we'd like to improve performance by limit the number of requests we have to perform when making a query to more than a one table(e.g. employee + employee's last payslip). To do so we could duplicate data and store the second table as an extension to the first. To achieve consistency we'd have to ensure, that both tables are in the same partition(so we can update both table in the transaction).

Intra-partition secondary index pattern

Similar pattern to the inter-partition secondary pattern, which I described previously(this one however lets you achieve consistency by using EGTs since all data is stored in the same partition).

Considerations

When considering consistency in Table Storage and storing data in the same partition(or at least duplicating it by creating secondary indexes), you have to think about your scalability targets. As you may know, minimizing the number of partitions can affect how your solutions scales in the future because they are the main factor of load balancing requests. As always all depends on the characteristics of your application and what features of this storage your're interested the most.

What's next?

In the next post we'll focus on inter-partition transactions and what can be done in that area. Stay tuned!

It's so easy - backup build and release definitions from VSTS using Azure Functions

When working with build and release definitions in VSTS we're blessed with the possibility to check audit logs, what was changed, when and by who. This - together with proper permissions setup - allow proper access management and easy rollback if something was misused. Unfortunately VSTS lacks an easy way to export those definitions so we can backup them or version in our repository. In this post I'll show you a quick way to schedule daily backups using Azure Functions.

Prerequisities

To perform actions from this post you'll need Visual Studio 2017 15.3 with Azure Functions SDK installed. Since those tools are no longer in preview, I no longer use CSX to create examples and proofs of concepts. I strongly advise you to update to the latest VS version so you can take the most from the new SDK.

What is more you'll need also a personal access token(PAT) from VSTS. Please read this article if you haven't for an idea how to get it.

Creating functions

To be able to schedule our backup, we'll need two functions. Both we'll be triggered by a timer and both will upload a blob to a Blob Storage container. Here's our infrastructure needed:

ARM template visualization created by ARMata

As you can see this is the basic infrastructure needed to be able to use Functions, which can be easily set up in Azure Portal. Once we have required components provisioned, we can prepare code, which will create backups.

In VS when you go to Create project wizard, you'll see a window with available templates. When you go to the Cloud tab you should see Azure Functions template ready to be created:

Once a project is created right-click on it, to to Add menu and select New item:

From the available positions select Azure Function and click Add. You'll see plenty of different function templates, from which we have to choose Timer trigger. Change the schedule to 0 0 0 */1 * * so it will be triggered once a day and click Ok.

Creating a backup

To create a backup we'll use once more VSTS REST API. Here are endpoint, which we'll use here:

They return JSON definitions, which can be easily stored and versioned. The actual code for creating a build definition backup looks like this:

/
public static class BuildBackup
{
	private const string Personalaccesstoken = "PAT";

	[FunctionName("BackupBuild")]
	public static async Task Run([TimerTrigger("0 */1 * * * *")]TimerInfo myTimer, [Blob("devops/build.json", FileAccess.Write)] Stream output, TraceWriter log)
	{
		try
		{
			using (var client = new HttpClient())
			{
				client.DefaultRequestHeaders.Accept.Add(
					new MediaTypeWithQualityHeaderValue("application/json"));

				client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic",
					Convert.ToBase64String(
						System.Text.Encoding.ASCII.GetBytes(
							string.Format("{0}:{1}", "", Personalaccesstoken))));

				using (var response = await client.GetAsync(
					$"https://{instance}.visualstudio.com/DefaultCollection/{project}/_apis/build/definitions?api-version=2.0")
				)
				{
					var data = await response.Content.ReadAsAsync<JObject>();
					foreach (var pr in data.SelectToken("$.value"))
					{
						var id = pr.First.SelectToken("$.id");
						using (var release = await client.GetAsync(
							$"https://{instance}.visualstudio.com/DefaultCollection/{project}/_apis/build/definitions/{id}?api-version=2.0")
						)
						{
							release.EnsureSuccessStatusCode();
							var releaseData = await release.Content.ReadAsStringAsync();
							var bytes = Encoding.UTF8.GetBytes(releaseData);
							await output.WriteAsync(bytes, 0, bytes.Length);
						}
					}
				}
			}
		}
		catch (Exception ex)
		{
			log.Info(ex.ToString());
		}
	}
}

To create a backup of a release definition you can use following function:

/
public static class ReleaseBackup
{
	private const string Personalaccesstoken = "PAT";

	[FunctionName("BackupRelease")]
	public static async Task Run([TimerTrigger("0 0 0 */1 * *")]TimerInfo myTimer, [Blob("devops/release.json", FileAccess.Write)] Stream output, TraceWriter log)
	{
		try
		{
			using (var client = new HttpClient())
			{ 
				client.DefaultRequestHeaders.Accept.Add(
					new MediaTypeWithQualityHeaderValue("application/json"));

				client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic",
					Convert.ToBase64String(
						Encoding.ASCII.GetBytes(
							string.Format("{0}:{1}", "", Personalaccesstoken))));

				using (var response = await client.GetAsync(
					"https://{instance}.vsrm.visualstudio.com/{project}/_apis/Release/definitions")
				)
				{
					var data = await response.Content.ReadAsAsync<JObject>();
					foreach (var pr in data.SelectToken("$.value"))
					{
						var id = pr.First.SelectToken("$.id");
						using (var release = await client.GetAsync(
							$"https://{instance}.vsrm.visualstudio.com/{project}/_apis/Release/definitions/{id}")
						)
						{
							release.EnsureSuccessStatusCode();
							var releaseData = await release.Content.ReadAsStringAsync();
							var bytes = Encoding.UTF8.GetBytes(releaseData);
							await output.WriteAsync(bytes, 0, bytes.Length);
						}
					}
				}
			}
		}
		catch (Exception ex)
		{
			log.Info(ex.ToString());
		}
	}
}

Some details:

  • I used a blob container named devops  - of course you can use any name you like
  • Unfortunately there's no way to combine those two functions(as long as you'd like to use different blob for holding build and release definitions)
  • You can easily version those JSON definitions by - instead of storing them in Blob Storage - calling a VSTS REST API for a repository and uploading a blob there