Migrating schema and data in Azure Table Storage

Recently I faced a problem, when I had to change and adjust schema in tables stored in Azure Table Storage. The issue there was to actually automate changes so I don't have to perform them manually on each environment. This was the reason why I created a simple library called AzureTableStorageMigratorwhich helps in such tasks and eases the whole process.

The basics

The base idea was to actually create two things:

  • a simple fluent API, which will take care of chaining all tasks
  • a table which will hold all migration metadata

Current version(1.0) gives you following possibilities:

  • void Insert<T>(string tableName, T entity, bool createIfNotExists = false)
  • void DeleteTable(string tableName)
  • void CreateTable(string tableName)
  • void RenameTable<T>(string originTable, string destinationTable)
  • void Delete<T>(string tableName, T entity)
  • void Delete(string tableName, string partitionKey)
  • void Delete(string tableName, string partitionKey, string rowKey)
  • void Clear(string tableName)

and when you take a look at the example of usage:

/
var migrator = new Migrator();
migrator.CreateMigration(_ =>
{
  _.CreateTable("table1");
  _.CreateTable("table2");
  _.Insert("table1", new DummyEntity { PartitionKey = "pk", RowKey = DateTime.UtcNow.Ticks.ToString(), Name = "foo"});
  _.Insert("table1", new DummyEntity { PartitionKey = "pk", RowKey = DateTime.UtcNow.Ticks.ToString(), Name = "foo2"});
  _.Insert("table2", new DummyEntity { PartitionKey = "pk", RowKey = DateTime.UtcNow.Ticks.ToString(), Name = "foo"});
}, 1, "1.1", "My first migration!");

you'll see, that's pretty straightforward and self-describing. 

The way how it works is very simple - each CreateMigration() method is described using 3 different values - its id, version number and description. Each time this method is called, it'll add a new record to the versionData table to make sure, that metadata is saved and the same migration won't be run twice.

Why should I use it?

In fact it's not a matter of what you "should" do but rather what is "good" for your project. Versioning is generally a good idea, especially if you follow CI/CD pattern, where the goal is to deploy and rollback with ease. If you perform migrations by hand, you'll eventually face the situation, where rollback is either very time-consuming or almost impossible. 

It's good to remember that making your database a part of your repository(of course in terms of storing schema, not data) is considered a good practice and is one of the main parts of many modern projects.

What's next?

I published ATSM because I couldn't find a tool similar to it, which would help me version tables in Table Storage easily. For sure some new features will be added in the future, however if you find this project interesting, feel free to post an issue or a request - I'll be more than happy to discuss it.

Limiting data being logged using Application Insights in Azure Functions

As you may know, Azure Functions have a preview of Application Insights integration enabled. This is another great addition to our serverless architecture since we don't have to add this dependency on our own - it's just there. However, there're some problems when it comes to handling the amount of data, which is being collected, especially when your're on an MSDN subscription.

Problem

When you enable Application Insights for your Function App, each function will start collecting different metrics(traces, errors, requests) at different scale. When you go to Azure Portal and access Data volume management tab in the Application Insights blade, you'll see, that there's one metric, which really exceeds our expectations(at least when it comes to the volume of the data traced):

As you can see, Message data takes 75% of the total amount of data collected

When you click on any bar, you'll access Data point volume tab - now we can understand, what kind of 'message' is really being logged:

Although chart says Message, data type related to this particular type of message is Trace

Configuring AI integration

Logging traces is perfectly fine, however we don't always want to do so(especially if you're on an MSDN subscription and don't want to be blocked). If you go to this page, you'll see a detailed information regarding both enabling and working with Application Insights. The most interesting part for us is the configuration section:

/
{
  "logger": {
    "categoryFilter": {
      "defaultLevel": "Information",
      "categoryLevels": {
        "Host.Results": "Error",
        "Function": "Error",
        "Host.Aggregator": "Information"
      }
    },
    "aggregator": {
      "batchSize": 1000,
      "flushTimeout": "00:00:30"
    }
  },
  "applicationInsights": {
    "sampling": {
      "isEnabled": true,
      "maxTelemetryItemsPerSecond" : 5
    }
  }
}

As you can see, we're able to set different levels for each category of data being logged. According to comments in this issue on GitHub, the easiest way to actually limit the data being logged is to set your configuration to the following:

/
{
  "logger": {
    "categoryFilter": {
      "defaultLevel": "Error",
      "categoryLevels": {
        "Host.Aggregator": "Information"
      }
    }
}

This way you should be able to avoid logging to much data or hitting your daily cap for Application Insights.

I strongly recommend you to play with AI integration in Azure Functions and provide feedback regarding possible features or enhancements. It's a great way to collaborate with a team responsible for a product and a chance to make it even better.