Analyze Table Storage queries locally

In general Table Storage has this awesome feature to check what kind of heresy we're creating by turning on diagnostics logs in the portal. It allows us to analyze queries and operations, so it's possible to find potentially problematic places and tweak them with ease. This functionality has one major drawback - we have to actually deploy our application and run it in the cloud. 

For most of the time you'll be developing using Table Storage Emulator. It'd great to check all our queries right after developing a feature. What options do we have? In fact Emulator has its own logs - they won't be as good and smooth as with full diagnostics, but for most scenarios they should be sufficient.

In the portal we have plenty of options we can use to analyze Table Storage performance

Enabling logging locally

To enable logging, you have to rutn this feature on in the Emulator config. You can find it located here:

C:\Users\{USER}\AppData\Local\AzureStorageEmulator

My config looks like this:

/
<?xml version="1.0"?>
<StorageEmulator xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <SQLInstance>(localdb)\MSSQLLocalDB</SQLInstance>
  <PageBlobRoot>C:\Users\kamz\AppData\Local\AzureStorageEmulator\PageBlobRoot</PageBlobRoot>
  <BlockBlobRoot>C:\Users\kamz\AppData\Local\AzureStorageEmulator\BlockBlobRoot</BlockBlobRoot>
  <LogPath>C:\Users\kamz\AppData\Local\AzureStorageEmulator\Logs</LogPath>
  <LoggingEnabled>false</LoggingEnabled>
</StorageEmulator>

To turn logging on you have to change LoggingEnabled property to true. If you have Emulator running, you'll have to restart it.

Reading logs

Once logging is enabled you'll see, that in the following location:

C:\Users\{USER}\AppData\Local\AzureStorageEmulator\Logs

Some files are starting to appear. There're logs with different levels of severity + some performance related information. If everything's running smoothly most of those files won't be helpful for you. In my scenario I find Verbose_debug file most interesting. 

This file contains information about requests which are handled by our Emulator. Consider following log:

/
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Got request GET http://127.0.0.1:10002/devstoreaccount1/Tables('User') 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Tables('User')
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Leaving concurrent request: 0
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Entering concurrent request: 1
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Got request GET http://127.0.0.1:10002/devstoreaccount1/Tables('UserXCompany') 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Tables('UserXCompany')
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Leaving concurrent request: 0
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Entering concurrent request: 1
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Got request GET http://127.0.0.1:10002/devstoreaccount1/Tables('Company') 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Tables('Company')
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Leaving concurrent request: 0
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Entering concurrent request: 1
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Got request GET http://127.0.0.1:10002/devstoreaccount1/User?$top=20 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/User?$top=20
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Leaving concurrent request: 0
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Entering concurrent request: 1
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Got request GET http://127.0.0.1:10002/devstoreaccount1/UserXCompany?$filter=PartitionKey%20eq%20%27b55cf6d6-0012-480d-a792-9d4bb0813431%27 2015-12-11
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/UserXCompany?$filter=PartitionKey%20eq%20%27b55cf6d6-0012-480d-a792-9d4bb0813431%27
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Setting status code to 200
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] In the process end callback
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Leaving concurrent request: 0
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Entering concurrent request: 1
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Got request GET http://127.0.0.1:10002/devstoreaccount1/Company?$filter=RowKey%20eq%20%272de37ce9-b012-43f9-9c09-ba1107bddaaf%27&$top=1 2015-12-11
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Company?$filter=RowKey%20eq%20%272de37ce9-b012-43f9-9c09-ba1107bddaaf%27&$top=1
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Setting status code to 200
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] In the process end callback
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Leaving concurrent request: 0

There're many information which we can get from it:

  • How many operations were handled during our queries
  • Which operations were handled concurrently
  • Which tables were queries
  • What queries were used to obtain information

Let's check how it related to the real function. Above log was generated for the following function:

/
[FunctionName("UserList")]
public static Task<HttpResponseMessage> UserList(
	[HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "user")] HttpRequestMessage req,
	[Table("User", "AzureWebJobsStorage")] IQueryable<UserEntity> users,
	[Table("UserXCompany", "AzureWebJobsStorage")] IQueryable<UserXCompanyEntity> usersXCompanies,
	[Table("Company", "AzureWebJobsStorage")] IQueryable<Company.CompanyEntity> companies,
	TraceWriter log)
{
	var usersQuery = users.Take(20).ToList().Select(_ => new
	{
		Login = _.Login,
		Name = _.Name,
		Surname = _.Surname,
		Company = FindUserCompanies(_, usersXCompanies, companies)
	});

	var response = req.CreateResponse(HttpStatusCode.OK,
		usersQuery);

	return Task.FromResult(response);
}

private static List<string> FindUserCompanies(UserEntity user, IQueryable<UserXCompanyEntity> usersXCompanies,
	IQueryable<Company.CompanyEntity> companies)
{
	var values = usersXCompanies.Where(_ => _.PartitionKey == user.RowKey).ToList();
	return values.Select(value => companies.Where(_ => _.RowKey == value.RowKey).First().Name).ToList();
}

I was expecting to see 3 queries(per each User loaded). Note that those queries were created dynamically thanks to using IQueryable<>:

  • GET 'User', TOP 20
  • GET 'UserXCompany', PartitionKey = User.RowKey
  • GET 'Company', TOP 1, RowKey = UserXCompany.RowKey

Did I get them?

16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Got request GET http://127.0.0.1:10002/devstoreaccount1/User?$top=20 2015-12-11

16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Got request GET http://127.0.0.1:10002/devstoreaccount1/UserXCompany?$filter=PartitionKey%20eq%20%27b55cf6d6-0012-480d-a792-9d4bb0813431%27 2015-12-11

16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Got request GET http://127.0.0.1:10002/devstoreaccount1/Company?$filter=RowKey%20eq%20%272de37ce9-b012-43f9-9c09-ba1107bddaaf%27&$top=1 2015-12-11

What now?

Now when I'm able to see how it looks under the hood I'm both confident, that the code I wrote works as expected and I'm able to tweak queries if I must. Maybe I should store Company information with a User to save one query? Maybe I should limit UserXCompany rows fetched? There many possibilities but once you're ready for changes, things should be easier in the future.

 

 

Azure Table Storage good practices - log tail pattern

Although Azure Table Storage is pretty simple and straightforward solution, designing it so you can take the most from it is not an easy task. I've seen some projects, where invalid decisions regarding tables structure, wrong partition keys values or "who cares" row keys led to constant degradation of performance and raising cost of maintenance. Many of those problems could be avoided by introducing smart yet easy to introduce patterns, from which one I like the most is the log tail pattern.

The problem

Let's consider following table:

/
PK | RK | Timestamp | Event

As you may know, rows inserted into Azure Storage table are always stored in ascending order using a row key. This allow you to go from the start to the end of a segments pretty quickly and with predictable performance. If we assume, that we have following rows in our table:

/
PK | RK | Timestamp  | Event
foo   1   01/01/2000  { "Event": "Some_event"}
foo   2   01/01/2000  { "Event": "Some_event"}
foo   3   01/01/2000  { "Event": "Some_event"}
(...)
foo   9999   01/01/2000  { "Event": "Some_event"}
foo   10000   01/01/2000  { "Event": "Some_event"}

we can really quickly fetch a particular row because we can easily query it using PK and RK. We know that 1 will be before 10 and 100 will be stored before 6578. The problem happens when we cannot quickly determine which RK is bigger - mostly because we used e.g. a custom identifier like combination of a GUID and a timestamp. This forces us often to query large portions of a table just to find the most recent records. It'd possible to use a statement in our query like WHERE RowKey > $some_value, but it still introduces some overhead, which could be critical in some scenarios. How can we store our data in Table Storage and retrieve most recent records quickly and efficiently?

Log tail pattern

Fortunately the solution is really easy here and if decision is made early, it doesn't require much effort to introduce. The idea is simple - find a row key, which will "reorder" our table in a way, that the newest rows are also the first ones in a table. The concept seems to be tricky initially, but soon will be the first thing you think about when you hear "Azure Table Storage" ;)

Let's consider following solution(taken from here):

/
string invertedTicks = string.Format("{0:D19}", DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks);
DateTime dt = new DateTime(DateTime.MaxValue.Ticks - Int64.Parse(invertedTicks));

This will reverse the order how rows are stored in a table and allow you to quickly fetch those you're interested in the most. It's especially useful when creating all kinds of logs and appending to them, where you're usually interested mostly in the most recent records.

Summary

I rely heavily on this pattern in many projects I've created(both for my private use and commercial) and it really helps in creating efficient table structure, which can be easily queried and is optimized for a particular use. Of course it cannot be used in all scenarios, but for some it's a must have.

One of the things you have to consider here is that you must pad the reverse tick value with leading zeroes to ensure the string value sorts as expected(something that is fixed by string.Format() in the example). Without this fix you can end with incorrectly ordered rows. Nonetheless it's a small price you have to pay for a proper design and performance.