Monitoring available threads for an application in Azure

Recently I've been thinking how can I track amount of threads available for an application hosted in Azure cloud. It was due to early discovered performance issues, which should be addressed sooner than later. Since we've decided to be async-heavy and made many possible breaking changes, solid metrics related to different application domains should be introduced and implemented.

Since we're taking advantage of Application Insights when measuring performance, finding a way to utilize it and measure threads available for our web app seems natural and uncomplicated. Fortunately Microsoft.ApplicationInsights NuGet package contains ITelemetryInitializer, which looks like this:

using Microsoft.ApplicationInsights.Channel;

namespace Microsoft.ApplicationInsights.Extensibility
  /// <summary>
  /// Represents an object that initializes <see cref="T:Microsoft.ApplicationInsights.Channel.ITelemetry" /> objects.
  /// </summary>
  /// <remarks>
  /// The <see cref="T:Microsoft.ApplicationInsights.DataContracts.TelemetryContext" /> instances use <see cref="T:Microsoft.ApplicationInsights.Extensibility.ITelemetryInitializer" /> objects to
  /// automatically initialize properties of the <see cref="T:Microsoft.ApplicationInsights.Channel.ITelemetry" /> objects.
  /// </remarks>
  public interface ITelemetryInitializer
    /// <summary>
    /// Initializes properties of the specified <see cref="T:Microsoft.ApplicationInsights.Channel.ITelemetry" /> object.
    /// </summary>
    void Initialize(ITelemetry telemetry);

All it needs is to implement Initialize() method.

Worth noting is the fact, that such initializer will be hit by all traces so don't go crazy when using it to log data from external resources etc.

With such interface, creating a custom initializer, which will log information about threads is a piece of cake:

public class AvailableThreadsInitializer : ITelemetryInitializer
	public void Initialize(ITelemetry telemetry)
		int workerThreadsMax;
		int completionPortThreadsMax;
		int workerThreadsAvailable;
		int completionPortThreadsMaxAvailable;

		ThreadPool.GetMaxThreads(out workerThreadsMax, out completionPortThreadsMax);
		ThreadPool.GetAvailableThreads(out workerThreadsAvailable, out completionPortThreadsMaxAvailable);

		telemetry.Context.Properties["Max. threads"] = workerThreadsMax.ToString();
		telemetry.Context.Properties["Max. threads(async I/O)"] = completionPortThreadsMax.ToString();
		telemetry.Context.Properties["Available threads"] = workerThreadsAvailable.ToString();
		telemetry.Context.Properties["Available threads(async I/O)"] = completionPortThreadsMaxAvailable.ToString();

All what is left to do is to add this initializer to the collection of already defined initializers:

TelemetryConfiguration.Active.TelemetryInitializers.Add(new AvailableThreadsInitializer());

Thanks to this change, each trace will have an information about threads available for your application - for sure it will ease diagnosis when something is wrong with your web application.

Thanks to @marekgrabarz for revealing this feature to me!

Looking under the hood

Because of some people I've met in my career, I have the importunity of looking under the hood when using method e.g. from FCL. It's one of the best things you can learn and adapt - always make sure you have at least a small idea what's going on behind the scenes.

Recently I've been playing with Parallel.ForEach so I went through the code. The very first call looks like this:

Parallel.ForEach() -> ForEachWorker<TSource, object>()

There's nothing fancy going here. But let's go deeper and investigate ForEachWorker<TSource, object>() method:
// If it's an array, we can use a fast-path that uses ldelems in the IL.
TSource[] sourceAsArray = source as TSource[]; 
if (sourceAsArray != null)
      return ForEachWorker<TSource, TLocal>( 
         sourceAsArray, parallelOptions, body, bodyWithState, bodyWithStateAndIndex, bodyWithStateAndLocal,
         bodyWithEverything, localInit, localFinally); 

// If we can index into the list, we can use a faster code-path that doesn't result in
// contention for the single, shared enumerator object. 
IList<TSource> sourceAsList = source as IList<TSource>;
if (sourceAsList != null) 
     return ForEachWorker<TSource, TLocal>(
        sourceAsList, parallelOptions, body, bodyWithState, bodyWithStateAndIndex, bodyWithStateAndLocal, 
        bodyWithEverything, localInit, localFinally);

// This is an honest-to-goodness IEnumerable.  Wrap it in a Partitioner and defer to our 
// ForEach(Partitioner) logic.
return PartitionerForEachWorker<TSource, TLocal>(Partitioner.Create(source), parallelOptions, body, bodyWithState, 
        bodyWithStateAndIndex, bodyWithStateAndLocal, bodyWithEverything, localInit, localFinally); 
because this part is really well documented, you can clearly see what steps were taken to improve performance of this method. By knowing this you can adjust your code to make sure, you're using it in the most efficient way.
I strongly encourage you to always read source code of libraries you're using, so you can understand all gotchas and implementation details. In less than a minute you can find something, what will help you in making your code better and faster.