Ldstr, where are you?

In the meantime I'm working on a Fody add-in, which can validate SQL query if it finds one. Details will be provided soon - now I'd like to describe one issue I had when analyzing MSIL code for it

Internally this add-in scans MSIL instructions in an assembly trying to find ldstr opcode in methods' bodies. There is no magic - I haven't found better way to find strings in the generated code, mostly because strings are not decorated in MSIL in any way. It worked just fine for most of test cases I created. I decided to test it in my side-project against more realistic use cases. The output turned out to be somehow surprising:

Fody: Fody (version 1.29.4.0) Executing
Fody/Stamp:   Starting search for git repository in SolutionDir
Fody/Stamp:   Found git repository in E:\Codenova\Klienci\MikroSystem\vNext\.git\
Fody/QueryValidator:   Found 0 queries to validate.
Fody/QueryValidator:   Trying to get configuration for E:\Codenova\Klienci\MikroSystem\vNext\LicznikNET.vNext.Api\obj\Debug\LicznikNET.vNext.Api.exe
Fody/QueryValidator:   Found configuration file E:\Codenova\Klienci\MikroSystem\vNext\LicznikNET.vNext.Api\app.config
Fody/QueryValidator:   Connection string is Data Source=.\SQLEXPRESS;Initial Catalog=vNext;Persist Security Info=True;User ID=foo;Password=bar;MultipleActiveResultSets=True;

What the heck?! I have almost 100 SQL queries in my code and still it found nothing? Impossibru!

I decided to run dotPeek and do some "visual debugging". Let's say we have following method:

public void Bar()
{
     using (var connection = new SqlConnection())
     {
          connection.Query(@"|> SELECT * FROM dbo.Foo");
     }
}

This works perfectly fine, the generated MSIL is what I expected:

IL_0006: ldloc.0      // connection
IL_0007: ldstr        "|> SELECT * FROM dbo.Foo"
IL_000c: ldnull       
IL_000d: ldnull       
IL_000e: ldc.i4.1     
IL_000f: ldloca.s     V_1
IL_0011: initobj      valuetype [mscorlib]System.Nullable`1<int32>
IL_0017: ldloc.1      // V_1
IL_0018: ldloca.s     V_2
IL_001a: initobj      valuetype [mscorlib]System.Nullable`1<valuetype [System.Data]System.Data.CommandType>
IL_0020: ldloc.2      // V_2
IL_0021: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<object> [Dapper]Dapper.SqlMapper::Query(class [System.Data]System.Data.IDbConnection, string, object, class [System.Data]System.Data.IDbTransaction, bool, valuetype [mscorlib]System.Nullable`1<int32>, valuetype [mscorlib]System.Nullable`1<valuetype [System.Data]System.Data.CommandType>)
IL_0026: pop    

But this doesn't reproduce my problem because in my side-project all my DB calls are asynchronous(using Dapper of course). Fixed example could look like this:

public async Task Foo()
{
     using (var connection = new SqlConnection())
     {
          await connection.QueryAsync(@"|> SELECT * FROM dbo.Foo");
     }
}

Now, when I check MSIL, the root of all evil will be revealed:

IL_0000: ldloca.s     V_0
IL_0002: call         valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::Create()
IL_0007: stfld        valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder QueryValidator.Fody.TestWeb.TestClass/'<Foo>d__0'::'<>t__builder'
IL_000c: ldloca.s     V_0
IL_000e: ldc.i4.m1    
IL_000f: stfld        int32 QueryValidator.Fody.TestWeb.TestClass/'<Foo>d__0'::'<>1__state'
IL_0014: ldloc.0      // V_0
IL_0015: ldfld        valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder QueryValidator.Fody.TestWeb.TestClass/'<Foo>d__0'::'<>t__builder'
IL_001a: stloc.1      // V_1
IL_001b: ldloca.s     V_1
IL_001d: ldloca.s     V_0
IL_001f: call         instance void [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::Start<valuetype QueryValidator.Fody.TestWeb.TestClass/'<Foo>d__0'>(!!0/*valuetype QueryValidator.Fody.TestWeb.TestClass/'<Foo>d__0'*/&)
IL_0024: ldloca.s     V_0
IL_0026: ldflda       valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder QueryValidator.Fody.TestWeb.TestClass/'<Foo>d__0'::'<>t__builder'
IL_002b: call         instance class [mscorlib]System.Threading.Tasks.Task [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder::get_Task()
IL_0030: ret  

When implementing my add-in I forgot, that compiler converts all async calls into state machines(or to be more specific - into nested classes implementing IAsyncStateMachine interface). Because I wasn't scanning nested classes(I believe it was a bug anyway), no async query could be found. After loading instructions for nested classes also, I am sure it works as intended:

Fody: Fody (version 1.29.4.0) Executing
Fody/Stamp:   Starting search for git repository in SolutionDir
Fody/Stamp:   Found git repository in E:\Codenova\Klienci\MikroSystem\vNext\.git\
Fody/QueryValidator:   Found 94 queries to validate.
Fody/QueryValidator:   Trying to get configuration for E:\Codenova\Klienci\MikroSystem\vNext\LicznikNET.vNext.Api\obj\Debug\LicznikNET.vNext.Api.exe
Fody/QueryValidator:   Found configuration file E:\Codenova\Klienci\MikroSystem\vNext\LicznikNET.vNext.Api\app.config
Fody/QueryValidator:   Connection string is Data Source=.\SQLEXPRESS;Initial Catalog=vNext;Persist Security Info=True;User ID=foo;Password=bar;MultipleActiveResultSets=True;

Boxing/unboxing - treacherous conversion

Boxing/unboxing conversions are one of the most popular interview questions so I'm not going to explain them in this post(who wants to read another description anyway). Instead I will present one example, which will ensure you, that you understand "what is going on" completely. The example originates from a great book "CLR via C#" by Jeffrey Richter. If you haven't got a chance, I strongly recommend you to read it - it's a fantastic collection of many gotchas in C#/CLR.

Let's say we have following struct in our code:

public struct Point
{
        private int _x;
        private int _y;

        public Point(int x, int y)
        {
            _x = x;
            _y = y;
        }

        public void Change(int x, int y)
        {
            _x = x;
            _y = y;
        }

        public override string ToString()
        {
            return $"{_x},{_y}";
        }
}

(yes, I know that mutable structs are evil - it's not the case). Let's try to play with it and display something:

class Program
{
        static void Main(string[] args)
        {
            var point = new Point(1, 1);
            Console.WriteLine(point);

            point.Change(2, 2);
            Console.WriteLine(point);

            var o = (object)point;
            Console.WriteLine(o);

            ((Point)o).Change(3, 3);
            Console.WriteLine(o);

            Console.ReadLine();
        }
}

The question is - what do you expect a console will display?

1,1

2,2

2,2

3,3

This is what my first thought was like. This is what our intuition tells us. But hey, let's start this program:

1,1

2,2

2,2

2,2

This is something unexpected. How is it possible, that we are missing changing our point to (3, 3)?

The "problem" with this example for most people is, that they forget how unboxing is supposed to work. Casting o to Point doesn't mean, that we are changing its type. We are trying to represent a reference type stored on a managed heap as a value type, which needs to be pushed onto the local thread stack. To do that, compiler has to emit an additional variable, which will store contents of this conversion. Let's check MSIL for this operation:

IL_003a: ldloc.1      // o
IL_003b: unbox.any    Program.Point
IL_0040: stloc.2      // V_2
IL_0041: ldloca.s     V_2
IL_0043: ldc.i4.3     
IL_0044: ldc.i4.3     
IL_0045: call         instance void Program.Point::Change(int32, int32)
IL_004a: nop 

As you can see, compiler emited a V_2 variable, which is supposed to store unboxing result. Then this variable is loaded onto evaluation stack and Change() method is being invoked. Because we don't have any reference to it, we actually don't see, that we are trying to change a Point, that we never expected to be created. Just to make sure, we can check emitted code for writing a result:

IL_004b: ldloc.1      // o
IL_004c: call         void [mscorlib]System.Console::WriteLine(object)
IL_0051: nop   

If we compare it with local variables:

.locals init (
      [0] valuetype Program.Point point,
      [1] object o,
      [2] valuetype Program.Point V_2
 )

we can see, that Console.WriteLine() is being called for an o variable, thus Change() method is never called for it.

Summary

Boxing/unboxing conversion can be treacherous because of the all differences between value and reference types. Above example can be fixed if we use an interface, which declares Change() method - in such case no conversion will be needed. If you are interested in such "not-so-obvious" cases, I strongly recommend you to check ProblemBook.NET book, where you can find even more examples.