Leverage static lambdas in C# to avoid memory allocations
08 Apr 2025Today I will have a look at how using delegates in C# affects the app’s memory footprint and how this can be optimized in certain scenarios, e.g. when using closures.
Intro
Recently within one of my pull requests, I changed the LINQ statement .Where(item => item.SomeProperty == "SomeFixedValue" && item.State == States.Ready)
to .Where(static item => item.SomeProperty == "SomeFixedValue" && item.State == States.Ready)
. As you can see, all I did was add the static
keyword to the declaration of the anonymous delegate/lambda. A colleague reviewing this PR approached me and told me that he’d never seen this before and was interested in why I did this. So I thought this might be a good thing for a dedicated blog post.
For the sake of brevity, I will use the term lambda as a synonym for an anonymous function, anonymous delegate, expression lambda, and statement lambda. Of course, there are differences (see the official Microsoft docs), but they are not relevant to this article.
💡 In this article, the concept of closures will be refered to. So let’s quickly recap what a closure is. For this, take the following code
Enumerable.Range(0, 100_000).Sum(item => item * multiplier);
As you can see, an array with 100.000 elements is created. Afterwards, every item is multiplied by 2 and all data are summed up. In this case, the variablemultiplier
is a closure of the lambdaitem => item * multiplier
, as the compiler has to capture the value ofmultiplier
and provide it to the lambda upon execution.
Running some benchmarks
With the scene being set, let’s dive into some code 🤓 I’m a “numbers guy”, and I want to see measurement results, stuff like that. Therefore, I really love the tool BenchmarkDotNet for doing micro benchmarking, as it takes care of a ton of aspects to provide meaningful, reproducible measurements.
First of all, this is our benchmarking code which we will go through line by line in a second:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Benchmarks>();
[MemoryDiagnoser]
[ShortRunJob]
public class Benchmarks
{
private static readonly long StaticMultiplier = 2;
private int[] _items = [];
[Params(100_000, 10_000_000)]
public int NumberOfItems;
[GlobalSetup]
public void GlobalSetup() => _items = Enumerable.Range(0, NumberOfItems).ToArray();
[Benchmark(Baseline = true)]
public long LinqWithMultiplierAsClosure()
{
var multiplier = 2L;
return _items.Sum(item => item * multiplier);
}
[Benchmark]
public long LinqWithMultiplierAsConstant() => _items.Sum(item => item * 2L);
[Benchmark]
public long LinqStaticWithMultiplierAsConstant() => _items.Sum(static item => item * 2L);
[Benchmark]
public long LinqStaticWithMultiplierAsStaticField() => _items.Sum(static item => item * StaticMultiplier);
}
By executing BenchmarkRunner.Run<Benchmarks>()
, BenchmarkDotNet will run all benchmarks being marked with the [Benchmark]
attribute inside the class Benchmarks
. So there are the following four benchmarks:
LinqWithMultiplierAsClosure
→ this is our baseline to compare with, which contains a closure of the variablemultiplier
.LinqWithMultiplierAsConstant
→ instead of capturing the multiplier as a closure, directly embed it into the lambda as a local.LinqStaticWithMultiplierAsConstant
→ instead of having a dedicated variablemultiplier
for the value2
, it is now moved into the lambda as a local. This avoids the closure capture.LinqStaticWithMultiplierAsStaticField
→ instead of having a dedicated variablemultiplier
for the value2
, it is now moved into a static field. This avoids the closure capture.
Last but not least, the benchmark is parameterized via the public field NumberOfItems
, i.e. all four benchmarking methods will be called twice: once with an array of 100.000 elements and another time with 10.000.000 elements. We do this to see whether the amount of input data plays a role in the allocation.
Benchmarking results
Running the benchmark gives the following results:
Method | NumberOfItems | Allocated | Alloc Ratio |
---|---|---|---|
LinqWithMultiplierAsClosure |
100.000 | 120 B | 1.00 |
LinqWithMultiplierAsConstant |
100.000 | 32 B | 0.27 |
LinqStaticWithMultiplierAsConstant |
100.000 | 32 B | 0.27 |
LinqStaticWithMultiplierAsStaticField |
100.000 | 32 B | 0.27 |
LinqWithMultiplierAsClosure |
10.000.000 | 132 B | 1.00 |
LinqWithMultiplierAsConstant |
10.000.000 | 44 B | 0.33 |
LinqStaticWithMultiplierAsConstant |
10.000.000 | 44 B | 0.33 |
LinqStaticWithMultiplierAsStaticField |
10.000.000 | 44 B | 0.33 |
That’s a lot of numbers 🤯 so let me explain:
- Method → the method name of the current benchmark.
- NumberOfItems → the number of elements within the array.
- Allocated → the amount of memory allocated by the code under benchmark.
As we can see, both static
cases perform better in terms of memory (32 bytes vs. 120 bytes). Interestingly though, there is no difference between the two static
cases. To understand this better, we need to take a look at what’s going on under the hood, i.e. what the compiler does. Of course, we could drive stick shift, directly jump into the IL code emitted by the compiler, and figure it out on a very low level. Although this is sometimes inevitable when analyzing certain issues, in this case checking the lowered C# code is sufficient.
💡 Lowering is the process when the compiler converts a high-level language feature into a lower-level language feature. For example, the C# compiler lowers a
foreach
loop into an old-fashionedfor
loop.
There are a bunch of tools out there for this, but as I’d like to stay in the inner dev loop, I prefer to use the IL Viewer tool within my JetBrains Rider IDE and set it to Low-Level C#.
Static vs. non-static with closure
Here’s the lowered C# code for LinqWithMultiplierAsClosure
vs. LinqStaticWithMultiplierAsConstant
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public class Benchmarks
{
public long LinqWithMultiplierAsClosure()
{
Benchmarks.<>c__DisplayClass4_0 cDisplayClass40 = new Benchmarks.<>c__DisplayClass4_0();
cDisplayClass40.multiplier = 2L;
return ((IEnumerable<int>) this._items).Sum<int>(new Func<int, long>((object) cDisplayClass40, __methodptr(<LinqWithMultiplierAsClosure>b__0)));
}
public long LinqStaticWithMultiplierAsConstant()
{
return ((IEnumerable<int>) this._items).Sum<int>(Benchmarks.<>c.<>9__5_0 ?? (Benchmarks.<>c.<>9__5_0 = new Func<int, long>((object) Benchmarks.<>c.<>9, __methodptr(<LinqStaticWithMultiplierAsConstant>b__5_0))));
}
[CompilerGenerated]
private sealed class <>c
{
public static Func<int, long> <>9__5_0;
internal long <LinqStaticWithMultiplierAsConstant>b__5_0(int item)
{
return (long) item * 2L;
}
}
[CompilerGenerated]
private sealed class <>c__DisplayClass4_0
{
public long multiplier;
internal long <LinqWithMultiplierAsClosure>b__0(int item)
{
return (long) item * this.multiplier;
}
}
}
Of course, this is rather ugly to read - the compiler has to make sure that the lowered code does not interfere with any higher C# language feature that we could potentially write on our own. So while it’s not valid for us to name a class <>c
, this is perfectly fine for the compiler. So don’t mix the angle brackets up with the concept of generics - it’s really just names.
This time, let’s start bottom up:
- The class
<>c__DisplayClass4_0
has a public instance fieldmultiplier
(line 29) which is used within the method<LinqWithMultiplierAsClosure>b__0
(line 33), i.e. every instance of<>c__DisplayClass4_0
could have its ownmultiplier
. The method takes anint
as input (one array element) and returns the calculation result (long
). - The class
<>c
has a public delegate field<>9__5_0
(line 18) which takes anint
as input (one array element), and returns along
(calculation result). Note that this is not an instance member but a static field, i.e. while there can be n instances of class<>c
, there is only one instance of the member<>c.<>9__5_0
. - The class
<>c
also contains the method<LinqStaticWithMultiplierAsConstant>b__5_0
(line 20). The method takes anint
as input (one array element) and returns the calculation result (long
).
Now comes the interesting part where the compiler uses the two generated classes <>c
and <>c__DisplayClass4_0
within the class Benchmarks
:
- In
LinqStaticWithMultiplierAsConstant
, the compiler passes the public delegate field<>9__5_0
from class<>c
to LINQ’sSum
method (line 12). If itnull
, anew Func<int, long>
gets instantiated, pointing to the already existing member<LinqStaticWithMultiplierAsConstant>b__5_0
from class<>c
. Due to the null-coalescing operator??
, only one instance will be created and reused. - In
LinqWithMultiplierAsClosure
, the compiler creates a new instance of the<>c__DisplayClass4_0
class and assigns the captured multiplier value to the fieldmultiplier
(lines 5 and 6). This is the actual closure capture. Now this instance gets passed to LINQ’sSum
method together with the instance method<LinqWithMultiplierAsClosure>b__0
of class<>c__DisplayClass4_0
, encapsulated in anew Func<int, long>
(line 7).
If you’re literally a nitpicker, you will have noted the following:
LinqWithMultiplierAsClosure
→ creates two instances- 1x
<>c__DisplayClass4_0
- 1x
Func<int, long>
- 1x
LinqStaticWithMultiplierAsConstant
→ creates only one instance- 1x
Func<int, long>
- 1x
And this is where the difference of 88 bytes (120 bytes - 32 bytes resp. 132 - 44 bytes) comes from.
Non-static with vs. without closure
The results also show that the same difference as before applies to LinqWithMultiplierAsConstant
, too, i.e. the compiler is smart enough to detect that the lambda contains no closure and thereby avoids the corresponding allocation. For the sake of brevity, it will skip the corresponding lowered code here, as it looks exactly the same as LinqStaticWithMultiplierAsConstant
before.
Static vs. static
For the sake of completeness, here’s the lowered C# code for LinqWithMultiplierAsClosure
vs. LinqStaticWithMultiplierAsConstant
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public class Benchmarks
{
private static readonly long StaticMultiplier;
public long LinqStaticWithMultiplierAsConstant()
{
return ((IEnumerable<int>) this._items).Sum<int>(Benchmarks.<>c.<>9__5_0 ?? (Benchmarks.<>c.<>9__5_0 = new Func<int, long>((object) Benchmarks.<>c.<>9, __methodptr(<LinqStaticWithMultiplierAsConstant>b__5_0))));
}
public long LinqStaticWithMultiplierAsStaticField()
{
return ((IEnumerable<int>) this._items).Sum<int>(Benchmarks.<>c.<>9__6_0 ?? (Benchmarks.<>c.<>9__6_0 = new Func<int, long>((object) Benchmarks.<>c.<>9, __methodptr(<LinqStaticWithMultiplierAsStaticField>b__6_0))));
}
[CompilerGenerated]
private sealed class <>c
{
public static Func<int, long> <>9__5_0;
public static Func<int, long> <>9__6_0;
internal long <LinqStaticWithMultiplierAsConstant>b__5_0(int item)
{
return (long) item * 2L;
}
internal long <LinqStaticWithMultiplierAsStaticField>b__6_0(int item)
{
return (long) item * Benchmarks.StaticMultiplier;
}
}
}
Due to what we’ve learned before, it becomes now quickly obvious why there is no benefit in terms of allocations. The compiler-generated class <>c
now contains two public delegate fields (<>9__5_0
and <>9__6_0
, lines 18 and 19), and two methods for the calculation (<LinqStaticWithMultiplierAsConstant>b__5_0
and <LinqStaticWithMultiplierAsStaticField>b__6_0
, lines 21 to 29). But when using it inside the methods LinqStaticWithMultiplierAsConstant
and LinqStaticWithMultiplierAsStaticField
in class Benchmarks
, there is no difference: we will always end up with one new Func<int, long>
(lines 7 and 12), pointing to the corresponding calculation method. Therefore, we see no further performance gain.
And now what?
You might be wondering and asking yourself the question why the heck does this guy care about 88 bytes? And that’s a great question!
So let’s slightly modify our Benchmarks
class:
public class Benchmarks
{
private int[] _items = [];
[Params(100, 1_000)]
public int NumberOfItems;
[GlobalSetup]
public void GlobalSetup() => _items = Enumerable.Range(0, NumberOfItems).ToArray();
[Benchmark(Baseline = true)]
public long LinqWithMultiplierAsClosure()
{
long sum = 0;
for (int i = 0; i < _items.Length; i++)
{
sum += _items.Select(item => item * i).Sum();
}
return sum;
}
[Benchmark]
public long LinqStaticWithMultiplierFromPassedState()
{
long sum = 0;
for (int i = 0; i < _items.Length; i++)
{
sum += _items.Select(static (item, index) => item * index).Sum();
}
return sum;
}
}
Here are the benchmark results:
Method | NumberOfItems | Allocated | Alloc Ratio |
---|---|---|---|
LinqWithMultiplierAsClosure |
100 | 10.96 KB | 1.00 |
LinqStaticWithMultiplierFromPassedState |
100 | 10.16 KB | 0.93 |
LinqWithMultiplierAsClosure |
1.000 | 109.4 KB | 1.00 |
LinqStaticWithMultiplierFromPassedState |
1.000 | 101.56 KB | 0.93 |
Now this little change begins to sum up: for 100 iterations, using the closure costs ≈800 bytes, but for 1.000 iterations already ≈8.000 bytes. So let’s check the lowered C# code again:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
public class Benchmarks
{
public long LinqWithMultiplierAsClosure()
{
long sum = 0;
Benchmarks2.<>c__DisplayClass3_0 cDisplayClass30 = new Benchmarks2.<>c__DisplayClass3_0();
for (cDisplayClass30.i = 0; cDisplayClass30.i < this._items.Length; cDisplayClass30.i++)
sum += (long) ((IEnumerable<int>) this._items).Select<int, int>(new Func<int, int>((object) cDisplayClass30, __methodptr(<LinqWithMultiplierAsClosure>b__0))).Sum();
return sum;
}
public long LinqStaticWithMultiplierFromPassedState()
{
long sum = 0;
for (int i = 0; i < this._items.Length; ++i)
sum += (long) ((IEnumerable<int>) this._items).Select<int, int>(Benchmarks2.<>c.<>9__4_0 ?? (Benchmarks2.<>c.<>9__4_0 = new Func<int, int, int>((object) Benchmarks2.<>c.<>9, __methodptr(<LinqStaticWithMultiplierFromPassedState>b__4_0)))).Sum();
return sum;
}
[CompilerGenerated]
private sealed class <>c
{
public static readonly Benchmarks2.<>c <>9;
public static Func<int, int, int> <>9__4_0;
internal int <LinqStaticWithMultiplierFromPassedState>b__4_0(int item, int index)
{
return item * index;
}
}
[CompilerGenerated]
private sealed class <>c__DisplayClass3_0
{
public int i;
internal int <LinqWithMultiplierAsClosure>b__0(int item)
{
return item * this.i;
}
}
}
Due to the changing captured closure, the compiler will have to capture 1.000 different multipliers within Benchmarks.LinqWithMultiplierAsClosure
, i.e. 1.000 different instances of Func<int, int>
(line 8). But for LinqStaticWithMultiplierFromPassedState
, the passed lambda has the shape Func<int, int, int>
(line 16), i.e. it accepts one parameter more - which is the index used as a multiplier, passed as state via the second parameter of .Select(static (item, index) => ...)
.
Imagine we’re in a hot path of an app, so you hopefully see my point by now: since it’s so easy to capture a closure, one can easily end up with a lot of memory traffic, causing allocations (memory) and garbage collection (CPU).
Conclusions
Now that we’ve analyzed the implications of using lambdas in terms of memory, let’s focus on some conclusions.
Awareness of implications
With the rise of the Language INntegrated Query (LINQ), lambdas are almost everywhere in modern C# code. For example when working with EF Core:
var address = await GetAddressFromSomewhereElseAsync();
var personsAtThisAddress = dbContext.Persons.Where(person => person.StreetId == address.StreetId && person.HouseId == address.HouseId);
If you want to work with EF Core, you cannot avoid lambdas. But in general, that’s not a problem - it all depends 😉 it’s all about trading benefits with costs - if a tool like EF Core (which comes with the implication of making heavy use of lambdas) provides a high value and the corresponding code is not a super hot path, you’re good to go. But you should be aware that there is a certain cost so that you can actually weigh it against the benefits.
Furthermore, let me also repeat the fact that just making a lambda statement static
does not save you a single bit. It’s more about the code’s intent: by using a static
lambda, you can express that you don’t want this portion to capture any closures - it’s a statement for the next reader of this code (maybe you). Furthermore, it protects you from accidentally capturing a closure in the midst of a bigger refactoring where such subtle changes can happen quickly. If a lambda is static
, the compiler will throw an error that this is forbidden.
So should you avoid using lambdas at all? Not at all.
Should you clutter all your lambdas with static
, e.g. .OrderBy(static person => person.Age)
? Definitely not, as we’ve seen in the non-static with vs. without closure scenario.
But you should be aware of the implications and might want to consider marking certain complex lambdas as static
:
public async Task<List<Building>> GetAllHeavyAndHighBuildingsFromTheFirstCenturyAsync()
{
return await _dbContext
.Buildings
.Where(static building =>
building.BuiltInYear >= 0
&& building.DestroyedInYear < 100
&& building.MassInElephants > 10_000
&& building.HeightInElephants > 30)
.OrderBy(building => building.Key)
.ToListAsync();
}
Look for other APIs
Although I didn’t explicitly mention it, but we already saw another alternative by using APIs avoiding closure allocations by passing additional state. Using .Select(static (item, index) => ...)
is one such example, where capturing the index from the outer scope can be avoided by using another overload of Select
. So if you’re lucky, you might find other APIs allowing you to pass additional state.
Here’s another example from some EF Core code:
public async Task<int> RunInTransactionWithClosureAsync(CustomDbContext context, CancellationToken cancellationToken = default)
{
IExecutionStrategy strategy = context.Database.CreateExecutionStrategy();
return await strategy.ExecuteAsync(async () => await InnerRunInTransactionAsync(context, cancellationToken));
}
public async Task<int> RunInTransactionWithLessClosureAsync(CustomDbContext context, CancellationToken cancellationToken = default)
{
IExecutionStrategy strategy = context.Database.CreateExecutionStrategy();
return await strategy.ExecuteAsync(
context,
async (innerContext, innerToken) => await InnerRunInTransactionAsync(innerContext, innerToken),
cancellationToken);
}
As you can see, IExecutionStrategy.ExecuteAsync
provides an additional overload which accepts some state (the CustomDbContext
in this case) and an external CancellationToken
. By using this specific overload, the closure captures of context
and cancellationToken
can be mitigated. However, there will still be a closure allocation of Func<Task<int>>
for the method InnerRunInTransactionAsync
.
Let your IDE help you
Modern IDEs come with a ton of features. But from my experience, only a small fraction of developers use them - so get to know your toolbelt! 🤓
For example, in JetBrains Rider/ReSharper, there is the inspection Lambda expression/anonymous method can be made static. It is disabled by default, and in my environments, I’ve configured this as a hint so that I get informed when there is an opportunity for making a lambda static
. I started with having it as a warning, but that was too verbose for me, as it is also raised for simple statements like .OrderBy(person => person.Age)
.
And again for JetBrains lovers, there is the fantastic HeapAllocationViewer for Rider and ReSharper which statically checks your code and (among other things) informs you about closure allocations with a nice little hint, too.
Summary
In this post, we’ve seen how capturing closures with anonymous functions/lambdas can lead to memory allocations. Furthermore, we analyzed the compiler-generated low-level C# code to see what’s going on under the hood. In closing, we took a look at when to use static
lambdas and how JetBrains tools can support you with that.
Thx for reading and take care!
