Introduction
In the beginning of June I attended .NET Summit conference in Minsk, Belarus. Besides interesting presentations (recording are available on YouTube, language is Russian / English) there were interesting quizzes.
One of the questions was:
What is the difference between
Software EngineerToArray
andToList
?
Hmm… what could be the difference besides one returns T[]
and another one returns List<T>
?
If you don’t know the answer or just like digging into sources of dotnet/corefx then I invite you to join me and find out the answer!
Note
All code snippets were taken from release/3.0 branch of dotnet/corefx repository. All code snippets were truncated to make them more representative (removed input argument checks, contracts, …, etc.).
Tests were targeting .NET Core 3.0 preview 7. However most of the information should be relevant for .NET Core 2.0+ and .NET Framework 4.5+.
Benchmark
Every journey has it’s first step. I usually start with a simple benchmark.
Here is one I’ve created using BenchmarkDotNet:
[CoreJob] [RPlotExporter, MemoryDiagnoser, RankColumn] public class ToArrayToList { private IEnumerable<int> data; [Params(10, 100, 1000, 10000)] public int N; [GlobalSetup] public void Setup() { this.data = new int[this.N]; } [Benchmark] public int[] ToArray() => this.data.ToArray(); [Benchmark] public List<int> ToList() => this.data.ToList(); } public class Program { public static void Main(string[] args) { var summary = BenchmarkRunner.Run<ToArrayToList>(); } }
On my machine this benchmark produces the following results:
Method | N | Mean | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|
ToArray | 10 | 49.527 ns | 0.0153 | – | 64 B |
ToList | 10 | 45.850 ns | 0.0229 | – | 96 B |
ToArray | 100 | 82.490 ns | 0.1013 | – | 424 B |
ToList | 100 | 78.811 ns | 0.1090 | – | 456 B |
ToArray | 1000 | 324.100 ns | 0.9613 | – | 4024 B |
ToList | 1000 | 321.039 ns | 0.9689 | – | 4056 B |
ToArray | 10000 | 3,223.046 ns | 9.5215 | 1.1902 | 40024 B |
ToList | 10000 | 3,278.019 ns | 9.5215 | 1.9035 | 40056 B |
At the first glance there is no significant difference between two methods. However in such cases it is important to be absolutely sure and the only way to make sure is to examine the source code.
Examining ToArray and ToList methods
All required source code is on GitHub so it is easy to find implementations of ToArray
and ToList
methods inside dotnet/corefx repository:
public static partial class Enumerable { // ... public static TSource[] ToArray<TSource>( this IEnumerable<TSource> source) { return source is IIListProvider<TSource> arrayProvider ? arrayProvider.ToArray() : EnumerableHelpers.ToArray(source); } // ... public static List<TSource> ToList<TSource>( this IEnumerable<TSource> source) { return source is IIListProvider<TSource> listProvider ? listProvider.ToList() : new List<TSource>(source); } // ... }
* the code snippets for Enumerable.ToArray and Enumerable.ToList were obtained from dotnet/corefx repository
Both methods start from an attempt to convert input source
to IIListProvider<TSource>
(lines: 7, 15) and since in benchmark input sequence is initialized as int[]
array:
// ... [GlobalSetup] public void Setup() { this.data = new int[this.N]; } // ...
… and in .NET all arrays are descendants of System.Array
class which doesn’t implement IIListProvider<T>
interface – we can be sure that execution proceeds right to the else
branch.
Stepping into Enumerable.ToArray “else” branch
The else
branch of Enumerable.ToArray
continues with a call to EnumerableHelpers.ToArray
method:
internal static partial class EnumerableHelpers { // ... internal static T[] ToArray<T>( IEnumerable<T> source) { if (source is ICollection<T> collection) { int count = collection.Count; if (count == 0) { return Array.Empty<T>(); } var result = new T[count]; collection.CopyTo(result, arrayIndex: 0); return result; } var builder = new LargeArrayBuilder<T>(initialize: true); builder.AddRange(source); return builder.ToArray(); } // ... }
* the code snippet for EnumerableHelpers.ToArray was obtained from dotnet/corefx repository
Method starts with an attempt to convert input source
to ICollection<T>
. Successful conversion results into “happy path” where it allocates new array and copies all source
(aliased as collection
) items into it.
Stepping into Enumerable.ToList “else” branch
The else
branch of Enumerable.ToList
continues with an instantiation of a new List<T>
object with source
passed as constructor argument:
internal static partial class EnumerableHelpers { // ... public List(IEnumerable<T> collection) { if (collection is ICollection<T> c) { int count = c.Count; if (count == 0) { _items = s_emptyArray; } else { _items = new T[count]; c.CopyTo(_items, 0); _size = count; } } else { _size = 0; _items = s_emptyArray; using (IEnumerator<T> en = collection!.GetEnumerator()) { while (en.MoveNext()) { Add(en.Current); } } } } // ... }
* the code snippet for List<T> was obtained from dotnet/corefx repository
Method starts with the attempt to convert input source
to ICollection<T>
. Successful conversion results into “happy path” where it allocates new array and copies all source
(aliased as collection
) items into it.
Unexpected discovery
You might already noticed the reason why benchmark results are so close 🙂 If not let’s take another look at the benchmark setup:
// ... [GlobalSetup] public void Setup() { this.data = new int[this.N]; } // ...
Here we initialize sequence with int[]
array. As was mentioned previously all arrays are descendants of System.Array
class which does implement ICollection<T>
interface.
So both ToArray
and ToList
methods basically do the same thing:
// ... var array = new T[collection.Count]; collection.CopyTo(array, 0); // ...
While this isn’t quite what one can expect from benchmark it is always important to know the edge cases.
Benchmark #2
Let’s modify the benchmark to use Enumerable.Range
method:
// ... [GlobalSetup] public void Setup() { this.data = Enumerable.Range(0, this.N); } // ...
Executing this new benchmark on my machine produces the following results:
Method | N | Mean | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|
ToArray | 10 | 15.56 ns | 0.0153 | – | 64 B |
ToList | 10 | 31.97 ns | 0.0229 | – | 96 B |
ToArray | 100 | 87.93 ns | 0.1013 | – | 424 B |
ToList | 100 | 176.19 ns | 0.1090 | – | 456 B |
ToArray | 1000 | 771.31 ns | 0.9613 | – | 4024 B |
ToList | 1000 | 1,568.32 ns | 0.9689 | – | 4056 B |
ToArray | 10000 | 7,296.05 ns | 9.5215 | 0.0076 | 40024 B |
ToList | 10000 | 15,276.89 ns | 9.5215 | 0.0153 | 40056 B |
Wooow! These results looks amazing! Do you feel this? The feeling of uncovered conspiracy theory? 🙂
Before starting the celebration… Have you noticed one very strange thing in these results?
Here is a union of two tables where (#1 are the results of the first benchmark and #2 are the results of current benchmark):
Method | N | (#1) Mean | (#2) Mean | (#1) Allocated | (#2) Allocated |
---|---|---|---|---|---|
ToArray | 10 | 49.527 ns | 15.56 ns | 64 B | 64 B |
ToList | 10 | 45.850 ns | 31.97 ns | 96 B | 96 B |
ToArray | 100 | 82.490 ns | 87.93 ns | 424 B | 424 B |
ToList | 100 | 78.811 ns | 176.19 ns | 456 B | 456 B |
ToArray | 1000 | 324.100 ns | 771.31 ns | 4024 B | 4024 B |
ToList | 1000 | 321.039 ns | 1,568.32 ns | 4056 B | 4056 B |
ToArray | 10000 | 3,223.046 ns | 7,296.05 ns | 40024 B | 40024 B |
ToList | 10000 | 3,278.019 ns | 15,276.89 ns | 40056 B | 40056 B |
Can you see it? Both benchmarks allocate absolutely the same amount of memory. This is suspicious because in “benchmark #1” we were allocating fixed size array for the entire input sequence which is a by the way a bare minimum of memory we could use to store the sequence, so any extra allocation (which definitely should happen in “benchmark #2”) should make a difference.
Unexpected discovery #2
To better understand what is happening let’s see what is exactly returned by Enumerable.Range
method:
public static partial class Enumerable { // ... public static IEnumerable<int> Range( int start, int count) { long max = ((long)start) + count - 1; if (count == 0) { return Empty<int>(); } return new RangeIterator(start, count); } // ... }
* the code snippet for Enumerable.Range was obtained from dotnet/corefx repository
It returns a new instance of RangeIterator
class which implements internal IPartition<T>
interface:
private sealed partial class RangeIterator : IPartition<int> { // ... }
* the code snippet for RangeIterator was obtained from dotnet/corefx repository
Which in turn extends IIListProvider<TElement>
interface:
internal interface IPartition<TElement> : IIListProvider<TElement> { // ... }
* the code snippet for IPartition<T> was obtained from dotnet/corefx repository
So basically the benchmark executes ToArray
and ToList
methods from RangeIterator
class because now source
is convertible to IIListProvider<TElement>
:
public static partial class Enumerable { // ... public static TSource[] ToArray<TSource>( this IEnumerable<TSource> source) { // ... return source is IIListProvider<TSource> arrayProvider ? arrayProvider.ToArray() : EnumerableHelpers.ToArray(source); } // ... public static List<TSource> ToList<TSource>( this IEnumerable<TSource> source) { // ... return source is IIListProvider<TSource> listProvider ? listProvider.ToList() : new List<TSource>(source); } // ... }
* the code snippets for Enumerable.ToArray and Enumerable.ToList were obtained from dotnet/corefx repository
Implementation of RangeIterator
clearly reveals the reason for such small memory footprint – both methods pre-allocate int[]
and List<int>
:
private sealed partial class RangeIterator : IPartition<int> { // ... public int[] ToArray() { int[] array = new int[_end - _start]; int cur = _start; for (int i = 0; i != array.Length; ++i) { array[i] = cur; ++cur; } return array; } // ... public List<int> ToList() { List<int> list = new List<int>(_end - _start); for (int cur = _start; cur != _end; cur++) { list.Add(cur); } return list; } // ... }
* the code snippet for RangeIterator was obtained from dotnet/corefx repository
This benchmark demonstrates one very important moment – you never know where optimization is.
Benchmark #3
Let’s modify the benchmark to use compiler generated iterator:
// ... [GlobalSetup] public void Setup() { this.data = this.GetEnumerable(); } private IEnumerable<int> GetEnumerable() { for (var i = 0; i < this.N; ++i) { yield return i; } } // ...
Executing this new benchmark on my machine produces the following results:
Method | N | Mean | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|
ToArray | 10 | 185.7 ns | 0.0610 | – | 256 B |
ToList | 10 | 142.7 ns | 0.0610 | – | 256 B |
ToArray | 100 | 804.8 ns | 0.2842 | – | 1192 B |
ToList | 100 | 761.6 ns | 0.2918 | – | 1224 B |
ToArray | 1000 | 5,890.1 ns | 2.0370 | – | 8536 B |
ToList | 1000 | 5,770.2 ns | 2.0218 | – | 8464 B |
ToArray | 10000 | 56,033.2 ns | 25.2686 | – | 106224 B |
ToList | 10000 | 59,923.0 ns | 31.1890 | 10.3760 | 131440 B |
These results looks much more realistic. However there is no absolute winner here but rather a sign to continue our journey (there is no obvious explanation of such a significant difference in Allocated column in last benchmark).
Because compiler generated iterator doesn’t implement neither ICollection<T>
nor IListProvider<T>
interfaces we now can be sure these results are produced by this piece of EnumerableHelpers.ToArray
method:
internal static partial class EnumerableHelpers { // ... internal static T[] ToArray<T>( IEnumerable<T> source) { // ... var builder = new LargeArrayBuilder<T>(initialize: true); builder.AddRange(source); return builder.ToArray(); } // ... }
* the code snippet for EnumerableHelpers.ToArray was obtained from dotnet/corefx repository
… and this piece of List<T>
constructor:
public List(IEnumerable<T> collection) { // ... if (collection is ICollection<T> c) { // ... } else { _size = 0; _items = s_emptyArray; using (IEnumerator<T> en = collection!.GetEnumerator()) { while (en.MoveNext()) { Add(en.Current); } } } }
* the code snippet for List<T> was obtained from dotnet/corefx repository
To better understand results of last benchmark we need to find out how exactly LargeArrayBuilder<T>
and List<T>
allocate memory.
How LargeArrayBuilder<T> allocate?
LargeArrayBuilder<T>
is an internal struct used to create dynamically sized arrays. On the high-level what it does is it allocates multiple fixed sized arrays to store input items and then combines them into single array. This high-level overview is mostly enough to understand the big picture but in our case details are important.
The EnumerableHelpers.ToArray
method invokes the following LargeArrayBuilder<T>
methods:
LargeArrayBuilder(initialize: true)
LargeArrayBuilder<T>.AddRange(source)
LargeArrayBuilder<T>.ToArray()
Let’s see what each of these invocations does.
LargeArrayBuilder(initialize: true)
All instances of LargeArrayBuilder<T>
structure are initialized with some maximum capacity value. This value limits the amount of memory instance can consume. In our case new instance is initialized with int.MaxValue
value:
internal struct LargeArrayBuilder<T> { // ... private const int StartingCapacity = 4; private const int ResizeLimit = 8; // ... private readonly int _maxCapacity; private T[] _first; private ArrayBuilder<T[]> _buffers; private T[] _current; private int _index; private int _count; // ... public LargeArrayBuilder(bool initialize) : this(maxCapacity: int.MaxValue) { } public LargeArrayBuilder(int maxCapacity) : this() { _first = _current = Array.Empty<T>(); _maxCapacity = maxCapacity; } // ... }
* the code snippet for LargeArrayBuilder<T> was obtained from dotnet/corefx repository
Beside maximum capacity LargeArrayBuider<T>
struct has a bunch of private fields and constants.
Let’s see what each of them does:
Field | Description |
---|---|
ResizeLimit | Integer constant, has value of 8. |
StartingCapacity | Integer constant, has value of 4. |
_maxCapacity | The maximum capacity builder can have. |
_first | The first buffer items are stored in. Resized until ResizeLimit . |
_buffers | After ResizeLimit * 2 , previously filled out buffers are stored here. |
_current | Currently filling buffer. If _count <= ResizeLimit , this is _first . |
_index | Index into the current buffer. |
_count | Count of all of the items in this builder. |
LargeArrayBuilder<T>.AddRange(source)
The next stop is LargeArrayBuilder<T>.AddRange
method. This method iterates over input sequence and stores items in current buffer (aliased as destination
(line: 8)):
internal struct LargeArrayBuilder<T> { // ... public void AddRange(IEnumerable<T> items) { using (IEnumerator<T> enumerator = items.GetEnumerator()) { T[] destination = _current; int index = _index; while (enumerator.MoveNext()) { T item = enumerator.Current; if ((uint)index >= (uint)destination.Length) { AddWithBufferAllocation(item, ref destination, ref index); } else { destination[index] = item; } index++; } _count += index - _index; _index = index; } } // ... }
* the code snippet for LargeArrayBuilder<T>.AddRange was obtained from dotnet/corefx repository
When current buffer is filled out (line: 14) control is passed to LargeArrayBuilder<T>.AddWithBufferAllocation
method (line: 16):
internal struct LargeArrayBuilder<T> { // ... private void AddWithBufferAllocation( T item, ref T[] destination, ref int index) { _count += index - _index; _index = index; AllocateBuffer(); // ... } // ... }
* the code snippet for LargeArrayBuilder<T>.AddWithBufferAllocation was obtained from dotnet/corefx repository
This method increases count of items stored in builder by number of items filled in the current buffer (line: 9) and calls LargeArrayBuilder<T>.AllocateBuffer
method (line: 11):
internal struct LargeArrayBuilder<T> { // ... private void AllocateBuffer() { if ((uint)_count < (uint)ResizeLimit) { int nextCapacity = Math.Min( _count == 0 ? StartingCapacity : _count * 2, _maxCapacity); _current = new T[nextCapacity]; Array.Copy(_first, 0, _current, 0, _count); _first = _current; } else { int nextCapacity; if (_count == ResizeLimit) { nextCapacity = ResizeLimit; } else { _buffers.Add(_current); nextCapacity = Math.Min(_count, _maxCapacity - _count); } _current = new T[nextCapacity]; _index = 0; } } // ... }
* the code snippet for LargeArrayBuilder<T>.AllocateBuffer was obtained from dotnet/corefx repository
The LargeArrayBuilder<T>.AllocateBuffer
method is responsible for allocating a new buffer. However, what exactly this method does depends on the current count of items stored in builder:
- When adding 1st item (
_count == 0
) it allocates buffer ofStartingCapacity
size (which is 4) and stores it in_first
and_current
fields. This buffer is used to store first 4 items (lines: 9-15). - When adding 5th item (
_count == 4
) it allocates buffer ofStartingCapacity * 2
(which is 8) size and stores it in_first
and_current
fields. The items from previous buffer are copied into new buffer. This buffer is used to store first 8 items (lines: 9-15). - When adding 9th item (
_count == 8
) it allocates buffer ofResizeLimit
size (which is 8) and stores it in_current
field while leaving previous buffer in the_first
field. This new buffer is used to store next 8 items (lines: 22, 29). - When adding more items (
_count > 8
) they are stored in_current
buffer. When_current
buffer becomes full it is pushed into_buffers
list and new buffer of_count
size is allocated. This buffer is stored in_current
field and used to store next_count
items (lines: 26-27, 29).
On return LargeArrayBuilder<T>.AddWithBufferAllocation
aliases current buffer (line: 11) and current index (line: 12) as destination
and index
and stores new item in current buffer (line: 13):
internal struct LargeArrayBuilder<T> { // ... private void AddWithBufferAllocation( T item, ref T[] destination, ref int index) { // ... AllocateBuffer(); destination = _current; index = _index; _current[index] = item; } // ... }
* the code snippet for LargeArrayBuilder<T>.AddWithBufferAllocation was obtained from dotnet/corefx repository
This way execution continues until the end of input sequence.
Looks a bit complicated? Let’s take a small example to see how this works for a sequence of 20 items:
Item # | Trace | _count | _first | _current | _buffers |
---|---|---|---|---|---|
1 | 4 items buffer allocated (new T1[4]) 1 item stored in T1[0] _ 3 items to store | 0 | T1[4] | T1[4] | (empty) |
2-4 | Storing items in T1[1]-T1[3] | 1-4 | – | – | – |
5 | 8 items buffer allocated (new T2[8]) 4 items copied from T1[0-3] to T2[0-3] 1 item stored in T2[4] _ 3 items to store | 4 | T2[8] | T2[8] | (empty) |
6-8 | Storing items in T2[5]-T2[7] | 5-8 | – | – | – |
9 | 8 items buffer allocated (new T3[8]) 1 item stored in T3[0] _ 7 items to store | 8 | T2[8] | T3[8] | (empty) |
10-16 | Storing items in T3[1]-T3[7] | 9-16 | – | – | – |
17 | 16 items buffer allocated (new T4[16]) 1 buffer stored (T3) 1 item stored in T4[0] _ 15 items to store | 16 | T2[8] | T4[16] | (T3[8]) |
18-20 | Storing items in T4[1]-T4[3] | 17-20 | – | – | – |
20 | 12 items to store | 20 | T2[8] | T4[16] | (T3[8]) |
Total | 4 buffers allocated 1 buffers stored 36 item allocated |
As you can see storing sequence of 20 items requires LargeArrayBuilder<T>
to allocate memory for 36 items (almost twice as more!). At first glance this may look quite inefficient, however before blaming LargeArrayBuilder<T>
in inefficiency you should note that it will allocate the same 36 items for any input sequence of 17 to 32 items inclusively.
Now the questions is – How many items it will allocate for sequences of 100, 1000 or 10000 items?
We obviously can’t draw a table for such large sequences (we can but this isn’t a productive way of doing things) but we can calculate this as sum of capacities of all allocated buffers.
We already know how LargeArrayBuilder<T>
allocates new buffers, so we can express last current buffer capacity (hereinafter C) as function of input sequence length (hereinafter S):
Third case represents capacity as n-th term of geometric progression where 8 is a scale factory and 2 is a common ratio. Such representation is possible because when _count
exceeds ResizeLimit
value (which is 8) new buffer’s capacity is chosen to be equal to _count
value which leads to a doubling of the capacity of each subsequent buffer allocation. This allows us to represent all possible capacity values as geometric progression.
Allocating same amount of items as we have already stored leads to another interesting effect – capacity of the last current buffer (hereinafter ) should be enough to store at least half of the input sequence:
Replacing with a formula for n-th term:
Now we can express n – a position term representing capacity of last current buffer which also equals to count of buffers allocated after _count
exceeds ResizeLimit
:
Here is a step by step explanation (feel free to skip is solutions is obvious to you):
Equation
Solution
Divide both sides by 8:
Then take of both sides:
Now use the property of logarithms (power rule) to simplify the left side:
Because , remove it:
Increment both sides by 1:
And change logarithm base from 2 to e:
Now we need to address a very important moment – because allocated buffer store multiple items the above formula should return the same n for a range of sequence lengths (i.e. it should return the same result for sequence of 9 to 16 items).
Such behavior can be achieved by ceiling rational parts or the equation:
Using n we now can express total count of buffers allocated, total count of items allocated and total count of buffers stored values:
Total count of buffers allocated
The case for
Author’s noteis how many buffers was allocated after 8th item plus number of previously allocated buffers.
Total count of items allocated
The case for
Author’s noteis a sum of geometric progression plus count of previously allocated items.
Total count of buffers stored
The last case for
Author’s noteis how many buffers was allocated after 8th item minus currently active buffer.
Here are the calculations for benchmark sequences:
Sequence length | Total count of buffers allocated | Total count of items allocated | Total count of buffers stored |
---|---|---|---|
10 | 3 | 20 | 0 |
100 | 6 | 132 | 3 |
1000 | 9 | 1028 | 6 |
10000 | 13 | 16388 | 10 |
LargeArrayBuilder<T>.ToArray()
The LargeArrayBuilder<T>.ToArray
method is responsible for allocating final array (line: 7) and coping buffered items to it (line: 8). The copy part is delegated to LargeArrayBuilder<T>.CopyTo
method:
internal struct LargeArrayBuilder<T> { // ... public T[] ToArray() { // ... array = new T[_count]; CopyTo(array, 0, _count); return array; } // ... }
* the code snippet for LargeArrayBuilder<T>.ToArray and is obtained from dotnet/corefx repository
LargeArrayBuilder<T>.CopyTo
method is very straightforward. It iterates over all allocated buffers and copies items to input array. Correct buffer is selected using LargeArrayBuilder<T>.GetBuffer
method:
internal struct LargeArrayBuilder<T> { // ... public void CopyTo(T[] array, int arrayIndex, int count) { for (int i = 0; count > 0; i++) { T[] buffer = GetBuffer(index: i); int toCopy = Math.Min(count, buffer.Length); Array.Copy(buffer, 0, array, arrayIndex, toCopy); count -= toCopy; arrayIndex += toCopy; } } // ... public T[] GetBuffer(int index) { return index == 0 ? _first : index <= _buffers.Count ? _buffers[index - 1] : _current; } // ... }
* the code snippets for LargeArrayBuilder<T>.CopyTo and LargeArrayBuilder<T>.GetBuffer were obtained from dotnet/corefx repository
Summing up
Knowledge of how LargeArrayBuilder<T>
allocates memory allows us to estimate how much memory will be required to build an array depending on input sequence length:
Memory required to hold a single array instance is calculated as memory required to hold System.Array
object plus memory required to hold all array elements:
Hence, the amount of memory required to hold all buffers can be expressed through Total count of buffers allocated and Total count of items allocated values as:
… and amount of memory required to hold final array:
So estimated amount of memory required to build an array can be expressed as:
Considering the following values on 64 bit systems (in bytes):
The above formula can be simplified to:
Beside memory allocated by instance of LargeArrayBuilder<T>
benchmark results also include memory allocated by compiler generated enumerator (40 bytes on x64 system) and memory allocated by ArrayBuilder<T>
structure (_buffers
field). Because we don’t know how ArrayBuilder<T>
allocates we can treat it’s allocations as estimation error, however because we know amount of memory consumed by compiler generated enumerator we should subtract its size from the benchmark results:
Sequence length | Total count of buffers allocated | Total count of items allocated | Total bytes estimated | Total bytes allocated (iterator adjusted) | Estimation error % |
---|---|---|---|---|---|
10 | 3 | 20 | 216 | 216 | 0 |
100 | 6 | 132 | 1096 | 1152 | 0.0486 |
1000 | 9 | 1028 | 8352 | 8496 | 0.0169 |
10000 | 13 | 16388 | 105888 | 106184 | 0.0027 |
How List<T> allocates?
When new instance of List<T>
is created from the IEnumerable<T>
(which isn’t ICollection<T>
) all items from input sequence are sequentially added to empty List<T>
instance (lines: 16-21):
public class List<T> { // ... public List(IEnumerable<T> collection) { // ... if (collection is ICollection<T> c) { // ... } else { _size = 0; _items = s_emptyArray; using (IEnumerator<T> en = collection!.GetEnumerator()) { while (en.MoveNext()) { Add(en.Current); } } } } // ... }
* the code snippet for List<T> was obtained from dotnet/corefx repository
Items are added using List<T>.Add
method:
public class List<T> { // ... public void Add(T item) { _version++; T[] array = _items; int size = _size; if ((uint)size < (uint)array.Length) { _size = size + 1; array[size] = item; } else { AddWithResize(item); } } // ... }
* the code snippet for List<T>.Add was obtained from dotnet/corefx repository
This method checks if there is enough capacity in the internal array to store new item (line: 9) and if not it switches execution to List<T>.AddWithResize
method (line: 16):
public class List<T> { // ... private void AddWithResize(T item) { int size = _size; EnsureCapacity(size + 1); _size = size + 1; _items[size] = item; } // ... private void EnsureCapacity(int min) { if (_items.Length < min) { int newCapacity = _items.Length == 0 ? DefaultCapacity : _items.Length * 2; if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength; if (newCapacity < min) newCapacity = min; Capacity = newCapacity; } } // ... }
* the code snippets for List<T>.AddWithResize and List<T>.EnsureCapacity were obtained from dotnet/corefx repository
List<T>.AddWithResize
works in pair with List<T>.EnsureCapacity
method: the latter ensure there will be enough space to store one more item by recalculating internal array capacity by either setting it to List<T>.DefaultCapacity
(which is 4 items) in case when current internal array is empty array (line: 17) or by doubling current capacity value (line: 18).
Internal array is reallocated inside of List<T>.Capacity property setter. Besides array allocation it also copies items from old array to the new one.
Author’s note
Looks quite simple. Right? I am sure it is, but still let’s take a small example and see how this works for a sequence of 20 items:
Item # | Trace | Capacity |
---|---|---|
1 | 4 items array allocated (T1[4]) 1 item stored in T1[0] – 3 items to store | 4 |
2-4 | Storing items in T1[1]-T1[3] | – |
5 | 8 items array allocated (T2[8]) 4 items copied from T1[0-3] to T2[0-3] 1 item stored in T2[4] – 3 items to store | 8 |
6-8 | Storing items in T2[5]-T2[7] | |
9 | 16 items array allocated (T3[16]) 8 items copied from T2[0-7] to T3[0-7] 1 item stored in T3[8] – 7 items to store | 16 |
10-16 | Storing items in T3[9]-T2[15] | – |
17 | 32 items array allocated (T4[32]) 16 items copied from T3[0-15] to T4[0-15] 1 item stored in T4[16] – 15 items to store | 32 |
18-20 | Storing items in T4[17]-T4[19] | – |
Total | 4 arrays allocated 60 items allocated |
So List<T>
allocates 60 items in order to store 20 items. It is much more than LargeArrayBuilder<T>
, which allocates 36 items for the same input. Before making any decisions let’s calculate how many internal arrays and items are allocated for sequences of 10, 100, 1000 and 10000 items.
This can be easily done because as you might already noticed – List<T>
capacity is always doubled… which means it’s value can be expressed through geometric progression where List<T>
‘s default capacity 4 is a scale factor and 2 is a common ratio:
The n term of the progressions is calculated as:
Because List<T>
‘s internal array should be able to store all items stored in previous internal array we can say that last internal array capacity (hereinafter ) should be at least enough to hold all sequence items:
Replacing with a formula for n-th term:
The n is then expressed as:
I won’t repeat the solution here. It is absolutely the same as we did for
Author’s noteLargeArrayBuilder<T>
.
Using n we now can express total count of array allocated and total count of items allocated values:
Total count of arrays allocated is equal to:
Total count of items allocated is equal to:
Here are the calculations for benchmark sequences:
Sequence length | Total count of arrays allocated | Total count of items allocated |
---|---|---|
10 | 3 | 28 |
100 | 6 | 252 |
1000 | 9 | 2044 |
10000 | 13 | 32764 |
We also can calculate exactly how much memory will be required to create List<T>
depending on the input sequence length:
We already know how to calculate memory required to hold a single array instance:
Hence, the amount of memory required to hold all arrays can be expressed through Total count of allocated arrays and Total count of allocated items values as:
Hence, exact amount of memory required is:
Considering the following values on 64 bit systems (in bytes):
The above formula can be simplified to:
You might remember (or not) that in the first What is wrong with this code? post we have counted the size of empty
List<T>
as 40 bytes.This was 100% correct for previous version of .NET Core. In .NET Core 3.0 most of the collection classes have their
Author’s note_syncRoot
field removed (here is a PR) which resulted in smaller collection sizes.
Beside memory allocated by instance of List<T>
benchmark results also include memory allocated by it’s enumerator (40 bytes on x64 system) which we should subtract from the benchmark results:
Sequence length | Total count of arrays allocated | Total count of items allocated | Total bytes allocated (exact) | Total bytes allocated (iterator adjusted) |
---|---|---|---|---|
10 | 3 | 28 | 216 | 216 |
100 | 6 | 252 | 1184 | 1184 |
1000 | 9 | 2044 | 8424 | 8424 |
10000 | 13 | 32764 | 131400 | 131400 |
Unexpected discovery #3
Now when understand how both LargeArrayBuilder<T>
and List<T>
allocate we can compare how many items they allocate for the same amount of input elements:
Sequence length | Total count of items allocated (List<T> ) | Total count of items allocated + count of items in final array (LargeArrayBuilder<T> ) |
---|---|---|
10 | 28 | 20 + 10 = 30 |
100 | 252 | 132 + 100 = 232 |
1000 | 2044 | 1028 + 1000 = 2028 |
10000 | 32764 | 16388 + 10000 = 26388 |
This data explains the difference we saw in Allocated column for last benchmark – in that case List<T>
allocated 6,376 items more than LargeArrayBuilder<T>
.
Conclusion
If you are reading these lines then first of all – Thank you for reading this huge post. I hope you enjoyed it.
Here is a small reward in form of short summary:
Result #1
In cases when source
is ICollection<T>
both ToArray
and ToList
methods demonstrate almost the same performance characteristics because internally they execute almost identical code.
Result #2
In cases when input sequence is a product of multiple iterators the performance of both ToArray
and ToList
methods would depend on the optimizations implemented by those iterators and in some cases can differ significantly.
Result #3
In cases when input sequence is an enumerable both ToArray
and ToList
methods demonstrate very close performance characteristics in terms of speed. However for large sequences ToArray
method could has smaller memory footprint. At the same time it is important to understand – because ToArray
causes allocation of multiple independent arrays it could cause memory fragmentation (and even object promotion) because GC wouldn’t be able to clean these arrays until the ToArray
operation is done. It is hard to say how critical these differences are but it is always good to know about them.
Bonus
Did you know that ArrayBuilder<T>
class used inside of LargeArrayBuilder<T>
class has the same allocation strategy as List<T>
? 🙂
One thought on “What is the difference between ToArray and ToList?”