Asynchronous Programming - Coroutine Deep Dive

Deep Dive

If the previous post was mostly about how to use coroutines, this post digs deep into what a coroutine really is.
If you don’t know coroutines yet, this isn’t the right post for you — I’d recommend reading the previous post and trying them out first.
IEnumerator, IEnumerable
When you use a coroutine, you declare a function with the return type IEnumerator and call it through StartCoroutine.
void Start()
{
StartCoroutine(TestCoroutine());
}
IEnumerator TestCoroutine()
{
yield return null;
}But IEnumerator wasn’t actually created for coroutines.
Its original purpose was as an interface for iterating over collections like List, Dictionary, and Array.
List<int> numbers = new List<int> { 1, 2, 3 };
foreach (int n in numbers)
{
Debug.Log(n);
}foreach iterates over the List and runs Debug.Log, but how is foreach able to iterate in the first place?
IEnumerable
For foreach to work, the collection must implement IEnumerable internally.
Let’s take a look inside the IEnumerable interface.
- Github Link : dotnet/runtime/IEnumerable.cs
public interface IEnumerable
{
IEnumerator GetEnumerator();
}If you inherit from it, you have to implement a function called GetEnumerator whose return type is IEnumerator.
Let’s check whether List actually inherits and implements it.
- Github Link : dotnet/runtime/List.cs
public Enumerator GetEnumerator() => new Enumerator(this);
IEnumerator<T> IEnumerable<T>.GetEnumerator() =>
Count == 0 ? SZGenericArrayEnumerator<T>.Empty :
GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<T>)this).GetEnumerator();We can confirm that List does inherit the interface and implement GetEnumerator.
Our goal was just to verify that List implements it, so there’s no need to understand the code itself.
That said, as the name GetEnumerator suggests, we can tell it has to return an Enumerator.
So let’s find out what an Enumerator actually is.
IEnumerator
Let’s look at the List.cs code again.
- Github Link : dotnet/runtime/List.cs
public struct Enumerator : IEnumerator<T>, IEnumerator
{
private readonly List<T> _list;
private readonly int _version;
private int _index;
private T? _current;
internal Enumerator(List<T> list)
{
_list = list;
_version = list._version;
}
public void Dispose()
{
}
public bool MoveNext()
{
.
.
return false;
}
public T Current => _current!;
object? IEnumerator.Current
{
get
{
.
.
return _current;
}
}
void IEnumerator.Reset()
{
.
.
_index = 0;
_current = default;
}
}As before, I removed parts of the original code since understanding how it works isn’t the goal.
The point to notice here is that Enumerator inherits from IEnumerator.
Aha — so we can understand the structure of List in the following order.
- For
foreachto iterate, the collection must implementIEnumerableinternally. IEnumerablemust implement a functionGetEnumeratorwhose return type isIEnumerator.GetEnumeratorreturns anEnumerator.Enumeratorinherits fromIEnumerator.
Finally, it’s IEnumerator’s turn. Let’s check the source code.
- Github Link : dotnet/runtime/IEnumerator.cs
public interface IEnumerator
{
bool MoveNext();
object Current { get; }
void Reset();
}The function names are remarkably intuitive. It feels like the puzzle is coming together.
So foreach was iterating using the functions of the collection’s IEnumerator all along.
foreach probably works in a way like this:
// foreach's Todo
IEnumerator enumerator = list.GetEnumerator();
while (enumerator.MoveNext())
{
var item = enumerator.Current;
}So far we’ve looked at what IEnumerator and IEnumerable really are.
Then how did Unity make coroutines by leveraging IEnumerator?
StartCoroutine
Let’s trace inside StartCoroutine().
// L108
public Coroutine StartCoroutine(IEnumerator routine)
{
...
return StartCoroutineManaged2(routine);
}
// L195
extern Coroutine StartCoroutineManaged2(IEnumerator enumerator);StartCoroutineManaged2 is declared as extern.
extern means the actual implementation lives in C++ native code, and since the Unity engine core is closed-source, we can’t look beyond that point. Still, we can infer its internal behavior from the header declaration at the top of the file and from the official Unity documentation.
// L19
[NativeHeader("Runtime/Scripting/DelayedCallUtility.h")]“All of the code from a coroutine’s first resumption point until its completion is executed within the DelayedCallManager inside Unity’s main loop.”
“A coroutine runs as an instance of a class automatically generated by the C# compiler. This object tracks the coroutine’s internal state and remembers where to resume after a yield.”
According to the official Unity documentation, when StartCoroutine is called, the IEnumerator is registered internally with the DelayedCallManager. From then on, each frame it checks Current to decide when to resume, and when the time comes, it calls MoveNext().
Just as foreach iterates over a collection with MoveNext() and Current, Unity repurposed IEnumerator to control execution flow in the exact same way.
Compilation
Coroutines are part of the built-in library — they require no external dependencies and are simple and easy to use.
But in this post, let’s peel back the convenience and expose the the ugly truth about coroutines.
After all, everything comes with a trade-off.
The site above lets you enter C# code and see the compiled code.
Let’s write a simple coroutine like the one below and check the compiled result.
- Before compiling
// before compiling
using System.Collections;
class Test
{
IEnumerator TestCoroutine()
{
int count = 0;
yield return null;
count++;
yield return null;
count++;
}
}- After compiling
// after compiling
internal class Test
{
[CompilerGenerated]
private sealed class <TestCoroutine>d__0 : IEnumerator<object>, IEnumerator, IDisposable
{
private int <>1__state;
private object <>2__current;
public Test <>4__this;
private int <count>5__1;
object IEnumerator<object>.Current
{
[DebuggerHidden]
get
{
return <>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden]
get
{
return <>2__current;
}
}
[DebuggerHidden]
public <TestCoroutine>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
}
[DebuggerHidden]
void IDisposable.Dispose()
{
}
private bool MoveNext()
{
switch (<>1__state)
{
default:
return false;
case 0:
<>1__state = -1;
<count>5__1 = 0;
<>2__current = null;
<>1__state = 1;
return true;
case 1:
<>1__state = -1;
<count>5__1++;
<>2__current = null;
<>1__state = 2;
return true;
case 2:
<>1__state = -1;
<count>5__1++;
return false;
}
}
bool IEnumerator.MoveNext()
{
//ILSpy generated this explicit interface implementation from .override directive in MoveNext
return this.MoveNext();
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
}
[NullableContext(1)]
[IteratorStateMachine(typeof(<TestCoroutine>d__0))]
private IEnumerator TestCoroutine()
{
<TestCoroutine>d__0 <TestCoroutine>d__ = new <TestCoroutine>d__0(0);
<TestCoroutine>d__.<>4__this = this;
return <TestCoroutine>d__;
}
}Understanding the compiled code isn’t the goal. There are two things we should focus on.
First : The State Machine
The TestCoroutine() function was transformed into a class called <TestCoroutine>d__0.
At the same time, a state value called <>1__state appeared inside it.
private sealed class <TestCoroutine>d__0
{
private int <>1__state;
}Based on this state value, every time MoveNext() is called it branches with a switch statement.
private bool MoveNext()
{
switch (<>1__state)
{
default:
return false;
case 0: // first execution
<>1__state = -1;
<count>5__1 = 0;
<>2__current = null;
<>1__state = 1; // move to next state
return true; // yield return null
case 1: // after the first yield
<>1__state = -1;
<count>5__1++;
<>2__current = null;
<>1__state = 2; // move to next state
return true; // yield return null
case 2: // after the second yield return
<>1__state = -1;
<count>5__1++;
return false; // coroutine ends
}
}The function doesn’t actually pause. The coroutine is turned into a state machine that saves the state value and does return true, and on the next MoveNext() call it resumes from the case matching the saved state value.
Second : Local Variables Become Class Fields
Did you notice why the compiler turned this into a class?
private sealed class <TestCoroutine>d__0
{
private int <count>5__1;
}A normal function disappears from the stack once it finishes executing, so its local variables vanish along with it. But a coroutine pauses with yield return and resumes later, so the value of the local variable count must persist even while it’s paused. Since this isn’t possible on the stack, the compiler generates a class, hoists the local variable up into a field of that class, and keeps it on the heap.
Now that it’s been made into a class, all that’s left is to create it with new and return it.
private IEnumerator TestCoroutine()
{
<TestCoroutine>d__0 <TestCoroutine>d__ = new <TestCoroutine>d__0(0);
}Here, the bare face of coroutines is revealed.
The coroutine was turned into a class in order to remember its state, and creating that class with new triggers a heap allocation.
In Unity, a heap allocation means that once the coroutine ends, it becomes a target for the GC to collect.
If heap allocations pile up and the GC runs frequently, it can lead to frame drops.
Coroutine
In the Compilation section, we confirmed that one state machine class gets allocated on the heap.
But the allocation at startup doesn’t end there.
Let’s check the signature of StartCoroutine we saw earlier once more.
public Coroutine StartCoroutine(IEnumerator routine)The return type is Coroutine. Let’s find out what Coroutine really is too.
public sealed class Coroutine : YieldInstruction
{
internal IntPtr m_Ptr;
Coroutine() {}
~Coroutine()
{
ReleaseCoroutine(m_Ptr);
}
[FreeFunction("Coroutine::CleanupCoroutineGC", true)]
extern static void ReleaseCoroutine(IntPtr ptr);
}See it? Coroutine is a class, not a struct.
It’s a wrapper class pointing to the actual coroutine the engine manages internally, and a new one is created and returned every time StartCoroutine is called. As you can tell from the destructor ~Coroutine calling CleanupCoroutineGC, this thing is a heap object managed by the GC.
In the end, starting a coroutine once triggers two heap allocations.
- The state machine class created by the compiler
- The
Coroutinewrapper object returned by Unity
YieldInstruction
In the section above, we learned that starting a coroutine once triggers two heap allocations.
But there’s also something we allocate with new by our own hands when we use coroutines.
It’s the Waitfamily —WaitForSeconds and friends, these are called YieldInstruction.
public class WaitForSeconds : YieldInstruction { ... }
public class WaitForFixedUpdate : YieldInstruction { ... }
public class WaitForEndOfFrame : YieldInstruction { ... }Hold on — taking a closer look, these are also classes that inherit from something called YieldInstruction.
We’ve been directly creating instances of these classes with the new keyword when using coroutines.
Here the grim reality of coroutines is revealed.
The coroutine itself is turned into a class that gets a heap allocation, and YieldInstruction triggers a heap allocation too.
Let’s take a look at a horrifying example.
void Start()
{
StartCoroutine(TestCoroutine());
}
IEnumerator TestCoroutine()
{
while (true)
{
yield return new WaitForSeconds(1f); // new on every loop
}
}If you can see how horrifying the code above is, you’ve gotten everything this post has to offer.
Two heap allocations happened when TestCoroutine started, and after that the WaitForSeconds allocations keep piling up every second.
Since this is a commonly seen pattern when using coroutines, it’s all the more horrifying.
There’s a way to make the situation above better. It’s caching.
private WaitForSeconds _waitForSeconds = new WaitForSeconds(1f);
IEnumerator TestCoroutine()
{
while (true)
{
yield return _waitForSeconds;
}
}By creating the WaitForSeconds object only once, putting it on the heap, and reusing it, no additional heap allocation occurs.
But do we always want to wait exactly 1 second?
Since we want to wait for whatever duration we choose on each call, in practice we mostly use it like this:
IEnumerator TestCoroutine(float waitTime)
{
while (true)
{
yield return new WaitForSeconds(waitTime); // heap alloc per call
}
}When the wait time changes dynamically, caching is virtually impossible.
Caching is an optimization that’s only valid when you always wait for the same amount of time; it’s hard to apply in the dynamic case.
To sum up, heap allocations occur in two places.
One is the moment you start a coroutine (twice — the state machine plus the Coroutine wrapper), and the other is the moment you new a fresh YieldInstruction. Conversely, the mechanism of a coroutine pausing with yield and then resuming carries no extra allocation on its own. One caveat: returning a value type like yield return 0 gets boxed into an object and allocates every time, so for a one-frame wait you must use yield return null.
The real problem is patterns that repeat allocations. Code that calls StartCoroutine anew every frame, or allocates a fresh YieldInstruction each time inside a loop, is exactly that. On the other hand, keeping a single coroutine alive for a long time and looping with yield return null allocates once at the start and that’s it — so it actually carries less overhead.
Wrapping Up
Coroutines are clearly a powerful and convenient tool.
But as we’ve seen, they aren’t suited for logic that runs every frame or very frequently.
Coroutines have been around since the early days of Unity, roughly 2005, which makes them a feature about 20 years old as of this writing.
Because the feature itself is so old, it has a bit of a legacy feel, and at the same time it’s so deeply embedded into MonoBehaviour that it seems hard to pull out.
Have people solved these problems?
Isn’t there a more powerful asynchronous tool that can replace coroutines like these?
Reference
- .NET runtime - IEnumerable.cs
- .NET runtime - IEnumerator.cs
- .NET runtime - List.cs
- Gamasutra - C# Memory Management for Unity Developers
- UnityCsReference - MonoBehaviour.bindings.cs
- UnityCsReference - Coroutine.bindings.cs
- Unity Documentation - Coroutines
- Unity Documentation - Write and run coroutines
- Unity Discussions - Why do my Coroutines allocate memory when they execute?