A formulation for scoped tasks in Rust


For just about as long as I’ve been working on async Rust, the topic of scoped tasks has come up. These are async tasks that borrow from their environment, and they would come in handy in a lot of situations.

Last year the standard library stabilized thread::scope which allows synchronous threads to do this. You could imagine a similar API, but with async:

async fn fanout(data: &Vec<Foo>) {
    task::scope(|s| {
        // Spawn subtasks to run in parallel.
        for chunk in data.chunks(50) {
            s.spawn(async { process(chunk).await });
        }
    }).await;
}

Similar to thread::scope, the task::scope future will wait on all spawned threads to complete, at which point our function will return. In async programming, this pattern is known as structured concurrency.1

Unfortunately, it turns out that this API can’t be implemented soundly in Rust today. I’ll go into the reasons below, but the short version is that there’s no way to prevent data from being used after it was freed. The ideas of scoped spawn and structured concurrency have been coming up again lately, and in these discussions it’s sometimes hard to pin down exactly what features we’re talking about and what is possible to do soundly.

In this post, I attempt to state as precisely as possible the essential features that we might want from scoped spawn APIs and where those run into fundamental language constraints. In particular, by essential here I mean that removing the feature:

My hope is that this creates a useful framework for understanding and reasoning about the problem space. I also attempt to be pretty concise in the formulation, meaning I don’t attempt to explain every aspect of the problem here, but do give a brief overview. We’ll then explore existing points on the design space and speculate about future possibilities through the lens of this formulation. If that sounds interesting to you, read on!

Brief recap on the problem

Scoped threads are nice because they’re allowed to borrow freely from the stack, and Rust makes this completely safe.

fn fanout_sync(data: &Vec<Foo>) {
    thread::scope(|s| {
        for chunk in data.chunks(50) {
            s.spawn(|| process_sync(chunk));
        }
    });
}

The thread::scope API works by calling a closure that is allowed to spawn threads and borrow from its environment. After that closure returns, the function blocks the current thread until all spawned threads are complete. This guarantees that any borrowed data remains in scope while the spawned threads are running.

Unfortunately when async is involved, this isn’t so easy. Our hypothetical API from the introduction might look pretty nice, but it can easily be misused:

async fn evil_fanout(data: &Vec<Foo>) {
    let scope_fut = task::scope(|s| async {
        for chunk in data.chunks(50) {
            s.spawn(|| async { process(chunk).await });
        }
    });

    // Let's get some tasks going on other threads...
    let mut scope_fut = Box::pin(scope_fut);
    futures::poll!(&mut scope_fut);
    futures::poll!(&mut scope_fut);

    // ...now pull out the rug! 😱
    std::mem::forget(scope_fut);
    return;
}

By the time we forget the scope future, it already has tasks running on other threads. The library couldn’t prevent us from exiting the current scope and invalidating its borrow of data before they completed, leading to a race condition and undefined behavior.

Formulation

The desire for scoped spawn is to combine the following features:

  1. Structure: We want the ability to reason from the code that a scope’s tasks have all completed before control flow leaves the scope. Additionally, errors (including panics) should propagate from tasks to their parent scopes.
  2. Borrowing: We want the ability to borrow freely from the parent scope inside child tasks. Depends on Structure.
  3. Nesting: We want the ability to spawn tasks from inside other tasks and (when Structure is present) declare async scopes inside other scopes.
  4. Parallelism: We want the subtasks we spawn to fan out to other threads.
  5. Safety: Using the APIs must not result in use-after-free or other UB.

Today, it must also work under the following language constraints:

  1. Non-blocking: We may not block the thread of an async executor, or bad things (performance issues and potentially deadlocks) will happen. Anything called in an async context must suspend in a timely manner.
  2. Non-cooperative cancellation: Suspended asynchronous computations are represented as values (futures), which may be dropped at any time. There is no requirement to poll them to completion.
  3. Forgetting: Values can leak. There’s no way to guarantee that code (such as a destructor) runs before a particular value goes out of scope.

Note that the first two constraints only apply to async. Combined, these constraints mean that we cannot guarantee any code in an async context runs before some set of values go out of scope.

Running some kind of hook turns out to be necessary for ensuring soundness. Tasks that borrow from the parent scope must be stopped before that scope is invalidated. Therefore, there is no way to implement the fully featured version in async Rust today.

Subsets of features

The nice thing about this formulation is that every feature listed above can be relaxed, with an existing, sound API to show for it:

  1. Relaxing Structure: tokio::spawn and similar global APIs in use today. (These also relax Borrowing, which depends on Structure.)
  2. Relaxing Borrowing: The task_scope and async_nursery crates.
  3. Relaxing Nesting: tokio::block_on. async_scoped::scope_and_block is equivalent.
  4. Relaxing Parallelism: FuturesUnordered, select, moro::async_scope!.
  5. Relaxing Safety: the unsafe async_scoped::scope_and_collect. The user must ensure the scope is not leaked. In exchange they get properties 1-4.

Examples

Now that we have the basic formulation, let’s expand on it through the lens of some known points on the design space that were already mentioned.

Synchronous code

thread::scope and its precursors are able to work because Non-blocking does not apply to them, simply because they run synchrounously instead of in async. They exploit the fact that upward control flow through a call frame is observable; that is, that if you write a function you can always “notice” that it is returning or unwinding.

Async removes this property through the confluence of the above constraints. Another way of looking at it is that async introduces a new direction for control flow to take. From the perpsective of an async context, control flow doesn’t travel “up” or “down” the call stack, but “out” of it completely.

Relaxing Parallelism

FuturesUnordered and select in the futures crate allow making progress on an arbitrary number of futures concurrently, and those futures can even borrow from the current scope.

They work by managing concurrency internally within the current async context, which makes Parallelism impossible. The way they work also seems to be unintuitive for some people, which leads to a number of common bugs related to progress of the child futures.2

Moro also works by relaxing Parallelism, but it solves the issues I just mentioned by making sure every spawned sub-task is always eligible to be polled. It polls the sub-tasks concurrently with the spawning context rather than nested under it.3

Moro still ensures that all the tasks within a scope run on the same thread. This gives us control over their progress: they only execute when the object that owns the current scope lets them. Even if that object is forgotten, there are no soundness issues, because it means the subtasks can never resume.

Parallelism gives child tasks the ability to make unbounded progress, decoupling them from the progress of the current thread. This is in fact exactly the property we usually desire from it! At the same time, it removes a way to tie the progress of a sub-task to the validity of the scope it borrows from.

But what about…?

I talk about new language features and library approaches below, but first let’s dispense with the most common ideas I usually see people gravitate toward.4

Forcing .await with a macro

Instead of returning a future from our hypothetical task::scope API, we could instead provide a macro that produced the future and then immediately awaited it. That would prevent anyone from handing the future to mem::forget (or its equivalent) in the middle of polling. Effectively, we attempt to relax the Non-cooperative cancellation constraint for the scope future specifically.

Remember though that for Nesting to be satisfied, it must be possible to define a scope in an async context. Indeed, both async examples above take place in an async fn. This means that our caller can itself choose to poll us a few times and then forget us – which transitively causes the scope future to be polled and then forgotten. Since we don’t control the caller, it doesn’t matter that we await the scope future immediately.

Pinning

You may recall that Pin has a nice-looking property: it guarantees that pinned data has its destructor called before its memory is invalidated.5 And in fact, futures are pinned before ever being awaited. Surely we can use this guarantee to our advantage?

While enticing, this intuition doesn’t match the actual behavior of Pin. The fact that our parent future is pinned actually has nothing to do with whether its internal data – the stuff actually being borrowed – is pinned.6 This is easy to see, because I can declare a local inside of an async block and move it or forget it around just like any other variable; the pinning guarantees do not apply.7

Likewise, the fact that a child future is pinned has nothing to do with whether its captures are pinned, so we can’t rely on that either.8

Unvisited points on the design space

As far as I know these haven’t been experimented with yet, though the language features have certainly been discussed before. If there are other solutions in the space (visited or not), I’d be happy to know about those too.

Please note that I’m not proposing anything in this post, just laying out the design space as well as I can.

In libraries

Choosing between parallelism and borrowing

Staying within the world of structured spawn APIs, you could combine the non-Parallelism API in moro with a non-Borrowing API and give users the option to pick one each time they spawn.

If the scope is dropped, you would have to come up with something reasonable to do with the tasks on other threads. You might cancel them and block on their destructors running, which at least in theory shouldn’t introduce any long blocks to the system.

Forgetting the scope would break the ability to reason about Structure, but would not affect soundness. It’s also expected to be quite rare; you could reasonably be upset with any code that does this.

This would be an interesting parallel to the async_scoped crate, which lets you choose between relaxing Nesting and Safety.

Restricting Borrowing

In the formulation above I said that subtasks should be able to “freely borrow” from their parent contexts. It’s possible that a more restricted form of borrowing might be possible to allow.

The way I’m imagining it, it would have to go through a couple of control layers that would make it significantly less ergonomic to use than “just capture a reference in your async block”. I’m also not sure yet if it would be sound. But it might be worth tinkering with, if only to see more clearly the limits of what the language can express.

Relaxing Non-blocking in async

This constraint directly arises from the use of async. The only way out of it would be to make the executor reentrant or spawn another thread to ensure the current thread’s tasks continue to run. I don’t think either is desirable.

We could make our executor reentrant, though it might be complex. But consider the implication. Every time any scope is started it will take up space on the stack until it’s finished, and scopes can be created dynamically. A web server servicing tens of thousands of concurrent requests would probably run out of stack space quickly. Regular backtraces would be a nightmare!

That said, I have seen it done in Swift (and Obj-C) test frameworks.

In the language

Relaxing Forgetting

Values are allowed to leak in Rust, but we could make some values un-leakable. This would mean that our scope guard destructor is guaranteed to run before the values it borrows from go out of scope.

Relaxing Non-cooperative cancellation

Solutions include:

Aside: What about async drop?

Even if Rust had async drop, by itself it wouldn’t be strong enough to relax any constraints. An async destructor would return a Future just like any other, and the caller would not be required to continue polling it.

As it turns out, async drop is basically orthogonal to the whole project. If we get async drop in some form, it can be used as the cancellation path for both the subtasks and the nursery itself. It would use one of the other mechanisms listed here to suspend progress in the scope defining code until subtasks are no longer running using.

Conclusion

This problem space is complex. It seems like every time I think I understand it, I come across some question that I realize I don’t quite know the answer to, then spend anywhere from a few minutes to a few hours thinking about it.

This post is my attempt to summarize and collect as much information about the problem as possible and act as a reference guide to myself and anyone else who’s wrestling with it. I hope it presents a useful model for thinking about the problem. If there are defects, I’d like to know!

The async working group is drafting a vision for the next few years, and one of the big questions to answer is how structured concurrency and scoped spawning fit into our long term goals. The closer we get to a useful model of the problem, the clearer I hope those answers will become.

You can comment on this post on the internals thread.

Special thanks to Nick Cameron, Manish Goregaokar, Eric Holk, Niko Matsakis, Alice Ryhl, and Yoshua Wuyts for reading a draft of this post and leaving feedback.


  1. Structured concurrency is compellingly explained in this blog post by Nathaniel J. Smith. ↩︎

  2. For example, FuturesUnordered allows “spawning” futures with its push API, but they don’t make progress unless the stream is actively being awaited. Here are some user experience stories related to this:

    • Solving a deadlock: Not getting concurrency after spawn, then trying to await tasks that have backpressure from the subtasks before starting to await them.
    • There are a couple combinator patterns that lead to you alternating serially between progressing the subtasks and processing their results, instead of doing it all concurrently. #2387, BBBS

    select works by waiting for the first future in a list of futures to complete, then dropping the rest. This exacerbates issues with non-cooperative cancellation, which creates subtle bugs due to a mismatch in expectations between a user’s intuitions about control flow and the fact an async function can be cancelled at any suspension point. ↩︎

  3. In other words, the spawner and spawnee are on equal footing; there is no need for the spawner to then poll the sub-executor to ensure that the spawned futures continue to make progress.

    Just as importantly, perhaps, the spawnee doesn’t begin execution until the spawner next suspends (which could be waiting on the spawnee, or some completely different future). This helps prevent the kind of weird race conditions you sometimes get with eager execution of spawned tasks. ↩︎

  4. “People” includes me, at times. This post is for me as much as anyone. ↩︎

  5. To be precise, “pinned data” refers to data of type T pointed to by some Pin<P: Deref<Target = T>> where T: !Unpin. The !Unpin part is important, because otherwise the drop guarantee (and other Pin guarantees) don’t apply. ↩︎

  6. In Pinning parlance, pinning is not structural for the locals of an async block. See the std::pin docs for more. ↩︎

  7. It’s also not enough to know that the future containing a local is pinned, because pinning doesn’t prevent forgetting, only invalidation. This means that the memory would still be valid, but if the future is forgotten any memory it refers to could have already gone out of scope. If we slap on a 'static requirement on the future to fix that, it might help, but then we are deep into the realm of restricting borrowing. ↩︎

  8. It’s possible that requiring captures to be pinned would help. But there’s no way to require that directly as a bound in Rust today, and it would still be a significant restriction on what can be borrowed. See the Restricting borrowing section. ↩︎