# Introducing cxx-async

Patrick Walton · August 19, 2022

Update 9/6/2022: Fixed a potential use-after-free on the C++ side if the future got dropped. Thanks to David Tolnay for pointing this out.

I'm happy to announce a new Rust crate that I've been working on for a while at Meta: cxx-async. cxx-async is an extension to the cxx crate that allows for bidirectional interoperability between C++ coroutines and asynchronous Rust functions. With it, you can await C++ coroutines and co_await Rust functions; as much as possible, everything "just works". The biggest practical benefit of C++ coroutine interoperability is elimination of awkward callback patterns when interfacing with C++.

Let's build a simple service to demonstrate this. Suppose we want to build a Rust service that uses the C stb_image and stb_image_write libraries to convert JPEG images to PNG1. (Note: Don't expose the stb_image libraries to untrusted input in a real application, as they aren't designed to be secure. They just have a simple API that's good for demonstration.) We might use the Tokio libraries to make a service like this:

use actix_web::{get, App, HttpServer, Responder};
use std::future::Future;
use std::io::Result as IoResult;

// The server entry point.
#[actix_web::main]
async fn main() -> IoResult<()> {
HttpServer::new(|| App::new().service(convert))
.bind(("127.0.0.1", 8765))?
.run()
.await
}


On the C++ side, we might wrap the stb_image libraries like so:

#define STB_IMAGE_WRITE_IMPLEMENTATION

#include "coroutine_example.h"
#include "stb_image.h"
#include "stb_image_write.h"
#include <cstdint>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>

// Synchronously reencodes a JPEG to PNG.
rust::Vec<uint8_t> reencode_jpeg(rust::Slice<const uint8_t> jpeg_data)
{
int width, height, channels;
jpeg_data.data(), jpeg_data.size(), &width, &height, &channels, 4);
if (pixels == nullptr)

// Write the PNG to a temporary file.
// stb_image_write doesn't support writing directly to an in-memory buffer, so we have to go
// through a file first.
char tmpPath[32] = "/tmp/imageXXXXXX";
int fd = mkstemp(tmpPath);
if (fd < 0)
throw std::runtime_error("Couldn't create temporary file!");
int ok = stbi_write_png(tmpPath, width, height, 4, pixels, width * 4);
if (ok == 0)
throw std::runtime_error("Couldn't reencode image!");

// Read that temporary file back to memory.
rust::Vec<uint8_t> encodedPNG;
uint8_t buffer[4096];
throw std::runtime_error("Failed to reread written image file to memory!");

// Clean up and return the decoded PNG to Rust.
close(fd);

return encodedPNG;
}


And then declare the cxx bridge to connect the Rust and C++ together:

#[cxx::bridge]
mod ffi {
unsafe extern "C++" {
include!("coroutine_example.h");
fn reencode_jpeg(jpeg_data: &[u8]) -> Vec<u8>;
}
}


And the REST service endpoint:

#[post("/convert")]
async fn convert(jpeg_data: Bytes) -> impl Responder {
ffi::reencode_jpeg(&jpeg_data)
}


After starting the server with cargo run, we can verify that this all works:

$curl --data-binary @ferris.jpg -H "Content-Type: image/jpeg" --output out.png http://127.0.0.1:8765/convert$ file out.png
out.png: PNG image data, 275 x 183, 8-bit/color RGBA, non-interlaced


So we have a working service, but it has scalability problems. We're doing the one thing you should never do when working with async I/O: blocking the event loop with a long-running computation. Let's use a thread pool on the C++ side1 to fix this problem. Using the boost::asio::thread_pool type from Boost, we can rewrite the C++ side to look like this:

...
#include <boost/asio/post.hpp>
...

rust::Vec<uint8_t> reencode_jpeg(std::vector<const uint8_t> jpeg_data)
{
...
}

// Asynchronously reencodes a JPEG to PNG via a thread pool.
void reencode_jpeg_async(rust::Slice<const uint8_t> jpeg_data)
{
std::vector<const uint8_t> data(jpeg_data.begin(), jpeg_data.end());
reencode_jpeg(std::move(data));
});
}


But now we have a problem: how do we get the data back from the reencode_jpeg function? We don't want to block on retrieving the results from the thread pool, as that would keep blocking our Tokio event loop. What we need is a way to return to the event loop while the conversion process runs and subsequently to enqueue a task to send the results back to the client. Traditionally, we'd use a callback for this. On the C++ side:

void reencode_jpeg(
std::vector<const uint8_t> jpeg_data,
rust::Fn<void(rust::Box<CallbackContext>, rust::Vec<uint8_t>)> callback,
rust::Box<CallbackContext> context)
{
...

callback(std::move(context), std::move(encodedPNG));
}

// Asynchronously reencodes a JPEG to PNG via a thread pool.
void reencode_jpeg_async(
rust::Slice<const uint8_t> jpeg_data,
rust::Fn<void(rust::Box<CallbackContext>, rust::Vec<uint8_t>)> callback,
rust::Box<CallbackContext> context)
{
std::vector<const uint8_t> data(jpeg_data.begin(), jpeg_data.end());
data = std::move(data),
callback = std::move(callback),
context = std::move(context)
]() mutable {
reencode_jpeg(std::move(data), std::move(callback), std::move(context));
});
}


And on the Rust side:

// The cxx bridge that declares the asynchronous function we want to call.
#[cxx::bridge]
mod ffi {
extern "Rust" {
type CallbackContext;
}

unsafe extern "C++" {
include!("coroutine_example.h");
fn reencode_jpeg_async(
jpeg_data: &[u8],
callback: fn(Box<CallbackContext>, result: Vec<u8>),
context: Box<CallbackContext>,
);
}
}

pub struct CallbackContext(Sender<Vec<u8>>);

// Our REST endpoint, which calls the asynchronous C++ function.
#[post("/convert")]
async fn convert(jpeg_data: Bytes) -> impl Responder {
let context = Box::new(CallbackContext(sender));
ffi::reencode_jpeg_async(
&jpeg_data,
|context, encoded| drop(context.0.send(encoded)),
context,
);
}


What a mess! We've replaced an elegant service with callback spaghetti and boilerplate. Surely there must be a better way.

It turns out that now there is, with cxx-async. We can upgrade our C++ code to C++20 and use the coroutines feature to dramatically reduce the boilerplate. First, we replace Boost with the thread pool from Folly:

...
#include <folly/Executor.h>
...

auto reencode_jpeg(std::vector<const uint8_t> jpeg_data)
{
...
}

// Asynchronously reencodes a JPEG to PNG via a thread pool.
RustFutureVecU8 reencode_jpeg_async(rust::Slice<const uint8_t> jpeg_data)
{
reencode_jpeg(std::vector(jpeg_data.begin(), jpeg_data.end()))
.semi()
}


(We could equally well have used CppCoro, or another lightweight package, instead of Folly; we just need some C++ library that exposes thread pool work items as co_awaitable tasks. Folly was chosen for this example because it's the most popular coroutine library of this writing.)

Now we modify our C++ functions to be coroutines instead of taking callbacks:

#define FOLLY_HAS_COROUTINES 1

...
...

{
...

co_return encodedPNG;
}

// Asynchronously reencodes a JPEG to PNG via a thread pool.
RustFutureVecU8 reencode_jpeg_async(rust::Slice<const uint8_t> jpeg_data)
{
co_return co_await reencode_jpeg(std::vector(jpeg_data.begin(), jpeg_data.end()))
.semi()
}



In Rust, we write a cxx_async::bridge declaration and modify the cxx::bridge declaration like this:

// The cxx bridge that declares the asynchronous function we want to call.
#[cxx::bridge]
mod ffi {
unsafe extern "C++" {
include!("coroutine_example.h");
type RustFutureVecU8 = crate::RustFutureVecU8;
fn reencode_jpeg_async(jpeg_data: &[u8]) -> RustFutureVecU8;
}
}

// The cxx_async bridge that defines the future we want to return.
#[cxx_async::bridge]
unsafe impl Future for RustFutureVecU8 {
type Output = Vec<u8>;
}



And inside the service endpoint, all the callbacks disappear:

// Our REST endpoint, which calls the C++ coroutine.
#[post("/convert")]
async fn convert(jpeg_data: Bytes) -> impl Responder {
ffi::reencode_jpeg_async(&jpeg_data).await.unwrap()
}


Notice how much nicer this is. We can keep the straight-line code that async Rust allows us to write, even across language boundaries. Additionally, the Rust code doesn't have to know about Folly, and the C++ code doesn't have to know about Tokio: cxx-async is generic over Rust coroutine libraries. Even if you aren't using C++ coroutines today, it may be worth introducing them to your project just to eliminate callbacks: there is no other solution I'm aware of that allows for callback-free async interoperability between languages.

The complete worked example can be found on GitHub. For comparison, the bad blocking version can be found in the blocking branch, while the callback-based version is in the callback-soup branch.

As noted before, cxx-async supports both CppCoro and Folly; an example for CppCoro can be found here, and here is an example for Folly. If you're using another C++ coroutine library, you can add support for it to cxx-async using the same mechanisms. In both frameworks, asynchronous Rust code can call C++ coroutines, and C++ coroutines can co_await asynchronous Rust. Extra effort has been expended to ensure that Folly semifutures can run directly on Rust executors without the overhead of having to create a separate executor on the C++ side; in theory, this can make asynchronous Folly code faster than the equivalent code using callbacks.

The cxx-async crate is available on crates.io. Please feel free to try it out, report feedback, report issues, and send pull requests! Integration with cxx is a possibility in the future as cxx-async stabilizes and matures.

1

In this particular toy example, we could instead dispatch to a thread pool on the Rust side and avoid the need to use cxx-async. But this doesn't work for C++ libraries that use coroutines internally.