Making OpenTelemetry play well with Kotlin Coroutines

When using OpenTelemetry with different execution models you might notice that Context is not properly propagated by default between execution flows. This is not an OpenTelemetry-specific problem, there isn’t much documentation on the particular integration with Kotlin to begin with, which makes it particularly challenging.

What happens when we use the OpenTelemetry Java SDK in Kotlin Coroutines? (and other execution models)

Even though the Java SDK is properly suited for usage in Kotlin applications, you will notice that when trying to create nested/child spans, these won’t be connected - in fact, they will be exported as their own Trace!

This is obviously a no-go and defeats the purpose of Tracing in general. Thankfully, OpenTelemetry provides some abstractions and extensions to deal with these kinds of problems.

A deeper look into the OpenTelemetry SDK

One of the abstractions that is usually implemented by different frameworks to bridge their internal Context-passing mechanisms for OpenTelemetry is io.opentelemetry.context.ContextStorageProvider and io.opentelemetry.context.ContextStorage. These are not really needed for Kotlin Coroutines in particular, but you may see them in other types of integrations. We’ll explore this in a future post on the Eclipse Vert.x integration with OpenTelemetry.

What you need to know for now is that this abstraction allows OpenTelemetry to save and restore the state of its internal context (which we change when we perform actions such as setting the active span for the current OpenTelemetry Context). By default, OpenTelemetry uses ThreadLocal as the backing storage.

However, we use a different approach for Kotlin Coroutines. Looking into the GitHub repository for the OpenTelemetry Java SDK you’ll find this module: https://github.com/open-telemetry/opentelemetry-java/tree/main/extensions/kotlin

It mainly consists of an implementation of CoroutineContext.Element, which you may have seen if you’ve looked into Coroutines and especially Coroutine Contexts in the past. Namely, OpenTelemetry defines a KotlinContextElement which extends ThreadContextElement. The latter type allows implementers to define state that should be installed into ThreadLocal each time a coroutine resumes execution.

Two methods are relevant to this implementation:

  • updateThreadContext: invoked before a coroutine is resumed on current thread; OpenTelemetry sets its context as current and stores the resulting scope
  • restoreThreadContext: invoked after a coroutine has suspended on current thread; OpenTelemetry takes the previous scope and closes it.

But how do you use these extensions?

Ah! OpenTelemetry also provides a single Kotlin file with useful extension methods: https://github.com/open-telemetry/opentelemetry-java/blob/main/extensions/kotlin/src/main/kotlin/io/opentelemetry/extension/kotlin/ContextExtensions.kt

What we really need is to have a way to transform our current Span or Context into a CoroutineContext.Element, so that we can use it when building our CoroutineContext and then run our coroutines with the correct state. That’s what the asContextElement methods provide here. By the way, Spans implement ImplicitContextKeyed, so that’s what you see as the receiver for one of the extension methods.

To use them, you would use a syntax like this:

withContext(pan.asContextElement()) {
    someCode()
}

Higher-level extension methods for cleaner code

Now that we got the fundamentals out of the way, it’s time to present to you some of the ways I like to use these features in my own code. Having to pass around Tracers and build spans manually within code blocks is boring and noisy. Most of the time you just want to mark a code block as “relevant” for tracing and add attributes to spans as needed.

For this purpose, let’s start with a simple suspend function:

suspend fun getUser(request: UserRequest): UserReply {
    // some code
}

When instrumenting for the first time you might do something like this (note that this doesn’t deal with context propagation yet):

val tracer = GlobalOpenTelemetry.get().getTracer("my-tracer")

suspend fun getUser(request: UserRequest): UserReply {
    Span span = tracer.spanBuilder("my-span").startSpan()

    // using the closeable pattern or we could just close the scope in the finally block
    span.makeCurrent().use {
        try {
            // run some code, maybe add a few attributes
        } catch (throwable: Throwable) {
            span.setStatus(StatusCode.ERROR)
            span.recordException(throwable)
            throw throwable
        } finally {
            span.end()
        }
    }
}

However, I argue that there is a better way:

suspend fun getUser(request: UserRequest): UserReply = withSpan("my-span") { span ->
    // run some code, maybe add a few attributes
}

Much cleaner right? This seems like some kind of black magic, but there’s really not much to it. Let’s take a look at this new utility.

suspend fun <T> withSpan(
    spanName: String,
    parameters: (SpanBuilder.() -> Unit)? = null,
    coroutineContext: CoroutineContext = EmptyCoroutineContext,
    block: suspend (span: Span) -> T
): T = tracer.startSpan(spanName, parameters, coroutineContext, block)

At this point, you don’t see any implementation details, but there are a couple of things I’d like to point out:

  • for this syntax to work properly we need to allow the return type to be preserved;
  • a few optional parameters will make your code utility more flexible, in case you need to manually set the parent span or run a coroutine in a different execution context;
  • as with everything else, composition is the key to flexibility - these utilities allow for usage with or without a provided tracer.

And finally:

suspend fun <T> Tracer.startSpan(
    spanName: String,
    parameters: (SpanBuilder.() -> Unit)? = null,
    coroutineContext: CoroutineContext = EmptyCoroutineContext,
    block: suspend (span: Span) -> T
): T {
    val span: Span = this.spanBuilder(spanName).run {
        if (parameters != null) parameters()
        startSpan()
    }

    return withContext(coroutineContext + span.asContextElement()) {
        try {
            block(span)
        } catch (throwable: Throwable) {
            span.setStatus(StatusCode.ERROR)
            span.recordException(throwable)
            throw throwable
        } finally {
            span.end()
        }
    }
}

One thing you might find weird is the syntax for building the CoroutineContext passed into withContext. However, this is a pretty common pattern when dealing with them, as CoroutineContexts are composable we can add them together to build our final context.

One scenario where you might find this useful is when your framework already provides a CoroutineContext, so you’d want to reuse that for sure and add the OpenTelemetry context on top.

Conclusion

Dealing with framework internals can be daunting at first, but you’ll end up understanding how things work and how to make them work for you. I hope this post has helped you understand how OpenTelemetry works with Kotlin Coroutines, and how higher-level abstractions can help you instrument your code while keeping it clean and expressive.