Comptime

CubeCL isn't just a new compute language: though it feels like you are writing GPU kernels, you are, in fact, writing compiler plugins that you can fully customize! Comptime is a way to modify the compiler IR at runtime when compiling a kernel for the first time.

This enables a lot of optimizations and flexibility without having to write many separate variants of the same kernels to ensure maximal performance.

Loop Unrolling

You can easily unroll loops in CubeCL using the unroll attribute on top of a for loop.

#![allow(unused)]
fn main() {
#[cube(launch)]
fn sum<F: Float>(input: &Array<F>, output: &mut Array<F>, #[comptime] end: Option<u32>) {
    let unroll = end.is_some();
    let end = end.unwrap_or_else(|| input.len());
    let mut sum = F::new(0.0);

    #[unroll(unroll)]
    for i in 0..end {
        sum += input[i];
    }

    output[ABSOLUTE_POS] = sum;
}
}

Note that if you provide a variable end that can't be determined at compile time, a panic will arise when trying to execute that kernel.

Feature Specialization

You could also achieve the sum using plane operations. We will write a kernel that uses that instruction when available based on a comptime feature flag. When it isn't available, it will fall back on the previous implementation, essentially making it portable.

#![allow(unused)]
fn main() {
#[cube(launch)]
fn sum_plane<F: Float>(
    input: &Array<F>,
    output: &mut Array<F>,
    #[comptime] plane: bool,
    #[comptime] end: Option<u32>,
) {
    if plane {
        output[UNIT_POS] = plane_sum(input[UNIT_POS]);
    } else {
        sum_basic(input, output, end);
    }
}
}

Note that no branching will actually occur on the GPU, since three different kernels can be generated from the last code snippet. You can also use the trait system to achieve a similar behavior.

Keyboard shortcuts

The CubeCL Book 🧊

Comptime

Loop Unrolling

Feature Specialization