Overview
This release brings major performance improvements to tensor operations, particularly in matrix
multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL
runtimes. It also introduces foundational features for multi-backend compatibility and adds new
quantization operations.
Support for ONNX models has been expanded, with additional operators and bug fixes for better operator
coverage.
As with previous releases, this version includes various bug fixes, further performance optimizations,
new tensor operations, and enhanced documentation.
Module & Tensor
• Remove copy restriction for const generic modules
#2222@laggui
• Add dim checks on output rank for unsqueeze and stack
#2331@laggui
• [Breaking] Change LR schedulers to return the initial LR at first `.step()`
#2337@towerpark
Bug Fixes
• Nonzero should return an empty vec for zero tensors
#2212@laggui
• Change ndarray mask_where implementation to correctly deal with NaNs
#2272@laggui
Backends
• Add candle `CudaDevice` and `MetalDevice` to avoid creating a new unique device each time
#2290@laggui
• Add `BackendRouter` to handle multiple backends on the way to distributed
#2353#2419@laggui
Bug Fixes
Documentation & Examples
• Add documentation for custom `cubecl` kernels, update some outdated docs
#2404@wingertge
Fixes
• Fix target convert in batcher and align guide imports
#2215@laggui
• Fix debugger settings doc in contributor book
#2223@tiruka
• Contributor Book: Fix the link of primitive types in the "Serialization" page
#2362@towerpark
• Fix xtask args which are unmodified when upgrading xtask commands
#2364@tiruka
ONNX Support
• Allow onnx-import expand op with non-const shapes
#2189@hexd0t
• Add missing output padding to conv transpose ONNX
#2216@laggui
Enhancements
• Introduce autotuning to `conv2d` and `conv_transpose2d` with a new `im2col`/`GEMM` algorithm
#2287@wingertge
• Add bounds checking to implicit GEMM to allow arbitrary input shapes
#2354@wingertge
• Initialize accumulator to bias for implicit GEMM to save an expensive `float_add`
#2383@wingertge
Refactoring
Miscellaneous
• Refactor xtask to use tracel-xtask and refactor CI workflow
#2063@syl20bnr
• Update CI workflow for last version of setup-linux action
#2248@syl20bnr
• [CI] Fix llvmpipe, lavapipe install for valgrind and vulnerabilities
#2264@syl20bnr
• Move conv autotune under feature flag (except key)
#2330@laggui