Runtime Dispatch¶
RL-Kernel routes operators through KernelRegistry. Callers request an operator by
logical type, and the registry selects the first available backend for the current device.
Dispatch Flow¶
- Detect platform from
device_ctx. - Load the priority list for the requested operator type.
- Try each backend in priority order.
- Cache successfully constructed operator instances.
- Skip backends that already failed in the current process.
LogP Priority¶
| Platform | Priority |
|---|---|
| CUDA | SM90 fused LogP when available, CUDA generic, FlashInfer, Triton generic, PyTorch native |
| ROCm | AITER, Triton generic, PyTorch native |
| CPU | PyTorch native |
For CUDA devices with compute capability 9.0 or newer, the registry inserts the SM90 LogP backend at the front of the CUDA priority list.
Relevant Files¶
rl_engine/kernels/registry.pyrl_engine/platforms/device.pyrl_engine/kernels/ops/