Quick Start¶
RL-Kernel exposes operators through a runtime registry. The registry selects a backend based on the current device and available compiled extensions.
import torch
from rl_engine.kernels.registry import kernel_registry
logits = torch.randn(16, 4096, device="cuda", dtype=torch.bfloat16).contiguous()
token_ids = torch.randint(0, 4096, (16,), device="cuda", dtype=torch.int32)
logp = kernel_registry.get_op("logp")
selected_log_probs = logp(logits, token_ids)
For environments without a compiled CUDA/ROCm extension, the registry falls back to the available PyTorch implementation when supported by the operator type.