IR Design Philosophy
Morok uses a single unified IR (UOp) for all compilation stages, following Tinygrad’s design philosophy.
The UOp Structure
#![allow(unused)]
fn main() {
pub struct UOp {
pub id: u64, // Unique stable ID
pub(crate) op: Op, // Operation (Rust tagged union)
pub(crate) dtype: DType, // Data type
// Cached properties (computed lazily)
pub(crate) shape_cache: OnceCell<...>,
pub(crate) vmin_vmax_cache: OnceCell<...>,
}
}
Key fields:
op: AnOpenum with 80+ operations spanning all abstraction levelsdtype: Type information (scalars, vectors, pointers)id: Unique identifier for caching and provenance tracking
Why One IR?
Traditional ML compilers use multiple IRs:
- TensorFlow: Graph → XLA HLO → MLIR → LLVM IR
- PyTorch: Python AST → TorchScript → FX Graph → Inductor IR → Triton
- JAX: Python → Jaxpr → StableHLO → MHLO → platform IR
Morok/Tinygrad uses one IR that represents all abstraction levels:
Same UOp at Different Stages
# High-level (after tensor ops):
BUFFER.reshape([2,3]).reduce(ADD, axis=0)
# Mid-level (after rangeify):
RANGE(2) -> RANGE(3) -> LOAD -> REDUCE -> STORE -> END
# Low-level (after codegen passes):
DEFINE_GLOBAL -> INDEX -> LOAD -> ADD -> STORE
Enabling Factors
-
Graph Rewriting as Universal Mechanism
Pattern matching + rewriting handles all transformations:
#![allow(unused)] fn main() { let optimized = graph_rewrite(&patterns, uop_graph, &mut ctx); } -
Hash Consing (Structural Sharing)
Identical subgraphs share memory via a thread-local cache:
#![allow(unused)] fn main() { thread_local! { static CACHE: RefCell<HashMap<UOpKey, Weak<UOp>>> = ...; } }Benefits:
- O(1) equality checking (pointer comparison)
- No duplicate subgraphs in memory
- Pattern matching can use pointer identity
-
Lazy Property Computation
Expensive analyses computed once and cached:
#![allow(unused)] fn main() { pub(crate) shape_cache: OnceCell<Result<Option<Shape>>>, pub(crate) vmin_vmax_cache: OnceCell<(ConstValue, ConstValue)>, } -
Operation Hierarchy
Ops organized by level to support progressive lowering:
#![allow(unused)] fn main() { impl Op { pub fn is_movement(&self) -> bool { ... } pub fn is_buffer(&self) -> bool { ... } pub fn is_alu(&self) -> bool { ... } } }
Trade-offs
| Aspect | Single IR | Multi-IR |
|---|---|---|
| Complexity | Lower | Higher |
| Translation bugs | None | Possible |
| Cross-level optimization | Natural | Requires bridging |
| Compile-time safety | Runtime checks | Per-IR guarantees |
| Codebase size | ~15k lines | 100k+ lines |
Morok vs Tinygrad UOp
| Aspect | Tinygrad | Morok |
|---|---|---|
| Language | Python dataclass | Rust struct |
| Children | src: tuple[UOp, ...] | Encoded in Op variants |
| Type safety | Runtime | Compile-time |
| Extra data | arg: Any (untyped) | Typed per variant |
| Memory | Weakref + GC | Rc<UOp> + explicit cleanup |
Morok’s Rust implementation adds compile-time guarantees:
#![allow(unused)]
fn main() {
// Each Op variant encodes its exact structure
Op::Binary(BinaryOp::Add, lhs, rhs) // vs Tinygrad's (Ops.ADD, (lhs, rhs))
Op::Reduce { src, ranges, reduce_op } // Named fields, typed
}