Hands-On: From Tensors to Models
This chapter teaches Morok through progressive examples. You’ll start with basic tensor operations and build up to a working neural network classifier.
What you’ll learn:
- Creating and manipulating tensors
- Shape operations (reshape, transpose, broadcast)
- Matrix multiplication
- Building reusable layers
- Composing a complete model
Prerequisites:
- Basic Rust knowledge
- Add
morok_tensorto yourCargo.toml
Key pattern: Morok uses lazy evaluation. Operations build a computation graph without executing. Call realize() to compile and run everything at once.
Example 1: Hello Tensor
Let’s create tensors, perform operations, and get results.
use morok_tensor::Tensor;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create tensors from slices
let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
let b = Tensor::from_slice(&[10.0f32, 20.0, 30.0, 40.0]);
// Lazy operations (no execution yet)
let sum = &a + &b;
let scaled = &sum * &Tensor::from_slice(&[0.1f32]);
// Execute and get results
let result = scaled.realize()?;
let data = result.to_ndarray::<f32>()?;
println!("Result: {:?}", data);
// Output: [1.1, 2.2, 3.3, 4.4]
Ok(())
}
What’s happening:
-
Tensor::from_slice()creates a tensor from a Rust slice. Thef32suffix tells Rust the element type. -
&a + &bdoesn’t compute anything yet. It returns a newTensorthat represents the addition. The&borrows the tensors so we can reuse them. -
realize()is where the magic happens. Morok:- Analyzes the computation graph
- Fuses operations where possible
- Generates optimized code
- Executes on the target device
-
to_ndarray()extracts the result as anndarray::ArrayDfor inspection.
Try this: Remove the realize() call. The code still runs, but data would be empty—nothing was computed.
Example 2: Shape Gymnastics
Neural networks constantly reshape data. Let’s master the basics.
#![allow(unused)]
fn main() {
fn shape_example() -> Result<(), Box<dyn std::error::Error>> {
// Create a 1D tensor with 6 elements
let data = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0, 5.0, 6.0]);
println!("Original shape: {:?}", data.shape()); // [6]
// Reshape to a 2x3 matrix
let matrix = data.try_reshape(&[2, 3])?;
println!("Matrix shape: {:?}", matrix.shape()); // [2, 3]
// [[1, 2, 3],
// [4, 5, 6]]
// Transpose to 3x2
let transposed = matrix.try_transpose(0, 1)?;
println!("Transposed shape: {:?}", transposed.shape()); // [3, 2]
// [[1, 4],
// [2, 5],
// [3, 6]]
// Broadcasting: add a row vector to every row
// [3, 2] + [1, 2] → [3, 2]
let bias = Tensor::from_slice(&[100.0f32, 200.0])
.try_reshape(&[1, 2])?;
let biased = &transposed + &bias;
let result = biased.realize()?;
println!("{:?}", result.to_ndarray::<f32>()?);
// [[101, 204],
// [102, 205],
// [103, 206]]
Ok(())
}
}
Key operations:
| Operation | What it does |
|---|---|
try_reshape(&[2, 3]) | Change shape (same total elements) |
try_reshape(&[-1, 3]) | Infer dimension from total size |
try_transpose(0, 1) | Swap dimensions 0 and 1 |
try_squeeze(dim) | Remove dimension of size 1 |
try_unsqueeze(dim) | Add dimension of size 1 |
Broadcasting rules (same as NumPy/PyTorch):
- Shapes align from the right
- Each dimension must match or be 1
- Dimensions of size 1 are “stretched” to match
[3, 2] + [1, 2] → [3, 2] ✓ (1 broadcasts to 3)
[3, 2] + [2] → [3, 2] ✓ (implicit [1, 2])
[3, 2] + [3] → error ✗ (2 ≠ 3)
Example 3: Matrix Multiply
Matrix multiplication is the workhorse of neural networks. Every layer uses it.
#![allow(unused)]
fn main() {
fn matmul_example() -> Result<(), Box<dyn std::error::Error>> {
// Input: 4 samples, 3 features each → shape [4, 3]
let input = Tensor::from_slice(&[
1.0f32, 2.0, 3.0, // sample 0
4.0, 5.0, 6.0, // sample 1
7.0, 8.0, 9.0, // sample 2
10.0, 11.0, 12.0, // sample 3
]).try_reshape(&[4, 3])?;
// Weights: 3 inputs → 2 outputs → shape [3, 2]
let weights = Tensor::from_slice(&[
0.1f32, 0.2, // feature 0 → outputs
0.3, 0.4, // feature 1 → outputs
0.5, 0.6, // feature 2 → outputs
]).try_reshape(&[3, 2])?;
// Matrix multiply: [4, 3] @ [3, 2] → [4, 2]
let output = input.dot(&weights)?;
let result = output.realize()?;
println!("Output shape: {:?}", result.shape()); // [4, 2]
println!("{:?}", result.to_ndarray::<f32>()?);
// Each row: weighted sum of that sample's features
Ok(())
}
}
Shape rules for dot():
| Left | Right | Result |
|---|---|---|
[M, K] | [K, N] | [M, N] |
[K] | [K, N] | [N] (vector-matrix) |
[M, K] | [K] | [M] (matrix-vector) |
[B, M, K] | [B, K, N] | [B, M, N] (batched) |
The inner dimensions must match (the K). Think of it as: “for each row of left, dot product with each column of right.”
Example 4: Building a Linear Layer
A linear layer computes y = x @ W.T + b. Let’s build one from scratch.
#![allow(unused)]
fn main() {
use morok_tensor::{Tensor, Error};
struct Linear {
weight: Tensor, // shape: [out_features, in_features]
bias: Tensor, // shape: [out_features]
}
impl Linear {
fn new(in_features: usize, out_features: usize) -> Self {
// Simple initialization (real code would use proper random init)
let weight_data: Vec<f32> = (0..in_features * out_features)
.map(|i| (i as f32 * 0.1).sin() * 0.1)
.collect();
let bias_data = vec![0.0f32; out_features];
Self {
weight: Tensor::from_slice(&weight_data)
.try_reshape(&[out_features as isize, in_features as isize])
.expect("reshape failed"),
bias: Tensor::from_slice(&bias_data),
}
}
fn forward(&self, x: &Tensor) -> Result<Tensor, Error> {
// y = x @ W.T + b
let weight_t = self.weight.try_transpose(0, 1)?;
let out = x.dot(&weight_t)?;
Ok(&out + &self.bias)
}
}
fn linear_example() -> Result<(), Box<dyn std::error::Error>> {
// Create a layer: 4 inputs → 2 outputs
let layer = Linear::new(4, 2);
// Single sample with 4 features
let input = Tensor::from_slice(&[1.0f32, 2.0, 3.0, 4.0]);
// Forward pass
let output = layer.forward(&input)?;
let result = output.realize()?;
println!("Output: {:?}", result.to_ndarray::<f32>()?);
Ok(())
}
}
Why transpose the weights?
PyTorch convention stores weights as [out_features, in_features]. For a layer mapping 4 → 2:
- Weight shape:
[2, 4] - Input shape:
[4]or[batch, 4] - We need:
input @ weight.T=[batch, 4] @ [4, 2]=[batch, 2]
This convention makes it easy to read the weight matrix: row i contains all weights feeding into output i.
Example 5: MNIST Classifier
Let’s build a complete neural network that could classify handwritten digits.
#![allow(unused)]
fn main() {
/// Two-layer neural network for MNIST
/// Architecture: 784 (28×28 pixels) → 128 (hidden) → 10 (digits)
struct MnistNet {
fc1: Linear,
fc2: Linear,
}
impl MnistNet {
fn new() -> Self {
Self {
fc1: Linear::new(784, 128),
fc2: Linear::new(128, 10),
}
}
fn forward(&self, x: &Tensor) -> Result<Tensor, Error> {
// Layer 1: linear + ReLU activation
let x = self.fc1.forward(x)?;
let x = x.relu()?;
// Layer 2: linear (no activation — raw logits)
self.fc2.forward(&x)
}
fn predict(&self, x: &Tensor) -> Result<Tensor, Error> {
let logits = self.forward(x)?;
// Convert logits to probabilities
logits.softmax(-1)
}
}
fn mnist_example() -> Result<(), Box<dyn std::error::Error>> {
let model = MnistNet::new();
// Simulate a 28×28 grayscale image (flattened to 784)
let fake_image: Vec<f32> = (0..784)
.map(|i| (i as f32) / 784.0)
.collect();
let input = Tensor::from_slice(&fake_image)
.try_reshape(&[1, 784])?; // batch size 1
// Forward pass
let logits = model.forward(&input)?;
let probs = logits.softmax(-1)?;
// Get results
let probs_result = probs.realize()?;
println!("Probabilities: {:?}", probs_result.to_ndarray::<f32>()?);
// Get predicted class
let prediction = logits.argmax(Some(-1))?;
let pred_result = prediction.realize()?;
println!("Predicted digit: {:?}", pred_result.to_ndarray::<i32>()?);
Ok(())
}
}
Key concepts:
-
ReLU activation:
x.relu()returnsmax(0, x). It introduces non-linearity—without it, stacking linear layers would just be one big linear layer. -
Logits vs probabilities: The raw output of the last layer (logits) can be any real number.
softmax()converts them to probabilities that sum to 1. -
argmax: Returns the index of the maximum value—the predicted class.
-
Batch dimension: We use shape
[1, 784]for a single image. For 32 images, use[32, 784]. The model handles batches automatically.
Example 6: Under the Hood
Want to see what Morok generates? Here’s how to inspect the IR and generated code.
#![allow(unused)]
fn main() {
fn inspect_compilation() -> Result<(), Box<dyn std::error::Error>> {
let a = Tensor::from_slice(&[1.0f32, 2.0, 3.0]);
let b = Tensor::from_slice(&[4.0f32, 5.0, 6.0]);
let c = &a + &b;
// Print the computation graph (before compilation)
println!("=== IR Graph ===");
println!("{}", c.uop().tree());
// Compile and execute
let result = c.realize()?;
// Inspect generated kernels
println!("\n=== Generated Kernels ===");
for (i, kernel) in result.kernels().iter().enumerate() {
println!("Kernel {}: {}", i, kernel.name);
println!("Backend: {}", kernel.backend);
println!("Code:\n{}\n", kernel.code);
}
Ok(())
}
}
What you’ll see:
-
IR Graph: The UOp tree shows operations like
BUFFER,LOAD,ADD,STORE. This is Morok’s intermediate representation before optimization. -
Generated Code: The actual LLVM IR or GPU code that runs. Notice how Morok fuses the loads and add into a single kernel—no intermediate buffers needed.
Debugging tip: If something seems slow or wrong, print the IR tree. Look for:
- Unexpected operations (redundant reshapes, extra copies)
- Missing fusion (separate kernels where one would do)
- Shape mismatches (often the root cause of errors)
Summary
You’ve learned the core patterns for using Morok:
| Task | Code |
|---|---|
| Create tensor | Tensor::from_slice(&[1.0f32, 2.0]) |
| Arithmetic | &a + &b, &a * &b, -&a |
| Reshape | t.try_reshape(&[2, 3])? |
| Transpose | t.try_transpose(0, 1)? |
| Matrix multiply | a.dot(&b)? |
| Activation | t.relu()?, t.softmax(-1)? |
| Execute | t.realize()? |
| Extract data | result.to_ndarray::<f32>()? |
The lazy evaluation pattern:
- Build your computation graph with operations
- Call
realize()once at the end - Morok optimizes and executes everything together
Next steps:
- Op Bestiary — Reference for IR operations
- Execution Pipeline — How compilation works
- Optimization System — Pattern-based rewrites