Chapter 2: Tensor
What is a Tensor, and why do we need one?
A numpy array is just data—numbers sitting in memory. It doesn't know where it came from or what created it.
A Tensor is a wrapper around a numpy array that also remembers:
- Where did this data come from?
- What math operation created it?
- Should we track gradients for backpropagation?
Think of it like a package vs. a tracked package. A numpy array is the item. A Tensor is the item plus its shipping history—where it's been, what happened to it along the way.
This "history" is called the computation graph, and it's what makes automatic differentiation possible (which we'll cover in Chapter 3).
A graph that shows:
- Numbers (Tensors) as nodes
- Operations (add, multiply, etc.) as nodes
- Edges showing how data flows from inputs → operations → outputs
c = a + b, the graph records: "c was created by adding a and b." Later, when computing gradients, we walk backwards through this graph.
Tensor Attributes
Every Tensor stores these attributes:
| Attribute | What it is |
|---|---|
data | The actual numbers (always a numpy array) |
dtype | The type of the elements in the array (see below) |
grad | Gradients computed during backpropagation (starts as None) |
requires_grad | Should we track this tensor for gradient computation? |
_op | What operation created this tensor (None for tensors you create directly) |
_inputs | What tensors were used to create this one (empty list for tensors you create directly) |
What's dtype?
A numpy array is always an ndarray object, but the elements inside can be different numeric types:
np.array([1, 2, 3]) # int64 (default for integers)
np.array([1.0, 2.0, 3.0]) # float64 (default for floats)
np.array([1, 2, 3], dtype="float32") # float32 (explicit)
Deep learning typically uses float32 because it's precise enough for neural network math, uses half the memory of float64, and GPUs are optimized for it. That's why our Tensor defaults to dtype="float32".
What's _op and _inputs?
The last two (_op and _inputs) are how we build the computation graph. When you do c = a + b, the resulting tensor c will have _op = "add" and _inputs = [a, b]. That's how it "remembers" its history.
Exercise 2.1: The __init__ method
The Problem
Users will create Tensors in different ways:
Tensor([1, 2, 3]) # from a Python list
Tensor(5) # from a single number
Tensor(np.array([1, 2, 3])) # from a numpy array
Tensor(some_existing_tensor) # from another Tensor
Your job: make all of these work. Internally, self.data should always end up as a numpy array—no matter what gets passed in.
Why does this matter?
The rest of your library assumes self.data is a numpy array. If it's sometimes a list, sometimes a Tensor, sometimes an array—everything breaks. So __init__ normalizes the input into a consistent format.
Thinking through it
You need to handle three cases:
| What's passed in | What to do | Why |
|---|---|---|
| A Tensor | Extract its .data | A Tensor contains a numpy array inside—unwrap it |
| A numpy array | Use it as-is | It's already what we want |
| Anything else | Convert with np.array() | See below |
The "anything else" case uses np.array(), which is quite flexible:
np.array([1, 2, 3]) # list → array([1, 2, 3])
np.array(5) # scalar → array(5)
np.array((1, 2, 3)) # tuple → array([1, 2, 3])
np.array([[1, 2], [3, 4]]) # nested list → 2D array
So instead of checking for every possible type (list? tuple? int? float?), we just say "if it's not a Tensor or ndarray, hand it to numpy and let numpy figure it out." If someone passes something truly invalid (like a string), np.array() will either convert it to a string array or error—which is fine, because passing a string to a Tensor doesn't make sense anyway.
Starter Code
import numpy as np
class Tensor:
def __init__(self, input, *, device=None, dtype="float32", requires_grad=True):
"""
Create a new tensor.
Args:
input: Array-like input (list, numpy array, or another Tensor)
device: Device placement (currently ignored, CPU only)
dtype: Data type for the array
requires_grad: Whether to track gradients for this tensor
"""
# Step 1: Normalize the input to a numpy array
if isinstance(input, Tensor):
# Case 1: Unwrap the Tensor
input = _____
elif isinstance(input, np.ndarray):
# Case 2: Already a numpy array
_____
else:
# Case 3: Something else (list, scalar, etc.)
input = _____
# Store with the correct dtype
self.data = input.astype(dtype)
# Initialize remaining attributes
self.requires_grad = _____
self.grad = _____
self._device = _____
self._op = _____
self._inputs = _____
isinstance()isinstance(x, SomeType) returns True if x is of type SomeType, otherwise False. For example:
isinstance([1, 2, 3], list) # True
isinstance([1, 2, 3], np.ndarray) # False
isinstance(np.array([1,2,3]), np.ndarray) # True
2.2 Data operations
Now that we have the basic Tensor class, let's extend it by adding a few simple methods that are helpful for working with tensors. For now, we'll focus on:
- Shape
- Dtype
- Device
- ndim
- Size
These are simple properties of the Tensor class.
class Tensor:
@property
def shape(self):
"""Shape of the tensor."""
return self.data.shape
@property
def dtype(self):
"""Data type of the tensor."""
return self.data.dtype
@property
def ndim(self):
"""Number of dimensions."""
return self.data.ndim
@property
def size(self):
"""Total number of elements."""
return self.data.size
@property
def device(self):
"""Device where tensor lives."""
return self._device
Now that we have basic properties let's focus on getting the actual data stored in a Tensor.
We'll start with a simple method: numpy().
- The purpose of
numpy()is to extract the raw NumPy array from a Tensor. - This is useful when you want to inspect, visualize, or use the data with other Python libraries without screwing anything in the graph.
class Tensor:
# existing code...
def numpy(self):
"""
Return the data as a NumPy array (detached from the computation graph).
This returns a copy, so modifying the result will not affect
the tensor's data.
Examples:
>>> x = Tensor([1, 2, 3])
>>> y = x + 1 # y is still a Tensor, part of the graph
>>> z = x.numpy() + 1 # z is a NumPy array, not part of the graph
Returns:
np.ndarray: A copy of the tensor's data as a NumPy array.
"""
return self.data.copy()
Sometimes we want a clone of a Tensor that has the same data but is not connected to the computation graph.
This is useful when we want to inspect or manipulate the data without affecting the graph. For example, during training you might want to log the current loss value for monitoring, but you don't need gradients for that—you just want the number. Or you might want to "freeze" part of a model so gradients don't flow through it.
The detach() method creates a new Tensor with the same underlying data as the original Tensor.
class Tensor:
def detach(self):
"""
Creates a new Tensor with same data but no gradient tracking.
Useful when you want to use values without building
computation graph.
Returns:
Tensor: New tensor with requires_grad=False
Example:
>>> x = Tensor([1, 2, 3], requires_grad=True)
>>> y = x.detach() # y doesn't track gradients
>>> z = y * 2 # This operation won't be in graph
"""
return Tensor(self.data, requires_grad=False)
detach() creates a Tensor with requires_grad=False. That means it won't participate in the Computation graph.
Vocabulary
| Term | What it means |
|---|---|
| Tensor | A wrapper around a numpy array that also tracks its history (where it came from, what operations created it). |
| Gradient | A number that tells you how much an output changes when you nudge an input. Used to adjust model weights during training. |
| Computation graph (also called "autograd graph") | The record of operations that created a tensor. Like a receipt showing the chain of math that happened. |
| Autograd | "Automatic gradient" — the process of walking the computation graph backwards to compute gradients for you. |
| Forward pass | Running your inputs through the model to get an output. This builds the computation graph. |
| Backward pass | Walking the computation graph in reverse to compute gradients. This is what backward() does. |
| requires_grad | A flag that says "track this tensor in the computation graph so we can compute its gradient later." |
| detach() | Create a copy of a tensor that is NOT tracked in the computation graph. |
What's Next?
Right now, our Tensor is just a fancy wrapper around a numpy array. The magic happens in Chapter 3, where we'll implement backward() and make it so that when you do math with Tensors, they automatically track their history for computing gradients.
def backward(self):
# Coming in Chapter 3!
pass
Original: zekcrates/chapter1 (revised for clarity)