Chapter 2: Tensor

✓ Revised for clarity. This chapter has been rewritten with more detailed explanations.

What is a Tensor, and why do we need one?

A numpy array is just data—numbers sitting in memory. It doesn't know where it came from or what created it.

A Tensor is a wrapper around a numpy array that also remembers:

Think of it like a package vs. a tracked package. A numpy array is the item. A Tensor is the item plus its shipping history—where it's been, what happened to it along the way.

This "history" is called the computation graph, and it's what makes automatic differentiation possible (which we'll cover in Chapter 3).

What is a Computation Graph?
A graph that shows:
  • Numbers (Tensors) as nodes
  • Operations (add, multiply, etc.) as nodes
  • Edges showing how data flows from inputs → operations → outputs
When you do c = a + b, the graph records: "c was created by adding a and b." Later, when computing gradients, we walk backwards through this graph.

Tensor Attributes

Every Tensor stores these attributes:

AttributeWhat it is
dataThe actual numbers (always a numpy array)
dtypeThe type of the elements in the array (see below)
gradGradients computed during backpropagation (starts as None)
requires_gradShould we track this tensor for gradient computation?
_opWhat operation created this tensor (None for tensors you create directly)
_inputsWhat tensors were used to create this one (empty list for tensors you create directly)

What's dtype?

A numpy array is always an ndarray object, but the elements inside can be different numeric types:

np.array([1, 2, 3])                  # int64 (default for integers)
np.array([1.0, 2.0, 3.0])            # float64 (default for floats)
np.array([1, 2, 3], dtype="float32") # float32 (explicit)

Deep learning typically uses float32 because it's precise enough for neural network math, uses half the memory of float64, and GPUs are optimized for it. That's why our Tensor defaults to dtype="float32".

What's _op and _inputs?

The last two (_op and _inputs) are how we build the computation graph. When you do c = a + b, the resulting tensor c will have _op = "add" and _inputs = [a, b]. That's how it "remembers" its history.

Exercise 2.1: The __init__ method

The Problem

Users will create Tensors in different ways:

Tensor([1, 2, 3])                    # from a Python list
Tensor(5)                            # from a single number
Tensor(np.array([1, 2, 3]))          # from a numpy array
Tensor(some_existing_tensor)         # from another Tensor

Your job: make all of these work. Internally, self.data should always end up as a numpy array—no matter what gets passed in.

Why does this matter?

The rest of your library assumes self.data is a numpy array. If it's sometimes a list, sometimes a Tensor, sometimes an array—everything breaks. So __init__ normalizes the input into a consistent format.

Thinking through it

You need to handle three cases:

What's passed inWhat to doWhy
A TensorExtract its .dataA Tensor contains a numpy array inside—unwrap it
A numpy arrayUse it as-isIt's already what we want
Anything elseConvert with np.array()See below

The "anything else" case uses np.array(), which is quite flexible:

np.array([1, 2, 3])        # list → array([1, 2, 3])
np.array(5)                # scalar → array(5)
np.array((1, 2, 3))        # tuple → array([1, 2, 3])
np.array([[1, 2], [3, 4]]) # nested list → 2D array

So instead of checking for every possible type (list? tuple? int? float?), we just say "if it's not a Tensor or ndarray, hand it to numpy and let numpy figure it out." If someone passes something truly invalid (like a string), np.array() will either convert it to a string array or error—which is fine, because passing a string to a Tensor doesn't make sense anyway.

Starter Code

import numpy as np

class Tensor:
    def __init__(self, input, *, device=None, dtype="float32", requires_grad=True):
        """
        Create a new tensor.

        Args:
            input: Array-like input (list, numpy array, or another Tensor)
            device: Device placement (currently ignored, CPU only)
            dtype: Data type for the array
            requires_grad: Whether to track gradients for this tensor
        """
        # Step 1: Normalize the input to a numpy array

        if isinstance(input, Tensor):
            # Case 1: Unwrap the Tensor
            input = _____

        elif isinstance(input, np.ndarray):
            # Case 2: Already a numpy array
            _____

        else:
            # Case 3: Something else (list, scalar, etc.)
            input = _____

        # Store with the correct dtype
        self.data = input.astype(dtype)

        # Initialize remaining attributes
        self.requires_grad = _____
        self.grad = _____
        self._device = _____
        self._op = _____
        self._inputs = _____
Understanding isinstance()
isinstance(x, SomeType) returns True if x is of type SomeType, otherwise False. For example:
isinstance([1, 2, 3], list)      # True
isinstance([1, 2, 3], np.ndarray) # False
isinstance(np.array([1,2,3]), np.ndarray)  # True

2.2 Data operations

Now that we have the basic Tensor class, let's extend it by adding a few simple methods that are helpful for working with tensors. For now, we'll focus on:

These are simple properties of the Tensor class.

class Tensor:
    @property
    def shape(self):
        """Shape of the tensor."""
        return self.data.shape

    @property
    def dtype(self):
        """Data type of the tensor."""
        return self.data.dtype

    @property
    def ndim(self):
        """Number of dimensions."""
        return self.data.ndim

    @property
    def size(self):
        """Total number of elements."""
        return self.data.size

    @property
    def device(self):
        """Device where tensor lives."""
        return self._device

Now that we have basic properties let's focus on getting the actual data stored in a Tensor.

We'll start with a simple method: numpy().

class Tensor:
    # existing code...

    def numpy(self):
        """
        Return the data as a NumPy array (detached from the computation graph).
        This returns a copy, so modifying the result will not affect
        the tensor's data.

        Examples:
            >>> x = Tensor([1, 2, 3])
            >>> y = x + 1   # y is still a Tensor, part of the graph
            >>> z = x.numpy() + 1  # z is a NumPy array, not part of the graph

        Returns:
            np.ndarray: A copy of the tensor's data as a NumPy array.
        """
        return self.data.copy()

Sometimes we want a clone of a Tensor that has the same data but is not connected to the computation graph.

This is useful when we want to inspect or manipulate the data without affecting the graph. For example, during training you might want to log the current loss value for monitoring, but you don't need gradients for that—you just want the number. Or you might want to "freeze" part of a model so gradients don't flow through it.

The detach() method creates a new Tensor with the same underlying data as the original Tensor.

class Tensor:
    def detach(self):
        """
        Creates a new Tensor with same data but no gradient tracking.
        Useful when you want to use values without building
        computation graph.

        Returns:
            Tensor: New tensor with requires_grad=False

        Example:
            >>> x = Tensor([1, 2, 3], requires_grad=True)
            >>> y = x.detach()  # y doesn't track gradients
            >>> z = y * 2       # This operation won't be in graph
        """
        return Tensor(self.data, requires_grad=False)
Note: detach() creates a Tensor with requires_grad=False. That means it won't participate in the Computation graph.

Vocabulary

TermWhat it means
TensorA wrapper around a numpy array that also tracks its history (where it came from, what operations created it).
GradientA number that tells you how much an output changes when you nudge an input. Used to adjust model weights during training.
Computation graph (also called "autograd graph")The record of operations that created a tensor. Like a receipt showing the chain of math that happened.
Autograd"Automatic gradient" — the process of walking the computation graph backwards to compute gradients for you.
Forward passRunning your inputs through the model to get an output. This builds the computation graph.
Backward passWalking the computation graph in reverse to compute gradients. This is what backward() does.
requires_gradA flag that says "track this tensor in the computation graph so we can compute its gradient later."
detach()Create a copy of a tensor that is NOT tracked in the computation graph.

What's Next?

Right now, our Tensor is just a fancy wrapper around a numpy array. The magic happens in Chapter 3, where we'll implement backward() and make it so that when you do math with Tensors, they automatically track their history for computing gradients.

def backward(self):
    # Coming in Chapter 3!
    pass

Original: zekcrates/chapter1 (revised for clarity)