Chapter 2: Tensor

✓ Revised for clarity. This chapter has been rewritten with more detailed explanations.

What is a Tensor, and why do we need one?

A numpy array is just data—numbers sitting in memory. It doesn't know where it came from or what created it.

A Tensor is a wrapper around a numpy array that also remembers:

Where did this data come from?
What math operation created it?
Should we track gradients for backpropagation?

Think of it like a package vs. a tracked package. A numpy array is the item. A Tensor is the item plus its shipping history—where it's been, what happened to it along the way.

This "history" is called the computation graph, and it's what makes automatic differentiation possible (which we'll cover in Chapter 3).

What is a Computation Graph?
A graph that shows:

Numbers (Tensors) as nodes
Operations (add, multiply, etc.) as nodes
Edges showing how data flows from inputs → operations → outputs

When you do c = a + b, the graph records: "c was created by adding a and b." Later, when computing gradients, we walk backwards through this graph.

Tensor Attributes

Every Tensor stores these attributes:

Attribute	What it is
`data`	The actual numbers (always a numpy array)
`dtype`	The type of the elements in the array (see below)
`grad`	Gradients computed during backpropagation (starts as None)
`requires_grad`	Should we track this tensor for gradient computation?
`_op`	What operation created this tensor (None for tensors you create directly)
`_inputs`	What tensors were used to create this one (empty list for tensors you create directly)

What's dtype?

A numpy array is always an ndarray object, but the elements inside can be different numeric types:

np.array([1, 2, 3])                  # int64 (default for integers)
np.array([1.0, 2.0, 3.0])            # float64 (default for floats)
np.array([1, 2, 3], dtype="float32") # float32 (explicit)

Deep learning typically uses float32 because it's precise enough for neural network math, uses half the memory of float64, and GPUs are optimized for it. That's why our Tensor defaults to dtype="float32".

What's _op and _inputs?

The last two (_op and _inputs) are how we build the computation graph. When you do c = a + b, the resulting tensor c will have _op = "add" and _inputs = [a, b]. That's how it "remembers" its history.

Exercise 2.1: The `init` method

The Problem

Users will create Tensors in different ways:

Tensor([1, 2, 3])                    # from a Python list
Tensor(5)                            # from a single number
Tensor(np.array([1, 2, 3]))          # from a numpy array
Tensor(some_existing_tensor)         # from another Tensor

Your job: make all of these work. Internally, self.data should always end up as a numpy array—no matter what gets passed in.

Why does this matter?

The rest of your library assumes self.data is a numpy array. If it's sometimes a list, sometimes a Tensor, sometimes an array—everything breaks. So __init__ normalizes the input into a consistent format.

Thinking through it

You need to handle three cases:

What's passed in	What to do	Why
A Tensor	Extract its `.data`	A Tensor contains a numpy array inside—unwrap it
A numpy array	Use it as-is	It's already what we want
Anything else	Convert with `np.array()`	See below

The "anything else" case uses np.array(), which is quite flexible:

np.array([1, 2, 3])        # list → array([1, 2, 3])
np.array(5)                # scalar → array(5)
np.array((1, 2, 3))        # tuple → array([1, 2, 3])
np.array([[1, 2], [3, 4]]) # nested list → 2D array

So instead of checking for every possible type (list? tuple? int? float?), we just say "if it's not a Tensor or ndarray, hand it to numpy and let numpy figure it out." If someone passes something truly invalid (like a string), np.array() will either convert it to a string array or error—which is fine, because passing a string to a Tensor doesn't make sense anyway.

Starter Code

import numpy as np

class Tensor:
    def __init__(self, input, *, device=None, dtype="float32", requires_grad=True):
        """
        Create a new tensor.

        Args:
            input: Array-like input (list, numpy array, or another Tensor)
            device: Device placement (currently ignored, CPU only)
            dtype: Data type for the array
            requires_grad: Whether to track gradients for this tensor
        """
        # Step 1: Normalize the input to a numpy array

        if isinstance(input, Tensor):
            # Case 1: Unwrap the Tensor
            input = _____

        elif isinstance(input, np.ndarray):
            # Case 2: Already a numpy array
            _____

        else:
            # Case 3: Something else (list, scalar, etc.)
            input = _____

        # Store with the correct dtype
        self.data = input.astype(dtype)

        # Initialize remaining attributes
        self.requires_grad = _____
        self.grad = _____
        self._device = _____
        self._op = _____
        self._inputs = _____

Understanding isinstance()
isinstance(x, SomeType) returns True if x is of type SomeType, otherwise False. For example:

isinstance([1, 2, 3], list)      # True
isinstance([1, 2, 3], np.ndarray) # False
isinstance(np.array([1,2,3]), np.ndarray)  # True

2.2 Data operations

Now that we have the basic Tensor class, let's extend it by adding a few simple methods that are helpful for working with tensors. For now, we'll focus on:

Shape
Dtype
Device
ndim
Size

These are simple properties of the Tensor class.

class Tensor:
    @property
    def shape(self):
        """Shape of the tensor."""
        return self.data.shape

    @property
    def dtype(self):
        """Data type of the tensor."""
        return self.data.dtype

    @property
    def ndim(self):
        """Number of dimensions."""
        return self.data.ndim

    @property
    def size(self):
        """Total number of elements."""
        return self.data.size

    @property
    def device(self):
        """Device where tensor lives."""
        return self._device

Now that we have basic properties let's focus on getting the actual data stored in a Tensor.

We'll start with a simple method: numpy().

The purpose of numpy() is to extract the raw NumPy array from a Tensor.
This is useful when you want to inspect, visualize, or use the data with other Python libraries without screwing anything in the graph.

class Tensor:
    # existing code...

    def numpy(self):
        """
        Return the data as a NumPy array (detached from the computation graph).
        This returns a copy, so modifying the result will not affect
        the tensor's data.

        Examples:
            >>> x = Tensor([1, 2, 3])
            >>> y = x + 1   # y is still a Tensor, part of the graph
            >>> z = x.numpy() + 1  # z is a NumPy array, not part of the graph

        Returns:
            np.ndarray: A copy of the tensor's data as a NumPy array.
        """
        return self.data.copy()

Sometimes we want a clone of a Tensor that has the same data but is not connected to the computation graph.

This is useful when we want to inspect or manipulate the data without affecting the graph. For example, during training you might want to log the current loss value for monitoring, but you don't need gradients for that—you just want the number. Or you might want to "freeze" part of a model so gradients don't flow through it.

The detach() method creates a new Tensor with the same underlying data as the original Tensor.

class Tensor:
    def detach(self):
        """
        Creates a new Tensor with same data but no gradient tracking.
        Useful when you want to use values without building
        computation graph.

        Returns:
            Tensor: New tensor with requires_grad=False

        Example:
            >>> x = Tensor([1, 2, 3], requires_grad=True)
            >>> y = x.detach()  # y doesn't track gradients
            >>> z = y * 2       # This operation won't be in graph
        """
        return Tensor(self.data, requires_grad=False)

Note: detach() creates a Tensor with requires_grad=False. That means it won't participate in the Computation graph.

Vocabulary

Term	What it means
Tensor	A wrapper around a numpy array that also tracks its history (where it came from, what operations created it).
Gradient	A number that tells you how much an output changes when you nudge an input. Used to adjust model weights during training.
Computation graph (also called "autograd graph")	The record of operations that created a tensor. Like a receipt showing the chain of math that happened.
Autograd	"Automatic gradient" — the process of walking the computation graph backwards to compute gradients for you.
Forward pass	Running your inputs through the model to get an output. This builds the computation graph.
Backward pass	Walking the computation graph in reverse to compute gradients. This is what `backward()` does.
requires_grad	A flag that says "track this tensor in the computation graph so we can compute its gradient later."
detach()	Create a copy of a tensor that is NOT tracked in the computation graph.

What's Next?

Right now, our Tensor is just a fancy wrapper around a numpy array. The magic happens in Chapter 3, where we'll implement backward() and make it so that when you do math with Tensors, they automatically track their history for computing gradients.

def backward(self):
    # Coming in Chapter 3!
    pass

← 1. Introduction 3. Automatic Differentiation →

Original: zekcrates/chapter1 (revised for clarity)