Open this tutorial in a jupyter `notebook`_.

.. _notebook: https://github.com/arthurdjn/nets/blob/master/0_Getting_Started_with_NETS.ipynb


======
Tensor
======


Definition
==========


A tensor is multi-dimensional array, similar to NumPy arrays.
Their particularity is their capability to stock previous gradients and operations.
Their architecture was highly inspired from *PyTorch* documentation.

In *PyTorch*, you create tensors using the tensor built-in method ``tensor`` or use ``Tensor`` class directly (it is not recommended to instantiate a tensor this way however).
*NETS* does not have (yet) built-in functions, so you will need to use the ``Tensor`` class to create a tensor.


Tensors supports basic mathematical operations.
All these operations are defined in the ``nets/ops.py`` module. Here is a list of some operations currently supported:

- ``add`` which add to tensors (broadcasting is supported),
- ``subtract`` which subtract to tensors (broadcasting is supported),
- ``multiply`` which add to tensors with the same shape,
- ``dot`` which compute the dot product ``@`` of two matrices,
- ``exp`` which compute the element-wise exponentiation of a tensor.
- ``sum`` which sum up all elements in a tensor,
- ``transpose`` which transpose a tensor.


.. code-block:: python

    import nets

    t1 = nets.Tensor([[1, 3],
                      [5, 7]])
    t2 = nets.Tensor([[2, 4],
                      [6, 8]])

    print(f"t1 =\n{t1}")
    print(f"t2 =\n{t2}")

    print("\nSome basic operations")
    print(f"t1 + t2: \n{t1 + t2}")
    print(f"t1 - t2: \n{t1 - t2}")
    print(f"t1 * t2: \n{t1 *t2}")
    print(f"t1 @ t2: \n{t1 @ t2}")
    print(f"t1 ** 2: \n{t1 ** 2}")
    print(f"t1 / t2: \n{t1 / t2}")
    print(f"t1 / 10: \n{t1 / 10}")


.. rst-class:: sphx-glr-script-out

     Out:

    .. code-block:: none

        t1 =
        Tensor([[1, 3],
                [5, 7]])
        t2 =
        Tensor([[2, 4],
                [6, 8]])

        Some basic operations
        t1 + t2:
        Tensor([[ 3,  7],
                [11, 15]])
        t1 - t2:
        Tensor([[-1, -1],
                [-1, -1]])
        t1 * t2:
        Tensor([[ 2, 12],
                [30, 56]])
        t1 @ t2:
        Tensor([[20, 28],
                [52, 76]])
        t1 ** 2:
        Tensor([[ 1,  9],
                [25, 49]])
        t1 / t2:
        Tensor([[0.5000, 0.7500],
                [0.8333, 0.8750]])
        t1 / 10:
        Tensor([[0.1, 0.3],
                [0.5, 0.7]])



Gradients
=========

*NETS* uses a custom *autograd* system, made with numpy.
Some vanilla architectures do not depends on this functionality however, like ``DNN`` networks.
All these information will be detailed in the model's section.

As you may have seen, there is a ``requires_grad`` set to ``False`` by default when we create a tensor.
This attribute attributes works similarly as **PyTorch**'s attribute. If set to ``True``,
previous gradients will be registered and saved in this tensor, in the ``_hooks`` attribute.
This attribute is basically a list containing all previous gradients.
That is, when calling the ``backward`` method on this tensor with an upstream gradient,
it will propagate through all previous gradients.

Let's see some basic examples:

.. code-block:: python

    t1 = nets.Tensor([1, 3], requires_grad=True)
    t2 = nets.Tensor([2, 4], requires_grad=True)

    # Some operations
    t3 = t1 + t2 + 4
    t4 = t3 * t2

    print(f"t1 =\n{t1}")
    print(f"t2 =\n{t2}")

    print("\nOperation:")
    print("t3 = t1 + t2 + 4")
    print("t4 = t3 * t2")

    print(f"\nt3 =\n{t3}")
    print(f"t4 =\n{t4}")

    print("\nBefore backpropagation")
    print(f"t1 gradient: {t1.grad}")
    print(f"t2 gradient: {t2.grad}")
    print(f"t3 gradient: {t3.grad}")
    print(f"t4 gradient: {t4.grad}")

    # Upstream gradient
    grad = nets.Tensor([-1, 2])

    # Back-propagation
    t4.backward(grad)

    print("\nAfter backpropagation")
    print(f"t1 gradient: {t1.grad}")
    print(f"t2 gradient: {t2.grad}")
    print(f"t3 gradient: {t3.grad}")
    print(f"t4 gradient: {t4.grad}")




.. rst-class:: sphx-glr-script-out

     Out:

    .. code-block:: none

        t1 =
        Tensor([1, 3], requires_grad=True)
        t2 =
        Tensor([2, 4], requires_grad=True)

        Operation:
        t3 = t1 + t2 + 4
        t4 = t3 * t2

        t3 =
        Tensor([ 7, 11], requires_grad=True)
        t4 =
        Tensor([14, 44], requires_grad=True)

        Before backpropagation
        t1 gradient: Tensor([0., 0.])
        t2 gradient: Tensor([0., 0.])
        t3 gradient: Tensor([0., 0.])
        t4 gradient: Tensor([0., 0.])

        After backpropagation
        t1 gradient: Tensor([-2.,  8.])
        t2 gradient: Tensor([-9., 30.])
        t3 gradient: Tensor([-2.,  8.])
        t4 gradient: Tensor([-1.,  2.])


Autograd
========

The autograd system creates a computational graph dynamically, used to update deep learning models.


The backward pass is called for computing the gradients and create a computational graph from all previous operations.
This backward pass can be called using the ``backward`` method of a ``Module``.
In this case, you will need to write the computational on your own, as the gradients' equations depends on a model.
This *Vanilla* back-propagation is implemented for standard models, like ``DNN``, ``CNN`` and ``RNN``.
However, you won't be able to mix them using this technique unless you change some parts of the ``backward`` method.

Or, you can use the *autograd* system. As ``Tensor`` keeps track of the computational graph in their ``hooks``,
it is easier to compute the back-propagation. The back-propagation is decomposed in elementary operations (+, -, /, \*, exp)
and the gradient given from a ``Loss`` function is then transferred through the computational graph.


Example:

.. code-block:: python

    t1 = nets.Tensor([[1., 3.],
                       [5., 7.]], requires_grad=True)
    t2 = nets.Tensor([[2., 4.],
                       [6., 8.]], requires_grad=True)

    # Some operations
    t3 = nets.tanh(t1 * t2)

    print(f"t1 =\n{t1}")
    print(f"t2 =\n{t2}")

    print("\nBefore backpropagation")
    print(f"t1 gradient:\n{t1.grad}")
    print(f"t2 gradient:\n{t2.grad}")
    print('t3 = tanh(t1 * t2)')
    print(f"t3 gradient:\n{t3.grad}")

    # Upstream gradient
    grad = nets.Tensor([[-1., 2.],
                        [-3., 4.]])

    # Back-propagation
    t3.sum(axis=1).backward(grad[0])

    print("\nAfter backpropagation")
    print(f"t1 gradient:\n{t1.grad}")
    print(f"t2 gradient:\n{t2.grad}")
    print('t3 = tanh(t1 * t2)')
    print(f"t3 gradient:\n{t3.grad}")



.. rst-class:: sphx-glr-script-out

     Out:

    .. code-block:: none

        t1 =
        Tensor([[1., 3.],
                [5., 7.]], requires_grad=True)
        t2 =
        Tensor([[2., 4.],
                [6., 8.]], requires_grad=True)

        Before backpropagation
        t1 gradient:
        Tensor([[0., 0.],
                [0., 0.]])
        t2 gradient:
        Tensor([[0., 0.],
                [0., 0.]])
        t3 = tanh(t1 * t2)
        t3 gradient:
        Tensor([[0., 0.],
                [0., 0.]])

        After backpropagation
        t1 gradient:
        Tensor([[-1.4130e-01, -6.0402e-10],
                [ 0.0000e+00,  0.0000e+00]])
        t2 gradient:
        Tensor([[-7.0651e-02, -4.5302e-10],
                [ 0.0000e+00,  0.0000e+00]])
        t3 = tanh(t1 * t2)
        t3 gradient:
        Tensor([[-1., -1.],
                [ 2.,  2.]])


Example
=======

With the backward pass, you can minimize any basic functions (functions that can be decomposed ito elementary operations).
As the *autograd* system, record the gradients, you can determine the slope at this point and adjust your inputs by a coefficient, called learning rate :math:`l_r`.

This is maybe a lot of information and a lot of text, so let's try to visualize how we can use gradients to minimize a function instead.

The function we will try to minimize is the so called *three hump camel* function:

.. math::

    \forall x_1, x_2 \in \mathbb{R}, \quad f(x_1, x_2) = 2 x_1^2 - 1.05 x_1^4 + \frac{x_1^6}{6} + x_1 x_2 + x_2^2


.. code-block:: python

    import numpy as np
    from mpl_toolkits.mplot3d import Axes3D

    fig = plt.figure()
    ax = fig.gca(projection='3d')

    # Make data
    camel = lambda x,y: (2*x**2 - 1.05*x**4 + ((x**6)/6) + x*y + y**2)
    X = np.linspace(-1, 1, 100)
    Y = np.linspace(-1, 1, 100)
    X, Y = np.meshgrid(X, Y)
    Z = camel(X, Y)

    # Plot the surface
    ax.view_init(30, 105)
    ax.plot_surface(X, Y, Z, cmap='coolwarm', linewidth=0)


.. image:: images/camelfunc.png

Let's minimize this function with *NETS*


.. code-block:: python

    import random as rd
    from nets.optim import SGD

    # Random x points
    data = rd.uniform(-1, 1)
    xn = nets.Parameter(data)
    # Random y points
    data = rd.uniform(-1, 1)
    yn = nets.Parameter(data)

    # Learning rate
    lr = 0.1
    optimizer = SGD([xn, yn], lr=lr, momentum=0)
    # Keep track of the loss
    nets_history = []
    nets_points_history = []
    
    # Run the simulation 50 times
    for i in range(50):
        # The gradients accumulates, we need to clear that at each epochs
        optimizer.zero_grad()
        # outputs, can be seen as "predictions"
        zn = camel(xn, yn)
        # Compute the loss (sum of all values)
        # As the minimum is at (0, 0), the lower the loss is, the closest we are to this minimum
        loss = zn.sum()  # is a 0-tensor
        # Get the gradients
        loss.backward()
        # Update the points
        optimizer.step()
        # Add the loss to the history and display the current loss in the console
        nets_history.append(loss.item())
        nets_points_history.append((xn.item(), yn.item(), zn.item()))
        print(f"\repoch: {i:4d} | loss: {loss.item():1.2E}", end="")



.. rst-class:: sphx-glr-script-out

     Out:

    .. code-block:: none

        epoch:   49 | loss: 3.61E-09


