But what is a Neural Network? | Deep learning, chapter 1
From the video, I have learned:
Neural network inspired by the brain as the name suggests. It is the purest form of learning in which it recognizes the handwritten digits which are done by a computer system that uses a network of functions that understand and translate into another form. It involves learning and some mathematical formulas to get the desired output.
The neural network is used in many real-life problems, including speech recognition, image recognition etc. This video covers the process of image recognition.
Algorithm:
The example shown in the video has:
- A bunch of neurons which holds a matrix of 28*28 pixels as input image which has 784 neurons.
- Each neuron holds a value of grayscale of the corresponding pixel. Range of the value is between 0 – 1. 0 means pixel is black, one means pixel is white.
- There is a three layer system in a structured layer form. Each layer consists of neurons. The first layer has 789, the second and third layer has 16 neurons. The second layer shows the edges and third layer shows the patterns like loops and lines, at last, all piece together and recognize the image.
- It uses a formula which involves activation pixel from each layer, WeightWeight which shows the connection between the layer and bias, which is used to know how high WeightWeight needs to be before activation. By applying formula, we get 13,002 parameters of total weights and bias.
- It is a function which takes input as an activation pixel between 0 -1 and spits out ten numbers as an output using the layers. In between layers are called hidden layers.
Gradient descent, how neural networks learn | Deep learning, chapter 2
In the video, I have learned how the gradient descent is, how it works
Gradient descent:
It is used to minimize the cost of the function as far as possible by finding the values of the function’s parameters. Firstly it starts with defining the initial parameters, and then it uses calculus to adjust the values to minimize the cost of the function.
So for cost-effective gradient function, we use some steps to reduce the cost. Based on the training examples gradient is nudged all of that weights and biased because of the fastest change of the value of the cost function. Cost Function involves an average of all the training data.
Consider, Gradient as a slope of the function, higher the Gradient, steeper the slope is which results learning of the model is fast. The slope should not be zero, because the learning will become zero. Here matters is the direction and how steep the slope is, and then repeat that over and over again.
Find which weights and bias minimize a specific cost function:
- Take the output that the network gives and take the output which is expected from the network.
- Add up all the squares and then find the difference between each of the component.
- Repeat this for all the training set and then find the average of the results.
- Now, results give the total cost of the network, we want the negative Gradient of the cot function, which shows how to make changes in all the weights and bias for all the connections to find the most efficient cost for the network
It is not familiar with the new handwritten images, as the data which is providing to the training set is already trained so well, so every time the new image came which is not trained it will give a false result.
Question 3:
Backpropagation algorithm performs a backward pass while focusing on adjustment of the model’s parameters like weights and bias.
In this video, we have learned how the algorithm works and make little adjustments for getting the layers in top of each other. It has taken an example of a training dataset of image 2. Through this example, he has explained:
- Let us assume that the training dataset is not well trained and has an output as 0.5, 0.2,0,1 instead of 5, 2, 1. Next, it cannot make changes in the activation function directly.
- So in training dataset of 2, we want this set to be nudged up, rest all nudged down.
- Now we want the increase in the activation of neurons of the previous layer so to carry out this, needs to change the value of WeightWeight and bias because they are responsible for the most significant effect in form for the brightness of activation neuron. Strength of connection between layers depends on the WeightWeight of the neuron.
- Once we have all the list of nudges, then we can apply it to all the layers in recursively to relevant WeightWeight and bias which tells about the values, repeating this in the backward direction.
- A correct gradient descent step is involved in all the training examples and then calculating the average of the desired changes that it gets as an output. However, then it is slow. So then it randomly divides the sub-data to mini-batches which will then results in low-cost function, which means it is an excellent example of training data.