Category: Other

#003B Gradient Descent

Gradient Descent A term “Deep learning” refers to training neural networks and sometimes very large neural networks. They are massively used for problems of classification and regression. The main goal is to optimize parameters  and  ( both in logistic regression and in neural networks ). Gradient Descent achieved amazing results in solving these optimization tasks. Not so new, developed in the 1940s, 1950s and 1960s. Hence, Gradient Descent is an algorithm that tries to minimize the error (cost)…
Read more

#002B Image representation in a computer

Image representation in a computer The computer stores 3 separate matrices corresponding to the red, green and blue (RGB) color channels of the image. If the input image is 64 by 64 pixels, then we would have three 64 by 64 matrices corresponding to the red, green and blue pixel intensity values for our image. For a 64 by 64 image – the total dimension of this vector will be =64*64*3=12288. Notation that we will follow…
Read more

#001B Deep Learning, wait but why now?

Deep Learning, wait but why now? If the basic technical idea behind deep learning neural networks has been around for decades, why are they only now taking off? To answer this question, we plot a figure where on the x-axis we plot the amount of labelled data we have for a task, and on the y-axis, we plot the performance of our learning algorithm (accuracy). For example, we want to measure the accuracy of our…
Read more

#010 C Random initialization of parameters in a Neural Network

Why do we need a random initialization? If we have for example this shallow Neural Network: Parameters for this shallow neural network are , \(\textbf{W}^{[2]} \), \(b^{[1]} \) and \(b^{[2]} \). If we initialize matrices  and \(\textbf{W}^{[2]}\) to zeros then unit1 and unit2 will give the same output, so \(a_1^{[1]}\) and \(a_2^{[1]}\) would be equal. In other words unit1 and unit2 are symmetric, and it can be shown by induction that these two units are computing…
Read more

#005B Logistic Regression: Scratch vs. Scikit-Learn

Logistic Regression: from scratch vs. Scikit-Learn Let’s now compare Logistic Regression from scratch and Logistic Regression from scikit – learn. Our dataset are class 0 and class 1, which we generated randomly. The training set has 2000 examples coming from the first and second class. The test set has 1000 examples, 500 from each class.  When we plot these datasets it looks like this: Python’s library scikit-learn has  function LogisticRegression and we will implement it…
Read more