#002 Binary classification
Binary Classification
Binary classification is the task of classifying elements of a given set into two groups. Logistic regression is an algorithm for binary classification.
Example of a binary classification problem:
- We have an input image \(x\) and the output \(y\) is a label to recognize the image.
- 1 means cat is on an image, 0 means that a non-cat object is in an image.
In binary classification, our goal is to learn a classifier that can input an image represented by its feature vector \(x\) and predict whether the corresponding label is 1 or 0. That is, whether this is a cat image or a non-cat image.

Logistic regression
Logistic regression is a supervised learning algorithm that we can use when labels are either 0 or 1 and this is the so-called Binary Classification Problem. An input feature vector \(X \) may correspond to an image that we want to recognize as either a cat picture (1) or a non-cat picture (0). That is, we want an algorithm to output the prediction which is an estimate of \(y\):
$$ \hat{y}= P\left ( y=1|x \right) \\x\in \mathbb{R}^{n_x}$$ |
$$ \mathrm{Parameters}: w \in R^{n_x}, b \in \mathbb{R}$$ |
More formally, we want \(\hat{y}\) to be the chance that \(y\) is equal to 1, given the input features \(x\). In other words, if \(x\) is a picture, we want \(y\) to tell us what is the chance that this is a cat picture.
The \(x \) is an \(n_{x}\) – dimensional vector. The parameters of logistic regression are \(w\), which is also an \(n_{x}\) – dimensional vector together with \(b\) which is a real number.
Given an input \(x\) and the parameters \(w\) and \(b\), how do we generate the output \(\hat{y}\)? One thing we could try, that doesn’t work, would be to have: \(\hat{y}=w^{T}x+b \) which is a linear function of the input and in fact, this is what we use if we were doing Linear Regression.
However, this is not a very good algorithm for binary classification, because we want \(\hat{y}\) to be the chance that \(y\) is equal to 1, so \(\hat{y}\) should be between 0 and 1.
It is difficult to enforce this because \(w^{T}x+b \) can be much bigger than 1 or can even be negative which doesn’t make sense for a probability that we want to be in a range between 0 and 1. We can conclude that we need a function which will transform \(\hat{y}=w^{T}x+b \) to be in a range between 0 and 1.
Let’s see one function that can help us do that. In logistic regression, the output is going to be the Sigmoid Function. We can see that it goes

The sigmoid function can be easily implemented in Python.
We are going to use \(z \) to denote the following quantity \( w^{T}x+b .\)
Next, we have: \(\hat{y}={\color{DarkGreen} \sigma (}w^{T}x+b{\color{DarkGreen} )} .\)
If \(z \) is a large number: \(\sigma (z) = \frac{1}{1+0} \approx 1 \)
In case that \(z \) is a large negative number: \(\sigma (z) = \frac{1}{1+\infty }\approx 0 \)
When we implement logistic regression, our job is to try to learn parameters \(w \) and \(b \), so that \(\hat{y} \) becomes a good estimate of the chance of \(y\) being equal to 1.
In the next post we will learn about Image Representation in Computers.
More resources on the topic:
For more resources about deep learning, check these other sites.