We would like to remind you that in our earlier articles we considered logistic regression for the binary classification when we collected some data and tried to predict the appearance of one of two possible tags. For example, we could take the time spent on the site or the number of viewed pages as input variables. Also, we tried to predict whether the user would buy something or not. As you remember, if the problem has only two input variables, we can graphically depict them on a two-dimensional plane, and then draw a line between two classes denoting “buy” and “do not buy.”

If we can find such a line between classes, so, in this case, they are called linearly separable. Let us remind that if we face with a linearly shared problem, then the logistic regression will be the most suitable solution since it is a linear classifier.

And the last thing to be mentioned here is that we are, of course, not limited by just two input variables. In particular, as you have seen, the data for the online store project has more than two input variables. But if we have only two input variables, we can easily display the solution graphically. In the case of three input variables, the solution can be displayed graphically, but it will be much more difficult to analyze it, and in the case of more dimensions, as you understand, a person cannot see the solution visually.

At the same time, many real problems have data sets containing hundreds or even thousands of input variables. So we will have to operate with the data using mathematics. Let us remind that with the linear classification of large dimensions, the classes are no longer divided by a straight line, but by a plane or a hyperplane. The crucial moment here is that they are all straight. In other words, they are not curved.

We will go further with the help of neural networks. We will consider the case when the data are not linearly separable, and therefore logistic regression does not work. Fortunately, neural networks are quite suitable for solving such problems.

Remember that the linear function has the form WTX. Anything that cannot be expressed as WTX is non-linear. As you will see, neural networks are also nonlinear. We know that expressions like x2 or x3 are also not linear, but neural networks are nonlinear in a particular way. As you will see, they can become nonlinear only by combining several logistic regression units taken together.

So, what will we write about in this rubric?

I will show you how to form a nonlinear classifier, which is a neural network and we can create it combining several units of logistic regression, or neurons.

We have discussed the binary classification in the logistic regression rubric. But what if we want to classify more than two things at the same time and if instead of simply predicting “buy – not buy,” we want to predict any number of user actions – purchase, the start of payment, adding to the basket, cleaning the basket and so on? Another example can be the classification of a car brand, based on its image because there are much more than two brands of cars in our world, and therefore it requires a classifier which is capable of classifying more than two things.

We will discuss how to classify more than two things in the article devoted to the functions of sigmoid and softmax. Note that you can use the softmax function in the logistic regression. In contrast, if you only need a binary classification in a neural network, then you can just use a sigmoid.

The rest of this section will be devoted to transferring the previous theoretical concepts into program codes. In particular, we will write the softmax function in the code, and then, using the knowledge we have gained about softmax, we will write the code for building the whole neural network. Finally, we will apply it to solve a real problem for our online store project.

This section focuses on how to make predictions in the neural network. In other words, if we have a vector of input data we will be able to calculate the vector of output data and interpret the numbers obtained at the output. As you will see, in neural networks, output numbers are the probabilities that the input data belong to a particular class.

It should be noted that in this section no outgoing results of our neural network will make sense. Let us remind that logistic regression includes a set of weighting coefficients, and since a neural network consists of many logistic units, it also includes a set of weighting coefficients. In the case of logistic regression, we use gradient descent to determine the values of these coefficients and call this process “learning.” Remember that we deal with two basic operations – forecasting and training. This section is devoted precisely to predicting, while the next one is dedicated to training. Only after we learn how to train a neural network, the results will make sense, but not earlier.

Do not forget that these are typical stages of machine learning. First, we create “brain,” but this brain is not too smart, because it has not trained yet, and therefore the predictions will not be very accurate. Then we train it, and only after that the brain will be able to give accurate predictions, and precisely because we have tried to teach it.