Neural networks have a long history in the field of creating artificial intelligence. They date back to the middle of the 20th century.

The main building block of the brain is a neuron. Therefore, the term “neural network” means a network of neurons. Look at the picture:

Neurons transmit information using electricity and chemical reaction. We can measure the differences in voltage of cell membranes. Neurons are called excitable because we can get massive power surges which are known as an action potential. It moves along the neuron and is transmitted to other neurons. So that, the neurons communicate in such a way.

Each neuron has the same basic structure. Each one has dendrites, which are inputs of many other neurons. The body of the cell and the axon are the output that moves to the other neurons through the dendrites.

There are extremely complicated models of neurons associated with the systems of nonlinear differential equations. The first one was the Hodgkin-Huxley model:

As you can see, this is a model of an electrical circuit, and in fact, there are some models of neurons treating them as a kind of wires. So you can use your skills of an electrical technician to find the neuron parameters.

There is also a simpler model, called the Fitzhugh-Nagumo model:

As you can see, it is still based on a system of nonlinear differential equations, but we have no desire to deal with them.

But here’s something we can take from what we have already known about neurons:

- As you could already see, the first analogy with a linear classifier is that a neuron receives many input signals and processes them forming an output signal.
- Secondly, the neuron has a threshold of excitability, and if the voltage reaches this threshold, a splash occurs. Otherwise the splash does not occur. This is the same as the binary classifier, with “yes” / “no” or 0/1 signal at the output.
- Analogy number three. Let’s take a microscope and look at the connection of the neurons through it. One neuron transmits a signal to the other through dendrites. The point of their connection is called a synapse, and the synapse can be exciting or inhibiting. In other words, it can amplify or weaken the signal, which, in its turn, will lead to a decrease in the overall incoming signal, so that it is unlikely that the action potential occurs (the same as a negative weighting factor in logistic regression)

Now it is easy to understand why scientists were so excited about this model. There is a question: is it possible to create an intelligent, thinking brain inside a computer having combined a group of neurons? Is it possible to create consciousness.

**How to calculate the output of a neural logistical classifier. Theory**

Contents

Now we graphically depict the logistic regression and show the calculations with a direct link or, in other words, how to get the forecast using the logistic regression. As an example, we consider the case of two-dimensional input data.

We would like to remind you that we have written an article on the logistic regression also covering issues of logistic regression classification.

So, we have two input variables x_{1} and x_{2}, and we have y at the output (we will talk about the meaning of the circles a bit later). We know that we must multiply each input variable by a weighting factor so that each of the lines can be considered as w_{1}x_{1} and w_{2}x_{2}. Thus, the circles mean multiplication. Another function can be considered as the sum of their incoming variables: w_{1}x_{1} + w_{2}x_{2}. So you can think of logistic regression as of a circle that precedes y. It is also called a logistic function or sigmoid.

Draw the sigmoid for you to know how it looks like. In general, this is an S-shaped curve that has a limit for x, striving for infinity, and for x striving for minus infinity.

There are some sigmoid functions, but two of them are commonly used. This is the hyperbolic tangent of th (x), which ranges from minus one to one and intersecting the y-axis at zero. Another function that we will use is the logistic function (sigmoid):

This function ranges from zero to one and intersects the y axis at x = 0.5. Considering all this, we can say that the result of logistic regression is the value σ (w^{T}x).

What does it mean?

If the scalar product w and x is positive and very large, then, as a result, we get a value very close to one.

If the product is very large but with a minus sign, we get a value very close to zero. If the value of the function is 0.5, it means that the product is zero, which in its turn means that we are right at the boundary between two classes. In other words, the probability of belonging to a class is 50%.

What is the difference between logistic regression and the general linear classifier discussed earlier? In the end, we get a logistic function that gives us a number between zero and one, and when we do a classification, we can say that everything that is larger than 0.5 belongs to class 1, and the rest belongs to class 0. In fact, this is the same since the value of the sigmoid at zero is 0.5.

**How to calculate the output of a neural logistical classifier. The Code**

Now we will write the code to implement logistic regression.

So, first of all, we need the data for regression work. We will create a matrix of dimension 100×2 with data which are subject to normal distribution.

import numpy as np

N = 100

D = 2

X = np.random.randn(N,D)

As we know, we need to add a bias term. We could create a two-dimensional vector of weighting coefficients with a scalar product between our x and weighting coefficients and then add the bias separately. But we usually add a column of ones to the original data and include the bias term in the weighting coefficients w, and that is exactly what we will do.

ones = np.array([[1]*N]).T

The matter is that the array implies only a single dimension, and we need a two-dimensionality to have rows in the columns as well. Therefore, we directly associate this array with our original data set and call it Xb.

Xb = np.concatenate((ones, X), axis=1)

Next, we set a vector of weighting coefficients randomly. Now it must have the dimension D + 1.

w = np.random.randn(D + 1)

Please, note that in this case their magnitude does not matter since we have not got any tags yet, we just need to calculate the sigmoid.

So, the first thing to do is to calculate the scalar product between each line x and w. You probably think that we will do this with the help of the for loop operator, but in fact, it is much more efficient to use the matrix multiplication function built in Python.

If you are used to using MATLAB, then you think that the asterisk means the multiplication of matrices, but in Python, it means only multiplying an element by an element. If you need to multiply matrices in Python, then the dot function is used. So that is what we will do.

Yes, and here’s another interesting thing. We calculate w^{T}x as a scalar product, but since w and x are usually represented by a one-dimensional column vector, we have the dimension NxD, but each x will be represented by a row vector of dimension 1xD so that we will have everything a bit backwards.

z = Xb.dot(w)

This will give us a vector of dimension Nx1, which you can check by printing it.

Now we need to calculate the sigmoid, so write the corresponding function for the calculation. What is remarkable about the Numpy library is that it works with vectors as well as with scalars, so we just need to run the function.

def sigmoid(z):

return 1/(1 + np.exp(-z))

print sigmoid(z)

Thus, as you can see, we have the dimension Nx1at the output, and the limit values are between zero and one as we have expected.

In the next article, we will create a project for an online store for which we are collecting data about customers, and we will create a sales forecast for the online store based on the data obtained.