# The Cross-entropy error function. Theory and Code

We continue our discussion of the topic “Fundamentals of Profound Learning” and discuss the cross-entropy error function. Do not worry about such a name, in fact, the essence is very simple. Let me remind you that we considered the quadratic error function in our last articles on logistic regression: Concerning regression, it is a less universal function because it assumes that our error has the Gaussian distribution.

## The cross-entropy error function. Theory

Of course, the error cannot have the Gaussian distribution with logistic regression, since the result is between zero and one, and the target variable takes values only 0 and 1. That’s why we need another error function.

What do we need from the error function?

First of all, it should equal zero if there are no errors, and have a larger value if lots of errors appear. So, let’s see how the cross-entropy function does it all.

The cross-entropy error function is defined as follows:

J=-{t*log (y)+ (1-t)*log (1-y)},

where t is a target variable, y is a result of logistic regression.

First of all, we note that only one term of the equation is essential. This is because the target variable t takes values only 0 and 1. If t = 1, the first term is significant, if t = 0, then the second term is significant.

Since y is the output variable of logistic regression that takes a value from zero to one, the logarithm of this quantity takes values from zero to minus infinity. But pay attention to the minus sign in the formula. Thus, having received minus infinity, we will get plus infinity as a result. Let us now consider several examples. Let us have t = 1, y = 1. As a result, we get one multiplied by zero, that is zero. Thus, the error is zero. Great. Note that if t = 0, y = 0, then we have the same result.

Now let t = 1, y = 0.9, so the model is almost correct. We get 0.11 as a result. It is a minimal error, so everything is also wonderful here. If we set t = 1, y = 0.5, so the model is on the verge of correctness, then we get 0.69 as a result of the error function. This is a serious error that indicates that the model doesn`t work properly. Now let t = 0, y = 0,1, so we are very much mistaken. As a result, we get 2,3 as the value of error function that is even more important.

Thus, the cross-entropy error function works as we have expected.

Finally, note that we need to calculate a common error to optimize the model parameters across the data set simultaneously. To do this, we summarize the values of all the individual errors from 1 to N. Thus, we have a common cross-entropy error function: ## The cross-entropy error function. The Code

Now let`s examine the calculation of the cross-entropy error function in the code.

First, we transfer a part of the code from the previous file.

import numpy as np

N = 100

D = 2

X = np.random.randn(N,D)

Now we create two of our classes, so that the first 50 points are concentrated in the zone with the center (-2; -2), and the second 50 ones are in the zone with the center (+2; +2).

X[:50, :] = X[:50, :] – 2*np.ones((50, D))

X[:50, :] = X[:50, :] + 2*np.ones((50, D))

Let’s create an array of target variables. The first 50 ones belong to class 0, and the other 50 ones are in class 1.

T = np.array(*50 + *50)

We can also copy the creation of a column of ones from the previous code, as well as the part of the code devoted to calculating the sigmoid.

ones = np.array([*N]).T

Xb = np.concatenate((ones, X), axis=1)

w = np.random.randn(D + 1)

z = Xb.dot(w)

def sigmoid(z):

return 1/(1 + np.exp(-z))

Y = sigmoid(z)

Now write a function for computing the cross-entropy error function. It depends on the value of the target variable and the output variable of the logistic regression.

def cross_entropy(T, Y):

E = 0

for i in xrange(N)

if T[i] == 1:

E -= np.log(Y[i])

else:

E -= np.log(1 – Y[i])

return E

print cross_entropy (T, Y)

Run the program. Our cross-entropy error is 52.1763501199.

It would be nice to find an already considered private solution for logistic regression. Its use is legitimate since our data have the same variance in both classes. We recommend you to calculate by yourself that the offset is zero, and both weighting factors are 4, to understand where such figures exactly come from.

w = np.array([0, 4, 4])

z = Xb.dot(w)

Y = sigmoid(z)

print cross_entropy(T, Y)

If you run the program, you can make sure that the cross-entropy error becomes significantly less when we use a particular solution of logistic regression.

## Visualization of linear separator

In this section of the article, we graphically depict the solution for the Bayes classifier, which we have found above.

We copy the code from the previous article. The only difference is that we also import the Matplotlib library this time.

import numpy as np

import matplotlib.pyplot as plt

N = 100

D = 2

X = np.random.randn(N,D)

We have already had a code to create data that consists of two Gaussian clouds, one of which has a center point (-2; -2), and the second one has a point (+2; +2).

X[:50, :] = X[:50, :] – 2*np.ones((50, D))

X[:50, :] = X[:50, :] + 2*np.ones((50, D))

T = np.array(*50 + *50)

We also have a target variable that takes a value of zero and one. Also, we have calculated the Bayes classifier and the sigmoid.

ones = np.array([*N]).T

Xb = np.concatenate((ones, X), axis=1)

def sigmoid(z):

return 1/(1 + np.exp(-z))

We have already known that the solution for the Bayes classifier is 0, 4, 4, where 0 is the intersection point with the y-axis. Consequently, the desired straight line has the form y = -x. Set the colors for the scatter plot of our target variables, set the number of points equal to 100, and α = 0.5.

plt.scatter(X[:,0], X[:,1] c=T, s=100, alpha=0.5)

When we draw the points, we depict a straight line. Set a starting point (-6; -6)  for x and draw another 100 points of the line

x_axis = np.linspace(-6, 6, 100)

y_axis = -x_axis

plt.plot(x_axis, y_axis)

plt.show

Run the program. So, we can see two Gaussian data clouds and a dividing line. 