**Solution to the problem of Donut by means of logistic regression**

In this topic, we will talk about one practical problem relating to logistic regression. It is called the “donut problem.” You will understand the meaning of this name after we get its graphics display.

In this example, we use much more data to make the visible effect more significant.

import numpy as np

import matplotlib.pyplot as plt

N = 1000

D = 2

The bottom line is that we have two radii – internal and external which are equal to 5 and 10 respectively. The half of the data is evenly distributed over the inner radius, while we immediately convert the polar coordinates to rectangular ones.

R_inner = 5

R_outer = 10

R1 = np.random.randn(N/2) + R_inner

theta = 2*np.pi*np.random.random(N/2)

X_inner = np.concatenate([[R1 * np.cos(theta)], [R1 * np.sin(theta)]]).T

The same we do with the external radius, also with equitably distributed data.

R2 = np.random.randn(N/2) + R_outer

theta = 2*np.pi*np.random.random(N/2)

X_outer = np.concatenate([[R2 * np.cos(theta)], [R2 * np.sin(theta)]]).T

Now we define the general X and, of course, the target variable.

X = np.concatenate([ X_inner, X_outer ])

T = np.array([0]*(N/2) + [1]*(N/2))

And we display this graphically so you can see how it looks.

plt.scatter(X[:,0], X[:,1], *c*=T)

plt.show()

In the diagram, you can see the “donut problem.”

The linear separator of logistic regression here is less appropriate since there is no straight line which is capable of separating these classes. But I will show you that, in fact, we can solve this problem.

As usual, we create a column of ones for the bias term.

ones = np.array([[1]*N]).T

The trick in circumvention of the “donut problem” is that we create one more column characterizing the radius of the point. Thanks to this you can see how to make our data linearly separated.

r = np.zeros((N,1))

for i in *xrange*(N):

r[i] = np.sqrt(X[i,:].dot(X[i,:]))

Xb = np.concatenate((ones, r, X), *axis*=1)

Let us set the weighting coefficients randomly again.

w = np.random.rand(D+2)

The rest of the code from previous lectures, like the sigmoid function, remains unchanged, so we do not rewrite it.

z = Xb.dot(w)

*def* sigmoid(*z*):

return 1/(1 + np.exp(-z))

Y = sigmoid(z)

*def* cross_entropy(*T, Y*):

E = 0

for i in *xrange*(N)

if T[i] == 1:

E -= np.log(Y[i])

else:

E -= np.log(1 – Y[i])

return E

We set a specially selected value of the training coefficient, as well as the number of iterations. In the general case, you need to experiment a little to find the correct values, or you have to use something like cross-checking. Also, let’s show how weighting coefficients change with time as we go through each iteration. We set to output a result every 100 passes and use gradient descent with regularization.

learning_rate = 0.0001

error = []

for i in *xrange*(5000):

e = cross_entropy(T, Y)

error.append(e)

if i % 100 == 0:

print e

w += learning_rate * ( np.dot((T – Y).T, Xb) – 0.01*w)

Y = sigmoid(Xb.dot(w))

Display the error change with time, the final weighting coefficients and the classification rating.

plt.plot(error)

plt.title(‘’Cross-entropy’’)

print ‘’Final w’’, w

print ‘’Final classification rate:’’, 1 – np.abs(T – np.round(Y)). sum / N

Run the program. So, we again see the “donut” and the iteration.

In the end, we get a very good classification factor.

The values of the weighting coefficients are also very interesting – the coefficients for x and y are almost equal to zero.

Thus, the classification is independent of the rectangular coordinate system. So, our model has shown this.

But we find out that it depends on the bias term, so if we set a small radius, we get a negative value for the bias term, which pushes the classification to zero. If the radius is too large, it pushes the value of the classification coefficient to one.

That’s how you can solve the problem using logistic regression.