How does machine learning help marketers use the customer data more effectively? 13.10.2017Category: Logistic RegressionAuthor: lifehacker Thanks to artificial intelligence, PR and advertising specialists will acquire unique superpowers, including the ability to predict the future in the next 30 years. Venture Beat interviewed PR agents and marketing tool distributors about the role of machine learning in their professions. We publish the excerpt from the article for you to understand the realities of nowadays. “Machine learning has already helped marketers use the customer data more efficiently and it complements what they have always had: intuition and experience,” Jeff Hardison, the vice president of Lytics, a platform for collecting the customer data. “Our data have 80% – 90% of accuracy in predicting what will happen in the next 30 days. We can predict in three days what the dynamics of development will be in 30 days” said Saif Adjani, the head of Keyhole, an analytical company using Google Tensor Flow machine learning tools. “Machine learning does a great job with large sets of data and helps us solve problems such as classification. It also helps us identify common elements of content that is becoming popular. In general, we see the benefits of machine learning in processing large data in the field of translation, image recognition, and spam protection. “- Steve Rayson, the director of BuzzSumo, a tool that allows the user evaluate the popularity of content. Are you impressed after reading? Personally, we have been inspired! And so let`s come back to our online store project for which where we are collecting the data for their analysis. The online shop project. Data preparationContents1 The online shop project. Data preparation2 The online shop project. Creating predictions So continue our project for the online store, or rather, consider the processing of data. I have already downloaded Numpy and Pandas libraries. import numpy as np import pandas as pd use the function pd.read_csv, the file ecommerce_data.csv for data downloading. df = pd.read_csv(‘ecommerce_data.csv’) If you want to see what we have in the file, use a command df.head() It shows the first five lines of the file. So, let’s get out of the shell and start working with the process file. If you do not want to write a code but you want to view it immediately, go to github, the corresponding file is called process.py. First of all, we load the Numpy and Pandas libraries. import numpy as np import pandas as pd Next, we write the get_data function. Firstly, it reads the data from the file, as we have done it earlier, and secondly, it converts them into Numpy matrices, because in such way it is easier to work. def get_data(): df = pd.read_csv(‘ecommerce_data.csv’) data = df.as_matrix() Next, we need to separate our x and y. Y is the last column so that x will be all the other columns except the last one. X = data[:, :-1] Y = data[:, -1] Then, as we have said, it is necessary to normalize the data. This value is x1 minus the mean and divided by the standard deviation for x1. The same goes for x2. X[:,1] = (X[:,1] – X[:,1].mean()) / X[:,1].std() X[:,2] = (X[:,2] – X[:,2].mean()) / X[:,2].std() Now let’s work on a column of categories, which is the time of the day. To do this, take the form of the original x and create a new X2 dimension Nx (D + 3) on its basis since we have four categories. N, D = X.shape X2 = np.zeros((N, D+3)) X2[:,0:(D-1)] = X[:,0:(D-1)] Now let’s write the code for direct coding for the rest of the columns. First, we do it simply. For each observation, we read the time value of the day, which, as you remember, has the value 0, 1, 2 and 3, and write this value in X2. for n in xrange(N): t = int(X[n,D-1]) X2[n,t+D-1] = 1 There is another way. We can create a new Nx4 dimension matrix for four columns and then index it directly. Z = np.zeros((N, 4)) Z[np.arange(N), X[:,D-1].astype(np.int32)] = 1 In this case, you will need to add one more line X2[:,-4:] = Z assert(np.abs(X2[:,-4:] – Z).sum() < 10e-10) And the end of the function. return X2, Y For our logistic regression classes, we need only binary data, not a complete set of them, so we write the get_binary_data function, which calls the get_data function, and then filter its result, selecting only the classes 0 and 1. def get_binary_data(): X, Y = get_data() X2 = X[Y <= 1] Y2 = Y[Y <= 1] return X2, Y2 That is all. The online shop project. Creating predictions So, first of all, you need to download our data. Run the already written function and assign values to our x and y. import numpy as np from process import get_binary_data X, Y = get_binary_data() After that, we can set the dimension and weighting coefficients for our model with the bias term set to zero. D = X.shape[1] W = np.random.randn(D) b = 0 We need to write a few more functions. First of all, it is a function for calculating the sigmoid, as well as the forward function that returns the sigmoid value for the expression WX + b. def sigmoid(a): return 1 / (1 + np.exp(-a)) def forward(X, W, b): return sigmoid(X.dot(W) + b) then we create two variables P_Y_given_X and predictions. P_Y_given_X = forward(X, W, b) predictions = np.round(P_Y_given_X) And there is one more function classification_rate, that accepts targets and forecasts as an argument, and returns the number of correct answers against their total number. def classification_rate(Y, P): return np.mean(Y == P) And, finally, print the result. print ‘’Score:”, classification_rate(Y, predictions) Run the program. As you can see, we have not got a perfect result, which has 32% of accuracy, having established random weighting coefficients. Next, we examine how to set the weighting coefficients to get a more accurate result. Ads are prohibited by the Google Adsense copyright protection program. To restore Google Ads, contact the copyright holders of the published content. This site uses a fraud technology. We ask you to leave this place to secure your personal data. Google Good Team