Implementing Multiclass Logistic Regression using BigQuery ML

Other algorithm specific hyper-parameters should be mentioned in this block as well.# Train a modeltrain_query = """ create or replace model `bigquery_demo.consumer_complaint_model` options ( model_type='logistic_reg', auto_class_weights=true, input_label_cols=['Product'], max_iterations=10) AS select Product, Consumer_complaint_narrative from `bigquery_demo.consumer_complaint` where Consumer_complaint_narrative is not null and Date_received <= "2017-09-30 00:00:00" limit 100000 """training_job = client.query(train_query).to_dataframe()print(training_job)Train a logistic regression classifierStep 2: Check training summary.ML.TRAINING_INFO call on the model trained in the previous step returns the training summary.training_info = """ select * from ML.TRAINING_INFO(MODEL `bigquery_demo.consumer_complaint_model`) """training_info_job = client.query(training_info).to_dataframe()print(training_info_job)Training summaryStep 3: Evaluate the model on the test set.ML.EVALUATE call on the model with the select query on the data will evaluate the model..In this case the select query provides all records in the table with date higher then 30-Sep-2017 as input to the model for evaluation.The average accuracy is .28 which is not very good..It might get better with some hyper parameter tuning but that’s not the scope of this article..One important thing to note here is that entire pre-processing of data for training and evaluation is managed by BigQuery..This might also have some impact on the accuracy.# perform evaluationquery_evaluate = """ select * from ML.EVALUATE (MODEL `bigquery_demo.consumer_complaint_model`, ( select Product, Consumer_complaint_narrative from `bigquery_demo.consumer_complaint` where Consumer_complaint_narrative is not null and Date_received > "2017-09-30 00:00:00" )) """evaluation_job = client.query(query_evaluate).to_dataframe()print(evaluation_job)Model evaluationStep 4: Get predictionsML.PREDICT call on the model will perform predictions on the data passed by the select query.# perform predictionquery_predict = """ select * from ML.PREDICT (MODEL `bigquery_demo.consumer_complaint_model`, ( select Consumer_complaint_narrative from `bigquery_demo.consumer_complaint` where Consumer_complaint_narrative is not null limit 2)) """prediction_job = client.query(query_predict).to_dataframe()prediction_job.head()Predictions from the modelStep 5: Evaluate the responseThe prediction will return the dataframe with probability distribution over all labels.prediction_job["predicted_Product_probs"][0]prediction_job["predicted_Product_probs"][1]Probability distribution among classesLimitations:BigQuery ML is still in beta phase..It is a relatively new product with some very promising features..However, it still has a long way to go..As of now, it supports only linear regression, binary logistic regression and multi class logistic regression..The maximum number of unique labels that are allowed in multi class logistic regression is 50.The source code of this exercise is on Github.. More details

Leave a Reply