Traditional vs Deep Learning Algorithms in Retail Industry — I

Traditional vs Deep Learning Algorithms in Retail Industry — IMotivation — The Importance of Analytics in RetailSharmistha ChatterjeeBlockedUnblockFollowFollowingMay 27Retail industry faces a difficult situations as retailers would like to understand the products that are very similar to each other in order to evaluate which product is better off not promoting from the similar products in the same week to increase sales activity and profit margins.

Retailers need to consider various aspects of individual store management in geographically distributed areas to allow continuous flow of inventory at a reduced cost and minimum wastage.

In addition, huge volume of online sales have also promoted them to design automated Machine Learning (ML) retail platforms.

The problems addressed by the ML platform includes:Inventory and supply chain management with assortment planning.

Analysis of customer buying patterns with invalid/fraud requests.

Analyzing customer interactions with virtual assistants and chatbots.

Performing retail analytics at scale to understand year on year growth.

Personalized recommendations using Collaborative filtering, Content-based filtering, Hybrid filtering and others.

Identifying item displacement/removal in stores.

Interpreting text and images from bills, packing lists, invoices.

The below figure depicts the importance of analytics and machine learning in the retail domain for increasing profits.

Source : https://www.



pdfThis blog is structured as follows:Understanding the life-cycle of machine learning algorithms in retail so as to continuously learn and improve the model accuracy.

Studying some of the applications of ensembling learning, clustering, PCA and time series forecasting in the space of retail.

Investigate the retail areas, which can be leveraged with the advanced use of deep learning.

In this context, I explain few of the few deep CNN models like RCNN, Fast-RCNN, Faster-RCNN and Mask RCNN.

Life Cycle of Machine Learning Algorithms in RetailThe typical lifecycle of retail machine learning algorithms involve steps from gathering the training data, training the model, to continuously work in improving the model performance in a feedback loop.

Source: https://www.


com/ideas-portfolio/how-ai-machine-learning-big-data-shape-retail/Dynamic PricingMachine learning collect data from customers and help retailers to check and continuously monitor the prices of competitors.

This enables retailers to automatically match them or even offer a lower price, so the customers get the best possible deal and stay with the same brand.

Elaborating further on this topic, key international AI & ML experts will discuss the ways in which AI & ML assists in boosting sales and reducing costs in retail at the World AI Show on 24 July in Singapore.

The below figure shows dynamic pricing fluctuations of Amazon and other thirty party providers over a period of time.

Source: https://www.



Fashion Trend AnalysisRetailers have been successful in collecting social media data such as likes and comments, analyze customer sentiments, identify customer interests and the hottest trends and finally make relevant offers for specific customers as well as identify what’s selling.


Inventory ManagementMachine learning has helped retailers to cope up with demand prediction to prevent additional expenses.

It has facilitated easier inventory management, by ordering items to inventory based on the level of demand and replenished stocks.

Ensemble Modeling in RetailSource: https://www.


com/ideas-portfolio/how-ai-machine-learning-big-data-shape-retail/Source: https://www.


com/ideas-portfolio/how-ai-machine-learning-big-data-shape-retail/Ensemble modeling in retail is mostly used for Uplift modeling for up-selling, cross-selling, churn prevention and retention activities.

The objective is not to predict the likelihood of a customer buying, but what can be done to increase the likelihood of customers’ making a purchase.

The models aims at predicting the outcome of a marketing campaign by selecting individuals for whom the action will be most profitable.

Ensemble methods like Bagging, Random Forests and Stacking can be used in uplift modeling.

Other ensemble methods exist, such as Extremely Randomized Trees or Random Decision Trees.

The difference in the methods lies only in the way by which randomness is injected into the tree learning algorithm to ensure that models in the ensemble are diverse.

Uplift modeling is frequently applied in the marketing domain in retail industry has limited access to collection of marketing datasets in real-time.

Success of uplift modeling lies in high model performance of ensembling techniques due to the ensemble diversity arising from the predictor variables that are very weakly correlated with customer behavior.

Clustering in RetailStore Clustering is a popular analytic technique in the retail domain.

It is employed to build relevant store segments which are homogenous in certain behavioral aspects and can be targeted using the same marketing strategy.

Retail store segmentation can help to group together stores which have a similar customer-base and location (for e.


a store in the heart of the city may cater to customer-base similarly).

It helps in differentiated marketing strategies for each store segment, targeted to specific customers.

Category-based clustering strategy zooms in on specifics and helps to group items and products of a certain category in a specific group as shown in the figure below.

Customer Segmentation also uses unsupervised clustering techniques like k-means, latent class analysis, hierarchical clustering, etc.

Clustering techniques used in retail sales are: K-means, K-mediods, Density based Clustering, Hierarchical Clustering, Farthest First Clustering , Filtered Clustering.

Source: https://www.


com/blog/store-clustering-store-based-vs-category-based-clusteringPrincipal Component Analysis in RetailPrincipal Component Analysis works for lowering and analyzing huge multivariate information sets with underlying linear structures, and for finding formerly unsuspected relationships.

Principal component analysis is a kind of dimensionality reduction technique for factor analysis, aimed at reducing the initial variable set into a smaller sized set of linear mixes.

The factor matrix includes the factor loadings of all the variables on all the elements drawn out.

The eigenvalues refer to the overall variation discussed by each factor.

The job of principal component analysis is to recognize the patterns in the information and to direct the information by highlighting their distinctions and resemblances.

Factor analysis is useful in : condensing variables , uncovering clusters of responses, automatically weighing each of the variables.

A customer items business wishes to evaluate client reactions to numerous qualities of a new moisturizer cream.

Their marketing agency carries out a principal elements analysis to figure out whether they can form a smaller set of uncorrelated variables that are much easier to evaluate and analyze.

The outcomes recognize the following patterns.

Smell, color and texture form a “Moisturizer quality” component.

Cleanliness, anti-aging, glowing factor that has an “Effect on skin”.

Amount had to be applied and cost form a “Value” component.

The evaluation metrics are as follows:Level of customer satisfaction obtained from reviews and feedbacks.

Product recommendation to friends and family members.

Likelihood of product purchase in future for existing and returning customers.

In order to reveal the most striking features among lots of associated variables, principal component analysis can be used to reveal the most popular metrics of the high-dimensional information.

Time series Modeling in RetailIn this section, let us see what are the solutions available to handle missing and incomplete data that frequently occurs in retail analytics.

In addition, become familiar how ARIMA/SARIMA can be used for forecasting different stocked/replenished items weekly, monthly, over any periods of time.

Sources of missing data and imputation techniquesCrowdsourced datasets are incomplete due to missing observations/values for important attributes.

Data imputation methods can fill in missing values in crowdsourced, regular, discrete, retail price time series datasets.

Retail Price Time Series Imputation (RPTSI) method uses an ensemble of three constituent methods for imputing retail prices in a univariate time series dataset based upon retail prices that occur in the time series: Price Change Lookup, Central Moving Average, and Polynomial Interpolation.

Other techniques of handling missing/sparse data are:Model Based Imputation (MBI): is a popular and effective collaborative approach for imputing missing data.

In case a data set contains k features, then MBI builds k different classifiers, each trained to predict a different feature as target.

Each classifier treats the observed values of the target feature as training set and the missing data as test set.

Matrix Factorization (MF) exploits hidden features of a matrix by learning latent matrices whose dot product represent the original matrix.

Recommender systems use various MF strategies to collaborative filtering of sparse user-item ratings with the aim of recommending unrated pairs.

Supervised Matrix factorization approximates the original data set by learning latent features matrices and enables the linear separation of classes in the projected data space.

The dot product can reconstruct the original matrix.

An important aspect of matrix factorization, is that it creates a latent feature space where all time-series instances jointly share information and all features are treated equally in the objective function, via a stochastic learning technique which is specifically designed for sparse data sets.

It improves separability of latent data by enforcing a classification accuracy loss function (e.


logistic regression loss function).

It is not specifically tailored to time series, where the features/points are correlated to the neighbor points.

Spline Interpolation: Splines are effective intra-series alternatives for reconstructing missing segments, by preserving the continuity and smoothness of the curve.

Cubic Spline Interpolation (CSP), interpolate a missing segment by fitting a local cubic polynomial (spline), assuring that the first derivative (tangential) and the second derivative (speed of curvature change) are preserved at the connection points to observed neighboring segments.

Forecasting with Time SeriesSARIMA model can be used to forecast the monthly retail commodity prices at wholesale level in using SARIMA and regression-SARIMA.

Weekdays and holidays act as external variables in regression-SARIMA model, to identify the importance of predictor variables.

SARIMA model with the external factors (SARIMAX) is proposed to overcome the disadvantage of the traditional SARIMA model, in forecasting the daily sales of fresh foods in a retail store.

ARIMA and Holt-Winters (HW) models are used in forecasting time series of a group of perishable dairy products.

To consider seasonality and holiday effects at different store locations, both ARIMAX and SARIMAX models are better.

Apart from time-series used in demand forecasting, Markov Chains also plays an important role in sales predictionWord2Vec to Prod2VecSimilar to word2vec model (in Continuous Bag of Words (CBoW)), where each sentence is composed of words and similar words to a given word are identified by considering contextual words, a similar algorithmic model for retail domain can be introduced known as Prod2Vec.

Prod2Vec assumes an individual basket is composed of products, and similar products are identified by considering other products purchased in the same basket.

A visualization of the resulting product vectors of each product when represented using a dimensionality reduction technique called t-SNE (t-Distributed Stochastic Neighbor Embedding) is depicted in the below figure.

Each color in the figure represents a certain set of products and closer the products are, more similar they are.

This helps us to translate a product into an n-dimensional vector and solve multiple use cases in retail industry.

Source : https://www.


com/single-post/2017/08/15/Neural-networks-for-retail-industryDeep learning in Retail EnvironmentsThere are several use-cases of item and video identification in retail business where deep learning has helped to solve such problems.

Detecting Item Removal/DisplacementConvolution Neural Networks plays an important role in detecting items removal in a store.

The input to our deep neural network is video of people interacting with items on shelves in front of a vending machine.

The camera is mounted at the top of the machine and triggered to record video only when the door is open.

An example of our raw video frame is shown in Figure 1.

Source: http://cs231n.



pdfSeveral different deep neural network architectures can be used to classify whether items have been removed or added to the shelf within the time frame of the video.

Since the video dataset is small, use of pre-trained models available online along with transfer learning (to avoid overfitting) makes it easier to detect the item removed from shelves.

Use image classification model, SqueezeNet, for video classification problem.

Use video classification models like C3D (3D ConvNets) with batch normalization and transfer learning to identify customer actions in the videos.

A typical C3D is used to identify objects, scenes and track customer actions as illustrated in the diagram below.

Source : https://research.


com/c3d-generic-features-for-video-analysis/Other Object Tracking methods are the observation model: generative method and discriminative method.

The generative method uses the generative model to describe the apparent characteristics and minimizes the reconstruction error to search the object, such as PCA.

The discriminative method being one of the most robust mechanisms can be used to distinguish between the object and the background.

To achieve the tracking by detection, candidate objects for all frames are detected and deep learning method used to recognize the wanted object from the candidates.

Product ClassificationWith the growth of online shopping it becomes essential for a retailer to know popular design styles of clothing products to increase the production of those styles to achieve more profit.

Therefore, if a system can classify the garments products according to different style, texture, size etc.

, it can automatically suggest different products to the customers based on their choices.

Source: https://pdfs.



pdfEffective design classification based on textures, local spatial variations of intensity or color in images has been an important topic of interest in the past decades.

A successful classification, detection or segmentation requires an efficient description of image textures.

Some Hand-Engineered feature extraction methods such as Census Transform histogram (CENTRIST), Local Binary Pattern (LBP), Histogram of Oriented Gradient (HOG) etc do exist and some of them have been able to achieve popularity because of their computational simplicities and better accuracies.

However a specific deep CNN model, as stated in the figure below, has been able to achieve better accuracy results in identifying garments design class over Hand-Engineered feature extraction methods.

FC layers and Convolutional layers used in a Deep CNN represent the features more elaborately, which are stronger than any of Hand-Engineered feature extraction techniques.

Source : https://pdfs.



pdfThe Convolutional Neural Networks were also evaluated in different classification tasks like differentiating persons from products as well as classifying product images, gender and age of persons.

Object Detection with Region-based Convolutional Neural Networks (RCNN)The task to define objects within images usually involves outputting bounding boxes and labels for individual objects.

This differs from the classification/localization task by applying classification and localization to many objects instead of just a single dominant object.

There lies 2 classes of object classification, one with object bounding boxes and the other with non-object bounding boxes.

For example, in a physical retail grocery basket, all items in the given basket needs to be detected from a given image with their respective bounding boxes.

One approach is to use Sliding Window technique to classify and localize images, where CNN could be applied to many different crops of the image.

Because CNN classifies each crop as object or background, application of CNN scales to huge numbers of locations and scales becomes computationally expensive.

Source : https://www.


com/blog/the-five-computer-vision-techniques-that-will-change-how-you-see-the-worldIn order to scale in an efficient manner, neural network researchers have proposed to use regions instead, where we find “blobby” image regions that are likely to contain objects.

This helps to scale and run with better latency.

The initial model designed was R-CNN (Region-based Convolutional Neural Network).

In a R-CNN, the entire input image is scanned for possible objects using an algorithm called Selective Search, generating ~2,000 region proposals.

Following this, CNN model is run on top of each of these region proposals.

Finally, the output from each CNN is fed it into an SVM to classify the region and a linear regression to tighten the bounding box of the object.

From object detection, the problem resolves to an image classification problem.

An immediate descendant to R-CNN is Fast R-CNN, which improves the detection speed through 2 augmentations:Performing feature extraction before proposing regions, thus only running one CNN over the entire image, andReplacing SVM with a softmax layer, thus extending the neural network for predictions instead of creating a new model.

Source : https://www.


com/blog/the-five-computer-vision-techniques-that-will-change-how-you-see-the-worldFast R-CNN performed much better in terms of speed because it trains just one CNN for the entire image.

The main disadvantage lies in the selective search algorithm still exhibits lot of time to generate region proposals.

The lack in improvement speed is because of the fact the region proposals are generated separately by another model.

The next invention in the field of R-CNN is Faster R-CNN, which is a canonical model for deep learning-based object detection.

It replaces the slow selective search algorithm with a fast neural network by inserting a Region Proposal Network (RPN) to predict proposals from features.

The Faster R-CNN works principally for feature extractions and ROI proposals (ROI pooling) with 2 more convolution layers.

The RPN is acts as a chief decision-maker to infer areas of investigation within the image to reduce the computational requirements of the overall inference process.

The RPN quickly and efficiently scans every location in order to assess whether further processing needs to be carried out in a given region.

The output from the scan results in k bounding box proposals each with 2 scores representing the probability of object or not at each location.

The region proposals are then fed into straight into what is essentially a Fast R-CNN.

A pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor are added to the Fast R-CNN to convert it to Faster R-CNN.

Faster R-CNN achieves better speeds and higher accuracy.

Future deep CNN models did a lot to increase detection speeds, few models managed to outperform Faster R-CNN by a significant margin.

The drawback of Faster R-CNN lies in the feature map selected by RoIPool (Region of Interests Pool) were slightly misaligned from the regions of the original image.

Source : https://www.


com/blog/the-five-computer-vision-techniques-that-will-change-how-you-see-the-worldMaskRCNNMaskRCNN, a model built to very successfully conduct object detection and image segmentation, by adding a branch (Fully Convolutional Network on top of a CNN-based feature map) to Faster R-CNN that outputs a binary mask to detect whether or not a given pixel is part of an object.

The Mask R-CNN is a multi-task network that works on a single input (image), to predict multiple kinds of outputs.

The figure below of MaskRCNN constitutes one more variant (on the right at the bottom) in building such mask.

Source : https://medium.

com/@jonathan_hui/image-segmentation-with-mask-r-cnn-ebe6d793272ROI Align in Mask R-CNNOne of the unique features of Mask R-CNN is the refinement of the ROI pooling.

In ROI, the warping is digitalized (top left diagram below): the cell boundaries of the target feature map are forced to realign with the boundary of the input feature maps.

Mask R-CNN uses ROI Align (Region of Interests Align) which does not digitalize the boundary of the cells (top right) and make every target cell to have the same size (bottom right).

The bilinear interpolation applied to calculate the feature map values within the cell better, avoids any error in rounding, preventing inaccuracies in detection and segmentation.

Mask R-CNN have gained popularity due to it’s major advantages-good inference speed, good accuracy, easier implementation process and extension capability.

Mask R-CNN combines them with the classifications and bounding boxes from Faster R-CNN to generate precise segmentations.

However the major drawbacks are false alerts and missing labels.

With a couple of retail items arranged on a table, is fed into the model, MaskRCNN is able to successfully identify a large variety of different scene objects including juice, glass, cold drinks bottle, detergent bottles, cups, mugs, snacks and other.

Further, inside each identified bounding box, the colorful shaded region identifies exactly which pixels in an image correspond to the object.

This is referred to as pixel segmentation, and each pixel in the image receives a predicted classification label about the kind of object that pixel belongs to (or background).

The model executes three things: object detection (green boxes), object classification, and segmentation (colorful shaded regions).

The green bounding boxes on the image below are the outputs of the model and above each box is a prediction of what kind of object is contained within.

Source : https://www.


com/watch?v=tlC2O9T9jksDetecting Text ObjectsObject detection models such as MaskRCNN and its predecessors provide a very flexible mechanism for identifying regions of interest inside images.

Non-traditional OCR related to object detection primarily have two classes of objects: text objects and then everything else.

A model very similar to MaskRCNN can be trained to identify regions of interest (RoI) in an image that are highly likely to contain text, a process known as text localization.

An example output of such a model is shown below.

A text localization model applied to a cell phone picture of a receipt.

Text segments are identified apart from scene objects and background pixels.

Some of the challenges for text extraction are: document of interest occurs alongside some background objects and the text within the document is highly unstructured.

This necessities identification of all possible existing text blocks.

The output of the model is overlaid on the image above — regions of text are identified with dotted-line bounding boxes with the estimated pixel mask for the text.

Above each box is the predicted class and confidence score, which, is “Text” in all detected cases, due to the presence of only one object class of interest.

The bounding boxes are tight and encapsulate the text regions fairly accurately.

Model MaskRCNN needs modification and training with OCR-relevant datasets to design an effective approach for text localization.

The inclusion of pixel masks in the multi-task learning forces the localization (bounding box regression) to be even more accurate.

Hence the primary task is to identify RoIs of an image that correspond to text blocks, then this is obviously of limited utility for OCR.

The next job is to read the text contained in each image region; a process known as text recognition.

OCR in Retail IndustryOCR systems have been a big help to the retail industry allowing companies to scan and extract data from a Bill of Lading, Packing List, Invoices, Purchase Orders and more.

OCR’s advanced capture software is a dynamic system that looks for key words on each page to create a template for each invoice based on the defined keywords.

ConclusionIn this article, we have seen how traditional learning and deep learning serves the retail industry in different problem spaces with structured as well as unstructured input data.

However there remains much work to be done in the following areas to increase model predictability:Nature of volume of discounts, inventory policy, safety stock maintained, assumptions regarding shoppers and use them in decision variables.

Using transfer learning to price the products in the same cluster.

Use of Meta-learning to automate and expedite the learning efficiency.

Introduction of promotion pricing , membership pricing and other features that govern pricing.

Enhance usage of Deep learning based Object Detection learning methods for intelligent store monitoring, maintenance and customer validation.

In the next blog let’s look at how IoT together with Machine Learning, Reinforcement Based Learning and BlockChain is set to create a revolution in the retail industry.







. More details

Leave a Reply