One out of every seven people were wanted on a warrant.
One out of every 21 were carrying weapons, from box cutters up to Uzi submachine guns.
So the New York miracle, if you will, began with fare evasion enforcement on the subway 25 years ago.
”Current process of data collection for estimating fare evasion is far from perfect and relies on human generated data and sampling techniques.
MTA fare evasion data collection (source)I believe that leveraging the technological advances in Computer Vision a cost effective automated system to measure (rather than estimate) the fare evasion in subways using security camera footage is a possibility.
Most of the turnstiles are covered by security cameras and this footage is recorded and saved on central servers.
It is possible to engineer a software pipeline that consumes video data from these servers and uses computer vision to detect turnstile jumpers and record this information in the database.
This automation will have following benefits:No need to install edge devices or on-site hardware which means minimum implementation cost.
Fare evasion data will have a precise time granularity as opposed to sampling.
Fare evasion data will have fine spatial granularity (station entrance level) unlike system-wide estimates by current method.
Potential of transforming it into real-time information systemAccurate estimates will enable MTA to do better cost benefit analysis and decide upon their next steps in a data driven fashion.
Spatio-temporal nature of data would enable efficient staff scheduling to prevent fare evasion.
It will cut the cost of manual data collectionComputer Vision to rescueWe will rephrase the problem in a way that Computer Vision can address.
The goal is to identify frames in a video where fare evasion occurs.
And fare evasion occurs when a person jumps over the turnstile rather than walking through it.
So if you can differentiate between a walking person and a jumping person in a given frame, you can built upon that to detect fare evasion.
A number of different architectures based on Convolutional Neural Networks are possible to address this issue.
But training these models from scratch needs a huge amount of data, the luxury that we do not posses for this specific problem (until unless you have a relative in MTA).
But in modern days of computer vision, you rarely need to start from scratch.
There is a plethora or pre-trained models available on web that can be used, either to initialize the partial or complete state of your models or to extract features on which you can train shallow classifies.
This is referred to as transfer learning.
Pose estimation systems appeared to be a perfect candidate for feature extraction for the pose features should be able to identify the jumping person from a walking one.
Pose estimation systems estimate the location of selected key point son the human body from images using convolutional neural networks.
So the solution was divided into two modules:Pose Feature Extraction: This module will detect all humans in the frame and estimate their pose by detection key body points.
Open Pose is a open source human pose estimation system that that was used by me to extract pose features.
Pose classification: Based on the intuition that posture of the person walking through the turnstile or jumping over it will be distinctively different.
Pose classification will classify the pose as jumping or walking based on pose features.
Proof of Concept SolutionTo test the viability of the proposed approach I have developed a proof of concept solution.
A small dataset of about 200 images from google image search was prepared.
Half the images were of people standing and the other half were of people jumping in the air.
Open pose was used to extract the pose features from the images which gives you position of 18 key points of the body.
Subsequently a small random forest was trained on 18 dimensional feature space to solve two way classification problem.
Classifier gives a precision of 91% and recall of 85% even with a train/test split of 50/50 indicating that it is an straightforward classification problem.
The reason for good results from such a small dataset is that in the 14 dimensional feature space the jumping pose and walking pose are easily separable.
Example predictions on unseen dataNow we will try running a video through the model and see if it can identify frames that involve a jumping person.
It does !Me walking through a poorly simulated turnstile.
Me jumping over a poorly simulated turnstile.
The results are promising and it appears to be possible to engineer an automatic fare evasion detection system using computer vision.
We believe that performance can be improved by retraining/fine-tuning the entire model pipeline on labelled MTA surveillance data.
This is because every system generates data distribution with some characteristic properties.
For example, the patterns of occlusion,angle of capture and lighting conditions will be unique for MTA surveillance data.
Using similar data in training should be able to enhance the performance.
Code is available on github.
Feel free to play around with it and leave a star if you like the work.
muaz-urwa/Fare-Evasion-Detection-using-Computer-VisionContribute to muaz-urwa/Fare-Evasion-Detection-using-Computer-Vision development by creating an account on GitHub.