Top 5 Data Science GitHub Repositories and Reddit Discussions (January 2019)

Introduction There’s nothing quite like GitHub and Reddit for data science.

Both platforms have been of immense help to me in my data science journey.

GitHub is the ultimate one-stop platform for hosting your code.

It excels at easing the collaboration process between team members.

 Most leading data scientists and organizations use GitHub to open-source their libraries and frameworks.

So not only do we stay up-to-date with the latest developments in our field, we get to replicate their models on our own machines!.Reddit discussions are on the same end of that spectrum.

Leading researchers and brilliant minds come together to discuss and extrapolate the latest topics and breakthroughs in machine learning and data science.

There is A LOT to learn from these two platforms.

I have made it a habit to check both these platforms at least twice a week.

It’s changed the way I learn data science.

I encourage everyone reading this to do the same!.In this article, we’ll focus on the latest open-source GitHub libraries and Reddit discussions from January 2019.

Happy learning!.You can also browse through the 25 best GitHub repositories from 2018.

The list contains libraries covering multiple and diverse domains, including NLP, Computer Vision, GANs, AutoML, among others.

  GitHub Repositories Flair (State-of-the-Art NLP Library) 2018 was a watershed year for Natural Language Processing (NLP).

Libraries like ELMo and Google’s BERT were ground-breaking releases.

As Sebastian Ruder said, “NLP’s ImageNet moment has arrived“!.Let’s keep that trend going into the new year!.Flair is another superb NLP library that’s easy to understand and implement.

And the best part?.It’s very much state-of-the-art!.Flair was developed and open-sourced by Zalando Research and is based on PyTorch.

 The library has outperformed previous approaches on a wide range of NLP tasks: Here, F1 is the accuracy evaluation metric.

I am currently exploring this library and plan to pen down my thoughts in an article soon.

Keep watching this space!.  face.

evoLVe – High Performance Face Recognition Library Face recognition algorithms for computer vision are ubiquitous in data science now.

We covered a few libraries in last year’s GitHub series as well.

Add this one to the growing list of face recognition libraries you must try out.


evoLVe is a “High Performance Face Recognition Library” based on PyTorch.

It provides comprehensive functions for face related analytics and applications, including: Face alignment (detection, landmark localization, affine transformation) Data pre-processing (e.


, augmentation, data balancing, normalization) Various backbones (e.


, ResNet, DenseNet, LightCNN, MobileNet, etc.

) Various losses (e.


, Softmax, Center, SphereFace, AmSoftmax, Triplet, etc.

) A bag of tricks for improving performance (e.


, training refinements, model tweaks, knowledge distillation, etc.


This library is a must-have for the practical use and deployment of high performance deep face recognition, especially for researchers and engineers.

  YOLOv3 YOLO is a supremely fast and accurate framework for performing object detection tasks.

It was launched three years back and has seen a few iterations since, each better than the last.

This repository is a complete pipeline of YOLOv3 implemented in TensorFlow.

This can be used on a dataset to train and evaluate your own object detection model.

Below are the key highlights of this repository: Efficient tf.

data pipeline Weights converter Extremely fast GPU non maximum suppression Full training pipeline K-means algorithm to select prior anchor boxes If you’re new to YOLO and are looking to understand how it works, I highly recommend checking out this essential tutorial.

  FaceBoxes: A CPU Real-Time Face Detector with High Accuracy One of the biggest challenges in computer vision is managing computational resources.

Not everyone has multiple GPUs lying around.

It’s been quite a hurdle to overcome.

Step up FaceBoxes.

It’s a novel face detecting approach that’s shown impressive performance on both speed and accuracy using CPUs.

This repository in a PyTorch implementation of FaceBoxes.

It contains the code to install, train and evaluate a face detection model.

No more complaining about a lack of computation power – give FaceBoxes a try today!.  Transformer-XL from Google AI Here’s another game-changing NLP framework.

It’s no surprise to see the Google AI team behind it (they’re the ones who came up with BERT as well).

Long range dependencies have been a thorn in the side of NLP.

Even with the significant progress made last year, this concept wasn’t quite dealt with.

RNN and Vanilla transformers were used but they were not quite good enough.

THat gap has now been filled by Google AI’s Transformer-XL.

A few key points to note about this library: Transformer-XL is able to learn long range dependencies about 80% longer than RNNs and 450% longer than vanilla Transformers Even on the computational front, Transformer-XL is about 1800+ times faster than Vanilla Transformer!.Transformer-XL has better performance in perplexity (more accurate at predicting a sample) on long sequences because of long-term dependency modeling This repository contains the code for Transformer-XL in both TensorFlow and PyTorch.

See if you can match (or even beat) the state-of-the-art results in NLP!.  There were a few other awesome data science repositories created in January.

Make sure you check them out: Machine Translation Reading list Using the latest advancements in AI to predict stock market movements.

Super-SloMo   Reddit Discussions Data Scientist is the new Business Analyst Don’t be fooled by the hot-take in the headline.

This is a serious discussion about the current state of data science and how it’s taught around the world.

It’s always been difficult to pin down specific labels on different data science roles.

The functions and tasks vary – so who should learn exactly what?. More details

Leave a Reply