Data Science with Optimus.
Part 1: Intro.
Breaking down data science with Python, Spark and Optimus.
Favio VázquezBlockedUnblockFollowFollowingFeb 7Don’t worry if you don’t know what these logos are, I’ll explain them in next articles :)Data science has reached new levels of complexity and of course awesomeness.
I’ve been doing this for years now, I’m what I want for people is to have a clear and easy path to do their job.
I’ve been talking about data science and more for a while now, but it’s time to get our hands dirty and code together.
This is the beginning of a series on articles about data science with Optimus, Spark and Python.
What is Optimus?https://github.
com/ironmussa/OptimusIf you been following me for a while, you know that me and the Iron-AI team created a library called Optimus.
Optimus V2 was created to make data cleaning a breeze.
The API was designed to be super easy for newcomers and very familiar for people that come from working with pandas.
Optimus expands the Spark DataFrame functionality, adding .
rows and .
With Optimus you can clean your data, prepare it, analyze it, create profilers and plots, and perform machine learning and deep learning, all in a distributed fashion, because on the back-end we have Spark, TensorFlow, Sparkling Water and Keras.
It’s super easy to use.
It’s like the evolution of pandas, with a piece of dplyr, joined by Keras and Spark.
The code you create with Optimus will work on your local machine, and with a simple change of masters, it can run on your local cluster or in the cloud.
You will see a lot of interesting functions created to help with every step of the data science cycle.
Optimus is perfect as a companion for an agile methodology for data science because it can help you in almost all the steps of the process, and it can easily connect to other libraries and tools.
If you want to read more about an Agile DS Methodology check this out:Agile Framework For Creating An ROI-Driven Data Science PracticeData Science is an amazing field of research that is under active development both from the academia and the industry…www.
ioInstalling OptimusThe install process it’s very simple.
Just run the command:pip install optimuspysparkAnd you are ready to rock and roll.
Running Optimus (and Spark, Python, etc.
)Creating a data science environment should be easy.
Both for trying stuff and for production.
When I was starting thinking on these series of articles I was shocked to find out how hard is to prepare a reproducible environment for data science with free tools.
Some of the contestants were:Google ColaboratoryEdit descriptioncolab.
comMicrosoft Azure Notebooks – Online Jupyter NotebooksProvides free online access to Jupyter notebooks running in the cloud on Microsoft Azure.
comHome – cnvrgEdit descriptioncnvrg.
ioBut without a doubt the winner for me was MatrixDS:A Community for Data Scientists by Data ScientistsEdit descriptionmatrixds.
comWith this tool you have a free environment for Python (with JupyterLab) and for R (with R Studio), also tools for presenting like Shiny and Bokeh, and much more.
And for free.
You will be able to run everything in the repo:FavioVazquez/ds-optimusHow to do data science with Optimus, Spark and Python.
comInside of MatrixDS with a simple fork of the project:MatrixDS | The Data Project WorkbenchMatrixDS is a place to build, share and manage data projects at any scale.
comJust create an account and you’re good to go.
The journeyDon’t worry if you don’t know what these logos are, I’ll explain them in next articles :)The path above is how I’m going to structure the different blogs, tutorials and articles in the series.
I have to tell you right now that I’m preparing a full course that will cover some of this tools with a business perspective with Business Science, here you can see more:Business Science UniversityLearn from Virtual Workshops that take you through the entire Data-Science-for-Business process of solving problems…university.
ioThe first part is on MatrixDS and GitHub, so you can just Fork in on GitHub and Forklift it on MatrixDS.
MatrixDS | The Data Project WorkbenchMatrixDS is a place to build, share and manage data projects at any scale.
comOn MatrixDS click on Forklift:FavioVazquez/ds-optimusHow to do data science with Optimus, Spark and Python.
comHere’s the notebook, but please check it out in the platforms :)For updates follow me on Twitter and LinkedIn :).