Exactly, data we need data and we need a lot of them.
In almost any get started examples out there on the web you use one of the common MNIST datasets, either the fashion (made from researchers at Zalando Research) or the digit dataset.
And if you train a model on one of them, you might reach pretty good results but you miss a lot of practice.
Create your own dataset instead and you will learn how to collect, store and process large datasets.
Data exampleTip 3: Learn how to work with your own data is an important skill which is required in real world applications.
Start with a simple example, collect text with a corresponding label and save it in a csv.
This is might be text from StackOverflow with the corresponding tags.
Where you want to predict the programming language, based on a given text.
There are so many possibilites choose one.
Learn to understand your dataExtract insights from your dataset help you to understand your data.
This is the perfect time to get familiar with Pandas and pyplot on a Jupyter Notebook.
Ask yourself anything you want to know about your dataset.
What kind of data is in my dataset?How many classes are there?How is the data balanced?and many more questions, be creativeEnvironmentEverything starts with your machine learning environment.
This can be either on your local machine or in the cloud.
Your machine learning environment is that kind of area where you install all the tools which you need.
One of those local environments is Anaconda, used for data science and machine learning.
Or use Colaboratory if you prefer a cloud-based environment that is free to use and requires no setup.
Use already existing technologyI want you to use AutoML before you start building custom machine learning models in TensorFlow.
You might ask “why AutoML or what is AutoML” and that are good questions.
With AutoML you can archive good machine learning results within hours instead of weeks or months.
AutoML takes away a bunch of steps which require time and knowledge.
Such as data splitting (train and test), encoding, embeddings, evaluate different architectures, and hyperparameter tuning to mention a few of them, and all automated for you.
There are different types of AutoML products, for the sake of simplicity we keep it simple and stay with Google AutoML.
It is as easy as it sounds, upload your training data, start the training and grab a ☕ till the training is done.
If it’s done you can evaluate the results, this is a good point to learn about the different evaluation metrics like precision and recall or how to interpret a confusion matrix.
And even better you already have a production-ready machine learning model.
Google provides you with a direct endpoint to make predictions.
Google offers different types of AutoML products AutoML Natural Language, AutoML Vision, AutoML Translation, AutoML Table more coming soon ヽ(•‿•)ノ.
Cloud AutoML | AutoML | Google CloudGoogle Cloud delivers secure, open, intelligent, and transformative tools to help enterprises modernize for today’s…cloud.
comThere is one thing you should also consider.
Do you really need to train a machine learning model?.Or can you use an already existing API with a pre-trained model to solve your issue?.There are many ready to use APIs which can be used.
Tip 4: AutoML or pre-trained APIs should be the first product of choice if you want to solve a problem.
PreprocessingIf Google AutoML or pre-trained APIs does not fit your needs you can still go with the custom machine learning model in TensorFlow or Keras.
Unfortunately, you cannot simply use the data which you collected and start training a machine learning model.
Preprocessing steps are required until the data is ready to use.
The preprocessing depends on the data which you use.
For text, you need a different preprocessing then for images.
Clean your data should be the first preprocessing step, to get back to our StackOverflow.
csv example maybe there are some rows without text?.A text without a label, special characters which are not required, or other kinds.
Cleaning up those data might improve the performance of the model.
Shuffle your data has various reasons usually our data is sorted like our StackOverflow.
csv which might be sorted based on the programming language.
We want to shuffle to be sure that our training and test dataset represents an overall distribution of the data.
There are more reasons and as mentioned in the first tip, I am sure you will learn all of them over time.
Split data into two sets, training and testing.
As a rule of thumb, 80% are training and 20% test data.
Embedding and encoding are another important part and if you google it, it might sound confusing.
To give you a simple understanding of embeddings and encoding imagine it like a method which converts text to numbers in a more or less intelligent way.
Tip 5: Data preprocessing is a necessary step and a complete chapter itself.
Get familiar with data preprocessing before you continue and train your custom model with TensorFlow.
And guess what.
you can use your insights which you got while digging in your dataset.
Build your TensorFlow modelWe reached the point where we can train a custom model with TensorFlow.
This is by far the most complex and time-consuming part, I am talking about a range between days to a month.
The first step is to consume the data, this depends on the framework which you use.
TensorFlow, TensorFlow Hub modules or Keras.
Don’t worry too much about the different types you will learn it on the way.
After that, you have to choose an algorithm, this is the time to learn about the different types of algorithms, how and when to use them, what are the differences.
The next step is to train our model, while the training on small datasets and simple models might perform well on your laptop, others require a large amount of processing power.
For that, you can take the advantages of training your models in the cloud.
Tip 6: The first few steps with TensorFlow are the hardest, don’t give up and continue.
Building your own projects give you the most out of it.
While you build your projects read the TensorFlow documentation and get familiar with the framework.
Evaluate your machine learning modelOne final and the most important step is still missing.
The model evaluation, how well did your model solve the issue?.Does the model also work previous unseen data also called generalization?.It is important to understand your results and classification accuracy is simply not enough to evaluate a model.
Tip 7: Get familiar with different evaluation metrics on how and when to use them.
Use different metrics like a confusion matrix, F1 score and so one.
Bring your model into productionDepending on the issue which you want to solve it is not needed to archive 100% accuracy.
Think about how your model might help to solve an issue either to save time, money or automate a process.
And if you can solve a highly time consumable process with an accuracy of 80% you might already save a lot of money.
Your model training is done and the results are looking good enough to solve the issue and what now?.Now it is time to bring your model into production.
Give the people the opportunity to use it.
You can host models in different ways like TensorFlow ModelServer or Google AI Platform.
Last Tip: Be consequent and create for all of your projects a GitHub repository.
It is the best way to build your own portfolio.
And even more, you will re-use a lot of code in other projects.
And if your first year with ML might feel like this.
Be aware at some point you have enough knowledge to build ML products with confidence.
What’s next?That’s not all, I will share with you examples and further posts to give you some hands-on experience.
Follow me on Twitter @HeyerSascha if you don’t want to miss it.
Collecting datasetsDataset preprocessingGoogle AutoMLKeras Named Entity Extraction trained and deployed with Google AI PlatformKubeflowThanks for reading.
If you enjoyed my article feel free to leave some claps ????.
Your feedback and questions are highly appreciated, you can find me on Twitter @HeyerSascha.