Through a portfolio of projects or a certification.
A certificate says to future clients and employers, ‘Hey, I’ve got the skills and I’ve put in the effort to get accredited.
’Google’s one-liner sums it up.
Demonstrate your proficiency to design and build data processing systems and create machine learning models on Google Cloud Platform.
If you don’t have the skills already, going through the learning materials for the certification means you’ll learn all about how to build world-class data processing systems on Google Cloud.
Who would want to do a Google Cloud Professional Data Engineer Certification?You’ve seen the figures.
The cloud is growing.
And it’s here to stay.
If you haven’t seen the figures, trust the cloud is growing.
If you’re already a data scientist, a data engineer, data analyst, machine learning engineer or looking for a career change into the world of data, the Google Cloud Professional Data Engineer Certification is for you.
Being able to use cloud technologies is becoming a requirement for any kind of data focused role.
Do you need the certificate to be a good data engineer/data scientist/machine learning engineer?No.
You can still use Google Cloud to work on data solutions without the certificate.
A certificate is only one validation method of existing skills.
How much does it cost?To sit the certification exam costs $200 USD.
If you fail, you will have to pay the fee again to resit.
There are costs associated with the preparation courses and using the platform itself.
Platform costs are what you’ll be charged for using Google Cloud’s services.
If you are an avid user, you’ll be well aware of these.
If not, and you’re only going through the training materials in this article, you could create a new Google Cloud account and complete them all well within the $300 credits Google offers on sign up.
We’ll get to course costs in a second.
How long does the certification last?2-years.
After that, you’ll need to take the exam again.
And since Google Cloud is evolving every day, it’s likely what’s required for the certificate has changed (as I found out was the case when I started writing this article).
What do you need to get ready for the exam?Google recommends 3+ years of industry experience and 1+ years designing and managing solutions using GCP for professional level certifications.
I didn’t have either of these.
It was more like 6-months of each.
To supplement this, I utilised a combination of online training resources.
What courses did I take?If you’re like me and don’t have the recommended requirements, you may want to look into some of the following courses to upskill yourself.
The following courses are what I used to prepare for the certification.
They’re listed in order of completion.
I’ve listed the costs, timelines and helpfulness towards passing the certification exam for each.
Some of the incredible online learning resources I used to upskill myself for the exam.
In order, A Cloud Guru, Linux Academy, Coursera.
Data Engineering on Google Cloud Platform Specialization on CouseraCost: $49 USD per month (after 7-day free trial)Time: 1–2 months, 10+ hours per weekHelpfulness: 8/10The Data Engineering on Google Cloud Platform Specilization on Coursera is made in collaboration with Google Cloud.
It’s broken into five sub-courses, each of which takes about 10-hours per week worth of study time.
If you’re unfamiliar with Data Processing on Google Cloud, this Specialization is like a 0 to 1.
You’ll go through a range of practical exercises using an iterative platform called QwikLabs.
Prior to these, will be lectures led by Google Cloud practitioners on how to use different services such as Google BigQuery, Cloud Dataproc, Dataflow and Bigtable.
A Cloud Guru Introduction to Google Cloud PlatformCost: FreeTime: 1week, 4–6 hoursHelpfulness: 4/10Don’t take the low helpfulness score as this course being useless.
It’s far from it.
The only reason it gets a lower score is it’s not focused on the Professional Data Engineer Certification (this could be gathered from the title).
I took this as a refresher after completing the Coursera Specialization because I’d only been using Google Cloud for a few specialised use cases.
If you’re coming from another cloud service provider or have never used Google Cloud before, you may want to take this course.
It’s a great introduction to Google Cloud Platform as a whole.
Linux Academy Google Certified Professional Data EngineerCost: $49 USD per month (after 7-day free trial)Time: 1–4 weeks, 4+ hours per weekHelpfulness: 10/10After completing the exam and reflecting back on the courses I’d done, the Linux Academy Google Certified Professional Data Engineer was the most helpful.
The videos, along with the Data Dossier eBook (a great free learning resource which came with the course) and the practice exams made the course one of the best learning resources I’ve ever used.
I even recommended it as the go-to resource in some Slack notes to the team after the exam.
Slack Notes• Some things on the exam weren’t in Linux Academy or A Cloud Guru or the Google Cloud Practice exams (expected)• 1 question with a graph of data points and what equation you’d need to cluster them (e.
cos(X) or X²+Y²)• Knowing the difference between Dataflow, Dataproc, Datastore, Bigtable, BigQuery, Pub/Sub and how they can each be used is a must• The two case studies in the exam are the exact same as the ones in the practice, though I didn’t read the studies at all during the exam (the questions gave enough insight)• Knowing some basic SQL query syntax is very helpful, especially for the BigQuery questions• The practice exams provided by Linux Academy and GCP are very similar style questions to the exam, I’d do each of these multiple times and use them to figure out where you’re weak• A little rhyme to help with Dataproc: “Dataproc the croc and Hadoop the elephant plan to Spark a fire and cook a Hive of Pigs” (Dataproc deals with Hadoop, Spark, Hive and Pig)• “Dataflow is a flowing Beam of light” (Dataflow deals with Apache Beam)• “Everyone around the world can relate to a well-made ACID washed Spanner.
” (Cloud Spanner is a DB designed for the cloud from the ground up, it’s ACID compliant and globally available)• Handy to know the names old school equivalents of relational and non-relational database options (e.
MongoDB, Cassandra)• IAM roles are slightly different for each service but understanding how to seperate users from being able to see data versus design workflows is helpful (e.
Dataflow Worker role can design workflows but not see the data)This is probably enough for now.
Mileage will probably vary from each exam.
Linux Academy’s course will supply 80% of the knowledge.
Google Cloud 1-minute videosCost: FreeTime: 1–2 hoursHelpfulness: 5/10These were recommended on the A Cloud Guru forums.
Many of them weren’t related to the Professional Data Engineer Certification however I cherry-picked some of the ones I recognised.
Some of the services can seem complex when going through a course, so it was good to hear a particular service described in a minute.
Preparing for the Cloud Professional Data Engineer ExamCost: $49 USD for the certificate or free (no certificate)Timeline: 1–2 weeks, 6+ hours per weekHelpfulness: N/AI found this resource the day before my exam was scheduled.
I didn’t do it due to time restrictions, hence the lack of helpfulness rating.
However, after going through the course overview page it looks like a great resource to bring together all the things you’ve been learning about Data Engineering on Google Cloud and to highlight any weak points.
I sent this course as a resource to one of my colleagues who’s preparing for the certification.
Google Data Engineering Cheatsheet by Maverick LinCost: FreeTimeline: N/AHelpfulness: N/AThis was another resource I stumbled upon after the exam.
I took a look at it and it’s comprehensive yet concise.
Plus, it’s free.
This could be used as something to read over in between practice exams or even after the certification to remind yourself.
What did I do after the courses?After getting close to completing the courses, I booked the exam with a week’s notice.
Having a deadline is a great motivation for going over what you’ve learned.
I went through the practice exams from Linux Academy and Google Cloud multiple times each until I could complete them at 95%+ accuracy every time.
Passing the Linux Academy practice exam with over 90% for the first time.
The quizzes from each platform are similar but I found going over the answers I kept getting wrong and writing down why I got them wrong helped fix my weak points.
The exam I took used designing data processing systems on Google Cloud for two case studies as the theme (this has changed since March 29, 2019).
And was multiple choice the whole way through.
It took me about 2-hours.
And was about 20% harder than any of the practice exams I’d taken.
I can’t stress the value of the practice exams enough.
What would I change if I went to do it again?More practice exams.
More practical knowledge.
Of course, there’s always more preparation you could do.
The recommended requirements do list 3+ years of using GCP.
But I didn’t have this so I had to deal with what I had.
ExtrasThe exam was updated on March 29.
The materials in this article will still give you a good foundation however, it’s important to note some changes.
Different sections of the Google Cloud Professional Data Engineer Exam (Version 1)1.
Designing data processing systems2.
Building and maintaining data structures and databases3.
Analysing data and enabling machine learning4.
Modelling business processes for analysis and optimisation5.
Visualizing data and advocating policy7.
Designing for security and complianceDifferent sections of the Google Cloud Professional Data Engineer Exam (Version 2)1.
Designing data processing systems2.
Building and Operationalizing Data Processing Systems3.
Operationalizing Machine Learning Models (most of the changes have happened here) [NEW]4.
Ensuring Solution QualityVersion 2 has combined section 1, 2, 4 and 6 of Version 1 into 1 and 2.
It has also combined section 5 and 7 from Version 1 into section 4.
And section 3 of Version 2 has been expanded to encompass all of Google Cloud’s new machine learning capabilities.
Because these changes have occurred so recently, many training materials have not had a chance to be updated.
However, going through the materials in this article should be enough to cover 70% of what you need.
I’d combine it with some of your own research on the following (these were introduced in Version 2 of the exam).
Google Machine Learning (ML) APIsGoogle Cloud Machine Learning EngineGoogle Cloud TPUs (a custom piece of hardware Google has built specifically for ML training)Google Glossary of ML termsAs you can see the latest update to the exam had a big focus on Google Cloud’s ML capabilities.
After the examWhen you complete the exam you’ll only receive a pass or fail result.
The advice is to aim for at least 70%, hence why I aimed for a minimum of 90% on the practice exams.
Once you’ve passed, you’ll be emailed a redemption code alongside your official Google Cloud Professional Data Engineer certificate.
Congratulations!You can use the redemption code on an exclusive Google Cloud Professional Data Engineer store which is packed with swag.
There are t-shirts, backpacks and hoodies (these may vary in stock when you get there).
I chose the hoodie.
Now you’re certified you can now show off your skillset (officially) and get back to doing what you do best, building.
See you in two years to get recertified.
PS if you have any questions, or would like something clarified, you can find me on Twitter and LinkedIn.
There’s also a video version of this article on YouTube.