This brings me to the second hurdle which is, in essence, trying to bring into the fold a data-aware product team.
A data-aware product team is essential for bringing the data triage (analytics, machine learning and engineering) together and bridging the business gap while reigning in those expectations.
Unfortunately, a sense of product understanding is not always clear within the domain of data.
It is common to encounter rushed implementations because an algorithm that made an impact in one domain is assumed to make an impact in another and unfortunately with the misperceived smell of success in the air, it can at times be difficult to convince anyone to take a step back and reevaluate.
It is in such cases that a data-aware product team is essential for creating a more manageable expectation.
To infinity and within…I have often heard people allude to the idea that because something is in the cloud it is all expansive, infinite in memory and processing power, but this could not be further from the truth.
Firstly, although you can use some pretty insane resources they are not unlimited and are certainly not cheap.
It is helpful to match the resources with the required business case, for example in the case of “realtime” inference try running smaller scripts on a serverless architecture like AWS Lambda rather than a persistent and costly AWS EC2 instance.
Another example may be rather than requesting database resources you may be better off loading your data from a catalogued file storage.
The second and possibly most important reason stems from thinking that resources don’t matter, and there is no need to optimise.
It is not uncommon to see massive data frames loaded into memory only to perform a single value lookup or massive and expensive compute resources used for what can efficiently be done with serverless architecture and don’t even get me started on some of the SQL queries running out there.
Basic programming principals should be followed, algorithms should be well documented, your code should not be repeated and you should only load what you wish to consume.
Unicorns, rainbows and wizardsI am often asked what is a data scientist?.Still to this day I am not actually certain.
I used to think those crazy geniuses with advanced degrees in mathematics (previously called statisticians) were data scientists.
These days I am torn between two schools of thought.
The first being that data science, in general, encompasses all those players who fall within the data spectrum (data engineering, machine learning and data analytics) somewhat similar to a genealogical species tree.
It is at this point that I wish to highlight a concept previously mentioned: the data triage.
This is not a unique concept but in my opinion, the minimum requirement to effectively bring machine learning into production irrespective of product and business teams.
Machine learning, to efficiently compute the expected outcome, engineering to build the plumbing and analytics to make sure the insights can be fed into the business machine.
This concept does, however, come with a few exceptions, most significantly that in some cases unicorns exist.
The unicorn is an effective hybrid of those three subspecies, a proverbial chosen one who is capable of creating amazing algorithms, attaching all the engineering pipes and deriving its business value over time with ease.
Whether you are a unicorn or a member of the data triage I am certain that you will at some stage encounter similar problems to what I have thus far.
Hopefully, with the right objectives, a little ingenuousness and the right team you will bring that world dominating AI into production.
More importantly, a balance between Data and Product needs to be made, such a relationship is often fragile, tenuous and requires consistent vigilance to maintain.