However, it is our experience from the last two months that many candidates treat these two roles interchangeably. We noticed lots of people with ML background applying to our Data Engineering team. Nearly 78% of candidates had only ML-related experience and 11% of candidates mentioned data engineering in their applications.
We checked our role description and it was OK with us. We are looking for someone who has worked in a cloud environment, with large volumes and a variety of data (using both RDBMs and cluster processing). Someone who knows how to set up a healthy backbone for our ML/DL models with some basic knowledge of how models work and what type of data they ingest. The perfect candidate would have a set of skills from the ones presented in the following diagram:
They should be able to build scaled cloud solutions with existing providers (AWS, GCP, Azure) and manipulate data using these structures (simple storage like S3 and data streaming in Firehose). They should also be able to use existing infrastructure like Oracle or Postgres DBs to create pipelines (or big data equivalents like AWS Athena or GCP Big Query) important for stakeholders. And, finally, they should be able to scale it out with computational frameworks to adopt big data analysis.
We know that back in the old Data Science times (around 2015) this would be like searching for a unicorn.But this type of specialization is more cohesive and groups skills and technologies close to each other.
Still, candidates declared more ML-related skills (training classification models, deep learning neural networks, hyperparameter search). Some of them had experience in working with cloud storage such as S3. They rarely mentioned ETL pipelines, building data warehouse solutions or parallel computing clusters.
And how is it in your Data Science departments? Did you notice similar trends? Or maybe the contrary? Please share your story with us.