Career options in data

Discovering roles and possibilities for newcomers to the world of data and ML.

2023-06-06

Introduction

If you're new to the world of data and machine learning (ML), it's easy to get overwhelmed by the career possibilities. There are many different roles, each filling a different function and requiring different skills. It's hard to know what's out there, what's possible, and what you might enjoy. Since I've been working in the field for a few years now, I want to share what I've learned and help you make an informed decision about your career path.

Firstly, you might be surprised to know that there are more options than just data science and machine learning! I will be focusing on "technical" roles in this post, which generally consist of a mix of data and software work. Some roles are more data-focused, others are more software-focused, and others are in between.

Disclaimers

This post is aimed at people entering the world of data and ML, regardless of their other prior experience. As a result, some of the roles I'll be discussing are not entry-level, but may be accessible to readers with relevant experience.

This post will be full of generalisations. This is because roles, titles, responsibilities, and skills vary widely between companies. Additionally, the lines are fuzzy and continually changing. It's hard to essentialise which roles even exist. Roles are even impacted by which other roles exist at a company!

This is just a general overview of how I personally have seen roles differentiated in the industry. I'm drawing on my experience, coworkers' and friends' experiences, and job postings I've seen. However, I am still biased by my perspective.

Use this post as a starting point so you know what to research further. I want to show you what's possible but not widely known; I'm not claiming to tell you what's universally true.

Roles in data and ML

There are a variety of roles which each require a mix of data and/or machine learning skills along with software engineering skills. Usually, different roles will collaborate together to solve a business need or function. There are three main areas of work: insights, models, and systems.

Roles in the same area share high-level deliverables, but don't necessarily perform similar work. You can think of these areas as interdisciplinary teams, but it's possible that there are several teams or individuals across teams who together fulfil each of these business needs at any company. These are not hard divisions; some roles can cross boundaries between areas.

Insights

Insights teams support data-driven decision-making. To do this, they gather and maintain data for analytics purposes. Then they analyse and interpret the data to derive insights. Finally, they surface these insights to inform business decisions.

Data Analyst

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Data Scientist

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Analytics Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Models

Models teams create and support machine learning (ML) models and services. They design and train models, then productionise and integrate them. They also support the end-to-end ML workflow and optimise the ML infrastructure. These models are usually, but not always, deep learning models.

ML Scientist

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

ML Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

ML Platform Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

ML Accelerators Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Systems

Systems teams build the services and platforms which support operations and analytics. They design and maintain foundational and operational infrastructure, and develop services for internal or external use. They may focus on supporting the core products of the company, or on internal tooling.

Data Platform Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Infrastructure Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Backend Engineer

What they do

Responsibilities:

Sample problems:

Common titles:

Skills they have

Primary skills:

Secondary skills:

Working together

Some roles are similar to one another or work closely together. This can make it hard to differentiate between them, although it also means it's easier to transition between them with some skill-building.

Data and science

Models and insights are often separate teams, but they both depend on data. Without data, there is nothing to draw insights from or to train models on. But without insights and models, there is no value in data.

As a result, analytics engineers are necessary to gather and maintain analytical data for both insights and models. Data analysts, data scientists, and ML scientists are often the stakeholders for analytics engineers' work. If the analytics data platforms are supported by data platform engineers, then analytics engineers may in turn be stakeholders for data platform engineers' work.

Analysts and scientists

Data analysts and data scientists are usually both part of insights teams because they both generally support business decision-making by directly obtaining insights from data. Although ML models can also be used for insights generation, the models themselves are often viewed as products or services, and ML scientists aim to improve them for downstream usage. As a result, companies which build their own ML models often have a separate team to focus on them.

There is some overlap between data scientist and ML scientist skillsets, and the roles sometimes have the same name. For the sake of this post, I'm differentiating them based on both model type (classical vs deep learning) and focus (business vs research).

Because data scientists require the same skills as data analysts do and more, it's commonly seen as a more senior role. In practice, data analysts often work together, with data analysts focusing more on business and communication, and data scientists focusing more on statistics and modelling.

It's very possible to have a successful career as a data analyst without becoming a data scientist. I know someone who transitioned from being a data scientist to a data analyst because they preferred the work, and they're much happier now. However, it's typically easier to get an entry-level job as a data analyst than a data scientist.

Machine learning

ML scientists typically work on research and experimentation, while ML engineers work on productionising and integrating models. For example, ML scientists might experiment with different model architectures, while ML engineers productionise the best one. ML platform engineers work on the infrastructure and tooling that supports ML workflows end-to-end, which includes both ML scientists' and ML engineers' work. Finally, ML accelerators engineers work on optimising models and their training and serving infrastructure to reduce cost and improve performance.

Smaller teams may have only ML scientists or only ML engineers, with the title depending on the primary focus of the role. In this case, people in these roles may be responsible for both types of work. Smaller teams may also not have ML platform engineers, with ML engineers performing that work instead. ML accelerators engineers are uncommon outside of companies which build very large models, have very high performance requirements, or create accelerators themselves.

Platform engineers

Infrastructure engineers, data platform engineers, and ML platform engineers all work on infrastructure and tooling. Infrastructure engineers work on foundational infrastructure, data platform engineers work on data infrastructure, and ML platform engineers work on ML infrastructure. This means that infrastructure engineers build the underlying systems which data platform engineers and ML platform engineers use to build their own systems, which in turn support data and ML processes. As a result, infrastructure engineering is more removed from data and ML work than the other two roles.

Data platform engineers and ML platform engineers perform similar types of roles, but with different focuses. In my experience, ML platform work often focuses on reducing toil in human workflows, while data platform work often focuses on reducing load on systems, but this is not always the case.

Typically, ML platform engineers are part of machine learning teams because they support ML workflows specifically, while data platform engineers are likely to be part of software engineering teams, like infrastructure engineers are. The specific team may depend on the company and the products or services that the data platform engineer role supports.

All three of these roles are often given "Ops" titles: DevOps Engineer, DataOps Engineer, and MLOps Engineer. This is a trend that originally started with the "DevOps" title, and has since been applied to other roles. However, DevOps (and by extension, DataOps and MLOps) is a set of practices and principles that can be applied to any role, not a role in itself.

Data and software

I know what you might be thinking: What about data engineers? As the data engineering role developed, the required skills broadened to encompass both analytics-focused and platform-focused work. "Analytics" data engineers focused on data modelling and tranformation, providing clean data, and supporting insight generation. "Platform" data engineers focused on data products and platforms, requiring more backend and infrastructure skills.

When I first began my career, I noticed that some companies had "data engineers" in two separate parts of the company: the "analytics" or "insights" team, and the "platform" or "software engineering" team. More recently, the distinct titles of "analytics engineer" and "data platform engineer" have emerged to differentiate between these two types of work.

This is a welcome change for me, because it makes it easier to understand what work a role entails and what skills are required. (Especially because I personally enjoy data platform engineering but not analytics engineering!) However, these titles are far from universal, and many companies use the "data engineer" title for both types of work.

I'm not saying that the title "data engineer" should go away. Maybe when the dust settles, we'll end up with "analytics engineer" and "data engineer", or "data engineer" and "data platform engineer". I'm just trying to convey an understanding of the different possible roles, not to prescribe what they should be called.

Many companies won't have a specific "data platform engineer" role, and instead the work will be done by backend engineers. As a result, backend engineers working in data and ML contexts may have to know "data platform" skills. This is because data platform engineering requires strong software engineering skills, and backend engineering is a fairly broad software engineering discipline.

The world has changed

Despite having only worked in the industry for a few years, I've seen a lot of change. Data and ML are having a "moment", and the current situation for newcomers is not the same as it was for me.

When I was looking for my first job, the vast majority of data science and ML job postings required a Master's degree or PhD, even if the degrees weren't in fields relevant to the role. There is a lot less gatekeeping in job requirements now, and more focus on actual skills. This is great for newcomers, because it means that you don't need to spend more time and money on higher education to get a job in the field.

There are also more jobs in ML now due to the ML boom, but there are fewer jobs in the overall market due to the tech crash. There is also more competition for the ML jobs that exist due to the increased interest in the field. I'm not sure how these factors balance out, but from personal experience, it's still hard for employers to find qualified candidates. You may just have to put in more work to stand out.

The quality of hosted and open-source models has leap-frogged in the past couple years across a variety of domains. This reduces opportunities for ML scientists and ML engineers, especially in smaller companies, as companies can use these models instead of building their own. However, there is still demand for applying transfer learning on top of open-source models.

Business interest in ML has increased, which increases opportunity for roles that build systems around models, open-source or not. There has been a huge boom in startups building new products and services which use ML under the hood. Additionally, business insights can't be easily replaced, so there are still many opportunities there.

Finally, ChatGPT and similar tools are making it easier to learn and work independently. I used to parse through dozens, if not hundreds, of articles to build enough understanding to answer each specific question I'd have. Now, you can just ask ChatGPT and it will give you a comprehensive answer that considers both the context of your question and your level of understanding, and it will do it faster than you can even read its response. (Seriously, if you haven't created an account yet, go do it now.)

On the other hand, ChatGPT reduces the need for junior skillsets because intermediate+ engineers can delegate well-defined tasks to ChatGPT instead of junior engineers. However, I think ChatGPT is still a net positive because it enables junior engineers to focus on more interesting work and to learn at an unprecedented speed.

What to do?

Now that you know what's out there, how do you decide what to do? I can't make your decision for you, but I can give you some advice.

First, I suggest that you figure out what role you're most interested in. Even if you don't have the skills for it yet, you can set this role as your north star. Then you can figure out what skills you need to get there, and the path you need to take along the way.

Second, I suggest that you figure out what skills you already have. You may be surprised to find that you already have some skills that are relevant to your desired role. If not, you might instead have skills that are applicable to a different role which you can use as a stepping stone.

Personally, I believe that it's better to be working in a role that you're skilled in at a good company than the role you wanted at a terrible company. Take it from me: I missed out on a ton of career growth and compensation, only to realise that I didn't want the role that I thought I did anyway. But of course, you want to be happy with your role too, and you may be lucky to find the right role at a good company.

Finally, remember that interests change and skills are transferrable. Even if the role you take isn't totally aligned with your interests, it might be the best for your growth and future opportunities. You can potentially transfer roles within a good company too.

You might not have the skills for your desired role right now, but you can get there. I believe in you, and I'm sure that you have people in your life who believe in you too. You've got this!

If you want to build your skills and experience, I suggest that you check out my post on data projects.