Roles in Data Science

Exploring your options in an exciting new field

Nowadays it seems like everyone wants a job in the data science field. But data science is a field with far more than a single type of job.

In this article we’ll briefly explore the Data Scientist, Data Analyst, Data Engineer, Database Administrator, and AI Engineer roles and explore how their skills and responsibilities vary.

Finally, we’ll explore which role might be a good fit for you.

This content is also available in video form on YouTube

Let’s take a look, starting with the most recognized role first.

Data Scientists

A computer with code on it and a Python book

Photo by Christina @ wocintechchat.com on Unsplash

According to Microsoft, a data scientist is “someone who applies machine learning techniques to train, evaluate, and deploy models that solve business problems.”

I largely agree with this definition, though I believe that data scientists do more than this entails - particularly in working with raw data and preparing, cleaning, and analyzing it for potential use in machine learning solutions.

In my eyes a data scientist’s job entails:

  • Cleaning Data
  • Perform Exploratory Data Analysis
  • Extract Meaningful Features from Data
  • Create Machine Learning Experiments
  • Evaluate Machine Learning Experiments
  • Create Machine Learning Pipelines
  • Monitor Deployed Models

Here data scientists are taking raw data, cleaning it (handling invalid values and ensuring it is in a state where it can be analyzed), performing some basic analysis on it, then taking that data and creating machine learning models. These models are then evaluated, refined, and deployed through machine learning pipelines where they are monitored on an ongoing basis for data drift, accuracy, and bias.

Data Scientists should have the following skills and traits:

  • Analytical thinking
  • Python or R Programming Competency
  • Ability to use a charting library in their language of choice
  • Ability to use a machine learning library in their language of choice
  • Basic knowledge of standard machine learning workloads
  • Understanding of Statistics
  • Familiarity with APIs
  • An understanding of the ethical considerations in machine learning

Data Analysts

A young man analyzing financial data at a sink

Photo by Joshua Mayo on Unsplash

Data Analysts are the ones responsible for looking at raw data to extract insights that can be shared with business stakeholders. Data analysts must have analytical skills as well as a large amount of domain knowledge over the business concerns related to the data they’re analyzing.

Key responsibilities of a data analyst include:

  • Cleaning Data
  • Transforming Data
  • Perform Exploratory Data Analysis
  • Building Reports & Visualization
  • Convert Raw Data to Insights
  • Communicate Insights to Stakeholders

Data analysts should have the following skills and traits:

  • Analytical thinking
  • Proficiency in spreadsheet software (including pivot tables, formulas, and lookups)
  • Knowledge of data visualization techniques
  • Proficiency in a data visualization tool such as (Power BI)[https://powerbi.microsoft.com/] or (Tableau)[https://www.tableau.com/]
  • The ability to write and run SQL queries
  • Understanding of Statistics
  • Written and verbal communication skills
  • Domain knowledge over the data being analyzed

Data Engineers

A woman with a laptop and several monitors overlooking a view from a skyscraper

Photo by Christina @ wocintechchat.com on Unsplash

If data scientists and data analysts are at the front lines working directly with the data, data engineers are the ones supporting them with the raw data they need.

Data engineers create and maintain data ingestion pipelines that ensure useful data is stored in data lakes, data warehouses, and data marts where data analysts, data scientists, and software applications the organization maintains can work with the data.

The duties of a data engineer involve:

  • Creating, configuring and maintaining data storage solutions
  • Creating and maintaining data transformation pipelines using ETL and ELT
  • Cleaning data to ensure it matches the standards required of its end destination
  • Deliver data reliably to locations where other data professionals and applications can use it
  • Ensure data privacy Concerns are respected

In order to accomplish these goals, data engineers must have the following skills:

  • Programming proficiency (including SQL)
  • Knowledge of relational and NoSQL databases
  • Knowledge of data lakes, data warehouses, and data marts
  • Skill with batch and stream processing
  • Skill with ETL and ELT workloads
  • Knowledge of data-related cloud services
  • A wide knowledge of the needs of various teams and applications in the organization
  • An understanding of the various pieces of data flowing through the application

AI Engineers

A close-up view of a tablet displaying a robot’s vision

Photo by ThisisEngineering RAEng on Unsplash

AI Engineers are not usually considered in posts like this, but because their roles are unique

According to Microsoft, “AI Engineers use Cognitive Services, Machine Learning, and Knowledge Mining to architect and implement Microsoft AI solutions.”

AI engineers are not professionals who deal strongly with data, instead they harness machine learning capabilities to do build applications with amazing capabilities. AI engineers build larger applications out of machine learning solutions provided by data scientists or pre-built solutions from services like Azure Cognitive Services.

AI engineers are responsible for:

  • Creating comprehensive user-facing applications
  • Integrating existing and new machine learning offerings
  • Communicating with in-house data scientists
  • Understanding and communicating the risks and tradeoffs of machine learning solutions
  • Creation and maintenance of conversational AI and virtual help agents

In order to achieve this, AI Engineers need the following skills:

  • Programming proficiency - particularly in building reliable systems and working with APIs
  • Knowledge of cloud-based machine learning offerings
  • Data science literacy
  • Communication skills to work with other professionals and to explain the limitations of their solutions
  • User Interface and User Experience design skills or the ability to work with dedicated professionals in those areas

Database Administrators

A woman leaning against a glass wall outside of a server room

Photo by Christina @ wocintechchat.com on Unsplash

Database administrators are not exclusively a data science role, but their skills and responsibilities amplify the effect of data science professionals.

Database administrators are responsible for the ongoing monitoring, performance, security, and maintenance of an organization’s databases. This responsibility also includes concerns around availability and disaster recovery should servers go offline.

More responsibilities include:

  • Monitoring and tuning database performance
  • Providing performance recommendations to application developers and database engineers
  • Implementing a backup and recovery strategy
  • Implementing a high availability strategy
  • Managing database security
  • Granting and revoking privileges to users and applications

Database administrators must have the following skills:

  • A strong understanding of the database technologies being used
  • Computer networking skills
  • Knowledge of data-related cloud services
  • Knowledge of identity and authentication on a cloud service provider
  • Familiarity with legal and organizational policies around data retention and privacy

Which is Right for You?

Okay, so I’ve now given you an overview of 5 major roles in the data science world. How do you determine which one is right for you?

There’s no hard and fast rule for this, though I’ll show you a flowchart in a moment.

For now, let’s evaluate some rough skill levels needed for various roles on a typical basis:

Role Programming Statistics Communication Technical Knowledge Domain Knowledge
Data Scientist Moderate High Medium Medium Medium
Data Analyst Low High High Low High
Data Engineer High None High High High
AI Engineer High Low Medium Medium Low
Database Administrator Low None Medium High Medium

This table may not be accurate for every role out there. For example, in some organizations data analysts may write a significant amount of code to generate interactive data visualizations using D3.js, Python, or other programming languages while other organizations might see data analysts work entirely through dedicated tooling.

However, the table may be helpful as a guideline for those preferring to emphasize certain skills or avoid others.

I also created a flowchart to help guide your decision-making process:

Flowchart of choosing a Data Science Role

Not everyone can read flowcharts, and not everyone can see, so let’s walk through each of these steps in a decision-making process.

First, you must ask yourself: Do I want to work directly with machine learning?

If you do want to be involved in machine learning, a data scientist role might make sense if you prefer to have your hands-on evaluating machine learning models, conducting experiments, and setting up machine learning pipelines. However, if you are more interested in the potential applications of machine learning, the role of an AI Engineer might be one to consider since AI Engineers get to work with machine learning solutions, including those pre-built from services like Azure Cognitive Services.

However, if you’d like to work in the field of data science but don’t directly want to be involved in machine learning or its immediate applications there are a few other questions to ask yourself.

First of all, if you are keenly interested in analyzing data to find and communicate relationships and trends, a data analyst role might be right for you.

If that’s not your thing and you prefer working with code to integrate various sources of data into a centrally available data storage area like a data lake, you should investigate a data engineer role.

Finally, if none of these things seem interested, but you still want to investigate data science and big data, a database administrator role might be one to consider as you get to focus on specific databases and ensure they operate securely and continue to perform well and can be recovered in the event of a catastrophe.

Conclusion

There is no one tried-and-true guide for which role you should pursue, though I’ve offered a few methods above that may help your decision-making process. Ultimately, you should explore and find what interests you.

One way to investigate these roles is to take a free Microsoft Learn module on exploring a few of these roles in more depth.

Another more involved option would be to study for the Azure AI Fundamentals exam which will give you an introduction to most of these roles.

Whichever path you take, I hope you have a fun and memorable data journey.