Studying for the DP-100 Azure Data Scientist Exam
My study plan for becoming an Azure Data Scientist Associate
Last week I took and passed the DP-100 exam and am officially a Microsoft Certified: Azure Data Scientist Associate. This was my first attempt at the exam and I passed with a 934 out of 1000 with a minimum of 700 points needed to pass the exam.
In this article I’ll cover the high-level exam topics, my study plan, and what I would have changed now that I’ve taken and passed it.
What is the DP-900 Exam
The DP-100 exam measures your ability to design and administer data science solutions on the Azure platform. At a broad level this covers the following areas:
- Compute, Endpoint, Dataset, User, Notebook, and Experiment Management in Azure Machine Learning Studio
- Automated ML through Azure Machine Learning Studio
- Automated ML through the Python SDK
- The Azure Machine Learning Studio Designer
- Manual experiments using the Python SDK
- Hyperdrive Experiments
- Using Databricks as Attached Compute resources
- Common algorithms, tasks, and metrics from machine learning experiments
- Fairness and Privacy
- Model Explainability
This is a somewhat intimidating list of things to cover in an exam, though it becomes less daunting if you have taken the AI-900 Azure AI Fundamentals exam first.
The exact contents of the exam change from time to time, so make sure you visit the official exam page for their current topics before starting to study or taking the exam yourself.
Benefits of Passing the Exam
Once you pass the DP-100 exam you earn the Microsoft Certified: Azure Data Scientist Associate certification and associated badge for a full calendar year before the credential expires.
Note: Microsoft currently allows people to re-test and renew certifications for free using their web browser
Having this certificate can help in getting promotions or raises or getting entirely new jobs. In my case, the certification helped me build the confidence that I have adequately learned machine learning on Azure well enough to teach it to others and represent it in the community. The certification may also help my odds of being accepted to speak at conferences or user groups as well.
How I studied
It took me about 6 months off and on to study before I felt ready to take this certification exam. I don’t recommend you spend as long as I did, but let’s explore my approach and the resources that I used in case it helps you on your own journey.
Azure AI Fundamentals Exam
The first thing I did in studying was to pass the AI-900 Azure AI Fundamentals exam last fall. While the DP-100 exam does not have any pre-requisites, I do strongly recommend that you take and pass the AI-900 exam in advance because the AI-900 will give you early exposure to Azure Machine Learning Studio, machine learning tasks and metrics, and fairness and explainability.
I have an entire article and video available on studying for that exam if you’ve not yet taken it.
Unlike the associate and expert-level certification, the fundamentals certifications do not expire.
Machine Learning Projects
I ran a number of machine learning projects using Azure Machine Learning Studio’s Automated ML and Designer. Most notable of these projects was my experiment that conclusively proved that Die Hard is a Christmas movie using an Automated ML classification task.
Running a few experiments in Azure Machine Learning Studio helped me understand the extent of the user interface, model explanations and metrics, and compute management.
One of the things that I did here that I am very glad I did was I deployed the model as a real-time prediction endpoint using an Azure Container Instance.
Playing around with the Designer also helped me understand the interrelationships between tasks and how individual models may reuse their results when you make subsequent runs.
No exam prep is complete without going through the relevant Microsoft Learn modules for that exam. In fact, these modules are so important that Microsoft links to them directly from the bottom of the exam page!
I went through each one of these learning paths, read everything, and did every lab. I paid extra attention to the knowledge check questions at the end of each module as well since these questions - or ones similar to them - often appear on exams. These questions also give you a great way of validating your understanding of the material as you go through.
I read a pair of books on the DP-100 exam. One book, from IP Specialist, seemed to be almost completely plagiarized from MS Learn and unworthy further of mention here.
The other book, Azure Data Scientist Associate Certification Guide by Andreas Botsikas and Michael Hlobil was a great resource to use and one I’d strongly recommend. The one drawback of this book is that I felt it was somewhat light on materials using the Python SDK and topics on Databricks.
I watched a number of online videos about DP-100 from platforms I use for learning.
Pluralsight has a DP-100 learning path that contained roughly 8 hours of content when I watched it, but more content is being added to this path at the time of this writing. I found the Pluralsight content useful for a familiarization perspective, but maybe not comprehensive enough for certification prep work. Still, it was useful.
Coursera partnered with Microsoft to offer a DP-100 test prep specialization which I found to be very helpful. Additionally, at the time of that I took it, they offered a discount on the actual exam by completing the specialization.
Udemy has a number of resources out there. The best one I encountered was the DP-100 Microsoft Azure Data Scientist Complete Exam Prep course by Scott Duffy. This was fine, but not in too much depth and in general I’d recommend the Coursera option instead.
A Cloud Guru offers 3 separate DP-100 courses, but the content is old and has not aged well and so I cannot recommend it.
Cloud Academy offers a DP-100 exam prep learning path that worked well and featured interactive lab environments, giving it a very hands-on approach. While these labs had issues in Fall of 2020 and both Cloud Academy and Azure support were not exceptionally helpful in getting the issues resolved, they were resolved and working when I looked into them in the spring of 2022.
I took the Machine Learning Engineer on Microsoft Azure Nanodegree from Udacity and completed it in about 10 days.
This was a very good course, though the content is somewhat out of date in places.
The place where this Nanodegree shined, however, was the major projects they had you build. This exercise forced me to write code in the Python SDK to conduct Auto ML experiments, do hyperparameter tuning, get, evaluate, and deploy models, and more. This essentially was the missing piece for me technically and complimented my Azure Machine Learning Studio portal experience quite well.
The final project of this course was to run a machine learning project of your own choosing and this added freedom and flexibility was wonderful to have. I wound up building a regression model to predict the total number of penalty minutes that would be assessed in a given hockey game matchup. It wasn’t a perfect experiment, but it gave me comfort and experience using the SDK in a guided environment.
Unfortunately, I do find some of Udacity’s policies on cancellation to be somewhat predatory in nature against newer learners and have attempted to get connected with a community manager regarding my concerns, but Udacity was unwilling discuss it beyond the support triage level at the time of this writing, so I do caution you should you wish to investigate Udacity further, particularly if you are a newer learner or have very limited finances.
In the week of the exam, I scanned the various sources of content I used while learning and created a number of flash cards for each topic. For things that I had no exposure to or seemed shaky on, I created even more flash cards. I then reviewed the book and MS Learn materials and jotted down notes on these cards.
The day of the exam I looked over the knowledge check quizzes in MS Learn and reviewed my notecards until I was ready to go.
The Exam Experience
I took the exam online from my guest bedroom. I needed to present my driver’s license (or some other valid legal photo ID) and photograph the room, but didn’t wind up interacting with the test proctor at all, which helped make me comfortable.
My exam wound up being less than 50 questions with some multiple choice options, some “hot area” select an option choices, and some drag and drop tasks to create a sequence of events.
The exam took me about 35 minutes total and was different than I expected. I anticipated deeper or more nuanced technical questions while the questions on my exam were higher-level typically and often dealt with strategic decision-making. I liked this approach since the technical details are things you’d easily search for in the real-world when stuck while the high-level approach is more important to internalize. However, this wasn’t the approach I prepared for!
I knew about 10 questions in that I was going to pass the exam unless my luck with questions changed. As soon as I finished the exam and evaluated the exam feedback section the system gave me my score and told me that I passed by a comfortable margin. Within a few hours I had a formal E-Mail to that nature and could claim my certification badge.
What I should have done differently
If I had to study for the DP-100 exam again, I’d do a few things differently. Some of these may be things that I encountered on the exam while others might be things I felt that I could grow further in.
Security and Administration
I feel the majority of the things I missed or felt low confidence on were around permissions management and security. You may benefit from taking the SC-900 Azure Security and Identity Fundamentals certification before DP-100 and I am considering studying for SC-900 to address this weak area of mine.
I did most of my experimentation on a compute instance to try to save money. This wound up being fairly tedious and possibly hurt me overall because the compute instance had to be manually activated whereas a cluster could dynamically add nodes to meet demand. Additionally, using Databricks as an attached compute for practice would have been helpful.
See my article on compute management in Azure Machine Learning for additional details on compute instances vs compute clusters.
Azure Kubernetes Service
I never deployed anything to Azure Kubernetes Service (AKS) during my prep work. As a result I didn’t work through any sort of token-based authentication scenario or deal with dynamic scaling configuration. While my experiments didn’t need AKS, AKS is absolutely part of the exam and I should have played with it more - or at least looked over its configuration options!
See my article on compute management in Azure Machine Learning for additional details on Azure Container Instances vs Azure Kubernetes Service.
While I studied the syntax around MLFlow, I never played with it much. Additionally, my preparation for the exam didn’t involve any sort of deployment automation practice or trying to use App Insights to query for specific messages. Evaluating how git repositories and Azure Machine Learning Studio could work together is something I never even studied as well.
I’ve still not run many clustering experiments using Azure Machine Learning Studio or the Python SDK and clustering is a significant and distinct workload in machine learning.
I personally enjoyed the DP-100 exam. While there was a lot on the exam, it was a fun one to take. In fact, of the 4 certification exams I’ve taken through Microsoft, I’d put the DP-100 exam in second place for enjoyment purposes with the AI-900 Azure AI Fundamentals exam ahead of it.
I benefitted a lot from the DP-100 exam and now feel more comfortable using the phrase “Data Scientist” to describe myself. DP-100 gave me more breadth and depth and by studying at a deeper level than was on the exam, I’ve now learned a lot about the individual machine learning algorithms and this knowledge will help me in the future.
While I do not expect to take my Azure Data Scientist certification and get a new job with it or negotiate higher pay at my current job where I teach software engineering, I now feel more capable and competent in the things I’m doing with machine learning. That confidence will translate to better talks in my communities. Additionally, seeing “Microsoft Certified: Azure Data Scientist Associate” next to my name on a conference submission might give some of my topics more credibility and increase my odds of being accepted to speak.
Your journey is different than mine. If you’re just starting out and wondering what you can do with ML in Azure, check out the AI-900 exam. However, if you know you’re interested in Machine Learning on Azure and want to get deeper into it, check out the DP-100 exam and watch how much you’ll learn while studying.