Tips, Tricks and Topics to study for the DP-700 Fabric Data Engineer Associate

Dec 3, 2024

I took the beta exam for the DP-700 Fabric Data Engineer Associate certification last week, and while I don’t know the results yet, I figured I would share my thoughts on studying for the exam, and a few notes on the topics included on the questions I received.

If you want to schedule the exam, here is the link: Exam DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric (beta) – Certifications | Microsoft Learn

Resources

In my preparation for the Exam, I trawled the limited but great resources out there from community members who already took the test:

Andy Cutler made an excellent link with Docs/Learn references based on the official ‘Skills Measured’ guide: DP-700 Microsoft Fabric Data Engineering Associate – Skills Measured Resources

Kevin Chant put together a good list of resources and links: Checklist for the DP-700 beta exam for the new Microsoft Fabric Data Engineering certification – Kevin Chant

Sam Debruyn made an article not too different from this one, with perspectives worth reading over: I just took the new Fabric DP-700 Data Engineering Exam: here’s what you should know – Sam Debruyn

Ali Stoops has some good reflections as well: Microsoft Fabric Data Engineer Associate (DP-700) Beta – My Experience & Tips

Official Microsoft Collections:

And then I reviewed a bunch of Microsoft Learn articles(both based off of Andy’s list above, but also just browsing through tens, if not hundreds of pages). The reason being, that Microsoft Learn is accessible to you during the exam. Hence, familiarity with the articles, and the search bar, is invaluable.

In no particular order, below are some Learn articles that came in handy for me, and which I was able to look up again during the exam, through the embedded Microsoft Learn tab in the exam software:

Learning Module Collection: https://learn.microsoft.com/en-gb/collections/p34pu1ex4y4r2z
KQL quick reference – Kusto | Microsoft Learn
Roles in workspaces in Microsoft Fabric – Microsoft Fabric | Microsoft Learn
Dynamic data masking in Fabric Data Warehouse – Microsoft Fabric | Microsoft Learn
Row-level security in Fabric data warehousing – Microsoft Fabric | Microsoft Learn
Incrementally load data from Data Warehouse to Lakehouse – Microsoft Fabric | Microsoft Learn
Build large-scale data copy pipelines with metadata-driven approach in copy data tool – Azure Data Factory | Microsoft Learn (Yes – Even ADF articles may prove useful for this exam. I couldn’t find any Fabric specific article about Metadata Driven ingestion, but the topic came up multiple times)
Get started with Git integration – Microsoft Fabric | Microsoft Learn

Topics worth spending time practicing

While the study guide provides a decent overall picture of the topics in scope for the exam, I thought I’d give slightly more granular insight. Below are some of the genres of questions and topics I encountered questions about:

Comparing and choosing the right tool for the job:

Evaluating which Batch Data Ingestion tool is best suited given the context (Often comparing Dataflows, Notebooks, Pipelines)
Evaluating which Streaming Data Ingestion tool is the best fit given the context (Often mentioning Spark Structured Streaming, Eventstreams and even Streaming Dataflows).
Evaluating Transformation tools for Batch Data (Pipelines, Notebooks, Dataflows, Stored Procedures)
Evaluating Transformation tools for Streaming Data (Notebooks, Event Processing in Eventstreams), including very specific details about how to optimize and output certain aggregations.
Evaluating the best place to monitor certain types of things the easiest way possible (e.g. queryinsights views for Fabric Warehouse, Monitor Hub for Notebook runs and more.)
Evaluating when to use Deployment Pipelines in Fabric vs. Azure Pipelines

Configure a Fabric Item to do a certain job:

How to setup Metadata Driven Ingestion with Data Pipelines (as in, which activities do you need to include, how to use and configure control tables, how to use parameters in activities)
How to setup Dynamic Data Masking on Fabric Data Warehouses (knowing the keywords to apply data masking, knowing which masking methods to use to accommodate specific masking/governance requirements).
How to setup row-level security in Fabric Data Warehouse.
How to setup Primary Keys in Data Warehouse.
How to configure Incremental Refresh in Pipelines, Notebooks, Dataflows.

Syntax Review type questions, asking you to fill in the blanks in code, or explain which option performs the intended operation:

Review KQL statements (knowing the right keywords and order of functions for creating tables, creating Aggregations, filtering, displaying columns, creating sorting/window functions
Fill in the blanks when creating DAGs in Notebooks (knowing the right keywords, knowing how to create dependencies)
Distinguish between different PySpark functions, and be able to choose the right one for the desired output.

Platform Related questions

How to setup GIT Integration, both to Azure DevOps and Github, including which pieces of information from DevOps and Github you need during setup, and which permissions you need in those tools (not just which permissions are needed in Fabric).
Shortcut Caching, and what triggers data to be loaded from cache vs from the source.
Data Discovery / Promotion: How to apply Endorsements and Certifications, who can do it, and what is their internal hierarchy.
Domains, and what can be delegated to domain administrators.

Access / Security questions:

Intricate scenarios about granting full/partial/limited access to Items and Workspaces.
- What Workspace Roles grant which types of access.
- How and when to instead share Fabric Items directly.
- How and when to share data directly.
- What permissions are required to create/run Deployment Pipelines
- What permissions are required to modify Workspace Settings and setup GIT Integration
Where and how to configure settings for allowing/preventing Workspace Creation, Fabric Item creation.

Happy Certification Hunting!