Microsoft Fabric Analytics Engineer (DP-600) — Complete Guide
OneLake · Lakehouse · Data Warehouse · Semantic Models · DAX · Dataflows · Pipelines · DirectLake · Real-Time Analytics · Scenarios · Cheat Sheet
Top Hashtags: #MicrosoftFabric, #DP600, #FabricAnalytics, #OneLake, #DataEngineering, #SemanticModels, #DAX, #PowerBI, #DataAnalytics, #MicrosoftCertified
Table of Contents
- Exam Overview & Microsoft Fabric Platform
- Plan, Implement & Manage Analytics Solutions (10–15%)
- Prepare and Serve Data (40–45%)
- Implement and Manage Semantic Models (20–25%)
- Explore and Analyze Data (20–25%)
- Scenario-Based Questions
- Cheat Sheet — Quick Reference
1. Exam Overview & Microsoft Fabric Platform
DP-600 Exam at a Glance
The DP-600 Microsoft Fabric Analytics Engineer exam validates expertise in designing, creating, and managing enterprise-scale analytics solutions using Microsoft Fabric. Candidates should have subject matter expertise in designing, creating, and managing analytical assets such as semantic models, warehouses, or lakehouses — and be able to query and analyze data using SQL, KQL, and DAX.
| Skill Domain | Exam Weight |
|---|---|
| Plan, implement, and manage a solution for data analytics | 10–15% |
| Prepare and serve data | 40–45% |
| Implement and manage semantic models | 20–25% |
| Explore and analyze data | 20–25% |
Key fact: Candidates should have expertise in designing, creating, and deploying enterprise-scale data analytics solutions using Microsoft Fabric components like lakehouses, data warehouses, notebooks, dataflows, data pipelines, semantic models, and reports.
What is Microsoft Fabric and how does it unify the analytics platform?
Microsoft Fabric is a unified, end-to-end analytics platform that brings together data engineering, data integration, data warehousing, data science, real-time analytics, and business intelligence in a single SaaS offering.
Microsoft Fabric workloads (all in one platform):
┌─────────────────────────────────────────────────────────┐
│ Microsoft Fabric │
├──────────┬──────────┬──────────┬──────────┬─────────────┤
│ Data │ Data │ Data │ Data │ Real-Time │
│Engineering│Warehouse│ Science │ Factory │ Analytics │
│(Lakehouse│(SQL │(Notebooks│(Pipelines│(Eventstream │
│ Spark) │ Analytics│ ML) │ Dataflows│ KQL DB) │
├──────────┴──────────┴──────────┴──────────┴─────────────┤
│ Power BI (Business Intelligence) │
├─────────────────────────────────────────────────────────┤
│ OneLake (Storage) │
└─────────────────────────────────────────────────────────┘
Key unification points:
→ OneLake: single storage layer — all workloads read/write the same data
→ One security model: workspace-level and item-level permissions
→ One governance: Microsoft Purview integration for data lineage and cataloguing
→ One billing: Fabric capacity (F-SKUs) covers all workloads
→ No data silos: Lakehouse tables appear in Warehouse, Power BI, Notebooks
What is OneLake and why is it the foundation of Fabric?
OneLake:
→ Fabric's single, unified data lake — built on Azure Data Lake Storage Gen2
→ One lake per Fabric tenant (like OneDrive for data)
→ All Fabric items (Lakehouses, Warehouses, KQL Databases) store
their data in OneLake automatically
→ Delta Parquet format: all table data stored as Delta Lake tables
→ No data duplication: Warehouse tables are Delta tables in OneLake;
same files accessible from Lakehouse SQL endpoint, Notebooks, Power BI
OneLake hierarchy:
Tenant → Workspaces → Items (Lakehouses, Warehouses, KQL DBs)
(like folders) (each item = a folder in OneLake)
Shortcuts:
→ Virtual links to data stored OUTSIDE OneLake (or in other Fabric items)
→ No data movement — shortcut references the data in place
→ Sources: Azure Data Lake Storage Gen2, Amazon S3, Google Cloud Storage,
another Fabric workspace (cross-workspace shortcut)
→ Use case: query S3 data from a Fabric Lakehouse without copying it
OneLake file explorer:
→ Windows app that mounts OneLake as a network drive (like OneDrive sync)
→ Browse and manage OneLake files directly from Windows Explorer
What are the Fabric experience roles and when do you use each?
Lakehouse (Data Engineering experience):
→ Combines a data lake (unstructured files) + a data warehouse (Delta tables)
→ Two endpoints automatically created:
SQL analytics endpoint: T-SQL read-only access to Delta tables
Semantic model: default Power BI dataset over all Delta tables
→ Storage: Files section (raw files) + Tables section (Delta Lake tables)
→ Compute: Apache Spark notebooks or Spark pipelines
→ Use for: raw data ingestion, Spark transformations, medallion architecture
Data Warehouse (Data Warehouse experience):
→ Enterprise SQL data warehouse with full T-SQL read/write support
→ Columnar storage, MPP (Massively Parallel Processing)
→ Supports: INSERT, UPDATE, DELETE, CREATE TABLE, stored procedures
→ Storage: Delta tables in OneLake (same as Lakehouse!)
→ Cross-database queries: query Lakehouse SQL endpoint from Warehouse
→ Use for: structured analytics, BI reporting, complex SQL workloads
KQL Database (Real-Time Analytics experience):
→ Optimised for time-series, log, and telemetry data
→ Query with Kusto Query Language (KQL)
→ Ingestion from: Eventstream, Azure Event Hubs, IoT Hub, REST API
→ Use for: IoT telemetry, application logs, clickstream, security events
Notebooks (Data Science / Data Engineering):
→ Interactive Jupyter notebooks with Spark (Python, Scala, R, SQL)
→ Connect to Lakehouse as default storage
→ Libraries: pandas, scikit-learn, PyTorch, MLflow, SynapseML
→ Use for: data exploration, ML model training, complex transformations
Dataflows Gen2 (Data Factory experience):
→ Power Query-based ETL — low-code data transformation
→ 150+ connectors (SQL Server, Salesforce, SharePoint, REST APIs)
→ Output: write transformed data to Lakehouse, Warehouse, or other destinations
→ Use for: self-service ETL, non-technical data preparation
Data Pipelines (Data Factory experience):
→ Orchestration tool for complex multi-step data workflows
→ Activities: Copy Data, Notebook, Stored Procedure, Dataflow, Web, Delete
→ Scheduling, monitoring, error handling
→ Use for: orchestrating end-to-end data movement and transformation
2. Plan, Implement & Manage Analytics Solutions (10–15%)
How do you design a Fabric workspace strategy?
Workspace = collaboration unit in Fabric (like a project or team space)
Workspace strategy patterns:
1. Environment-based (recommended for enterprise):
Prod-Sales-WS → production reports and models
Dev-Sales-WS → development and testing
Staging-Sales-WS → pre-production validation
2. Domain-based:
Finance-WS → all Finance items (Lakehouse, Warehouse, Reports)
HR-WS → all HR items
Marketing-WS → all Marketing items
3. Hybrid: domain + environment
Finance-Dev-WS / Finance-Prod-WS
Workspace capacity:
→ Each workspace assigned to a Fabric capacity (F2, F4, F8, F16 ... F2048)
→ Capacity determines: compute power, storage throughput, concurrency
→ Premium capacity (P-SKUs) or Fabric capacity (F-SKUs) required for
most Fabric features (Lakehouse, Warehouse, etc.)
→ Free tier: limited features (Power BI reports only)
F-SKU vs P-SKU:
F-SKUs (Fabric capacities): pay-as-you-go or reserved
P-SKUs (Power BI Premium): legacy, being phased out in favour of F-SKUs
F64 ≈ P1 in compute equivalence
Workspace roles:
Admin: full control — manage members, settings, delete workspace
Member: create/edit/delete all items, manage permissions on items
Contributor: create/edit items they own, cannot delete others' items
Viewer: read-only access — cannot create or edit items
What is Git integration in Microsoft Fabric and why does it matter?
Fabric Git integration:
→ Connect a Fabric workspace to an Azure DevOps Git repository (or GitHub)
→ Fabric items serialised to YAML/JSON definition files in the repo
→ Enables: version control, code review (PRs), branching strategies,
rollback, CI/CD deployment pipelines
Supported items for Git sync:
→ Semantic models (Power BI datasets)
→ Reports
→ Notebooks
→ Data pipelines
→ Dataflows Gen2
→ Lakehouses (metadata, not data)
→ Warehouses (metadata, not data)
Git workflow:
Developer → edits item in Fabric workspace
→ commits changes to feature branch in Git
→ raises Pull Request for review
→ approved PR merges to main branch
→ CI/CD pipeline deploys to production workspace
Deployment Pipelines (Fabric native CI/CD):
→ Move Fabric items between Dev → Test → Production workspaces
→ Deployment rules: map workspace-specific settings
(e.g., different data source connection strings per environment)
→ Compare stages: see what changed between Dev and Production
→ Supported with both manual promotion and automated triggers
Git + Deployment Pipelines together:
Git = source control (version history, branching, review)
Deployment Pipelines = promotion workflow (Dev → Staging → Prod)
Best practice: use both together for enterprise Fabric governance
How do you implement security in Microsoft Fabric?
Fabric security layers:
1. Workspace-level security (Entra ID groups):
Assign Entra ID security groups to workspace roles
All items in workspace inherit the workspace permissions
2. Item-level permissions (share individual items):
Share a specific report or semantic model without granting
workspace access → viewer can only see that one item
3. OneLake data access roles (preview):
Grant read access to specific folders/tables in a Lakehouse
without exposing the entire Lakehouse
4. Semantic model Row-Level Security (RLS):
Define roles in the semantic model with DAX filter rules
Users see only the data their role permits
e.g., UK managers see only UK rows; US managers see only US rows
5. Object-level security (OLS):
Restrict access to specific tables or columns in a semantic model
Hidden tables/columns don't appear in the model for restricted users
6. Warehouse column-level security:
T-SQL GRANT/DENY on specific columns
Restrict PII columns from certain users
7. Dynamic data masking:
Mask sensitive column values (e.g., show **** instead of SSN)
without changing the underlying data
Microsoft Purview integration:
→ Fabric data assets automatically catalogued in Purview
→ Sensitivity labels flow from Purview to Fabric items
→ Data lineage: track data flow from source through transformations to report
→ Scan Fabric workspaces for sensitive information (PII, financial data)
3. Prepare and Serve Data (40–45%)
What is the Medallion Architecture in Fabric and how do you implement it?
Medallion Architecture (Bronze → Silver → Gold):
→ Layered data organisation pattern for lakehouses
→ Each layer increases data quality and refinement
Bronze (Raw) layer:
→ Land raw data exactly as received — no transformation
→ Append-only: never delete or modify
→ Formats: CSV, JSON, Parquet, Delta, images, whatever the source sends
→ Serves as the "source of truth" — can reprocess from here if needed
→ Implementation: Files section of Lakehouse or Bronze Lakehouse
Silver (Cleansed/Conformed) layer:
→ Validated, deduplicated, standardised data
→ Apply business rules: correct data types, handle nulls, join related data
→ Delta Lake tables: schema enforced, history preserved
→ Implementation: Silver Lakehouse (separate workspace item)
Gold (Business/Serving) layer:
→ Aggregated, business-ready data modelled for analytics
→ Star schema: fact and dimension tables
→ Optimised for query performance (ZORDER, vacuum, partitioning)
→ Powers semantic models, reports, dashboards
→ Implementation: Gold Lakehouse or Fabric Data Warehouse
Fabric implementation:
Bronze Lakehouse → Notebook/Pipeline → Silver Lakehouse
→ Notebook/Pipeline → Gold Lakehouse/Warehouse
→ Semantic Model → Power BI
Shortcuts between layers:
→ Silver Lakehouse shortcuts Bronze files — no data copy
→ Gold Warehouse shortcuts Silver Lakehouse tables
→ Same physical data, different logical views
What are the data ingestion options in Microsoft Fabric?
1. Data Pipelines (Copy Data activity):
→ Orchestrated, scheduled data movement
→ 100+ connectors: SQL Server, Oracle, Salesforce, REST APIs, SAP, etc.
→ Supports: full load, incremental load (watermark or Change Data Capture)
→ Configure in: Fabric workspace → New item → Data Pipeline
→ Best for: production ETL, scheduled batch ingestion, complex orchestration
2. Dataflows Gen2 (Power Query):
→ Low-code ETL with 150+ connectors
→ Transformations: filter, merge, pivot, custom columns, M language
→ Output destinations: Lakehouse, Warehouse, Azure SQL, etc.
→ Best for: self-service ETL, non-technical data prep, connecting to
data sources that pipelines don't natively support well
3. Notebooks (Spark):
→ Python, PySpark, Scala, R, Spark SQL
→ Full programmatic control — complex transformations, ML prep
→ Write directly to Lakehouse Delta tables
→ Best for: complex transformations, ML feature engineering, custom logic
4. Eventstream (Real-Time Analytics):
→ Ingest streaming data: Azure Event Hubs, IoT Hub, Kafka, custom endpoints
→ Route, transform, and fan-out streams to: KQL Database, Lakehouse, Warehouse
→ Best for: IoT telemetry, application events, real-time data feeds
5. Mirroring:
→ Continuously replicate data FROM external databases INTO OneLake
→ Sources: Azure SQL Database, Azure Cosmos DB, Snowflake, Azure Databricks
→ Near real-time replication — no pipeline needed
→ Data appears as Delta tables in Fabric — queryable immediately
→ Best for: zero-ETL analytics on operational data
6. COPY INTO (Warehouse T-SQL):
→ Bulk load Parquet or CSV files from OneLake into Warehouse tables
COPY INTO dbo.Sales
FROM 'https://onelake.dfs.fabric.microsoft.com/.../Files/sales.parquet'
WITH (FILE_TYPE = 'PARQUET');
How do Delta Lake tables work in Fabric Lakehouses?
Delta Lake:
→ Open-source storage layer that adds ACID transactions to data lakes
→ All Lakehouse Tables are Delta Lake tables in Fabric
→ File format: Parquet data files + _delta_log (transaction log as JSON)
Delta Lake key features:
ACID transactions: reads and writes are atomic — no partial data
Schema enforcement: reject data that doesn't match table schema
Schema evolution: add/rename columns without rewriting data
Time travel: query historical versions of the table
SELECT * FROM lakehouse.Sales VERSION AS OF 5 -- version 5
SELECT * FROM lakehouse.Sales TIMESTAMP AS OF '2025-01-01'
Upserts (MERGE): update + insert in one operation
Creating and writing Delta tables in Notebooks:
# PySpark — write DataFrame as Delta table:
df.write.format("delta").mode("overwrite").saveAsTable("Sales")
# Append mode:
df.write.format("delta").mode("append").saveAsTable("Sales")
# Upsert (MERGE):
from delta.tables import DeltaTable
deltaTable = DeltaTable.forName(spark, "Sales")
deltaTable.alias("target").merge(
updates_df.alias("source"),
"target.SaleId = source.SaleId"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
Delta maintenance:
OPTIMIZE: compacts small files into larger Parquet files (better query performance)
OPTIMIZE lakehouse.Sales
OPTIMIZE lakehouse.Sales ZORDER BY (Region, SaleDate) -- sort for better pruning
VACUUM: removes old files no longer needed (frees storage)
VACUUM lakehouse.Sales RETAIN 168 HOURS -- keep 7 days of history
V-ORDER: Fabric-specific write-time optimisation for faster reads
Automatically applied when writing Delta tables in Fabric
Uses sorting, row groups, and compression for Power BI performance
What is the Lakehouse SQL Analytics Endpoint?
SQL Analytics Endpoint:
→ Auto-generated, read-only T-SQL interface over all Delta tables
in a Lakehouse — no configuration needed
→ Every Lakehouse gets one automatically
→ Accessible via: SQL Server Management Studio (SSMS), Azure Data Studio,
any JDBC/ODBC tool using the connection string
What you can do:
→ Query Delta tables with T-SQL (SELECT, JOINs, aggregations)
→ Create SQL views, stored procedures, functions
→ Share with business users for SQL-based reporting
→ Connect from Power BI Desktop (DirectQuery or Import)
What you cannot do:
→ INSERT, UPDATE, DELETE on Lakehouse tables from SQL endpoint
(data modifications only via Spark notebooks or pipelines)
→ CREATE TABLE with new data (read-only for tables created by Spark)
Cross-item queries (Fabric virtual network of items):
-- Query Lakehouse table from Warehouse:
SELECT s.*, c.CustomerName
FROM [Sales_Lakehouse].[dbo].[FactSales] s
JOIN [DimCustomers_Warehouse].[dbo].[Customer] c
ON s.CustomerId = c.CustomerId
-- Cross-workspace query:
SELECT * FROM [OtherWorkspace_Lakehouse].[dbo].[FactOrders]
-- Requires appropriate permissions on the other workspace
What is the Fabric Data Warehouse and how does it differ from Lakehouse?
Feature Lakehouse Data Warehouse
Primary compute: Apache Spark (notebooks) T-SQL (SQL engine)
DML support: Spark only (not SQL EP) Full T-SQL (INSERT/UPDATE/DELETE)
Schema: Schema-on-read (flexible) Schema-on-write (enforced)
Best for: Big data, ML, raw data BI reporting, structured analytics
Tables format: Delta Lake in OneLake Delta Lake in OneLake (same!)
SQL endpoint: Read-only (SQL analytics EP) Full read/write T-SQL
Stored procedures: No Yes
Cross-DB queries: Via shortcuts Via cross-database T-SQL
Streaming ingest: Via Spark notebooks Via pipelines or COPY INTO
Star schema: Can implement Recommended approach
Fabric Warehouse T-SQL capabilities:
→ CREATE TABLE, CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION
→ INSERT, UPDATE, DELETE, MERGE (full DML)
→ Distributed transactions across warehouse tables
→ Column-level security and row-level security
→ Automatic statistics maintenance for query optimisation
→ Cross-database queries: query Lakehouse SQL endpoint tables
→ COPY INTO: bulk load from OneLake files
When to use Warehouse over Lakehouse:
→ Traditional BI/DWH team comfortable with T-SQL
→ Need stored procedures and full DML for complex ETL
→ Strict schema enforcement required
→ Multiple data consumers using BI/reporting tools via SQL
→ Existing SQL Server/Synapse Analytics migration
4. Implement and Manage Semantic Models (20–25%)
What is a Fabric semantic model (Power BI dataset)?
Semantic model:
→ The business logic layer on top of data — defines metrics, relationships,
hierarchies, and calculations for business users and reports
→ All Power BI reports connect to a semantic model
→ Every Lakehouse and Warehouse auto-creates a default semantic model
→ Custom semantic models can be created manually in Power BI Desktop
and published to Fabric workspace
Semantic model components:
Tables: imported or DirectQuery/DirectLake views of data
Relationships: define how tables join (one-to-many, many-to-many)
Measures: DAX calculations (Total Sales, YoY Growth, Profit Margin)
Calculated columns: row-level computed values in a table
Hierarchies: drill paths (Year → Quarter → Month → Day)
Perspectives: subset views of the model for different audiences
Row-Level Security (RLS): data access filters per user/role
Connectivity modes:
Import: data copied INTO the semantic model (fast, but has refresh lag)
DirectQuery: queries go to source on each visual interaction (live, slower)
DirectLake: Fabric-only: reads Delta files directly — combines speed
of Import with freshness of DirectQuery
Composite: mix of Import and DirectQuery tables in one model
What is DirectLake mode and why is it a game-changer?
DirectLake (Fabric-exclusive):
→ Power BI reads Delta Parquet files from OneLake directly
→ No data import, no DirectQuery round-trips to a database
→ Transcoding: Delta Parquet → in-memory columnar format on first access
→ Parquet data stays hot in memory — subsequent queries serve from memory
Performance comparison:
Import: Fast (in-memory), but data is stale until next refresh
DirectQuery: Always fresh, but every visual = live SQL query (slower)
DirectLake: Fast AS Import (in-memory after first load) AND fresh
AS DirectQuery (always reads latest Delta files)
How DirectLake works:
1. Delta table updated (by pipeline, notebook, or streaming)
2. Power BI report loads — no import/refresh needed
3. On query: Fabric engine reads Delta files, transcodes to memory
4. Subsequent queries: served from in-memory transcoded data
5. When Delta table updated again: cache invalidated, re-read on next query
DirectLake limitations (when it falls back to DirectQuery):
→ Complex DAX not optimisable in DirectLake engine
→ Non-Delta data sources
→ Exceeds row/column limits for the capacity tier
DirectLake framing (parquet file optimisation):
→ Framing: Fabric creates an optimised version of Delta table for DL
→ Run manually: Semantic model → Refresh → Frame table
→ V-ORDER on write automatically helps DirectLake performance
→ Avoid excessively wide tables (many columns) — increases frame size
What are the key DAX patterns for DP-600?
Measures vs Calculated Columns:
Calculated column: evaluated row by row at data refresh time
stored in the model, increases model size
use for: row-level values used in slicers/filters
Measure: evaluated at query time based on filter context
not stored, always calculated on the fly
use for: aggregations, KPIs, dynamic calculations
ALWAYS prefer measures over calculated columns
Common DAX functions tested:
CALCULATE — modify filter context:
Total Sales EU = CALCULATE([Total Sales], Region[Region] = "EU")
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
FILTER — row-by-row table filter:
High Value Sales = CALCULATE([Total Sales],
FILTER(Sales, Sales[Amount] > 10000))
SUMX / AVERAGEX / MAXX (iterator functions):
Weighted Avg Price = DIVIDE(
SUMX(Sales, Sales[Quantity] * Sales[UnitPrice]),
SUM(Sales[Quantity]))
Time intelligence:
YTD Sales = TOTALYTD([Total Sales], Date[Date])
MTD Sales = TOTALMTD([Total Sales], Date[Date])
QTD Sales = TOTALQTD([Total Sales], Date[Date])
YoY Growth = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
Rolling 12M = CALCULATE([Total Sales],
DATESINPERIOD(Date[Date], LASTDATE(Date[Date]), -12, MONTH))
ALLSELECTED — include slicer filters but remove model filters:
Market Share = DIVIDE([Total Sales],
CALCULATE([Total Sales], ALLSELECTED(Product[Category])))
Window functions (DAX 2023+):
Rank = RANKX(ALL(Product[Product]), [Total Sales],, DESC)
Running Total = CALCULATE([Total Sales],
WINDOW(1, ABS, 0, REL, ORDERBY(Date[Date])))
Variables (always use for readability and performance):
YoY % =
VAR CurrentSales = [Total Sales]
VAR PriorYearSales = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
VAR YoYGrowth = DIVIDE(CurrentSales - PriorYearSales, PriorYearSales)
RETURN
IF(ISBLANK(PriorYearSales), BLANK(), YoYGrowth)
What are calculation groups and when do you use them?
Calculation groups:
→ Reusable sets of DAX calculations applied to any measure in the model
→ Solve the "many measures × many time periods" explosion
→ Defined in Tabular Editor (external tool — not native in Power BI Desktop)
Without calculation groups:
Sales Current Month, Sales Prior Month, Sales YoY, Sales YTD
Profit Current Month, Profit Prior Month, Profit YoY, Profit YTD
Revenue Current Month, Revenue Prior Month, Revenue YoY, Revenue YTD
→ 12 measures! For 10 base measures: 40 measures to maintain
With calculation groups:
Time Intelligence calculation group with items:
Current: SELECTEDMEASURE() -- passes through whatever measure is selected
Prior Month: CALCULATE(SELECTEDMEASURE(), PREVIOUSMONTH(Date[Date]))
YoY: CALCULATE(SELECTEDMEASURE(), SAMEPERIODLASTYEAR(Date[Date]))
YTD: TOTALYTD(SELECTEDMEASURE(), Date[Date])
→ 4 calculation items × any number of base measures = instant time intelligence
→ User selects measure (Sales) + time item (YoY) → sees Sales YoY automatically
Other calculation group use cases:
→ Currency conversion: apply FX rates to any financial measure
→ Format strings: dynamic number formatting per calculation item
→ Scenario comparison: Actual vs Budget vs Forecast for any metric
Precedence:
Multiple calculation groups applied to one measure:
Time Intelligence (precedence 10) applied first
Currency (precedence 5) applied second
Lower precedence number = applied later (outer context)
What is Row-Level Security (RLS) in semantic models?
Row-Level Security (RLS):
→ Restricts which data rows a user can see in reports
→ Defined as roles with DAX filter expressions in the semantic model
→ Works in Power BI service — enforced by the service (users cannot bypass)
Static RLS:
Create role "UK Sales" with table filter:
Geography[Country] = "UK"
→ Users assigned to "UK Sales" role see ONLY UK rows
Dynamic RLS (most common for enterprise):
Create role "Employee Data Filter" with table filter:
SalesPerson[Email] = USERPRINCIPALNAME()
→ User alice@contoso.com sees only rows where SalesPerson email = alice@contoso.com
→ One role definition, automatically filters per signed-in user
RLS implementation steps:
1. Power BI Desktop: Modelling → Manage Roles → New Role → add DAX filter
2. Publish to Fabric workspace
3. Semantic model → Security → assign Entra ID users/groups to each role
4. Test: Semantic model → Security → "Test as role" to verify filter
Object-Level Security (OLS):
→ Hide entire tables or columns from specific roles
→ Hidden objects don't appear in the model at all for restricted users
→ Defined in Tabular Editor (not Power BI Desktop)
RLS vs OLS:
RLS: controls WHICH ROWS a user sees
OLS: controls WHICH TABLES/COLUMNS a user sees
Use both together for comprehensive data security
5. Explore and Analyze Data (20–25%)
How do you query data in Fabric using SQL, KQL, and Notebooks?
T-SQL in Fabric Warehouse or Lakehouse SQL endpoint:
-- Aggregation:
SELECT
d.Year,
d.Quarter,
p.Category,
SUM(f.SalesAmount) AS TotalSales,
COUNT(DISTINCT f.CustomerId) AS UniqueCustomers
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
GROUP BY d.Year, d.Quarter, p.Category
ORDER BY d.Year, d.Quarter, TotalSales DESC;
-- Window function:
SELECT
ProductName,
SalesAmount,
RANK() OVER (PARTITION BY Category ORDER BY SalesAmount DESC) AS RankInCategory,
SUM(SalesAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM FactSales;
-- CROSS DATABASE query (Lakehouse → Warehouse):
SELECT w.ProductKey, w.UnitPrice, l.StockLevel
FROM [SalesWarehouse].[dbo].[DimProduct] w
JOIN [InventoryLakehouse].[dbo].[StockLevels] l
ON w.ProductKey = l.ProductKey;
KQL in KQL Database or Real-Time Analytics:
// Query telemetry data:
SensorReadings
| where Timestamp > ago(1h)
| where DeviceId startswith "factory-"
| summarize avg(Temperature), max(Temperature), count()
by bin(Timestamp, 5m), DeviceId
| order by Timestamp desc
// Detect anomalies:
AppLogs
| where Level == "Error"
| summarize ErrorCount=count() by bin(Timestamp, 1h), AppName
| where ErrorCount > 100 // alert threshold
| render timechart
PySpark in Notebooks:
# Read Delta table:
df = spark.read.format("delta").load("Tables/FactSales")
# Transformation:
from pyspark.sql.functions import col, sum, year, month
monthly_sales = df \
.withColumn("Year", year(col("SaleDate"))) \
.withColumn("Month", month(col("SaleDate"))) \
.groupBy("Year", "Month", "ProductCategory") \
.agg(sum("SalesAmount").alias("TotalSales")) \
.orderBy("Year", "Month")
# Write to Gold Lakehouse:
monthly_sales.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.saveAsTable("GoldLakehouse.MonthlySalesSummary")
What is Activator (Data Activator) in Fabric?
Microsoft Fabric Activator (formerly Data Activator):
→ No-code, event-driven alerting and automation within Fabric
→ Monitors data in real time and triggers actions when conditions are met
Key concepts:
Reflex: the container for data monitoring rules and actions
Object: business entity being monitored (Order, Device, Customer)
Trigger: condition that fires an action (temperature > 80°C,
order value > £10,000, stock level < 100 units)
Action: what happens when trigger fires (Teams alert, Power Automate
flow, send email, call Logic App)
Data sources for Activator:
→ Power BI report visuals (monitor a KPI on a report)
→ Eventstream (real-time streaming data)
→ Fabric Real-Time Analytics KQL query results
Example use cases:
→ Alert operations team via Teams when IoT temperature exceeds threshold
→ Notify sales manager when a high-value deal is marked as Lost in CRM
→ Trigger Power Automate reorder flow when inventory drops below min level
→ Alert finance when daily spend exceeds budget threshold
Integration:
→ Activator natively integrates with Teams, Outlook, Power Automate
→ Accessible in Fabric: New item → Activator (Reflex)
How do you implement data lineage and impact analysis in Fabric?
Lineage view in Fabric:
→ Visual map showing data flow from source to report
→ Access: Workspace → Lineage view (top-right button)
→ Shows: source → Dataflow/Pipeline → Lakehouse → Semantic Model → Reports
Impact analysis:
→ Before modifying a Lakehouse table or Warehouse view, check impact
→ Right-click a Fabric item → Impact analysis
→ Shows: all downstream items that depend on the selected item
→ Critical before: renaming tables, changing column types, deleting data
Scenarios requiring impact analysis:
→ Renaming a column in a Lakehouse table:
Impact analysis → shows which semantic models use this column
→ update semantic models and reports before renaming
→ Modifying a dataflow output:
Impact analysis → shows which Lakehouses and semantic models
receive output from this dataflow
→ Deleting a data source connection:
Impact analysis → shows all pipelines and dataflows using the connection
Microsoft Purview integration:
→ Fabric workspaces automatically scanned by Purview
→ Data map: lineage tracked across Fabric, Azure Data Factory, ADLS
→ Sensitivity labels: propagate from Purview to Fabric items
→ Data catalogue: business glossary terms linked to Fabric tables
→ Access governance: classify and govern Fabric data assets in Purview
6. Scenario-Based Questions
Scenario: Design a medallion lakehouse for a retail company's sales analytics platform.
Requirements: ingest from 5 source systems, daily batch + real-time,
Power BI reports for 500 business users, 3-year data history
Architecture:
SOURCE SYSTEMS:
SQL Server (ERP), Salesforce CRM, Azure Event Hubs (clickstream),
SharePoint (product catalogue), REST API (weather data)
BRONZE Lakehouse — Raw Zone:
Data Pipeline (Copy Data):
→ SQL Server → incremental CDC → Delta tables
→ Salesforce → Dataflow Gen2 → CSV → Bronze Files
Eventstream:
→ Azure Event Hubs → Eventstream → Bronze Lakehouse Delta table
Notebooks schedule (daily):
→ REST API weather data → JSON files → Bronze Files
SILVER Lakehouse — Cleansed Zone:
Notebooks (PySpark, scheduled by Pipeline):
→ Deduplicate customer records (fuzzy match on email/phone)
→ Standardise: date formats, currency, country codes
→ Validate: null checks, referential integrity, range checks
→ Upsert MERGE → Silver Delta tables
Schema: FactSales (raw), DimCustomer (cleansed), DimProduct (cleansed)
GOLD Lakehouse — Serving Zone:
Notebooks (PySpark):
→ Build star schema: FactSales, DimCustomer, DimProduct, DimDate
→ Aggregations: DailySalesSummary, MonthlyRegionalSales
→ Apply OPTIMIZE + ZORDER for query performance
→ V-ORDER enabled automatically in Fabric
SEMANTIC MODEL (DirectLake):
→ Connect to Gold Lakehouse via DirectLake mode
→ Define: measures (Total Sales, YoY Growth, Customer LTV)
→ RLS: regional managers see only their region's data
→ Calculation groups: Current / Prior Period / YTD / Rolling 12M
POWER BI REPORTS:
→ Executive dashboard (summary KPIs)
→ Regional sales report (filtered by RLS)
→ Real-time clickstream dashboard (DirectQuery to KQL Database)
Scenario: A DirectLake semantic model is running slowly. How do you diagnose and fix it?
-
Check fallback to DirectQuery: Open Performance Analyzer in Power BI Desktop. If queries show "Direct Query" instead of "Storage Engine", DirectLake has fallen back. Common causes: complex DAX, non-Delta tables, unsupported features.
-
Check V-ORDER: Fabric applies V-ORDER on Delta writes automatically. Verify the table was written using Spark in Fabric (not copied from external Delta without V-ORDER). Re-write the table via a Fabric notebook to apply V-ORDER.
-
Run OPTIMIZE: small Delta files slow DirectLake framing.
# In a Fabric notebook: spark.sql("OPTIMIZE GoldLakehouse.FactSales ZORDER BY (DateKey, ProductKey)") -
Reduce table width: DirectLake reads all columns in a table into memory. Remove unused columns from the Lakehouse table. Don't include columns that aren't needed in the semantic model.
-
Check capacity: if the Fabric capacity is undersized (F2 or F4), increase to F8 or F16 — DirectLake in-memory transcoding is CPU-bound.
-
Review DAX measures: complex measures force DirectLake to DirectQuery fallback. Simplify measures, use variables, pre-aggregate data in Gold layer rather than computing in DAX.
-
Frame the tables: manually trigger framing for large tables: Semantic model → Settings → Scheduled refresh → Refresh now (includes framing).
Scenario: How do you implement a near real-time analytics solution for IoT sensor data?
Architecture:
IoT Devices
↓
[Azure IoT Hub]
↓
[Fabric Eventstream]
↓ (fan-out to multiple destinations simultaneously)
├── [KQL Database] ← real-time querying (last 30 days hot data)
│ ↓
│ [Activator Reflex] ← alert when temperature > 85°C
│ ↓
│ Teams alert to Operations team
│
└── [Lakehouse (Bronze)] ← all raw events for historical analysis
↓ (hourly Notebook job)
[Lakehouse (Silver)] ← cleansed, validated sensor readings
↓ (daily Notebook job)
[Lakehouse (Gold)] ← aggregated: hourly avg/max per device/location
↓
[Semantic Model (DirectLake)] ← Power BI operational dashboard
↓
[Power BI Real-Time Dashboard] ← auto-refresh every 5 minutes
KQL real-time query:
SensorReadings
| where Timestamp > ago(1h)
| summarize
AvgTemp = avg(Temperature),
MaxTemp = max(Temperature),
AlertCount = countif(Temperature > 85)
by bin(Timestamp, 5m), DeviceId, Location
| order by Timestamp desc
Activator trigger:
Object: DeviceId
Monitor: MaxTemp every 1 minute from Eventstream
Trigger: when MaxTemp > 85 for 3 consecutive readings
Action: Send Teams message to #ops-alerts channel
Scenario: Migrate a traditional Power BI Premium dataset to a Fabric semantic model with DirectLake.
Current state:
→ Power BI Premium workspace with large Import dataset
→ 50GB semantic model, 3-hour daily refresh window
→ 8 reports connected, 200 daily users
→ Data source: Azure Synapse Analytics SQL Pool
Target state:
→ Fabric workspace with DirectLake semantic model
→ No scheduled refresh needed — always current
→ Same reports and users, same RLS rules
Migration steps:
Step 1 — Set up Fabric workspace:
→ Create Fabric workspace on F64 capacity (equivalent to P1)
→ Configure Git integration with Azure DevOps
Step 2 — Create Fabric Lakehouse:
→ New Lakehouse "SalesLakehouse" in the workspace
→ Create Shortcut to the Synapse SQL Pool data
OR
→ Create Data Pipeline: Synapse SQL → Lakehouse Delta tables (initial load)
→ Incremental pipeline: daily CDC from Synapse → Lakehouse (MERGE)
Step 3 — Optimise Lakehouse tables:
→ Notebook: OPTIMIZE + ZORDER on all large fact tables
→ Verify V-ORDER enabled (default in Fabric)
→ Apply partitioning on date columns for FactSales
Step 4 — Create DirectLake semantic model:
→ Fabric workspace → New semantic model
→ Select Lakehouse tables
→ Import existing DAX measures from old .pbix file
→ Re-create RLS roles (DAX filters identical)
→ Re-create calculation groups in Tabular Editor
Step 5 — Reconnect reports:
→ Each report: Change data source → new Fabric semantic model
→ Verify all visuals render correctly with DirectLake
Step 6 — Validate and switch:
→ Run Query Performance Analyzer — confirm DirectLake (not DirectQuery)
→ Compare report output: DirectLake vs old Import model (data parity check)
→ Update app/workspace access for 200 users
→ Decommission old Power BI Premium workspace
Benefits achieved:
→ Eliminated 3-hour refresh window — data always fresh
→ Reduced model size (no data import — reads Delta files directly)
→ Reduced costs: one Fabric capacity covers all workloads
7. Cheat Sheet — Quick Reference
DP-600 Exam Domain Quick Reference
Domain Weight Key Topics
Plan, implement & manage (10-15%) Low Workspaces, Git, Deployment
Pipelines, Security, Purview
Prepare and serve data (40-45%) HIGHEST Lakehouse, Warehouse, Delta,
Pipelines, Dataflows, Mirroring,
Medallion architecture
Implement & manage semantic models (20-25%) High DAX, DirectLake, RLS, OLS,
Calculation groups, Relationships
Explore and analyze data (20-25%) High SQL, KQL, Spark notebooks,
Lineage, Impact analysis, Activator
Fabric Item Selection Guide
Need to... → Use
Ingest streaming data in real time → Eventstream
Query time-series/log data with KQL → KQL Database
Big data transformation with Spark/Python → Lakehouse + Notebooks
SQL-based ETL with full DML → Data Warehouse
Low-code ETL from 150+ sources → Dataflow Gen2
Orchestrate multi-step data workflows → Data Pipeline
No-code alerting when data conditions met → Activator
Replicate operational DB data to Fabric → Mirroring
Create business metrics and KPIs → Semantic Model
Self-service BI for business users → Power BI Reports
Always-fresh Power BI (no scheduled refresh) → DirectLake semantic model
Delta Lake Quick Reference
Write modes:
overwrite → replace all data
append → add new rows (no deduplication)
merge/upsert → update existing + insert new (no duplicates)
Read historical data (time travel):
df = spark.read.format("delta").option("versionAsOf", 5).load(...)
df = spark.read.format("delta").option("timestampAsOf", "2025-01-01").load(...)
Maintenance:
OPTIMIZE tableName → compact small files
OPTIMIZE tableName ZORDER BY (col) → sort for query pruning
VACUUM tableName RETAIN 168 HOURS → clean old files (keep 7-day history)
V-ORDER: Fabric automatically applies on write → faster DirectLake reads
DAX Quick Reference
Key patterns:
Total Sales = SUM(FactSales[SalesAmount])
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
YoY % = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
YTD = TOTALYTD([Total Sales], Date[Date])
Running Total = CALCULATE([Total Sales], DATESYTD(Date[Date]))
Market Share = DIVIDE([Total Sales], CALCULATE([Total Sales], ALL(Product)))
Dynamic = SELECTEDMEASURE() ← inside calculation group item
Variables (always use):
Measure =
VAR x = [Base Measure]
VAR y = CALCULATE([Base Measure], PREVIOUSMONTH(Date[Date]))
RETURN DIVIDE(x - y, y)
Measures vs Calculated Columns:
Calculated columns: row-level, at refresh, stored → use sparingly
Measures: query-time, dynamic, not stored → always prefer
Security Layers Summary
Layer Scope Tool
Workspace roles All items in workspace Entra ID groups → Admin/Member/
Contributor/Viewer
Item-level sharing Specific item Share individual item
OneLake data access Specific folder/table OneLake data access roles (preview)
Semantic model RLS Row-level data filter DAX roles in semantic model
Semantic model OLS Table/column hiding Tabular Editor roles
Warehouse column sec. Column-level access T-SQL GRANT/DENY
Sensitivity labels Item classification Microsoft Purview labels
Medallion Architecture Summary
Layer Storage Purpose Quality
Bronze Files/Delta Raw as-received Untransformed
Silver Delta tables Cleaned, validated Deduplicated, typed
Gold Delta tables Business-ready Aggregated, star schema
Compute per layer:
Bronze: Pipelines (Copy Data) + Eventstream
Silver: Notebooks (PySpark MERGE) + Dataflows Gen2
Gold: Notebooks (aggregations, star schema) + Pipelines
Performance optimisation:
→ OPTIMIZE + ZORDER on Gold tables (query pruning)
→ V-ORDER on all writes (DirectLake speed)
→ Partition large tables by date column
→ VACUUM regularly (remove old Delta files)
Top 10 Tips
-
Prepare and serve data is 40–45% of the exam — the largest domain by far. Prioritise: medallion architecture, Delta Lake operations (OPTIMIZE, VACUUM, MERGE, time travel), Lakehouse vs Warehouse decisions, and all ingestion patterns (Pipelines, Dataflows, Notebooks, Mirroring, Eventstream).
-
DirectLake is Fabric's biggest innovation — it combines Import speed with DirectQuery freshness by reading Delta Parquet files directly. Know: when it falls back to DirectQuery, how V-ORDER helps, how to optimise tables for DirectLake, and how framing works.
-
OneLake unifies everything — all Fabric items (Lakehouse, Warehouse, KQL DB) store data in OneLake as Delta Parquet. The same physical files are accessible from all Fabric experiences. Shortcuts let you reference external data without copying it.
-
Measures over calculated columns always — calculated columns are stored in the model and computed at refresh time. Measures are computed at query time and are dynamic. Using calculated columns where measures should be used is a common design mistake.
-
DAX variables improve both readability and performance — always use
VARto avoid evaluating the same expression multiple times. Variables are evaluated once in the current filter context — referencing them multiple times doesn't re-evaluate. -
Calculation groups solve measure proliferation — instead of creating YTD/YoY/Prior Period versions of every measure, one calculation group applies those calculations to any measure dynamically. This is a senior analytics engineer pattern.
-
RLS is enforced in the service, not the model — users cannot bypass RLS through Power BI Desktop or external tools once published. Dynamic RLS using
USERPRINCIPALNAME()is the enterprise pattern — one role definition scales to all users. -
Impact analysis before modifying anything — always run impact analysis before renaming columns, changing table schemas, or deleting items. Unplanned changes cascade to semantic models and reports, breaking downstream consumers.
-
Git integration + Deployment Pipelines together — Git provides source control and code review (PRs). Deployment Pipelines manage Dev → Staging → Production promotion. Using both together is the enterprise Fabric governance best practice.
-
Know when to use Lakehouse vs Warehouse — Lakehouse (Spark + flexible schema) for big data, ML, and raw/intermediate layers. Warehouse (T-SQL + full DML) for structured BI, stored procedures, and when your team is SQL-native. The Gold layer can use either — choose based on team skills and query patterns.
No comments:
Post a Comment