Sunday, May 17, 2026

Microsoft Fabric Analytics Engineer (DP-600) Complete Guide

Microsoft Fabric Analytics Engineer (DP-600) — Complete Guide

OneLake · Lakehouse · Data Warehouse · Semantic Models · DAX · Dataflows · Pipelines · DirectLake · Real-Time Analytics · Scenarios · Cheat Sheet

Top Hashtags: #MicrosoftFabric, #DP600, #FabricAnalytics, #OneLake, #DataEngineering, #SemanticModels, #DAX, #PowerBI, #DataAnalytics, #MicrosoftCertified


Table of Contents

  1. Exam Overview & Microsoft Fabric Platform
  2. Plan, Implement & Manage Analytics Solutions (10–15%)
  3. Prepare and Serve Data (40–45%)
  4. Implement and Manage Semantic Models (20–25%)
  5. Explore and Analyze Data (20–25%)
  6. Scenario-Based Questions
  7. Cheat Sheet — Quick Reference

1. Exam Overview & Microsoft Fabric Platform

DP-600 Exam at a Glance

The DP-600 Microsoft Fabric Analytics Engineer exam validates expertise in designing, creating, and managing enterprise-scale analytics solutions using Microsoft Fabric. Candidates should have subject matter expertise in designing, creating, and managing analytical assets such as semantic models, warehouses, or lakehouses — and be able to query and analyze data using SQL, KQL, and DAX.

Skill Domain Exam Weight
Plan, implement, and manage a solution for data analytics 10–15%
Prepare and serve data 40–45%
Implement and manage semantic models 20–25%
Explore and analyze data 20–25%

Key fact: Candidates should have expertise in designing, creating, and deploying enterprise-scale data analytics solutions using Microsoft Fabric components like lakehouses, data warehouses, notebooks, dataflows, data pipelines, semantic models, and reports.


What is Microsoft Fabric and how does it unify the analytics platform?

Microsoft Fabric is a unified, end-to-end analytics platform that brings together data engineering, data integration, data warehousing, data science, real-time analytics, and business intelligence in a single SaaS offering.

Microsoft Fabric workloads (all in one platform):
┌─────────────────────────────────────────────────────────┐
│                    Microsoft Fabric                     │
├──────────┬──────────┬──────────┬──────────┬─────────────┤
│  Data    │  Data    │   Data   │   Data   │  Real-Time  │
│Engineering│Warehouse│ Science  │  Factory │  Analytics  │
│(Lakehouse│(SQL      │(Notebooks│(Pipelines│(Eventstream │
│ Spark)   │ Analytics│ ML)      │ Dataflows│  KQL DB)    │
├──────────┴──────────┴──────────┴──────────┴─────────────┤
│              Power BI (Business Intelligence)           │
├─────────────────────────────────────────────────────────┤
│                    OneLake (Storage)                    │
└─────────────────────────────────────────────────────────┘

Key unification points:
→ OneLake: single storage layer — all workloads read/write the same data
→ One security model: workspace-level and item-level permissions
→ One governance: Microsoft Purview integration for data lineage and cataloguing
→ One billing: Fabric capacity (F-SKUs) covers all workloads
→ No data silos: Lakehouse tables appear in Warehouse, Power BI, Notebooks

What is OneLake and why is it the foundation of Fabric?

OneLake:
→ Fabric's single, unified data lake — built on Azure Data Lake Storage Gen2
→ One lake per Fabric tenant (like OneDrive for data)
→ All Fabric items (Lakehouses, Warehouses, KQL Databases) store
  their data in OneLake automatically
→ Delta Parquet format: all table data stored as Delta Lake tables
→ No data duplication: Warehouse tables are Delta tables in OneLake;
  same files accessible from Lakehouse SQL endpoint, Notebooks, Power BI

OneLake hierarchy:
Tenant → Workspaces → Items (Lakehouses, Warehouses, KQL DBs)
         (like folders)   (each item = a folder in OneLake)

Shortcuts:
→ Virtual links to data stored OUTSIDE OneLake (or in other Fabric items)
→ No data movement — shortcut references the data in place
→ Sources: Azure Data Lake Storage Gen2, Amazon S3, Google Cloud Storage,
            another Fabric workspace (cross-workspace shortcut)
→ Use case: query S3 data from a Fabric Lakehouse without copying it

OneLake file explorer:
→ Windows app that mounts OneLake as a network drive (like OneDrive sync)
→ Browse and manage OneLake files directly from Windows Explorer

What are the Fabric experience roles and when do you use each?

Lakehouse (Data Engineering experience):
→ Combines a data lake (unstructured files) + a data warehouse (Delta tables)
→ Two endpoints automatically created:
  SQL analytics endpoint: T-SQL read-only access to Delta tables
  Semantic model: default Power BI dataset over all Delta tables
→ Storage: Files section (raw files) + Tables section (Delta Lake tables)
→ Compute: Apache Spark notebooks or Spark pipelines
→ Use for: raw data ingestion, Spark transformations, medallion architecture

Data Warehouse (Data Warehouse experience):
→ Enterprise SQL data warehouse with full T-SQL read/write support
→ Columnar storage, MPP (Massively Parallel Processing)
→ Supports: INSERT, UPDATE, DELETE, CREATE TABLE, stored procedures
→ Storage: Delta tables in OneLake (same as Lakehouse!)
→ Cross-database queries: query Lakehouse SQL endpoint from Warehouse
→ Use for: structured analytics, BI reporting, complex SQL workloads

KQL Database (Real-Time Analytics experience):
→ Optimised for time-series, log, and telemetry data
→ Query with Kusto Query Language (KQL)
→ Ingestion from: Eventstream, Azure Event Hubs, IoT Hub, REST API
→ Use for: IoT telemetry, application logs, clickstream, security events

Notebooks (Data Science / Data Engineering):
→ Interactive Jupyter notebooks with Spark (Python, Scala, R, SQL)
→ Connect to Lakehouse as default storage
→ Libraries: pandas, scikit-learn, PyTorch, MLflow, SynapseML
→ Use for: data exploration, ML model training, complex transformations

Dataflows Gen2 (Data Factory experience):
→ Power Query-based ETL — low-code data transformation
→ 150+ connectors (SQL Server, Salesforce, SharePoint, REST APIs)
→ Output: write transformed data to Lakehouse, Warehouse, or other destinations
→ Use for: self-service ETL, non-technical data preparation

Data Pipelines (Data Factory experience):
→ Orchestration tool for complex multi-step data workflows
→ Activities: Copy Data, Notebook, Stored Procedure, Dataflow, Web, Delete
→ Scheduling, monitoring, error handling
→ Use for: orchestrating end-to-end data movement and transformation

2. Plan, Implement & Manage Analytics Solutions (10–15%)

How do you design a Fabric workspace strategy?

Workspace = collaboration unit in Fabric (like a project or team space)

Workspace strategy patterns:
1. Environment-based (recommended for enterprise):
   Prod-Sales-WS    → production reports and models
   Dev-Sales-WS     → development and testing
   Staging-Sales-WS → pre-production validation

2. Domain-based:
   Finance-WS   → all Finance items (Lakehouse, Warehouse, Reports)
   HR-WS        → all HR items
   Marketing-WS → all Marketing items

3. Hybrid: domain + environment
   Finance-Dev-WS / Finance-Prod-WS

Workspace capacity:
→ Each workspace assigned to a Fabric capacity (F2, F4, F8, F16 ... F2048)
→ Capacity determines: compute power, storage throughput, concurrency
→ Premium capacity (P-SKUs) or Fabric capacity (F-SKUs) required for
  most Fabric features (Lakehouse, Warehouse, etc.)
→ Free tier: limited features (Power BI reports only)

F-SKU vs P-SKU:
F-SKUs (Fabric capacities):    pay-as-you-go or reserved
P-SKUs (Power BI Premium):     legacy, being phased out in favour of F-SKUs
F64 ≈ P1 in compute equivalence

Workspace roles:
Admin:       full control — manage members, settings, delete workspace
Member:      create/edit/delete all items, manage permissions on items
Contributor: create/edit items they own, cannot delete others' items
Viewer:      read-only access — cannot create or edit items

What is Git integration in Microsoft Fabric and why does it matter?

Fabric Git integration:
→ Connect a Fabric workspace to an Azure DevOps Git repository (or GitHub)
→ Fabric items serialised to YAML/JSON definition files in the repo
→ Enables: version control, code review (PRs), branching strategies,
           rollback, CI/CD deployment pipelines

Supported items for Git sync:
→ Semantic models (Power BI datasets)
→ Reports
→ Notebooks
→ Data pipelines
→ Dataflows Gen2
→ Lakehouses (metadata, not data)
→ Warehouses (metadata, not data)

Git workflow:
Developer → edits item in Fabric workspace
         → commits changes to feature branch in Git
         → raises Pull Request for review
         → approved PR merges to main branch
         → CI/CD pipeline deploys to production workspace

Deployment Pipelines (Fabric native CI/CD):
→ Move Fabric items between Dev → Test → Production workspaces
→ Deployment rules: map workspace-specific settings
  (e.g., different data source connection strings per environment)
→ Compare stages: see what changed between Dev and Production
→ Supported with both manual promotion and automated triggers

Git + Deployment Pipelines together:
Git = source control (version history, branching, review)
Deployment Pipelines = promotion workflow (Dev → Staging → Prod)
Best practice: use both together for enterprise Fabric governance

How do you implement security in Microsoft Fabric?

Fabric security layers:

1. Workspace-level security (Entra ID groups):
   Assign Entra ID security groups to workspace roles
   All items in workspace inherit the workspace permissions

2. Item-level permissions (share individual items):
   Share a specific report or semantic model without granting
   workspace access → viewer can only see that one item

3. OneLake data access roles (preview):
   Grant read access to specific folders/tables in a Lakehouse
   without exposing the entire Lakehouse

4. Semantic model Row-Level Security (RLS):
   Define roles in the semantic model with DAX filter rules
   Users see only the data their role permits
   e.g., UK managers see only UK rows; US managers see only US rows

5. Object-level security (OLS):
   Restrict access to specific tables or columns in a semantic model
   Hidden tables/columns don't appear in the model for restricted users

6. Warehouse column-level security:
   T-SQL GRANT/DENY on specific columns
   Restrict PII columns from certain users

7. Dynamic data masking:
   Mask sensitive column values (e.g., show **** instead of SSN)
   without changing the underlying data

Microsoft Purview integration:
→ Fabric data assets automatically catalogued in Purview
→ Sensitivity labels flow from Purview to Fabric items
→ Data lineage: track data flow from source through transformations to report
→ Scan Fabric workspaces for sensitive information (PII, financial data)

3. Prepare and Serve Data (40–45%)

What is the Medallion Architecture in Fabric and how do you implement it?

Medallion Architecture (Bronze → Silver → Gold):
→ Layered data organisation pattern for lakehouses
→ Each layer increases data quality and refinement

Bronze (Raw) layer:
→ Land raw data exactly as received — no transformation
→ Append-only: never delete or modify
→ Formats: CSV, JSON, Parquet, Delta, images, whatever the source sends
→ Serves as the "source of truth" — can reprocess from here if needed
→ Implementation: Files section of Lakehouse or Bronze Lakehouse

Silver (Cleansed/Conformed) layer:
→ Validated, deduplicated, standardised data
→ Apply business rules: correct data types, handle nulls, join related data
→ Delta Lake tables: schema enforced, history preserved
→ Implementation: Silver Lakehouse (separate workspace item)

Gold (Business/Serving) layer:
→ Aggregated, business-ready data modelled for analytics
→ Star schema: fact and dimension tables
→ Optimised for query performance (ZORDER, vacuum, partitioning)
→ Powers semantic models, reports, dashboards
→ Implementation: Gold Lakehouse or Fabric Data Warehouse

Fabric implementation:
Bronze Lakehouse → Notebook/Pipeline → Silver Lakehouse
                                     → Notebook/Pipeline → Gold Lakehouse/Warehouse
                                                         → Semantic Model → Power BI

Shortcuts between layers:
→ Silver Lakehouse shortcuts Bronze files — no data copy
→ Gold Warehouse shortcuts Silver Lakehouse tables
→ Same physical data, different logical views

What are the data ingestion options in Microsoft Fabric?

1. Data Pipelines (Copy Data activity):
→ Orchestrated, scheduled data movement
→ 100+ connectors: SQL Server, Oracle, Salesforce, REST APIs, SAP, etc.
→ Supports: full load, incremental load (watermark or Change Data Capture)
→ Configure in: Fabric workspace → New item → Data Pipeline
→ Best for: production ETL, scheduled batch ingestion, complex orchestration

2. Dataflows Gen2 (Power Query):
→ Low-code ETL with 150+ connectors
→ Transformations: filter, merge, pivot, custom columns, M language
→ Output destinations: Lakehouse, Warehouse, Azure SQL, etc.
→ Best for: self-service ETL, non-technical data prep, connecting to
  data sources that pipelines don't natively support well

3. Notebooks (Spark):
→ Python, PySpark, Scala, R, Spark SQL
→ Full programmatic control — complex transformations, ML prep
→ Write directly to Lakehouse Delta tables
→ Best for: complex transformations, ML feature engineering, custom logic

4. Eventstream (Real-Time Analytics):
→ Ingest streaming data: Azure Event Hubs, IoT Hub, Kafka, custom endpoints
→ Route, transform, and fan-out streams to: KQL Database, Lakehouse, Warehouse
→ Best for: IoT telemetry, application events, real-time data feeds

5. Mirroring:
→ Continuously replicate data FROM external databases INTO OneLake
→ Sources: Azure SQL Database, Azure Cosmos DB, Snowflake, Azure Databricks
→ Near real-time replication — no pipeline needed
→ Data appears as Delta tables in Fabric — queryable immediately
→ Best for: zero-ETL analytics on operational data

6. COPY INTO (Warehouse T-SQL):
→ Bulk load Parquet or CSV files from OneLake into Warehouse tables
COPY INTO dbo.Sales
FROM 'https://onelake.dfs.fabric.microsoft.com/.../Files/sales.parquet'
WITH (FILE_TYPE = 'PARQUET');

How do Delta Lake tables work in Fabric Lakehouses?

Delta Lake:
→ Open-source storage layer that adds ACID transactions to data lakes
→ All Lakehouse Tables are Delta Lake tables in Fabric
→ File format: Parquet data files + _delta_log (transaction log as JSON)

Delta Lake key features:
ACID transactions:   reads and writes are atomic — no partial data
Schema enforcement:  reject data that doesn't match table schema
Schema evolution:    add/rename columns without rewriting data
Time travel:         query historical versions of the table
  SELECT * FROM lakehouse.Sales VERSION AS OF 5  -- version 5
  SELECT * FROM lakehouse.Sales TIMESTAMP AS OF '2025-01-01'
Upserts (MERGE):     update + insert in one operation

Creating and writing Delta tables in Notebooks:
# PySpark — write DataFrame as Delta table:
df.write.format("delta").mode("overwrite").saveAsTable("Sales")

# Append mode:
df.write.format("delta").mode("append").saveAsTable("Sales")

# Upsert (MERGE):
from delta.tables import DeltaTable
deltaTable = DeltaTable.forName(spark, "Sales")
deltaTable.alias("target").merge(
  updates_df.alias("source"),
  "target.SaleId = source.SaleId"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

Delta maintenance:
OPTIMIZE: compacts small files into larger Parquet files (better query performance)
  OPTIMIZE lakehouse.Sales
  OPTIMIZE lakehouse.Sales ZORDER BY (Region, SaleDate)  -- sort for better pruning

VACUUM: removes old files no longer needed (frees storage)
  VACUUM lakehouse.Sales RETAIN 168 HOURS  -- keep 7 days of history

V-ORDER: Fabric-specific write-time optimisation for faster reads
  Automatically applied when writing Delta tables in Fabric
  Uses sorting, row groups, and compression for Power BI performance

What is the Lakehouse SQL Analytics Endpoint?

SQL Analytics Endpoint:
→ Auto-generated, read-only T-SQL interface over all Delta tables
  in a Lakehouse — no configuration needed
→ Every Lakehouse gets one automatically
→ Accessible via: SQL Server Management Studio (SSMS), Azure Data Studio,
                  any JDBC/ODBC tool using the connection string

What you can do:
→ Query Delta tables with T-SQL (SELECT, JOINs, aggregations)
→ Create SQL views, stored procedures, functions
→ Share with business users for SQL-based reporting
→ Connect from Power BI Desktop (DirectQuery or Import)

What you cannot do:
→ INSERT, UPDATE, DELETE on Lakehouse tables from SQL endpoint
  (data modifications only via Spark notebooks or pipelines)
→ CREATE TABLE with new data (read-only for tables created by Spark)

Cross-item queries (Fabric virtual network of items):
-- Query Lakehouse table from Warehouse:
SELECT s.*, c.CustomerName
FROM [Sales_Lakehouse].[dbo].[FactSales] s
JOIN [DimCustomers_Warehouse].[dbo].[Customer] c
  ON s.CustomerId = c.CustomerId

-- Cross-workspace query:
SELECT * FROM [OtherWorkspace_Lakehouse].[dbo].[FactOrders]
-- Requires appropriate permissions on the other workspace

What is the Fabric Data Warehouse and how does it differ from Lakehouse?

Feature              Lakehouse                    Data Warehouse
Primary compute:     Apache Spark (notebooks)    T-SQL (SQL engine)
DML support:         Spark only (not SQL EP)      Full T-SQL (INSERT/UPDATE/DELETE)
Schema:              Schema-on-read (flexible)   Schema-on-write (enforced)
Best for:            Big data, ML, raw data       BI reporting, structured analytics
Tables format:       Delta Lake in OneLake         Delta Lake in OneLake (same!)
SQL endpoint:        Read-only (SQL analytics EP) Full read/write T-SQL
Stored procedures:   No                           Yes
Cross-DB queries:    Via shortcuts                Via cross-database T-SQL
Streaming ingest:    Via Spark notebooks          Via pipelines or COPY INTO
Star schema:         Can implement                Recommended approach

Fabric Warehouse T-SQL capabilities:
→ CREATE TABLE, CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION
→ INSERT, UPDATE, DELETE, MERGE (full DML)
→ Distributed transactions across warehouse tables
→ Column-level security and row-level security
→ Automatic statistics maintenance for query optimisation
→ Cross-database queries: query Lakehouse SQL endpoint tables
→ COPY INTO: bulk load from OneLake files

When to use Warehouse over Lakehouse:
→ Traditional BI/DWH team comfortable with T-SQL
→ Need stored procedures and full DML for complex ETL
→ Strict schema enforcement required
→ Multiple data consumers using BI/reporting tools via SQL
→ Existing SQL Server/Synapse Analytics migration

4. Implement and Manage Semantic Models (20–25%)

What is a Fabric semantic model (Power BI dataset)?

Semantic model:
→ The business logic layer on top of data — defines metrics, relationships,
  hierarchies, and calculations for business users and reports
→ All Power BI reports connect to a semantic model
→ Every Lakehouse and Warehouse auto-creates a default semantic model
→ Custom semantic models can be created manually in Power BI Desktop
  and published to Fabric workspace

Semantic model components:
Tables:        imported or DirectQuery/DirectLake views of data
Relationships: define how tables join (one-to-many, many-to-many)
Measures:      DAX calculations (Total Sales, YoY Growth, Profit Margin)
Calculated columns: row-level computed values in a table
Hierarchies:   drill paths (Year → Quarter → Month → Day)
Perspectives:  subset views of the model for different audiences
Row-Level Security (RLS): data access filters per user/role

Connectivity modes:
Import:      data copied INTO the semantic model (fast, but has refresh lag)
DirectQuery: queries go to source on each visual interaction (live, slower)
DirectLake:  Fabric-only: reads Delta files directly — combines speed
             of Import with freshness of DirectQuery
Composite:   mix of Import and DirectQuery tables in one model

What is DirectLake mode and why is it a game-changer?

DirectLake (Fabric-exclusive):
→ Power BI reads Delta Parquet files from OneLake directly
→ No data import, no DirectQuery round-trips to a database
→ Transcoding: Delta Parquet → in-memory columnar format on first access
→ Parquet data stays hot in memory — subsequent queries serve from memory

Performance comparison:
Import:       Fast (in-memory), but data is stale until next refresh
DirectQuery:  Always fresh, but every visual = live SQL query (slower)
DirectLake:   Fast AS Import (in-memory after first load) AND fresh
              AS DirectQuery (always reads latest Delta files)

How DirectLake works:
1. Delta table updated (by pipeline, notebook, or streaming)
2. Power BI report loads — no import/refresh needed
3. On query: Fabric engine reads Delta files, transcodes to memory
4. Subsequent queries: served from in-memory transcoded data
5. When Delta table updated again: cache invalidated, re-read on next query

DirectLake limitations (when it falls back to DirectQuery):
→ Complex DAX not optimisable in DirectLake engine
→ Non-Delta data sources
→ Exceeds row/column limits for the capacity tier

DirectLake framing (parquet file optimisation):
→ Framing: Fabric creates an optimised version of Delta table for DL
→ Run manually: Semantic model → Refresh → Frame table
→ V-ORDER on write automatically helps DirectLake performance
→ Avoid excessively wide tables (many columns) — increases frame size

What are the key DAX patterns for DP-600?

Measures vs Calculated Columns:
Calculated column:  evaluated row by row at data refresh time
                    stored in the model, increases model size
                    use for: row-level values used in slicers/filters
Measure:            evaluated at query time based on filter context
                    not stored, always calculated on the fly
                    use for: aggregations, KPIs, dynamic calculations
                    ALWAYS prefer measures over calculated columns

Common DAX functions tested:

CALCULATE — modify filter context:
Total Sales EU = CALCULATE([Total Sales], Region[Region] = "EU")
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))

FILTER — row-by-row table filter:
High Value Sales = CALCULATE([Total Sales],
  FILTER(Sales, Sales[Amount] > 10000))

SUMX / AVERAGEX / MAXX (iterator functions):
Weighted Avg Price = DIVIDE(
  SUMX(Sales, Sales[Quantity] * Sales[UnitPrice]),
  SUM(Sales[Quantity]))

Time intelligence:
YTD Sales = TOTALYTD([Total Sales], Date[Date])
MTD Sales = TOTALMTD([Total Sales], Date[Date])
QTD Sales = TOTALQTD([Total Sales], Date[Date])
YoY Growth = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
Rolling 12M = CALCULATE([Total Sales],
  DATESINPERIOD(Date[Date], LASTDATE(Date[Date]), -12, MONTH))

ALLSELECTED — include slicer filters but remove model filters:
Market Share = DIVIDE([Total Sales],
  CALCULATE([Total Sales], ALLSELECTED(Product[Category])))

Window functions (DAX 2023+):
Rank = RANKX(ALL(Product[Product]), [Total Sales],, DESC)
Running Total = CALCULATE([Total Sales],
  WINDOW(1, ABS, 0, REL, ORDERBY(Date[Date])))

Variables (always use for readability and performance):
YoY % =
VAR CurrentSales = [Total Sales]
VAR PriorYearSales = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
VAR YoYGrowth = DIVIDE(CurrentSales - PriorYearSales, PriorYearSales)
RETURN
  IF(ISBLANK(PriorYearSales), BLANK(), YoYGrowth)

What are calculation groups and when do you use them?

Calculation groups:
→ Reusable sets of DAX calculations applied to any measure in the model
→ Solve the "many measures × many time periods" explosion
→ Defined in Tabular Editor (external tool — not native in Power BI Desktop)

Without calculation groups:
Sales Current Month, Sales Prior Month, Sales YoY, Sales YTD
Profit Current Month, Profit Prior Month, Profit YoY, Profit YTD
Revenue Current Month, Revenue Prior Month, Revenue YoY, Revenue YTD
→ 12 measures! For 10 base measures: 40 measures to maintain

With calculation groups:
Time Intelligence calculation group with items:
  Current:      SELECTEDMEASURE()  -- passes through whatever measure is selected
  Prior Month:  CALCULATE(SELECTEDMEASURE(), PREVIOUSMONTH(Date[Date]))
  YoY:          CALCULATE(SELECTEDMEASURE(), SAMEPERIODLASTYEAR(Date[Date]))
  YTD:          TOTALYTD(SELECTEDMEASURE(), Date[Date])

→ 4 calculation items × any number of base measures = instant time intelligence
→ User selects measure (Sales) + time item (YoY) → sees Sales YoY automatically

Other calculation group use cases:
→ Currency conversion: apply FX rates to any financial measure
→ Format strings: dynamic number formatting per calculation item
→ Scenario comparison: Actual vs Budget vs Forecast for any metric

Precedence:
Multiple calculation groups applied to one measure:
  Time Intelligence (precedence 10) applied first
  Currency (precedence 5) applied second
  Lower precedence number = applied later (outer context)

What is Row-Level Security (RLS) in semantic models?

Row-Level Security (RLS):
→ Restricts which data rows a user can see in reports
→ Defined as roles with DAX filter expressions in the semantic model
→ Works in Power BI service — enforced by the service (users cannot bypass)

Static RLS:
Create role "UK Sales" with table filter:
  Geography[Country] = "UK"
→ Users assigned to "UK Sales" role see ONLY UK rows

Dynamic RLS (most common for enterprise):
Create role "Employee Data Filter" with table filter:
  SalesPerson[Email] = USERPRINCIPALNAME()
→ User alice@contoso.com sees only rows where SalesPerson email = alice@contoso.com
→ One role definition, automatically filters per signed-in user

RLS implementation steps:
1. Power BI Desktop: Modelling → Manage Roles → New Role → add DAX filter
2. Publish to Fabric workspace
3. Semantic model → Security → assign Entra ID users/groups to each role
4. Test: Semantic model → Security → "Test as role" to verify filter

Object-Level Security (OLS):
→ Hide entire tables or columns from specific roles
→ Hidden objects don't appear in the model at all for restricted users
→ Defined in Tabular Editor (not Power BI Desktop)

RLS vs OLS:
RLS: controls WHICH ROWS a user sees
OLS: controls WHICH TABLES/COLUMNS a user sees
Use both together for comprehensive data security

5. Explore and Analyze Data (20–25%)

How do you query data in Fabric using SQL, KQL, and Notebooks?

T-SQL in Fabric Warehouse or Lakehouse SQL endpoint:
-- Aggregation:
SELECT
  d.Year,
  d.Quarter,
  p.Category,
  SUM(f.SalesAmount) AS TotalSales,
  COUNT(DISTINCT f.CustomerId) AS UniqueCustomers
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
GROUP BY d.Year, d.Quarter, p.Category
ORDER BY d.Year, d.Quarter, TotalSales DESC;

-- Window function:
SELECT
  ProductName,
  SalesAmount,
  RANK() OVER (PARTITION BY Category ORDER BY SalesAmount DESC) AS RankInCategory,
  SUM(SalesAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM FactSales;

-- CROSS DATABASE query (Lakehouse → Warehouse):
SELECT w.ProductKey, w.UnitPrice, l.StockLevel
FROM [SalesWarehouse].[dbo].[DimProduct] w
JOIN [InventoryLakehouse].[dbo].[StockLevels] l
  ON w.ProductKey = l.ProductKey;

KQL in KQL Database or Real-Time Analytics:
// Query telemetry data:
SensorReadings
| where Timestamp > ago(1h)
| where DeviceId startswith "factory-"
| summarize avg(Temperature), max(Temperature), count()
    by bin(Timestamp, 5m), DeviceId
| order by Timestamp desc

// Detect anomalies:
AppLogs
| where Level == "Error"
| summarize ErrorCount=count() by bin(Timestamp, 1h), AppName
| where ErrorCount > 100  // alert threshold
| render timechart

PySpark in Notebooks:
# Read Delta table:
df = spark.read.format("delta").load("Tables/FactSales")

# Transformation:
from pyspark.sql.functions import col, sum, year, month

monthly_sales = df \
  .withColumn("Year", year(col("SaleDate"))) \
  .withColumn("Month", month(col("SaleDate"))) \
  .groupBy("Year", "Month", "ProductCategory") \
  .agg(sum("SalesAmount").alias("TotalSales")) \
  .orderBy("Year", "Month")

# Write to Gold Lakehouse:
monthly_sales.write \
  .format("delta") \
  .mode("overwrite") \
  .option("overwriteSchema", "true") \
  .saveAsTable("GoldLakehouse.MonthlySalesSummary")

What is Activator (Data Activator) in Fabric?

Microsoft Fabric Activator (formerly Data Activator):
→ No-code, event-driven alerting and automation within Fabric
→ Monitors data in real time and triggers actions when conditions are met

Key concepts:
Reflex:       the container for data monitoring rules and actions
Object:       business entity being monitored (Order, Device, Customer)
Trigger:      condition that fires an action (temperature > 80°C,
              order value > £10,000, stock level < 100 units)
Action:       what happens when trigger fires (Teams alert, Power Automate
              flow, send email, call Logic App)

Data sources for Activator:
→ Power BI report visuals (monitor a KPI on a report)
→ Eventstream (real-time streaming data)
→ Fabric Real-Time Analytics KQL query results

Example use cases:
→ Alert operations team via Teams when IoT temperature exceeds threshold
→ Notify sales manager when a high-value deal is marked as Lost in CRM
→ Trigger Power Automate reorder flow when inventory drops below min level
→ Alert finance when daily spend exceeds budget threshold

Integration:
→ Activator natively integrates with Teams, Outlook, Power Automate
→ Accessible in Fabric: New item → Activator (Reflex)

How do you implement data lineage and impact analysis in Fabric?

Lineage view in Fabric:
→ Visual map showing data flow from source to report
→ Access: Workspace → Lineage view (top-right button)
→ Shows: source → Dataflow/Pipeline → Lakehouse → Semantic Model → Reports

Impact analysis:
→ Before modifying a Lakehouse table or Warehouse view, check impact
→ Right-click a Fabric item → Impact analysis
→ Shows: all downstream items that depend on the selected item
→ Critical before: renaming tables, changing column types, deleting data

Scenarios requiring impact analysis:
→ Renaming a column in a Lakehouse table:
  Impact analysis → shows which semantic models use this column
  → update semantic models and reports before renaming
→ Modifying a dataflow output:
  Impact analysis → shows which Lakehouses and semantic models
  receive output from this dataflow
→ Deleting a data source connection:
  Impact analysis → shows all pipelines and dataflows using the connection

Microsoft Purview integration:
→ Fabric workspaces automatically scanned by Purview
→ Data map: lineage tracked across Fabric, Azure Data Factory, ADLS
→ Sensitivity labels: propagate from Purview to Fabric items
→ Data catalogue: business glossary terms linked to Fabric tables
→ Access governance: classify and govern Fabric data assets in Purview

6. Scenario-Based Questions

Scenario: Design a medallion lakehouse for a retail company's sales analytics platform.

Requirements: ingest from 5 source systems, daily batch + real-time,
Power BI reports for 500 business users, 3-year data history

Architecture:
SOURCE SYSTEMS:
  SQL Server (ERP), Salesforce CRM, Azure Event Hubs (clickstream),
  SharePoint (product catalogue), REST API (weather data)

BRONZE Lakehouse — Raw Zone:
  Data Pipeline (Copy Data):
    → SQL Server → incremental CDC → Delta tables
    → Salesforce → Dataflow Gen2 → CSV → Bronze Files
  Eventstream:
    → Azure Event Hubs → Eventstream → Bronze Lakehouse Delta table
  Notebooks schedule (daily):
    → REST API weather data → JSON files → Bronze Files

SILVER Lakehouse — Cleansed Zone:
  Notebooks (PySpark, scheduled by Pipeline):
    → Deduplicate customer records (fuzzy match on email/phone)
    → Standardise: date formats, currency, country codes
    → Validate: null checks, referential integrity, range checks
    → Upsert MERGE → Silver Delta tables
  Schema: FactSales (raw), DimCustomer (cleansed), DimProduct (cleansed)

GOLD Lakehouse — Serving Zone:
  Notebooks (PySpark):
    → Build star schema: FactSales, DimCustomer, DimProduct, DimDate
    → Aggregations: DailySalesSummary, MonthlyRegionalSales
    → Apply OPTIMIZE + ZORDER for query performance
    → V-ORDER enabled automatically in Fabric

SEMANTIC MODEL (DirectLake):
  → Connect to Gold Lakehouse via DirectLake mode
  → Define: measures (Total Sales, YoY Growth, Customer LTV)
  → RLS: regional managers see only their region's data
  → Calculation groups: Current / Prior Period / YTD / Rolling 12M

POWER BI REPORTS:
  → Executive dashboard (summary KPIs)
  → Regional sales report (filtered by RLS)
  → Real-time clickstream dashboard (DirectQuery to KQL Database)

Scenario: A DirectLake semantic model is running slowly. How do you diagnose and fix it?

  1. Check fallback to DirectQuery: Open Performance Analyzer in Power BI Desktop. If queries show "Direct Query" instead of "Storage Engine", DirectLake has fallen back. Common causes: complex DAX, non-Delta tables, unsupported features.

  2. Check V-ORDER: Fabric applies V-ORDER on Delta writes automatically. Verify the table was written using Spark in Fabric (not copied from external Delta without V-ORDER). Re-write the table via a Fabric notebook to apply V-ORDER.

  3. Run OPTIMIZE: small Delta files slow DirectLake framing.

    # In a Fabric notebook:
    spark.sql("OPTIMIZE GoldLakehouse.FactSales ZORDER BY (DateKey, ProductKey)")
    
  4. Reduce table width: DirectLake reads all columns in a table into memory. Remove unused columns from the Lakehouse table. Don't include columns that aren't needed in the semantic model.

  5. Check capacity: if the Fabric capacity is undersized (F2 or F4), increase to F8 or F16 — DirectLake in-memory transcoding is CPU-bound.

  6. Review DAX measures: complex measures force DirectLake to DirectQuery fallback. Simplify measures, use variables, pre-aggregate data in Gold layer rather than computing in DAX.

  7. Frame the tables: manually trigger framing for large tables: Semantic model → Settings → Scheduled refresh → Refresh now (includes framing).


Scenario: How do you implement a near real-time analytics solution for IoT sensor data?

Architecture:
IoT Devices
  ↓
[Azure IoT Hub]
  ↓
[Fabric Eventstream]
  ↓ (fan-out to multiple destinations simultaneously)
  ├── [KQL Database] ← real-time querying (last 30 days hot data)
  │     ↓
  │   [Activator Reflex] ← alert when temperature > 85°C
  │     ↓
  │   Teams alert to Operations team
  │
  └── [Lakehouse (Bronze)] ← all raw events for historical analysis
        ↓ (hourly Notebook job)
      [Lakehouse (Silver)] ← cleansed, validated sensor readings
        ↓ (daily Notebook job)
      [Lakehouse (Gold)] ← aggregated: hourly avg/max per device/location
        ↓
      [Semantic Model (DirectLake)] ← Power BI operational dashboard
        ↓
      [Power BI Real-Time Dashboard] ← auto-refresh every 5 minutes

KQL real-time query:
SensorReadings
| where Timestamp > ago(1h)
| summarize
    AvgTemp = avg(Temperature),
    MaxTemp = max(Temperature),
    AlertCount = countif(Temperature > 85)
  by bin(Timestamp, 5m), DeviceId, Location
| order by Timestamp desc

Activator trigger:
Object: DeviceId
Monitor: MaxTemp every 1 minute from Eventstream
Trigger: when MaxTemp > 85 for 3 consecutive readings
Action: Send Teams message to #ops-alerts channel

Scenario: Migrate a traditional Power BI Premium dataset to a Fabric semantic model with DirectLake.

Current state:
→ Power BI Premium workspace with large Import dataset
→ 50GB semantic model, 3-hour daily refresh window
→ 8 reports connected, 200 daily users
→ Data source: Azure Synapse Analytics SQL Pool

Target state:
→ Fabric workspace with DirectLake semantic model
→ No scheduled refresh needed — always current
→ Same reports and users, same RLS rules

Migration steps:

Step 1 — Set up Fabric workspace:
→ Create Fabric workspace on F64 capacity (equivalent to P1)
→ Configure Git integration with Azure DevOps

Step 2 — Create Fabric Lakehouse:
→ New Lakehouse "SalesLakehouse" in the workspace
→ Create Shortcut to the Synapse SQL Pool data
  OR
→ Create Data Pipeline: Synapse SQL → Lakehouse Delta tables (initial load)
→ Incremental pipeline: daily CDC from Synapse → Lakehouse (MERGE)

Step 3 — Optimise Lakehouse tables:
→ Notebook: OPTIMIZE + ZORDER on all large fact tables
→ Verify V-ORDER enabled (default in Fabric)
→ Apply partitioning on date columns for FactSales

Step 4 — Create DirectLake semantic model:
→ Fabric workspace → New semantic model
→ Select Lakehouse tables
→ Import existing DAX measures from old .pbix file
→ Re-create RLS roles (DAX filters identical)
→ Re-create calculation groups in Tabular Editor

Step 5 — Reconnect reports:
→ Each report: Change data source → new Fabric semantic model
→ Verify all visuals render correctly with DirectLake

Step 6 — Validate and switch:
→ Run Query Performance Analyzer — confirm DirectLake (not DirectQuery)
→ Compare report output: DirectLake vs old Import model (data parity check)
→ Update app/workspace access for 200 users
→ Decommission old Power BI Premium workspace

Benefits achieved:
→ Eliminated 3-hour refresh window — data always fresh
→ Reduced model size (no data import — reads Delta files directly)
→ Reduced costs: one Fabric capacity covers all workloads

7. Cheat Sheet — Quick Reference

DP-600 Exam Domain Quick Reference

Domain                                    Weight   Key Topics
Plan, implement & manage (10-15%)         Low      Workspaces, Git, Deployment
                                                   Pipelines, Security, Purview
Prepare and serve data (40-45%)           HIGHEST  Lakehouse, Warehouse, Delta,
                                                   Pipelines, Dataflows, Mirroring,
                                                   Medallion architecture
Implement & manage semantic models (20-25%) High   DAX, DirectLake, RLS, OLS,
                                                   Calculation groups, Relationships
Explore and analyze data (20-25%)         High     SQL, KQL, Spark notebooks,
                                                   Lineage, Impact analysis, Activator

Fabric Item Selection Guide

Need to...                                      → Use
Ingest streaming data in real time              → Eventstream
Query time-series/log data with KQL             → KQL Database
Big data transformation with Spark/Python       → Lakehouse + Notebooks
SQL-based ETL with full DML                     → Data Warehouse
Low-code ETL from 150+ sources                  → Dataflow Gen2
Orchestrate multi-step data workflows           → Data Pipeline
No-code alerting when data conditions met       → Activator
Replicate operational DB data to Fabric         → Mirroring
Create business metrics and KPIs                → Semantic Model
Self-service BI for business users              → Power BI Reports
Always-fresh Power BI (no scheduled refresh)    → DirectLake semantic model

Delta Lake Quick Reference

Write modes:
overwrite     → replace all data
append        → add new rows (no deduplication)
merge/upsert  → update existing + insert new (no duplicates)

Read historical data (time travel):
df = spark.read.format("delta").option("versionAsOf", 5).load(...)
df = spark.read.format("delta").option("timestampAsOf", "2025-01-01").load(...)

Maintenance:
OPTIMIZE tableName                 → compact small files
OPTIMIZE tableName ZORDER BY (col) → sort for query pruning
VACUUM tableName RETAIN 168 HOURS  → clean old files (keep 7-day history)

V-ORDER: Fabric automatically applies on write → faster DirectLake reads

DAX Quick Reference

Key patterns:
Total Sales = SUM(FactSales[SalesAmount])
Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR(Date[Date]))
YoY % = DIVIDE([Total Sales] - [Sales LY], [Sales LY])
YTD = TOTALYTD([Total Sales], Date[Date])
Running Total = CALCULATE([Total Sales], DATESYTD(Date[Date]))
Market Share = DIVIDE([Total Sales], CALCULATE([Total Sales], ALL(Product)))
Dynamic = SELECTEDMEASURE()  ← inside calculation group item

Variables (always use):
Measure =
VAR x = [Base Measure]
VAR y = CALCULATE([Base Measure], PREVIOUSMONTH(Date[Date]))
RETURN DIVIDE(x - y, y)

Measures vs Calculated Columns:
Calculated columns: row-level, at refresh, stored → use sparingly
Measures:           query-time, dynamic, not stored → always prefer

Security Layers Summary

Layer                   Scope                   Tool
Workspace roles         All items in workspace  Entra ID groups → Admin/Member/
                                                Contributor/Viewer
Item-level sharing      Specific item           Share individual item
OneLake data access     Specific folder/table   OneLake data access roles (preview)
Semantic model RLS      Row-level data filter   DAX roles in semantic model
Semantic model OLS      Table/column hiding     Tabular Editor roles
Warehouse column sec.   Column-level access     T-SQL GRANT/DENY
Sensitivity labels      Item classification     Microsoft Purview labels

Medallion Architecture Summary

Layer    Storage          Purpose              Quality
Bronze   Files/Delta      Raw as-received      Untransformed
Silver   Delta tables     Cleaned, validated   Deduplicated, typed
Gold     Delta tables     Business-ready       Aggregated, star schema

Compute per layer:
Bronze: Pipelines (Copy Data) + Eventstream
Silver: Notebooks (PySpark MERGE) + Dataflows Gen2
Gold:   Notebooks (aggregations, star schema) + Pipelines

Performance optimisation:
→ OPTIMIZE + ZORDER on Gold tables (query pruning)
→ V-ORDER on all writes (DirectLake speed)
→ Partition large tables by date column
→ VACUUM regularly (remove old Delta files)

Top 10 Tips

  1. Prepare and serve data is 40–45% of the exam — the largest domain by far. Prioritise: medallion architecture, Delta Lake operations (OPTIMIZE, VACUUM, MERGE, time travel), Lakehouse vs Warehouse decisions, and all ingestion patterns (Pipelines, Dataflows, Notebooks, Mirroring, Eventstream).

  2. DirectLake is Fabric's biggest innovation — it combines Import speed with DirectQuery freshness by reading Delta Parquet files directly. Know: when it falls back to DirectQuery, how V-ORDER helps, how to optimise tables for DirectLake, and how framing works.

  3. OneLake unifies everything — all Fabric items (Lakehouse, Warehouse, KQL DB) store data in OneLake as Delta Parquet. The same physical files are accessible from all Fabric experiences. Shortcuts let you reference external data without copying it.

  4. Measures over calculated columns always — calculated columns are stored in the model and computed at refresh time. Measures are computed at query time and are dynamic. Using calculated columns where measures should be used is a common design mistake.

  5. DAX variables improve both readability and performance — always use VAR to avoid evaluating the same expression multiple times. Variables are evaluated once in the current filter context — referencing them multiple times doesn't re-evaluate.

  6. Calculation groups solve measure proliferation — instead of creating YTD/YoY/Prior Period versions of every measure, one calculation group applies those calculations to any measure dynamically. This is a senior analytics engineer pattern.

  7. RLS is enforced in the service, not the model — users cannot bypass RLS through Power BI Desktop or external tools once published. Dynamic RLS using USERPRINCIPALNAME() is the enterprise pattern — one role definition scales to all users.

  8. Impact analysis before modifying anything — always run impact analysis before renaming columns, changing table schemas, or deleting items. Unplanned changes cascade to semantic models and reports, breaking downstream consumers.

  9. Git integration + Deployment Pipelines together — Git provides source control and code review (PRs). Deployment Pipelines manage Dev → Staging → Production promotion. Using both together is the enterprise Fabric governance best practice.

  10. Know when to use Lakehouse vs Warehouse — Lakehouse (Spark + flexible schema) for big data, ML, and raw/intermediate layers. Warehouse (T-SQL + full DML) for structured BI, stored procedures, and when your team is SQL-native. The Gold layer can use either — choose based on team skills and query patterns.



No comments:

Post a Comment

Featured Post

Microsoft Azure Solutions Architect Expert (AZ-305) Complete Guide

Microsoft Azure Solutions Architect Expert (AZ-305) — Complete Guide Well-Architected Framework · Identity & Governance · Data Storage ...

Popular posts