Sunday, April 26, 2026

Microsoft Power BI Complete Guide

 

Microsoft Power BI — Complete Guide

Core Concepts · Data Modelling · DAX · Power Query · RLS · Service & ALM · Scenarios · Cheat Sheet


Table of Contents

  1. Core Concepts — Basics
  2. Data Modelling
  3. DAX — Data Analysis Expressions
  4. Power Query & M Language
  5. Row-Level Security & Object-Level Security
  6. Power BI Service, Gateway & ALM
  7. Performance Optimisation
  8. Scenario-Based Questions
  9. Cheat Sheet — Quick Reference

1. Core Concepts — Basics

What is Power BI and what are its main components?

Power BI is a business analytics service from Microsoft that enables users to connect to data, transform it, build data models, create visualisations, and share insights.

Component Description
Power BI Desktop Free Windows app for building reports and data models
Power BI Service Cloud SaaS (app.powerbi.com) for publishing, sharing, collaboration
Power BI Mobile iOS/Android apps for consuming reports on mobile
Power BI Report Server On-premises report hosting for organisations that cannot use cloud
Power BI Embedded Embed Power BI visuals/reports in custom applications via REST API
Power BI Gateway Bridge between on-premises data sources and the Power BI Service

What is the difference between a Dataset, Report, and Dashboard?

Dataset (Semantic model): the data layer — contains tables, relationships, measures, and the connection to the data source. Multiple reports can share a single dataset.

Report: the visualisation layer — one or more pages of charts, tables, and visuals built from a single dataset. Created in Power BI Desktop, published to the Service.

Dashboard: a single-page canvas in the Power BI Service only (not Desktop). Pins individual visuals from multiple reports. Live tiles update automatically. Cannot be filtered interactively like a report page.

Key distinction: Reports are created in Desktop from one dataset. Dashboards are assembled in the Service from multiple reports. Datasets are shared and reused across many reports.


What are the data connectivity modes in Power BI?

Mode Data Storage Freshness Performance Best For
Import Copied into VertiPaq in-memory engine As of last refresh Fastest Most scenarios, large models
DirectQuery Not stored — queries sent live to source Always current Slower (source-dependent) Real-time requirements
Live Connection Not stored — connects to existing AS model Always current Fast (remote model) Reusing existing SSAS/Azure AS models
Composite model Mix of Import and DirectQuery tables Mixed Variable Large-scale with some real-time tables

Tip: Import = best performance, periodic refresh. DirectQuery = always current, slower. This is one of the most frequently asked architecture questions.


What is the difference between Power BI Pro, Premium Per User (PPU), and Premium capacity?

Licence Key Features Dataset Limit Refreshes/Day
Free Personal use, My Workspace only 1GB 8
Pro Publish, share, collaborate 10GB 8
Premium Per User (PPU) All Pro + paginated reports, deployment pipelines, AI 100GB 48
Premium capacity (P/EM/F SKUs) Free viewers, dedicated compute, very large models 400GB+ Unlimited (API)

Warning: Free users can only view reports in Premium capacity workspaces — not Pro workspaces. This is a common licensing trap in enterprise deployments.


What is a Power BI Workspace and what types exist?

A workspace is a collaboration container in the Power BI Service holding datasets, reports, dashboards, and dataflows.

Type Description
My Workspace Personal workspace for each user. Not for sharing — private.
Collaborative workspace Shared team workspace backed by Microsoft 365 group. Roles: Admin, Member, Contributor, Viewer.
Premium workspace Assigned to Premium capacity — enables advanced features and free viewer access.

Tip: Always publish to collaborative workspaces, not My Workspace. Content in My Workspace is lost if the user leaves the organisation.


What is the VertiPaq engine in Power BI?

VertiPaq is the in-memory columnar storage engine that powers Power BI Import mode datasets. Key characteristics:

  • Stores data in compressed column segments (not row-based like traditional databases)
  • High compression ratios — especially on low-cardinality columns (like Country, Status, Category)
  • Loads entire dataset into RAM for fast query processing
  • DAX queries are evaluated against the VertiPaq in-memory store

Understanding VertiPaq explains why star schemas with low-cardinality dimension columns compress and query so efficiently in Power BI.


2. Data Modelling

What is a star schema and why is it recommended for Power BI?

A star schema organises data into a central fact table (measurable events/transactions) surrounded by dimension tables (descriptive attributes).

Dimension tables (one side):
DimDate     → DateKey, Year, Month, Quarter, WeekDay, IsHoliday
DimProduct  → ProductKey, Name, Category, SubCategory, Cost, ListPrice
DimCustomer → CustomerKey, Name, City, Country, Segment
DimEmployee → EmployeeKey, Name, Email, ManagerEmail, Region

Fact table (many side):
FactSales → DateKey, ProductKey, CustomerKey, EmployeeKey,
            Quantity, Revenue, Cost, DiscountAmount

Why recommended for Power BI:

  1. Optimised for VertiPaq — column-store compression works best on low-cardinality dimension columns
  2. Simpler DAX — relationships are clear and unambiguous
  3. Faster query performance — fewer joins, fewer cross-filter paths
  4. Easier to understand for report authors

Tip: Never bring in a flat denormalised table from a data warehouse as-is. Always reshape into a star schema in Power Query or the model layer.


What is the difference between a fact table and a dimension table?

Fact table: contains measurable, quantitative data about business events. High row count, fewer columns. Contains numeric measures (Revenue, Quantity, Cost) and foreign keys to dimension tables. Examples: Sales, Orders, Transactions, Web events.

Dimension table: contains descriptive, categorical attributes used to filter, group, and label fact data. Fewer rows, more columns. Contains the "who, what, when, where, why" context. Examples: Date, Product, Customer, Geography, Employee.

Warning: A common mistake is mixing descriptive and transactional data in a single table. Keep fact and dimension tables strictly separated.


What are relationship cardinality types and cross-filter directions?

Cardinality types:

Cardinality Description Use Case
One-to-Many (1:*) Standard star schema relationship Dimension → Fact
Many-to-One (*:1) Same as 1:* from the other side Fact → Dimension
One-to-One (1:1) Rare — often indicates tables should be merged Splitting wide tables
Many-to-Many (:) Complex scenarios — use with caution Bridge tables, role-playing dimensions

Cross-filter direction:

  • Single: filter flows from one-side (dimension) to many-side (fact). Recommended default.
  • Both: filter flows in both directions. Can cause ambiguous filter paths and unexpected results.

Critical: Avoid bi-directional relationships in star schemas. Use CROSSFILTER() in DAX to apply bi-directional filtering only where specifically needed.


What is a date table and why is it essential in Power BI?

A date table is a dedicated dimension with one row per date containing columns for Year, Quarter, Month, Week, Day, Weekday, IsWeekend, FiscalYear, etc.

Why essential:

  1. Time intelligence DAX functions (SAMEPERIODLASTYEAR, DATESINPERIOD, etc.) require a properly marked date table
  2. Consistent date filtering across all fact tables sharing the date dimension
  3. Fiscal year/period calculations differing from calendar year
  4. Handling gaps in fact table dates (weekends, holidays)
Requirements for marking a date table:
✓ One row per date, no gaps
✓ Date column must be Date data type (not DateTime)
✓ No blanks in the date column
✓ Covers the full range of all dates in all fact tables

Mark via: Table Tools → Mark as date table

Tip: Always create or import a proper date table. Never rely on Power BI's auto date/time feature in production — it creates hidden tables that bloat the model and cannot be customised. Disable it in Options → Data Load.


What is the difference between calculated columns and measures?

Calculated Column Measure
Evaluated Row by row at refresh time Dynamically at query time
Storage Stored in model (consumes memory) Not stored (computed on fly)
Returns A value per row A single aggregate value
Used in Slicers, filters, rows/columns Values area of visuals
Context Row context Filter context
Best for Categorisation, bucketing, flags KPIs, aggregations, time intelligence
Calculated column (stored, row-level):
Profit = FactSales[Revenue] - FactSales[Cost]
AgeGroup = IF(Age < 30, "Young", IF(Age < 50, "Mid", "Senior"))

Measure (dynamic, context-aware):
Total Revenue = SUM(FactSales[Revenue])
YoY Growth % = DIVIDE([Revenue] - [Revenue LY], [Revenue LY])

Critical: Calculated columns are frequently overused. Every calculated column consumes memory and slows refresh. If the computation depends on filters or context — it must be a measure.


3. DAX — Data Analysis Expressions

What is the difference between filter context and row context in DAX?

Filter context: the set of filters applied to the model when a measure is evaluated. Comes from slicers, report filters, visual columns/rows, and CALCULATE() filter arguments. Determines which rows are included in aggregations.

Row context: exists inside calculated columns and iterators (SUMX, AVERAGEX, etc.). Represents the current row being processed. Each row has its own context with all column values accessible.

Filter context:
Total Revenue = SUM(FactSales[Revenue])
-- SUM includes only rows matching the current filter context
-- If slicer filters Year=2024, only 2024 rows are summed

Row context in calculated column:
Profit = FactSales[Revenue] - FactSales[Cost]
-- Revenue and Cost are the values of the CURRENT row

Row context in iterator measure:
Weighted Avg Price =
  DIVIDE(
    SUMX(FactSales, FactSales[Quantity] * FactSales[UnitPrice]),
    SUM(FactSales[Quantity])
  )
-- SUMX iterates every row; Quantity * UnitPrice uses row context per row

Tip: Filter context = what data is visible. Row context = which row am I on. Context transition = when row context converts to filter context inside CALCULATE().


What does CALCULATE() do and why is it the most important DAX function?

CALCULATE(expression, filter1, filter2, ...) evaluates an expression in a modified filter context. It is the only DAX function that can change the filter context — making it the foundation of almost all advanced DAX.

Filter override:
Revenue 2024 =
  CALCULATE(SUM(FactSales[Revenue]), Year[Year] = 2024)

Remove all filters on a column:
Revenue All Products =
  CALCULATE(SUM(FactSales[Revenue]), ALL(DimProduct))

Market share %:
Market Share =
  DIVIDE(
    SUM(FactSales[Revenue]),
    CALCULATE(SUM(FactSales[Revenue]), ALL(DimProduct))
  )

Year-over-year:
Revenue LY =
  CALCULATE(
    SUM(FactSales[Revenue]),
    SAMEPERIODLASTYEAR(DimDate[Date])
  )

YoY Growth %:
YoY Growth =
  DIVIDE([Revenue] - [Revenue LY], [Revenue LY])

Tip: If a measure needs to "ignore" a filter, "change" a filter, or "look at a different time period" — the answer is CALCULATE(). Almost every complex measure uses it.


What is the difference between ALL(), ALLEXCEPT(), and ALLSELECTED()?

ALL(table or column)
→ Removes ALL filters from the specified table/column
→ Used inside CALCULATE to ignore slicers/filters
% of Total =
  DIVIDE([Revenue], CALCULATE([Revenue], ALL(DimProduct)))

ALLEXCEPT(table, column1, column2, ...)
→ Removes ALL filters EXCEPT those on specified columns
→ Keep some filters, remove the rest
% of Category =
  DIVIDE([Revenue],
    CALCULATE([Revenue], ALLEXCEPT(DimProduct, DimProduct[Category])))
-- Each product shows its % within its own category

ALLSELECTED(table or column)
→ Removes filters from the visual level only
→ Keeps slicer/page/report filter context
% of Selection =
  DIVIDE([Revenue], CALCULATE([Revenue], ALLSELECTED(DimProduct)))
-- % within whatever is currently selected in slicers

What are the key time intelligence functions in DAX?

Year-to-date:
Revenue YTD = TOTALYTD([Revenue], DimDate[Date])
Revenue YTD = CALCULATE([Revenue], DATESYTD(DimDate[Date]))

Fiscal year YTD (fiscal year ends June 30):
Revenue FYTD = TOTALYTD([Revenue], DimDate[Date], "06-30")

Same period last year:
Revenue LY = CALCULATE([Revenue], SAMEPERIODLASTYEAR(DimDate[Date]))

Year-over-year growth:
YoY Growth % = DIVIDE([Revenue] - [Revenue LY], [Revenue LY])

Rolling 3 months:
Revenue Last 3M =
  CALCULATE([Revenue],
    DATESINPERIOD(DimDate[Date], LASTDATE(DimDate[Date]), -3, MONTH))

Month-to-date / Quarter-to-date:
Revenue MTD = TOTALMTD([Revenue], DimDate[Date])
Revenue QTD = TOTALQTD([Revenue], DimDate[Date])

Previous month:
Revenue PM = CALCULATE([Revenue], PREVIOUSMONTH(DimDate[Date]))

Warning: Time intelligence functions require a properly marked date table with no gaps. They return BLANK silently if the date table is not marked correctly.


What is DIVIDE() and why should you use it instead of the / operator?

Unsafe — errors on division by zero:
Margin % = [Revenue] / [Cost]
-- Returns error if Cost = 0 → breaks the visual

Safe — handles zero denominator gracefully:
Margin % = DIVIDE([Revenue], [Cost])
-- Returns BLANK if Cost = 0

With custom fallback:
Margin % = DIVIDE([Revenue], [Cost], 0)
-- Returns 0 instead of BLANK when Cost = 0

Best practice: Always use DIVIDE() for any division in DAX — never the / operator. This is asked in every Power BI .


What is RELATED() vs RELATEDTABLE() and when do you use each?

RELATED(column)
→ Used in calculated column on the MANY side (fact table)
→ Traverses the relationship to get a value from the ONE side (dimension)
→ Like a VLOOKUP — fetches a related table's value for the current row

Example (calculated column on FactSales):
Product Category = RELATED(DimProduct[Category])
-- For each sales row, fetches the product's category from DimProduct

RELATEDTABLE(table)
→ Used in calculated column on the ONE side (dimension table)
→ Returns a TABLE of all related rows from the MANY side
→ Used with COUNTROWS, SUMX, AVERAGEX, etc.

Example (calculated column on DimProduct):
Total Units Sold = COUNTROWS(RELATEDTABLE(FactSales))
-- For each product row, counts all its sales transactions

What are iterator functions in DAX and when should you use them?

Iterator functions (SUMX, AVERAGEX, MAXX, MINX, COUNTX, RANKX, FILTER) evaluate an expression for each row of a table in row context, then aggregate the results.

SUMX — sum of a row-by-row calculation:
Total Margin = SUMX(FactSales, FactSales[Revenue] - FactSales[Cost])
-- Cannot use SUM(Revenue) - SUM(Cost) when you need row-level calc first

AVERAGEX — average of a row-by-row expression:
Avg Order Value = AVERAGEX(FactSales,
  FactSales[Quantity] * FactSales[UnitPrice])

RANKX — rank within a table:
Product Revenue Rank =
  RANKX(ALL(DimProduct), [Revenue], , DESC, Dense)
-- Ranks each product by revenue across all products

FILTER — returns a filtered subset of a table:
Top Customers =
  COUNTROWS(FILTER(DimCustomer, [Revenue] > 100000))

4. Power Query & M Language

What is Power Query and what is it used for in Power BI?

Power Query (M language) is the ETL (Extract, Transform, Load) layer of Power BI. It connects to data sources, shapes and cleans the data, and loads it into the data model. Every transformation step is recorded and executed in order during refresh.

Common transformations:

  • Remove/rename/reorder columns
  • Filter rows by condition
  • Change data types, handle nulls/blanks
  • Split columns, merge columns, extract substrings
  • Unpivot columns (wide → tall, normalise denormalised data)
  • Pivot columns (tall → wide)
  • Merge queries (JOIN two tables)
  • Append queries (UNION multiple tables)
  • Group by (aggregate)
  • Add custom columns using M expressions

What is the difference between Merge and Append in Power Query?

Merge (JOIN): combines two queries horizontally — adds columns from one query to another based on a matching key. Equivalent to SQL JOIN.

Join Type Description
Inner Only matching rows from both tables
Left Outer All rows from left, matching from right
Right Outer All rows from right, matching from left
Full Outer All rows from both tables
Left Anti Rows in left with NO match in right
Right Anti Rows in right with NO match in left

Append (UNION): combines two or more queries vertically — stacks rows. Equivalent to SQL UNION ALL. Column names must match.

Tip: Append is the correct way to combine multiple files from a folder (monthly Excel files). Merge is correct for enriching a fact table with dimension attributes.


What is query folding in Power Query and why is it important?

Query folding is the ability of Power Query to translate transformation steps into native queries executed by the data source — rather than downloading all data and transforming locally in Power BI.

With folding (efficient):
Source: SQL Server
Steps: Filter rows → Select columns → Group by
Power BI sends: SELECT col1, col2 FROM table WHERE ... GROUP BY ...
→ Source does the heavy lifting, only result is transferred

Without folding (inefficient):
Steps after a non-foldable operation
→ Power BI downloads ALL rows THEN transforms locally
→ Extremely slow for large tables

Check folding: Right-click a step → "View native query". If greyed out, folding is broken at that step.

Common folding breakers: adding an index column, certain custom column expressions using M functions not translatable to SQL.

Warning: Once folding breaks at a step, all subsequent steps also lose folding. Structure your query so foldable steps come first, non-foldable steps last.


What are parameters in Power Query and how are they used?

Parameters are named, typed values referenceable across multiple queries.

Parameter: ServerName = "prod-sql.database.windows.net"
Parameter: DatabaseName = "SalesDB"
Parameter: DaysToLoad = 90

Used in query:
Source = Sql.Database(ServerName, DatabaseName),
Filtered = Table.SelectRows(Source,
  each [LoadDate] >= Date.AddDays(Date.From(DateTime.LocalNow()), -DaysToLoad))

Common uses:

  • Environment switching (dev/test/prod server names)
  • Date range filtering (load last N days)
  • File path management (move files without breaking reports)
  • API base URLs (switch between environments)

What is incremental refresh and how does it work?

Incremental refresh loads only new and changed data on each refresh, rather than reloading the entire table. Dramatically reduces refresh time for large tables.

Setup steps:

  1. Create two Power Query parameters: RangeStart (DateTime) and RangeEnd (DateTime)
  2. Filter the date column in Power Query: [Date] >= RangeStart and [Date] < RangeEnd
  3. In Power BI Desktop: right-click the table → Incremental refresh
  4. Define: store data for X years, refresh data for Y days
  5. Publish to Premium workspace (required for incremental refresh)

How it works at refresh time:

  • Only the most recent Y days are re-queried from the source
  • Historical partitions (older than Y days) are kept as-is
  • Source must support query folding for incremental refresh to work

5. Row-Level Security & Object-Level Security

What is Row-Level Security (RLS) in Power BI and how is it implemented?

RLS restricts which rows a user can see in a report, based on their identity.

Static RLS (hardcoded filter):
Role: "APAC Region"
Filter on DimGeography: [Region] = "APAC"
→ Anyone assigned this role sees only APAC data

Dynamic RLS (user identity — scalable):
Role: "SalesRLS"
Filter on DimEmployee: [Email] = USERPRINCIPALNAME()
→ Each user sees only their own rows
→ No separate role per user needed

Hierarchical RLS (manager sees team):
Filter on DimEmployee:
[Email] = USERPRINCIPALNAME()
  || [ManagerEmail] = USERPRINCIPALNAME()
→ Each user sees their own rows + their direct reports' rows

Implementation steps:

  1. Define roles in Power BI Desktop (Modelling → Manage roles)
  2. Publish dataset to Power BI Service
  3. Assign users/groups to roles (Dataset → Security)
  4. Test with "View as role" in both Desktop and Service

Tip: Dynamic RLS with USERPRINCIPALNAME() is the scalable enterprise pattern — one role covers all users. Static RLS requires a separate role per filter value — unmanageable at scale.


What is Object-Level Security (OLS) and how does it differ from RLS?

RLS: restricts which rows a user can see. The table and columns are visible — only rows are filtered.

OLS: restricts access to entire tables or columns. Users without permission cannot see the table/column at all — completely hidden from their model view.

RLS OLS
Restricts Rows Tables or columns
Visibility Table/columns visible, rows filtered Table/column completely hidden
Configured in Power BI Desktop (Manage roles) Tabular Editor or REST API
Use case Region, department, personal data filtering Hide salary, SSN, margin columns from certain roles

Tip: Use OLS to hide sensitive columns (salary, SSN, cost/margin) from certain roles. Use RLS to show the same columns to all users but filter which rows they see. Both can be combined.


6. Power BI Service, Gateway & ALM

What is a Power BI Gateway and when is it needed?

A Power BI Gateway is a software agent installed on an on-premises machine that bridges the Power BI Service (cloud) and on-premises data sources.

Gateway Type Users Use Cases
Standard (enterprise) Multiple users, multiple data sources Scheduled refresh, DirectQuery, Live Connection
Personal Single user only Scheduled refresh only — personal development

Common on-premises sources needing a gateway: SQL Server (on-prem), Oracle, SharePoint (on-prem), Excel/CSV on file shares, SAP, MySQL on-prem

Warning: The gateway machine must always be running when scheduled refreshes are due. Gateway outages are the #1 cause of refresh failures in enterprise deployments. Use a cluster of gateways for high availability.


What are Deployment Pipelines in Power BI?

Deployment Pipelines (Premium feature) provide a Dev → Test → Production promotion path for Power BI content within the Service.

Setup:

  1. Create pipeline with three stages: Development, Test, Production
  2. Assign a workspace to each stage
  3. Develop in the Development workspace
  4. Deploy to Test with one click — content copied to Test workspace
  5. After testing, deploy to Production

Deployment rules: override data source connections and parameters per stage (dev DB → prod DB) without editing the report.

Tip: Deployment rules are the equivalent of environment variables for Power BI — they let you override connection strings per stage without touching the report file.


What are Dataflows in Power BI and when should you use them?

Dataflows are self-service ETL pipelines defined in the Power BI Service using Power Query Online. They extract, transform, and store data as reusable tables (entities) in Azure Data Lake Storage.

When to use:

  • Centralise common data preparation logic reused across many reports
  • Offload heavy transformations from individual datasets to a shared layer
  • Reduce refresh load — dataflow refreshes once, multiple datasets consume the result
  • Enable self-service analytics on top of IT-governed prepared data

Tip: Dataflows implement a "prepare once, use many" pattern — the Power BI equivalent of a data warehouse staging layer without needing Azure Synapse or Databricks.


What are sensitivity labels and data protection in Power BI?

Power BI integrates with Microsoft Purview Information Protection (MIP) to apply sensitivity labels to datasets, reports, and dashboards.

Labels flow downstream: if a dataset has a "Confidential" label, reports and exports built from it inherit the label automatically.

Capabilities:

  • Prevent exporting confidential reports to Excel/PDF/CSV
  • Encrypt exported files
  • Audit label usage in the Microsoft Purview compliance portal
  • Mandatory labelling policies — require users to apply a label before publishing

7. Performance Optimisation

What are the key Power BI performance best practices?

Data model:

  • Use star schema — avoid snowflake schemas (extra joins slow queries)
  • Remove unused columns and rows before loading
  • Avoid high-cardinality columns in dimension tables (prefer grouping)
  • Use integer surrogate keys for relationships (not text/GUID)
  • Disable auto date/time (Options → Data Load)

DAX:

  • Prefer measures over calculated columns for context-dependent logic
  • Use variables (VAR) to avoid repeating expensive expressions
  • Use DIVIDE() instead of /
  • Avoid FILTER(ALL(...), ...) patterns — expensive; use KEEPFILTERS() where possible
  • Avoid row-by-row operations on large tables without need

Power Query:

  • Enable query folding — apply filters and column selections early (before non-foldable steps)
  • Use incremental refresh for large tables
  • Remove unused columns at the source step, not after transformations

Report:

  • Limit visuals per page — each visual executes a separate query
  • Use bookmarks/buttons instead of many pages
  • Use aggregation tables for summary visuals over large fact tables

What is the Performance Analyzer and how do you use it?

Performance Analyzer (View → Performance Analyzer in Desktop) records the time each visual takes to render, broken down into:

  • DAX query: time to evaluate the measure/query
  • Visual display: time to render the chart/visual
  • Other: overhead

How to use:

  1. Open Performance Analyzer panel
  2. Click "Start recording"
  3. Interact with the report (click visuals, change slicers)
  4. Review query times — slow DAX queries indicate model/DAX issues; slow visual display indicates rendering issues
  5. Copy the DAX query for slow visuals and analyse in DAX Studio

8. Scenario-Based Questions

Scenario: A Power BI report refreshes slowly (45 minutes). How do you optimise it?

Situation: Dataset has 10 tables, 50M+ rows, refresh takes 45 minutes.

  1. Enable incremental refresh — only load new/changed rows each refresh. Reduces time by 80–90% for large tables. Requires Premium workspace.
  2. Check query folding in Power Query — ensure filters and transformations fold to source. Non-folding steps force full data download.
  3. Remove unused columns and rows in Power Query — never import what reports don't need.
  4. Use aggregation tables for summary-level visuals — pre-aggregate so most queries hit the small summary table, not 50M rows.
  5. Consider DirectQuery or Composite model for the largest tables — remove the need to import them.
  6. Parallelise independent queries — split one large query into multiple smaller ones where possible.
  7. Upgrade to Premium capacity for more compute resources and parallel refresh threads.

Scenario: Implement RLS where each salesperson sees only their own region, but their manager sees all their reports' regions.

Role: "SalesRLS"
Filter on DimEmployee:
[Email] = USERPRINCIPALNAME()
  || [ManagerEmail] = USERPRINCIPALNAME()

How it works:

  • Salesperson (a@company.com): only their own row passes → their region only
  • Manager (mgr@company.com): all rows where ManagerEmail = mgr@company.com → all direct reports' regions

Steps:

  1. Ensure DimEmployee has Email and ManagerEmail columns from Azure AD or HR system
  2. Create the "SalesRLS" role with the DAX filter above
  3. Publish and assign ALL sales users to "SalesRLS" in Power BI Service
  4. Test with "View as" for a salesperson, then a manager

Tip: This hierarchical pattern works for any manager-subordinate structure. Maintain ManagerEmail accuracy in the source system — it drives the entire access model.


Scenario: Your DAX measure shows the same total value for every row in a matrix. What is wrong?

Cause: The measure is not being filtered by the row dimension. Check in order:

  1. Missing relationship — the field in matrix rows comes from a table not related to the fact table. No relationship = no filter propagation. Fix: add the relationship.
  2. Wrong filter direction — relationship exists but single-direction filter flows away from the dimension, not toward the fact. Fix: verify direction in Model view.
  3. Inactive relationship — relationship exists but is dashed (inactive). Fix: use USERELATIONSHIP() in the measure:
    Revenue by Ship Date =  CALCULATE([Revenue],    USERELATIONSHIP(DimDate[Date], FactSales[ShipDate]))
    
  4. ALL() in the measure — the measure explicitly removes filters. Review the measure definition.

Scenario: Design a Power BI solution for an executive dashboard with near-real-time KPIs.

Requirements: C-suite dashboard showing Revenue, Orders, NPS, Inventory — updated every 15 minutes, accessible on mobile.

Architecture:

  1. Data layer: Import mode with incremental refresh scheduled every 15 minutes (Premium required for sub-hourly refresh). Or DirectQuery for true real-time.
  2. Data model: star schema — DimDate, DimProduct, DimCustomer, DimRegion against FactSales, FactOrders.
  3. Key measures: Revenue MTD, Revenue vs Target (%), Orders Today, NPS Score, Inventory Days of Cover, YoY Growth %.
  4. Report design: KPI card visuals at top. Trend line charts. Geographic map visual. Mobile layout configured in Desktop → View → Mobile layout.
  5. Dashboard: pin KPI cards to a Power BI Dashboard — enables data alerts (notify when Revenue drops below threshold).
  6. Security: OLS on cost/margin columns for non-finance roles. RLS if execs are region-specific.
  7. Distribution: publish to Premium workspace. Create an App for the dashboard. Enable mobile access.

Scenario: How do you handle multiple date relationships in a single fact table?

Problem: FactSales has three date columns — OrderDate, ShipDate, DeliveryDate. You have one DimDate table. You need to analyse by each date independently.

Setup:
- Create active relationship: DimDate[DateKey] → FactSales[OrderDate]
- Create inactive relationships: DimDate[DateKey] → FactSales[ShipDate]
                                  DimDate[DateKey] → FactSales[DeliveryDate]

Measures using inactive relationships:
Revenue by Ship Date =
  CALCULATE([Revenue],
    USERELATIONSHIP(DimDate[Date], FactSales[ShipDate]))

Revenue by Delivery Date =
  CALCULATE([Revenue],
    USERELATIONSHIP(DimDate[Date], FactSales[DeliveryDate]))

Default measure (no USERELATIONSHIP needed):
Revenue by Order Date = [Revenue]
-- Uses the active relationship automatically

9. Cheat Sheet — Quick Reference

Connectivity Modes

Import       → data copied into VertiPaq. Fastest. Periodic refresh.
DirectQuery  → live queries to source on every interaction. Always current. Slower.
Live Connect → connects to existing AS/SSAS model. No local model.
Composite    → mix of Import + DirectQuery tables in one model.

DAX Functions Quick Reference

Aggregation:
SUM(column)          AVERAGE(column)      COUNT(column)
MIN(column)          MAX(column)          DISTINCTCOUNT(column)

Logical:
IF(condition, true, false)
SWITCH(expression, val1, result1, val2, result2, else)
AND(bool1, bool2)    OR(bool1, bool2)     NOT(bool)
IFERROR(expr, alt)   ISBLANK(expr)        COALESCE(val1, val2)

Filter context:
CALCULATE(expr, filter1, ...)
ALL(table/column)    ALLEXCEPT(tbl, col)  ALLSELECTED(col)
FILTER(table, condition)
KEEPFILTERS(filter)  REMOVEFILTERS(col)

Relationship navigation:
RELATED(column)           -- many side → one side
RELATEDTABLE(table)       -- one side → many side
USERELATIONSHIP(col1, col2) -- activate inactive relationship

Iterators (row context):
SUMX(table, expression)   AVERAGEX(table, expr)
MAXX(table, expr)         MINX(table, expr)
COUNTX(table, expr)       RANKX(table, expr)

Time intelligence:
TOTALYTD([measure], date)     TOTALMTD  TOTALQTD
SAMEPERIODLASTYEAR(date)      PREVIOUSMONTH(date)
DATESINPERIOD(date, last, n, period)
DATESYTD(date, [fiscalYearEnd])

Text:
CONCATENATE(text1, text2)   UPPER(text)    LOWER(text)
LEFT(text, n)   RIGHT(text, n)   MID(text, start, n)
LEN(text)   SUBSTITUTE(text, old, new)   FORMAT(value, format)

Math:
DIVIDE(num, denom, [alt])   ROUND(number, decimals)
ABS(number)   INT(number)   MOD(number, divisor)

Variables:
VAR x = expression
RETURN x + 1

Star Schema Template

DimDate       ← DateKey PK | Date | Year | Month | Quarter | Week | Weekday
DimProduct    ← ProductKey PK | Name | Category | SubCategory | Cost | Price
DimCustomer   ← CustomerKey PK | Name | Email | City | Country | Segment
DimEmployee   ← EmployeeKey PK | Name | Email | ManagerEmail | Region | BU

FactSales     ← DateKey FK | ProductKey FK | CustomerKey FK | EmployeeKey FK
               | Quantity | Revenue | Cost | Discount | OrderID

Relationships: All Dim[Key] → Fact[Key], one-to-many, single filter direction
Date table: marked as date table, no gaps, full date range coverage

RLS Patterns

Static RLS:
[Region] = "APAC"

Dynamic RLS (per user):
[Email] = USERPRINCIPALNAME()

Hierarchical (user + manager):
[Email] = USERPRINCIPALNAME()
  || [ManagerEmail] = USERPRINCIPALNAME()

% of Total (ignores RLS for denominator):
% Share = DIVIDE([Revenue],
  CALCULATE([Revenue], ALL(DimProduct)))

Test RLS: Desktop → Modelling → View as roles
         Service  → Dataset → Security → Test as role

Common Performance Issues & Fixes

Issue Cause Fix
Slow refresh Full table reload every time Enable incremental refresh
Slow refresh Non-folding Power Query steps Restructure query — foldable steps first
Slow visual Expensive DAX measure Use VAR, avoid FILTER(ALL()), use aggregations
Same value every row Missing/wrong relationship Add/fix relationship or use USERELATIONSHIP()
Blank time intelligence Date table not marked Mark table as date table, fix gaps
RLS not working User not assigned to role in Service Assign user to role in Dataset → Security
DirectQuery slow Complex joins at source Add indexes at source; simplify Power Query steps
Model too large Too many columns imported Remove unused columns in Power Query

Top 10 Tips

  1. Filter context vs row context is the most-tested DAX concept. Know the difference, and know what "context transition" means (CALCULATE converts row context to filter context).
  2. CALCULATE() is the answer to almost every advanced DAX question — "ignore filter", "change filter", "compare time periods" all use CALCULATE().
  3. DIVIDE() not / — always use DIVIDE() for safe division. Mentioned in every DAX .
  4. Star schema vs flat table — always recommend star schema. Know why (VertiPaq compression, simpler DAX, faster queries).
  5. Mark date table — time intelligence functions silently fail without a properly marked date table. Know the requirements.
  6. Dynamic RLS with USERPRINCIPALNAME() is the scalable pattern — one role for all users. Static RLS = one role per filter value = unmanageable.
  7. Query folding — know what it is, why it matters, and what breaks it. 
  8. Calculated column vs measure — columns are stored and consume memory; measures are dynamic. Overusing calculated columns is the most common performance mistake.
  9. Incremental refresh — the answer to "how do you refresh a 100M row table in minutes." Requires Premium, Power Query parameters, and a foldable date filter.
  10. Deployment pipelines + deployment rules — the ALM answer for Power BI. Know that deployment rules are how you change connection strings per environment stage.


No comments:

Post a Comment

Featured Post

Power Platform Governance & COE Complete Guide

  Power Platform Governance & COE — Complete Guide Governance Fundamentals · DLP Policies · Environment Strategy · COE Starter Kit · AL...

Popular posts