Sunday, May 17, 2026

Microsoft Azure Administrator (AZ-104) Complete Guide

Microsoft Azure Administrator (AZ-104) — Complete Guide

Identity & Governance · Storage · Compute · Virtual Networking · Monitoring & Backup · ARM & Bicep · Scenarios · Cheat Sheet

Top Hashtags: #AzureAdministrator, #AZ104, #MicrosoftAzure, #AzureAdmin, #CloudComputing, #AzureCertification, #AzureNetworking, #AzureCompute, #AzureStorage, #MicrosoftCertified

Exam Overview & Identity/Governance (20–25%)
Implement and Manage Storage (15–20%)
Deploy and Manage Azure Compute (20–25%)
Implement and Manage Virtual Networking (15–20%)
Monitor and Maintain Azure Resources (10–15%)
ARM Templates, Bicep & Automation
Scenario-Based Questions
Cheat Sheet — Quick Reference

1. Exam Overview & Identity/Governance (20–25%)

AZ-104 Exam at a Glance

The AZ-104 Microsoft Azure Administrator exam validates expertise in implementing, managing, and monitoring an organisation's Azure environment. It covers virtual networks, storage, compute, identity, security, and governance.

Skill Domain	Exam Weight
Manage Azure identities and governance	20–25%
Deploy and manage Azure compute resources	20–25%
Implement and manage virtual networking	15–20%
Implement and manage storage	15–20%
Monitor and maintain Azure resources	10–15%

Prerequisites: No formal prerequisites, but 6+ months of hands-on Azure experience is strongly recommended. Familiarity with PowerShell, Azure CLI, Azure Portal, and ARM/Bicep templates expected.

What is Azure Resource Manager (ARM) and how does it underpin everything in Azure?

Azure Resource Manager (ARM):
→ The management layer for ALL Azure resources
→ Every action (Portal, CLI, PowerShell, REST API, Terraform) goes through ARM
→ Provides: authentication, authorisation, tagging, locking, templates

ARM concepts:
Subscription:      billing and access boundary — contains resource groups
Resource Group:    logical container for related Azure resources
                   all resources in an RG share the same lifecycle
Resource:          individual Azure service (VM, storage account, VNet)
Resource Provider: the service that supplies a resource type
                   Microsoft.Compute (VMs), Microsoft.Network (VNets),
                   Microsoft.Storage (storage accounts)

ARM operations:
Control plane:   manage resources (create, delete, configure) via ARM
Data plane:      access resource data (read a blob, send a queue message)
                 controlled by resource-level access (keys, RBAC data roles)

Resource Group best practices:
→ Group by lifecycle: resources deleted together go in same RG
→ Group by environment: Production-RG, Dev-RG, Test-RG
→ Region: RG has a region (for metadata), but can contain resources
  from any region
→ RBAC applied at RG level applies to all resources in the RG
→ Delete RG = delete ALL resources in it (use locks to protect)

What are Azure Management Groups and how do they enable governance at scale?

Management Group hierarchy:
Root Management Group (tenant level)
  └── Management Group (e.g., "Production")
        └── Management Group (e.g., "EMEA")
              └── Subscription A
              └── Subscription B
        └── Management Group (e.g., "APAC")
              └── Subscription C

Why Management Groups:
→ Apply Azure Policy across multiple subscriptions at once
→ Apply RBAC role assignments across multiple subscriptions
→ Manage hundreds of subscriptions from a single hierarchy
→ Up to 6 levels of hierarchy (not counting root)

Example governance with Management Groups:
Root
  ├── Corporate (MG) — Audit policy applied here (all subs inherit)
  │     ├── Production (MG) — No public IP policy applied here
  │     │     ├── Prod-UK subscription
  │     │     └── Prod-US subscription
  │     └── Non-Production (MG) — Dev/test allowed policies
  │           ├── Dev subscription
  │           └── Test subscription
  └── Sandbox (MG) — Allow all (experimental)
        └── Sandbox subscription

What is Azure Policy and how does it work?

Azure Policy enforces organisational standards and assesses compliance
at scale across subscriptions and resource groups.

Policy components:
Policy definition:  the rule (e.g., "VMs must use managed disks")
Initiative:         a collection of policy definitions (e.g., "CIS Azure benchmark")
Assignment:         apply policy/initiative to a scope (MG/subscription/RG)
Compliance:         dashboard showing compliant vs non-compliant resources
Remediation:        fix non-compliant resources (deployIfNotExists,
                    modify effects)

Policy effects (what happens when condition is met):
Deny:                block the resource creation/update (strongest)
Audit:               allow but mark as non-compliant (reporting only)
Append:              add fields to the resource (e.g., required tags)
Modify:              change resource properties (e.g., disable public IP)
DeployIfNotExists:   deploy a related resource if missing (e.g., deploy
                     diagnostics extension if not present)
AuditIfNotExists:    audit if a related resource is missing

Common built-in policies:
→ "Allowed locations" — restrict deployments to specific regions
→ "Require a tag and its value" — enforce tagging standards
→ "Allowed virtual machine SKUs" — restrict VM sizes
→ "VMs should use managed disks" — no unmanaged disk VMs
→ "Secure transfer to storage accounts should be enabled"
→ "Allowed resource types" — whitelist only approved resource types

Policy vs RBAC:
Azure Policy: WHAT can be deployed (resource properties/config)
Azure RBAC:   WHO can deploy/manage resources (identity-based)
Both needed: RBAC controls access, Policy controls configuration

What are Azure resource locks and when do you use them?

Resource locks prevent accidental deletion or modification of
critical Azure resources — even by subscription owners.

Lock types:
CanNotDelete (Delete lock):
→ Users can read and modify the resource but CANNOT delete it
→ Most common lock — protect production resources from accidental deletion

ReadOnly:
→ Users can read the resource but CANNOT modify OR delete it
→ Equivalent to applying Reader role to everyone
→ Use with caution — may break operations that need to modify resource
   (e.g., restarting a VM requires a write operation)

Lock scope (inherited by children):
Subscription → Resource Group → Resource
Lock on RG protects all resources in the RG

Lock hierarchy:
If RG has Delete lock: resources in RG cannot be deleted
If Resource has Delete lock: that specific resource cannot be deleted
Most restrictive lock wins

Commands:
# Apply lock:
az lock create --name "ProductionLock" --resource-group "Prod-RG" \
  --lock-type CanNotDelete

# PowerShell:
New-AzResourceLock -LockName "ProductionLock" -LockLevel CanNotDelete \
  -ResourceGroupName "Prod-RG"

# Remove lock (must remove before deleting resource):
az lock delete --name "ProductionLock" --resource-group "Prod-RG"

Tip: Resource locks are the last safety net for production resources. Always apply a CanNotDelete lock to production resource groups. Locks must be removed before deleting — this intentional friction prevents accidents.

What is Azure Cost Management and how do you control spending?

Azure Cost Management + Billing:
→ Monitor, allocate, and optimise Azure spending
→ Access: Azure Portal → Cost Management + Billing

Key features:
Cost analysis:    visualise spending by service, resource, tag, location
Budgets:          set spending thresholds with email/action group alerts
Cost alerts:      alert when budget reaches 50%, 75%, 90%, 100%
Advisor:          AI recommendations to reduce cost, improve security,
                  performance, reliability

Cost control tools:
Tags:             tag resources (Environment=Production, Team=Finance)
                  filter Cost Analysis by tag to see team-level spend
Reservations:     commit to 1 or 3 years → 40-72% discount vs pay-as-you-go
Savings Plans:    flexible hourly commitment → 11-65% discount
Azure Hybrid Benefit: use existing Windows Server/SQL Server licences
                      on Azure VMs → save up to 40%
Dev/Test pricing: reduced rates for non-production subscriptions
Auto-shutdown:    schedule VMs to shut down outside business hours

Budget alert example:
Budget: £10,000/month for Production subscription
Alerts:
  → 80% (£8,000) reached: email Finance team
  → 100% (£10,000) reached: email Finance + CTO + trigger action group
  → Action group: call Azure Automation runbook to tag new VMs for review

2. Implement and Manage Storage (15–20%)

What are Azure Storage account types and redundancy options?

Storage account types:
Standard General Purpose v2 (GPv2):
→ Supports: Blob, File, Queue, Table storage
→ Performance: Standard (HDD-backed)
→ Use for: most workloads, backup, archive

Premium Block Blobs:
→ Supports: Block blobs and append blobs only
→ Performance: SSD-backed, very low latency
→ Use for: high-throughput scenarios, AI/ML data pipelines

Premium File Shares:
→ Supports: Azure Files only
→ Performance: SSD-backed
→ Use for: high-performance file shares (databases, FSLogix profiles)

Premium Page Blobs:
→ Supports: Page blobs only (used for VM disks)
→ Performance: SSD-backed
→ Use for: IaaS VM managed disks (usually auto-selected)

Redundancy options (data durability):
LRS (Locally Redundant Storage):
→ 3 copies within a single datacenter (single availability zone)
→ 11 nines durability (99.999999999%)
→ Cheapest — no protection against datacenter failure

ZRS (Zone-Redundant Storage):
→ 3 copies across 3 availability zones in the same region
→ 12 nines durability
→ Survives datacenter failure
→ Recommended for most production workloads

GRS (Geo-Redundant Storage):
→ LRS in primary region + LRS in secondary region (100s of miles away)
→ Secondary is read-accessible only after Microsoft initiates failover
→ 16 nines durability

GZRS (Geo-Zone-Redundant Storage):
→ ZRS in primary + LRS in secondary
→ Highest durability (16 nines)
→ Most expensive — use for critical data

RA-GRS / RA-GZRS:
→ Adds read access to secondary region at all times (not just after failover)
→ Use when: need to read from secondary for DR or read scale-out

What are the Azure Blob storage access tiers?

Access tiers (balance cost vs access frequency):

Hot:
→ Highest storage cost, lowest access cost
→ Use for: frequently accessed data (active databases, app files)
→ Minimum storage duration: none

Cool:
→ Lower storage cost than Hot, higher access cost
→ Use for: infrequently accessed data (30-day minimum recommended)
→ Minimum storage duration: 30 days (early deletion fees apply)

Cold (newer tier):
→ Lower storage cost than Cool, higher access cost than Cool
→ Use for: rarely accessed data, kept at least 90 days
→ Minimum storage duration: 90 days

Archive:
→ Lowest storage cost (~90% cheaper than Hot)
→ OFFLINE — data is not immediately accessible
→ Rehydration required before reading:
  Standard rehydration: up to 15 hours
  High priority rehydration: under 1 hour (higher cost)
→ Use for: long-term backup, compliance data, rarely accessed archives
→ Minimum storage duration: 180 days

Lifecycle Management policies:
→ Automatically transition blobs between tiers based on age/conditions
→ Example:
  After 30 days → move from Hot to Cool
  After 90 days → move from Cool to Cold
  After 365 days → move to Archive
  After 2555 days (7 years) → delete

# PowerShell - set blob access tier:
Set-AzStorageBlobTier -Container "mycontainer" -Blob "myfile.pdf" \
  -Tier Archive -Context $storageContext

What is Azure Files and how does it compare to Blob storage?

Azure Files:
→ Fully managed cloud file shares accessible via SMB (3.0/2.1) and NFS
→ Can be mounted as a network drive on Windows, Linux, macOS
→ Compatible with on-prem file server applications
→ Use cases: lift-and-shift of on-prem file servers, shared config files,
             FSLogix profile containers (Azure Virtual Desktop)

Azure File Sync:
→ Caches Azure file shares on Windows Server on-premises
→ On-prem server has frequently used files locally (fast access)
→ Infrequently used files stored only in Azure (cloud tiering)
→ Supports multiple on-prem servers all syncing to the same Azure share
→ Use for: hybrid file server scenarios, replacing DFS

Azure Blob vs Azure Files:
                    Blob Storage          Azure Files
Access protocol:    HTTP/HTTPS REST       SMB, NFS, REST
Mount as drive:     No                    Yes (SMB/NFS)
Use case:           Unstructured data,    File shares, home dirs
                    media, backup         app config, lift-and-shift
Max file size:      190.7 TB per blob     4 TB per file (Standard)
Performance:        Scales massively      Depends on share tier
POSIX permissions:  No                    Yes (NFS shares)

What are Shared Access Signatures (SAS) and when do you use them?

SAS (Shared Access Signature):
→ A URI that grants limited, time-bound access to Azure Storage resources
→ No need to share the storage account key
→ Granular control: which resource, what operations, how long, from where

SAS types:
Account SAS:    access to multiple storage services (Blob + Queue + Table)
Service SAS:    access to a specific service (Blob only)
User delegation SAS: signed with Entra ID user credentials (most secure)
                     no storage account key needed — audit trail in Entra logs

SAS parameters:
sv:  service version
st:  start time
se:  expiry time (always set — limit exposure window)
sr:  resource type (b=blob, c=container, f=file, q=queue)
sp:  permissions (r=read, w=write, d=delete, l=list, c=create)
sig: cryptographic signature

Example SAS URI:
https://mystorageaccount.blob.core.windows.net/mycontainer/file.pdf
?sv=2023-11-03&st=2025-01-01T00%3A00%3A00Z
&se=2025-01-31T23%3A59%3A00Z&sr=b&sp=r
&sig=xxxxxxxxxxxxxxxxxxxxxxxxxxx

Stored Access Policy:
→ Define SAS constraints on the container/queue/table (server-side)
→ Allows revoking a SAS by modifying or deleting the stored access policy
→ Without a stored access policy, a SAS cannot be revoked before expiry

Access control hierarchy (most to least preferred):
1. Entra ID RBAC (Storage Blob Data Reader/Contributor) — no keys needed
2. User delegation SAS — Entra-backed, audited
3. Service/Account SAS — key-based, harder to revoke
4. Storage account key — full access, treat like a root password

3. Deploy and Manage Azure Compute (20–25%)

What are Azure Virtual Machines and key configuration concepts?

VM creation key decisions:
Region:         where the VM runs — affects latency, compliance, availability
Size/SKU:       CPU, RAM, temp disk, max NICs, max data disks
                B-series: burstable (dev/test)
                D-series: general purpose (most workloads)
                E-series: memory optimised (databases, in-memory analytics)
                F-series: compute optimised (batch, gaming)
                N-series: GPU (AI/ML training, graphics)
Image:          OS (Windows Server 2022, Ubuntu 22.04, RHEL, custom)
Disk:           OS disk (required) + data disks (optional)
Authentication: SSH key pair (Linux) or password (Windows — avoid for prod)
Networking:     VNet, subnet, NSG, public IP (avoid if possible)
Availability:   Availability Zones, Availability Sets, VMSS

VM disk types:
Standard HDD:  cheapest, high latency — backup/archive workloads
Standard SSD:  lower latency than HDD — dev/test, light workloads
Premium SSD:   SLA-backed, low latency — production workloads
Premium SSD v2: more granular performance control
Ultra Disk:    highest IOPS (up to 400,000) — SAP HANA, top-tier DBs

Managed disks:
→ Azure manages storage account for disk — no self-management needed
→ Snapshots: point-in-time copy of a managed disk
→ Images: generalised snapshot used to create new VMs
→ Disk encryption: Azure Disk Encryption (BitLocker/dm-crypt)
               or Server-Side Encryption with customer-managed keys

VM extensions:
Custom Script Extension:  run scripts on VM after deployment
Azure Monitor Agent:      collect metrics/logs to Azure Monitor
DSC Extension:            apply PowerShell DSC configurations
Diagnostics Extension:    send guest OS metrics to Azure Monitor
Microsoft Antimalware:    antivirus for Windows VMs

What are Availability Sets and Availability Zones?

Availability Sets:
→ Protect VMs from hardware failures WITHIN a single datacenter
→ Fault domains: separate physical hardware (rack, power, network)
→ Update domains: separate maintenance windows
→ VMs spread across fault domains (up to 3) + update domains (up to 20)
→ SLA: 99.95% uptime (for 2+ VMs in an availability set)
→ Does NOT protect against datacenter/zone failure
→ Legacy approach — prefer Availability Zones for new deployments

Availability Zones:
→ Physically separate datacenters within the same Azure region
→ Each zone has independent power, cooling, and networking
→ Deploy VM in Zone 1, Zone 2, Zone 3 = protected from datacenter failure
→ SLA: 99.99% uptime (for 2+ VMs in different zones)
→ Not all regions have 3 AZs — check region capabilities first
→ Zone-resilient services: ZRS storage, zone-redundant App Gateways

VM Scale Sets (VMSS):
→ Deploy and manage a group of identical, load-balanced VMs
→ Auto-scale: add/remove VMs based on CPU, memory, schedule, or custom metrics
→ Uniform mode: all VMs use the same image and size
→ Flexible mode: mix of VMs (different sizes/images) — recommended
→ Works with Azure Load Balancer or Application Gateway

Decision:
Single datacenter protection   → Availability Set (legacy)
Datacenter/zone protection     → Availability Zones (recommended)
Auto-scaling web tier          → VM Scale Sets in Availability Zones
Highest HA (99.99% SLA)        → 2+ VMs across 2+ Availability Zones

What are Azure App Service and its key features?

Azure App Service:
→ Fully managed PaaS for hosting web apps, REST APIs, mobile backends
→ Supports: .NET, Java, Node.js, Python, PHP, Ruby, custom containers
→ No server management — Azure handles OS patching, scaling, load balancing

App Service Plan:
→ The compute resources underlying the App Service
→ Defines: region, number of VM instances, size of VM instances, pricing tier

Pricing tiers:
Free/Shared: no SLA, shared infrastructure, limited — dev only
Basic:       dedicated compute, manual scale, no slots
Standard:    auto-scale, deployment slots (staging), custom domains/SSL
Premium:     more instances, more slots, VNet integration, higher performance
Isolated:    App Service Environment (ASE) — private, VNet-injected, highest isolation

Key features:
Deployment slots:
→ Separate environments (staging, QA) on the same App Service
→ Swap slots: zero-downtime deployment
   Staging → Production swap: production gets staging's code instantly
   If deployment fails: swap back (rollback in seconds)
→ Standard tier: 5 slots, Premium: 20 slots

Auto-scale:
→ Scale out (add instances): based on CPU %, memory, HTTP queue length
→ Scale in (remove instances): when metrics drop
→ Schedule-based: scale up at 8am, scale down at 6pm

VNet Integration:
→ App Service can make outbound calls to resources in a VNet
→ Required for: private SQL, private storage, on-prem via VPN/ExpressRoute
→ Does NOT allow inbound traffic from VNet (use Private Endpoint for that)

Deployment methods:
→ GitHub Actions / Azure DevOps CI/CD pipeline (recommended)
→ Local Git deployment
→ ZIP deploy (az webapp deployment)
→ Container deployment (ACR → App Service)

What is Azure Kubernetes Service (AKS) and key admin concepts?

AKS: managed Kubernetes cluster on Azure
→ Microsoft manages: control plane (API server, etcd, scheduler)
→ You manage: worker nodes (node pools), workloads

Key concepts:
Node pool:      group of VMs with same size/config — can have multiple pools
System pool:    runs kube-system pods (required)
User pool:      runs application workloads
Node size:      choose VM SKU for the node pool
Autoscaler:     cluster autoscaler — adds/removes nodes based on pod demand
HPA:            Horizontal Pod Autoscaler — adds/removes pods based on CPU/memory

Networking modes:
Kubenet:        nodes get IP from Azure VNet, pods get IP from overlay network
                pods use NAT to communicate outside cluster
Azure CNI:      pods get IP directly from VNet subnet
                enables direct pod-to-pod communication from VNet
                requires more IP addresses (plan subnet size carefully)

AKS + Entra ID integration:
→ RBAC for Kubernetes: use Entra ID groups as Kubernetes RBAC subjects
→ Managed Identity: AKS uses Managed Identity to pull from ACR, access Key Vault
→ Workload Identity: pods authenticate to Azure services using Entra ID

AKS storage:
Azure Disk:     ReadWriteOnce — one pod on one node
Azure Files:    ReadWriteMany — multiple pods on multiple nodes (SMB)
Azure Blob:     ReadWriteMany (via BlobFuse) — large unstructured data

4. Implement and Manage Virtual Networking (15–20%)

What are Virtual Networks (VNets) and subnets in Azure?

Virtual Network (VNet):
→ Isolated, private network in Azure — your own address space
→ Resources in a VNet communicate privately by default
→ Define address space: CIDR notation (e.g., 10.0.0.0/16)
→ VNet scoped to a single region (cannot span regions)

Subnets:
→ Subdivide VNet address space into smaller ranges
→ Each resource goes in a specific subnet
→ Reserved addresses per subnet: first 4 + last 1 = 5 IPs unusable
  e.g., 10.0.0.0/24 = 256 addresses, 251 usable

Subnet design example:
VNet: 10.0.0.0/16 (65,536 addresses)
  ├── WebSubnet:    10.0.1.0/24 (251 usable — web servers/App Service)
  ├── AppSubnet:    10.0.2.0/24 (251 usable — application tier)
  ├── DataSubnet:   10.0.3.0/24 (251 usable — databases)
  ├── GatewaySubnet: 10.0.4.0/27 (27 usable — VPN/ExpressRoute GW)
  └── AzureBastionSubnet: 10.0.5.0/26 (59 usable — Bastion host)
      (must be named exactly "AzureBastionSubnet")

VNet peering:
→ Connect two VNets privately through Microsoft backbone (no internet)
→ Can be same region (VNet Peering) or different regions (Global VNet Peering)
→ NOT transitive: A↔B, B↔C does NOT mean A↔C
   Solution: hub-spoke topology or Azure Virtual WAN
→ Peering is non-overlapping: address spaces must not overlap

Service endpoints:
→ Extend VNet identity to Azure PaaS services (Storage, SQL, Key Vault)
→ Traffic stays on Microsoft backbone (not internet)
→ PaaS service can be locked to your VNet's traffic only

Private Endpoints:
→ Assign a private IP from your VNet to a PaaS service
→ The PaaS service gets a private IP in your VNet
→ Disable public internet access on the PaaS resource
→ Access via Private DNS Zone: storage.privatelink.blob.core.windows.net
→ More secure than Service Endpoints — completely private
→ Recommended over Service Endpoints for production

What are Network Security Groups (NSGs) and how do they work?

NSG (Network Security Group):
→ Stateful firewall controlling inbound/outbound traffic
→ Applied to: subnet (all resources in subnet) or NIC (specific VM)

NSG rule components:
Priority:          100-4096 (lower number = higher priority)
Name:              descriptive name
Source:            IP, CIDR range, Service Tag, or Application Security Group
Source port:       * or specific port/range
Destination:       IP, CIDR range, Service Tag, or Application Security Group
Destination port:  80, 443, 3389, * or range
Protocol:          TCP, UDP, ICMP, or Any
Action:            Allow or Deny

Default rules (cannot be deleted, lowest priority):
Inbound:  AllowVNetInbound (65000), AllowAzureLoadBalancerInbound (65001),
          DenyAllInbound (65500)
Outbound: AllowVNetOutbound (65000), AllowInternetOutbound (65001),
          DenyAllOutbound (65500)

Service Tags (pre-defined IP ranges maintained by Microsoft):
Internet:           all public IP addresses
VirtualNetwork:     VNet address space + peered VNets + on-prem
AzureLoadBalancer:  Azure Load Balancer's health probe IP (168.63.129.16)
Storage:            Azure Storage IP ranges
Sql:                Azure SQL IP ranges
AppService:         App Service IP ranges

Application Security Groups (ASG):
→ Group VMs logically (WebServers, AppServers, DBServers)
→ Write NSG rules referencing ASG instead of IP addresses
→ As VMs are added to ASG, rules apply automatically — no IP updates needed

NSG best practices:
→ Apply at subnet level (not NIC) for simpler management
→ Never allow RDP (3389) or SSH (22) from Internet — use Azure Bastion
→ Use Service Tags instead of hard-coded IP ranges
→ Use ASGs for application-tier segmentation

What are Azure Load Balancer and Application Gateway?

Azure Load Balancer (Layer 4 — TCP/UDP):
→ Distributes inbound flows across backend pool VMs
→ Layer 4 (transport layer) — balances based on IP + port
→ No SSL termination, no URL routing, no WAF

Types:
Public LB:   balances internet-facing traffic
Internal LB: balances traffic within a VNet (no public IP)

SKUs:
Basic:   limited features, no SLA, no AZ support — avoid for production
Standard: zone-redundant, secure by default, SLA 99.99%, HTTPS health probes

Standard LB components:
Frontend IP:    public or private IP that receives traffic
Backend pool:   VMs or VMSS that receive distributed traffic
Health probe:   TCP, HTTP, or HTTPS check — removes unhealthy VMs
Load balancing rule: frontend port → backend port mapping
Inbound NAT rule:   direct specific port to specific VM (e.g., RDP to VM1)

Application Gateway (Layer 7 — HTTP/HTTPS):
→ Web traffic load balancer with SSL termination, URL routing, WAF
→ Layer 7 (application layer) — makes routing decisions based on URL/headers

Key features:
SSL termination:     decrypts HTTPS at the gateway — backend can use HTTP
URL path routing:    /images → ImageServers pool, /api → APIServers pool
Host-based routing:  store.contoso.com → StoreFrontend, api.contoso.com → APIBackend
WAF (Web App Firewall): protects against OWASP Top 10 (SQLi, XSS, etc.)
Autoscaling:         Application Gateway v2 scales automatically
Cookie-based affinity: stick user sessions to same backend server

Decision:
Non-HTTP traffic (TCP/UDP)       → Azure Load Balancer
HTTP/HTTPS with URL routing/WAF  → Application Gateway
Global HTTP routing (multi-region) → Azure Front Door
DNS-based routing (any protocol)  → Azure Traffic Manager

What is Azure VPN Gateway and ExpressRoute?

Azure VPN Gateway:
→ Connects on-premises networks to Azure via encrypted VPN tunnel (IPsec/IKE)
→ Requires GatewaySubnet in the VNet

VPN types:
Site-to-Site (S2S):
→ Persistent IPsec tunnel between on-prem VPN device and Azure
→ Traffic encrypted over public internet
→ Bandwidth: up to 10 Gbps (depends on SKU)
→ Use for: branch office connectivity, hybrid workloads

Point-to-Site (P2S):
→ Individual client devices connect to Azure VNet via VPN
→ Supports: OpenVPN, SSTP, IKEv2
→ Use for: remote workers connecting to Azure resources

VPN Gateway SKUs:
Basic:      dev/test, limited bandwidth, no zone-redundancy
VpnGw1-5:  production, varying bandwidth (650 Mbps to 10 Gbps)
VpnGw1-5AZ: zone-redundant (recommended for production)

Azure ExpressRoute:
→ Private, dedicated connection from on-prem to Azure (NOT over internet)
→ Through a connectivity provider (BT, Equinix, Megaport)
→ Bandwidth: 50 Mbps to 100 Gbps
→ Latency: consistent, predictable (dedicated circuit)
→ SLA: 99.95% uptime

ExpressRoute vs VPN:
                    VPN Gateway          ExpressRoute
Connectivity:       Over internet        Private (dedicated)
Encryption:         Yes (IPsec)          No (provider responsibility)
Bandwidth:          Up to 10 Gbps        Up to 100 Gbps
Latency:            Variable             Consistent/low
Cost:               Lower               Higher
Use for:            Dev/test, smaller   Production, regulated,
                    workloads           high bandwidth needs

5. Monitor and Maintain Azure Resources (10–15%)

What is Azure Monitor and what does it collect?

Azure Monitor:
→ Comprehensive monitoring solution for Azure resources
→ Collects: metrics, logs, distributed traces, changes

Data types:
Metrics:    numerical time-series data (CPU %, disk IOPS, request count)
            stored 93 days in Azure Monitor Metrics store
            sub-minute granularity available
Logs:       structured/unstructured text data (activity logs, resource logs)
            stored in Log Analytics workspace (configurable retention)
            queried with Kusto Query Language (KQL)
Traces:     distributed transaction traces (App Insights)
Changes:    resource configuration changes (Change Analysis)

Azure Monitor sources:
Activity Log:    ALL control plane operations in a subscription
                 (who created/deleted/modified which resource)
                 90 days retention, export to Log Analytics for longer
Resource logs:   diagnostic logs from Azure resources
                 (VM guest OS, Storage analytics, SQL query logs)
                 not collected by default — must enable Diagnostic Settings
Guest OS metrics: CPU, memory, disk from inside the VM
                  requires Azure Monitor Agent installed on VM
Application logs: custom app telemetry via Application Insights SDK

Azure Monitor Agent (AMA):
→ Replaces deprecated Log Analytics agent (MMA) and Diagnostics extension
→ Collects: Windows Event Logs, Linux Syslog, performance counters
→ Configured via Data Collection Rules (DCRs) — flexible, per-VM rules
→ Deploy via Azure Policy for all VMs or Arc-connected servers

What is Log Analytics and KQL (Kusto Query Language)?

Log Analytics workspace:
→ Central repository for all Azure Monitor log data
→ All resource diagnostic logs, VM logs, Activity Log sent here
→ Queried using KQL
→ Retention: 30 days default, configurable to 2 years (longer via archive)

KQL basics (most common exam patterns):
// Get all VMs that had high CPU in last 24 hours:
Perf
| where TimeGenerated > ago(24h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where CounterValue > 90
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| order by avg_CounterValue desc

// Find all failed login attempts:
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4625  // Failed logon
| summarize count() by Account, Computer
| order by count_ desc

// Storage account operations:
StorageBlobLogs
| where TimeGenerated > ago(1d)
| where OperationName == "DeleteBlob"
| project TimeGenerated, CallerIpAddress, Uri, StatusCode

// Join two tables:
AzureActivity
| where OperationNameValue == "Microsoft.Compute/virtualMachines/delete"
| join kind=inner (
    Heartbeat | summarize LastSeen=max(TimeGenerated) by Computer
  ) on $left.ResourceGroup == $right.Computer

KQL operators:
where:      filter rows
project:    select columns
summarize:  aggregate (count, sum, avg, max, min)
order by:   sort results
join:       combine tables
extend:     add calculated columns
ago():      time ago (ago(1h), ago(7d), ago(30d))
bin():      group time into buckets (bin(TimeGenerated, 1h))

What is Azure Backup and how does it work?

Azure Backup:
→ Cloud-based backup service — no on-prem backup infrastructure needed
→ Central management: Recovery Services Vault (classic) or Backup Vault (new)

What can be backed up:
Azure VMs:          full VM backup (OS + data disks) — agent-less
Azure SQL in VM:    workload-aware backup (transaction log backup)
Azure Files:        file share snapshots
Azure Blobs:        operational backup (point-in-time restore)
On-prem Windows:    via MARS agent → files, folders, system state
On-prem servers:    via Azure Backup Server (MABS) or DPM

Azure VM Backup:
→ Snapshot → transferred to Recovery Services Vault
→ Crash-consistent backup: VM point-in-time consistent
→ Application-consistent: quiesce app before snapshot (VSS on Windows)
→ Retention: up to 9,999 recovery points
→ Restore options: Create new VM | Replace existing disk | Restore files

Recovery Services Vault key settings:
Backup policy:      frequency (daily/weekly) + retention schedule
                    (daily: 7-180 days, weekly: up to 5 years)
Soft delete:        deleted backup data retained 14 days — protects against
                    ransomware deletion attacks (cannot be disabled in 14-day window)
Cross-region restore: restore to paired region for DR testing
Immutable vault:    lock vault to prevent deletion of backup data

Azure Site Recovery (ASR):
→ Disaster Recovery — replicate VMs to another Azure region
→ Continuous replication → near-zero RPO (minutes)
→ Failover: redirect traffic to replica VMs in DR region
→ Failback: return to primary region after incident resolved
→ Test failover: validate DR plan without impacting production

RTO vs RPO:
RTO (Recovery Time Objective):  how long before service is restored?
RPO (Recovery Point Objective): how much data loss is acceptable?
ASR typically achieves: RPO < 15 minutes, RTO < 2 hours

6. ARM Templates, Bicep & Automation

What are ARM templates and Bicep?

ARM Templates (JSON):
→ Declarative infrastructure-as-code for Azure resources
→ Define WHAT you want — Azure figures out HOW to create it
→ Idempotent: apply same template multiple times → same result

ARM template structure:
{
  "$schema": "https://schema.management.azure.com/schemas/...",
  "contentVersion": "1.0.0.0",
  "parameters": {         // inputs to the template
    "vmName": { "type": "string", "defaultValue": "myVM" },
    "vmSize": { "type": "string", "allowedValues": ["Standard_D2s_v3"] }
  },
  "variables": {          // computed values
    "storageAccountName": "[concat('storage', uniqueString(resourceGroup().id))]"
  },
  "resources": [          // resources to deploy
    {
      "type": "Microsoft.Compute/virtualMachines",
      "apiVersion": "2023-03-01",
      "name": "[parameters('vmName')]",
      "location": "[resourceGroup().location]",
      "properties": { ... }
    }
  ],
  "outputs": {            // values to return after deployment
    "vmPublicIp": { "type": "string", "value": "[reference(...).ipAddress]" }
  }
}

Bicep (recommended over JSON ARM):
→ Domain-specific language that compiles to ARM JSON
→ Cleaner syntax, better IntelliSense, type safety
→ Transpiles to ARM JSON — same capabilities, easier to write

// Bicep equivalent of above:
param vmName string = 'myVM'
param vmSize string = 'Standard_D2s_v3'

var storageAccountName = 'storage${uniqueString(resourceGroup().id)}'

resource vm 'Microsoft.Compute/virtualMachines@2023-03-01' = {
  name: vmName
  location: resourceGroup().location
  properties: {
    hardwareProfile: {
      vmSize: vmSize
    }
    ...
  }
}

output vmId string = vm.id

Deploy ARM/Bicep:
# Azure CLI:
az deployment group create \
  --resource-group "MyRG" \
  --template-file "main.bicep" \
  --parameters vmName="MyVM"

# PowerShell:
New-AzResourceGroupDeployment \
  -ResourceGroupName "MyRG" \
  -TemplateFile "main.bicep" \
  -vmName "MyVM"

What is Azure Automation and what are Runbooks?

Azure Automation:
→ Cloud-based automation service for process automation, configuration,
  update management, and desired state configuration

Components:
Runbooks:          PowerShell or Python scripts run automatically
                   Types: PowerShell, PowerShell Workflow, Python, Graphical
Schedules:         trigger runbooks on a time-based schedule
Webhooks:          trigger runbooks via HTTP POST (from alerts, Logic Apps)
Managed Identity:  authenticate runbooks to Azure without stored credentials
Hybrid Runbook Worker: run runbooks on on-prem servers (not just Azure)

Update Management:
→ Assess and deploy OS updates to Azure VMs and on-prem servers
→ Scheduled maintenance windows for controlled patching
→ Compliance reporting: which VMs are missing updates

Desired State Configuration (DSC):
→ Ensure VMs maintain a specific configuration (PowerShell DSC)
→ E.g., ensure IIS is always installed, specific registry keys are set
→ Azure Automation pulls VM configuration and remediates drift

Common Runbook scenarios:
→ Start/stop VMs on schedule (cost saving)
→ Clean up old snapshots/resources
→ Auto-remediate Azure Policy non-compliance
→ Scale App Service Plan based on business hours
→ Rotate storage account keys and update Key Vault

# PowerShell Runbook example — stop all VMs in a resource group:
Connect-AzAccount -Identity  # Use Managed Identity — no credentials

$vms = Get-AzVM -ResourceGroupName "Dev-RG" -Status |
       Where-Object { $_.PowerState -eq "VM running" }

foreach ($vm in $vms) {
    Stop-AzVM -Name $vm.Name -ResourceGroupName "Dev-RG" -Force
    Write-Output "Stopped: $($vm.Name)"
}

7. Scenario-Based Questions

Scenario: Design a highly available web application architecture on Azure.

Requirement: 99.99% SLA, auto-scaling, DDoS protection, private backend

Architecture:
Internet
  ↓
[Azure DDoS Protection Standard] — protects public endpoint
  ↓
[Application Gateway v2 + WAF] — Zone-redundant, SSL termination,
  ↓                               URL routing, OWASP protection
[VM Scale Set — WebTier]        — Windows/Linux web servers
  3 zones × auto-scale (2-20 instances)
  Standard LB (internal) for health distribution
  ↓
[Internal Load Balancer]        — distributes to app tier
  ↓
[VM Scale Set — AppTier]        — application servers
  3 zones × auto-scale (2-10 instances)
  ↓
[Azure SQL — Business Critical] — zone-redundant, auto-failover group
  + Azure Cache for Redis        — session cache, query cache

Networking:
VNet: 10.0.0.0/16
  WebSubnet:  10.0.1.0/24 (App Gateway + Web tier)
  AppSubnet:  10.0.2.0/24 (App tier)
  DataSubnet: 10.0.3.0/24 (SQL private endpoint)
  GWSubnet:   10.0.4.0/27 (for future VPN/ExpressRoute)

Security:
NSG on WebSubnet:  allow 80/443 from Internet, deny all else inbound
NSG on AppSubnet:  allow only from WebSubnet (no direct internet)
NSG on DataSubnet: allow only from AppSubnet on SQL port 1433
Private Endpoint:  Azure SQL accessed via private IP in DataSubnet
Bastion:           Azure Bastion for admin RDP/SSH (no public IP on VMs)
Key Vault:         store connection strings, API keys (no secrets in config)
Managed Identity:  VMs access Key Vault and Storage — no credentials in code

Scenario: A VM cannot connect to Azure SQL Database. How do you troubleshoot?

Check NSG rules: VM's subnet NSG and NIC NSG — is outbound port 1433 allowed to the SQL server IP or Service Tag Sql?
Check SQL firewall: Azure SQL → Networking → is the VM's subnet added as a VNet rule (Service Endpoint) or is there a private endpoint? Is public access disabled?
Check routing: is there a custom UDR (User Defined Route) redirecting SQL traffic through a firewall/NVA? Ensure the route to SQL doesn't black-hole traffic.

Test connectivity from VM:

# Test TCP connection on port 1433:
Test-NetConnection -ComputerName "myserver.database.windows.net" -Port 1433
# Check for: TcpTestSucceeded: True

Check Private DNS: if using Private Endpoint, the VM must resolve myserver.database.windows.net to the private IP (10.x.x.x), not the public IP. Check if the VNet is linked to the Private DNS Zone.
Check SQL authentication: connection string correct? Entra ID auth vs SQL auth? User has permissions on the database?
Network Watcher — IP Flow Verify: Network Watcher → IP Flow Verify — simulate traffic from VM to SQL and see which NSG rule is blocking.

Scenario: Reduce Azure costs for a development environment running 24/7.

Auto-shutdown: enable VM auto-shutdown at 6pm (or use Azure Automation runbook) → restart at 8am. Saves ~58% of VM compute cost.
Spot VMs for dev workloads: use Azure Spot VMs for dev/test (up to 90% discount) — acceptable for non-critical workloads that can tolerate interruption.
B-series VMs: switch dev VMs to B-series (burstable) — much cheaper than D-series for low-average-load workloads.
Dev/Test pricing: activate Dev/Test subscription pricing for non-production workloads — discounted Windows Server and SQL licensing.
Azure Hybrid Benefit: if you have existing Windows Server or SQL Server licences with Software Assurance → apply Hybrid Benefit on VMs to eliminate licence cost.
Reserved Instances: for always-on resources (build server, dev database) → 1-year reservation saves ~40%.
Right-size: Azure Advisor Cost recommendations → identify over-provisioned VMs with <5% average CPU utilisation → downsize.
Storage lifecycle policies: dev storage → Cool tier after 30 days, Archive after 90 days for old test data.
Delete unused resources: set Azure Policy to audit/alert on resources without required tags. Runbook: delete resources tagged Environment=Dev older than 30 days with no activity.

Scenario: How do you implement a backup strategy for Azure VMs in a regulated industry?

Requirements: 7-year retention, immutable backup, cross-region DR, audit trail

Implementation:
1. Recovery Services Vault per region (Primary + Secondary)

2. Backup policy:
   Daily backup:   retained 30 days
   Weekly backup:  retained 1 year (every Sunday)
   Monthly backup: retained 7 years (first Sunday of month)
   Yearly backup:  retained 7 years (January 1st)

3. Soft delete: enabled (default 14 days)
   Extended soft delete: 180 days for regulated environments

4. Immutable vault (locked):
   → Once locked, backup data cannot be deleted or modified
   → Protects against ransomware that tries to delete backups
   → Enable: Vault → Properties → Immutability → Locked

5. Cross-region restore: enabled on vault
   → Restore to secondary region for DR testing

6. Azure Monitor + Log Analytics:
   → Diagnostic settings on vault → Log Analytics
   → Alert on: backup job failures, unexpected deletions
   → KQL query: AzureDiagnostics | where Category == "AzureBackupReport"
                 | where OperationName == "DeleteBackupData"

7. Azure Backup Reports (Power BI dashboard):
   → Track compliance: which VMs are not backed up?
   → Report on backup storage consumption
   → Evidence for compliance audits

8. Azure Policy:
   → "Azure Backup should be enabled for Virtual Machines" initiative
   → Automatically enrol new VMs in backup policy

8. Cheat Sheet — Quick Reference

AZ-104 Exam Domain Weights

Domain                                      Weight    Priority
Manage Azure identities and governance      20-25%    ⭐⭐⭐⭐⭐
Deploy and manage Azure compute resources   20-25%    ⭐⭐⭐⭐⭐
Implement and manage virtual networking     15-20%    ⭐⭐⭐⭐
Implement and manage storage                15-20%    ⭐⭐⭐⭐
Monitor and maintain Azure resources        10-15%    ⭐⭐⭐

VM Availability Quick Reference

Protect against:              Use:
Hardware failure in rack      Availability Set (2+ fault domains)
Datacenter failure            Availability Zone (2+ VMs in 2+ zones)
Region failure                Azure Site Recovery (replicate to paired region)
App crashes (load balance)    Azure Load Balancer or App Gateway

SLAs:
Single VM (Premium SSD):  99.9%
Availability Set:         99.95%
Availability Zone (2+):   99.99%
Zone + Load Balancer:     99.99%

Storage Tier Decision

Data access frequency:        Use tier:
Multiple times per day        Hot
Once per 30 days              Cool
Once per 90 days              Cold
Once per year or less         Archive (+ rehydration time)

Redundancy decision:
No DR needed:                 LRS (cheapest)
Zone outage protection:       ZRS (recommended for most prod)
Region outage protection:     GRS or GZRS
Read access to secondary:     RA-GRS or RA-GZRS

Networking Quick Reference

Resource                 Purpose
VNet                     Private network — your isolated Azure network
Subnet                   Subdivide VNet — group resources by tier/function
NSG                      Stateful firewall on subnet or NIC
ASG                      Logical grouping of VMs for NSG rules
VNet Peering             Connect VNets privately (non-transitive)
Private Endpoint         Private IP for PaaS services in your VNet
Service Endpoint         Route PaaS traffic via VNet backbone
Azure Bastion            Secure RDP/SSH without public IP
Azure Firewall           Managed, stateful L4-L7 firewall (centralised)
Load Balancer            L4 TCP/UDP load balancing
Application Gateway      L7 HTTP/HTTPS with WAF and URL routing
VPN Gateway              Encrypted VPN to on-prem (over internet)
ExpressRoute             Private dedicated circuit to on-prem
Azure Front Door         Global HTTP load balancing + CDN + WAF
Traffic Manager          DNS-based global routing (any protocol)

Useful Azure CLI Commands

# Login and set subscription:
az login
az account set --subscription "Production"

# Create resource group:
az group create --name "Prod-RG" --location "uksouth"

# Create VM:
az vm create --resource-group "Prod-RG" --name "MyVM" \
  --image "Win2022Datacenter" --size "Standard_D2s_v3" \
  --admin-username "azureadmin" \
  --generate-ssh-keys

# Start/Stop VM:
az vm start --resource-group "Prod-RG" --name "MyVM"
az vm stop  --resource-group "Prod-RG" --name "MyVM"
az vm deallocate --resource-group "Prod-RG" --name "MyVM"

# Create storage account:
az storage account create --name "mystorageacc" \
  --resource-group "Prod-RG" --location "uksouth" \
  --sku "Standard_ZRS" --kind "StorageV2"

# Create VNet and subnet:
az network vnet create --resource-group "Prod-RG" \
  --name "MyVNet" --address-prefix "10.0.0.0/16"
az network vnet subnet create --resource-group "Prod-RG" \
  --vnet-name "MyVNet" --name "WebSubnet" --address-prefix "10.0.1.0/24"

# Apply resource lock:
az lock create --name "ProdLock" --resource-group "Prod-RG" \
  --lock-type "CanNotDelete"

# Deploy Bicep:
az deployment group create \
  --resource-group "Prod-RG" --template-file "main.bicep"

# Query logs (KQL via CLI):
az monitor log-analytics query \
  --workspace "MyWorkspace" \
  --analytics-query "AzureActivity | where ActivityStatus == 'Failed' | take 10"

Top 10 Tips

ARM is the control plane for everything — every Azure action (Portal, CLI, PowerShell, Terraform) goes through ARM. Understanding ARM hierarchy (Management Group → Subscription → Resource Group → Resource) is fundamental to every governance, RBAC, and policy question.
Policy controls WHAT, RBAC controls WHO — Azure Policy enforces resource configuration (must use managed disks, must be in specific region). RBAC controls who can deploy/manage resources. Both are needed — not interchangeable.
Availability Zones over Availability Sets for new deployments — Availability Zones protect against datacenter failure (99.99% SLA). Availability Sets only protect within one datacenter (99.95% SLA). Always recommend AZs for production.
Private Endpoints over Service Endpoints for production — Service Endpoints route traffic via backbone but PaaS still has a public endpoint. Private Endpoints give PaaS a private IP in your VNet — you can completely disable public access. More secure, more enterprise.
ZRS for most production storage — ZRS replicates across 3 availability zones (survives datacenter failure). LRS only replicates within one datacenter. GRS/GZRS for cross-region requirements. Knowing the right tier for the scenario is heavily tested.
Blob lifecycle management policies for cost control — move blobs from Hot → Cool → Cold → Archive automatically based on age. This pattern (with the day thresholds) is a common scenario question.
NSG is stateful — response traffic is automatic — if you allow inbound port 80, the response traffic is automatically allowed outbound. You don't need a separate outbound rule for responses. Only need rules for initiated connections.
Azure Bastion eliminates public IP on VMs — never open RDP (3389) or SSH (22) to the internet from an NSG. Use Azure Bastion (deployed in AzureBastionSubnet) for secure RDP/SSH via the Azure Portal. This is the correct answer to any "how do you securely administer VMs" question.
Soft delete + Immutable vault = ransomware protection for backups — soft delete keeps deleted backups for 14 days. Immutable vault (locked) prevents backup data deletion entirely. Both are the answer to "how do you protect backups against ransomware."
Bicep over JSON ARM for new IaC — Bicep is the modern, recommended IaC language for Azure (compiles to ARM JSON). Cleaner syntax, type safety, better tooling. Know how to deploy Bicep via CLI (az deployment group create --template-file main.bicep).

Sunday, May 17, 2026