Microsoft Azure Administrator (AZ-104) — Complete Guide
Identity & Governance · Storage · Compute · Virtual Networking · Monitoring & Backup · ARM & Bicep · Scenarios · Cheat Sheet
Top Hashtags: #AzureAdministrator, #AZ104, #MicrosoftAzure, #AzureAdmin, #CloudComputing, #AzureCertification, #AzureNetworking, #AzureCompute, #AzureStorage, #MicrosoftCertified
Table of Contents
- Exam Overview & Identity/Governance (20–25%)
- Implement and Manage Storage (15–20%)
- Deploy and Manage Azure Compute (20–25%)
- Implement and Manage Virtual Networking (15–20%)
- Monitor and Maintain Azure Resources (10–15%)
- ARM Templates, Bicep & Automation
- Scenario-Based Questions
- Cheat Sheet — Quick Reference
1. Exam Overview & Identity/Governance (20–25%)
AZ-104 Exam at a Glance
The AZ-104 Microsoft Azure Administrator exam validates expertise in implementing, managing, and monitoring an organisation's Azure environment. It covers virtual networks, storage, compute, identity, security, and governance.
| Skill Domain | Exam Weight |
|---|---|
| Manage Azure identities and governance | 20–25% |
| Deploy and manage Azure compute resources | 20–25% |
| Implement and manage virtual networking | 15–20% |
| Implement and manage storage | 15–20% |
| Monitor and maintain Azure resources | 10–15% |
Prerequisites: No formal prerequisites, but 6+ months of hands-on Azure experience is strongly recommended. Familiarity with PowerShell, Azure CLI, Azure Portal, and ARM/Bicep templates expected.
What is Azure Resource Manager (ARM) and how does it underpin everything in Azure?
Azure Resource Manager (ARM):
→ The management layer for ALL Azure resources
→ Every action (Portal, CLI, PowerShell, REST API, Terraform) goes through ARM
→ Provides: authentication, authorisation, tagging, locking, templates
ARM concepts:
Subscription: billing and access boundary — contains resource groups
Resource Group: logical container for related Azure resources
all resources in an RG share the same lifecycle
Resource: individual Azure service (VM, storage account, VNet)
Resource Provider: the service that supplies a resource type
Microsoft.Compute (VMs), Microsoft.Network (VNets),
Microsoft.Storage (storage accounts)
ARM operations:
Control plane: manage resources (create, delete, configure) via ARM
Data plane: access resource data (read a blob, send a queue message)
controlled by resource-level access (keys, RBAC data roles)
Resource Group best practices:
→ Group by lifecycle: resources deleted together go in same RG
→ Group by environment: Production-RG, Dev-RG, Test-RG
→ Region: RG has a region (for metadata), but can contain resources
from any region
→ RBAC applied at RG level applies to all resources in the RG
→ Delete RG = delete ALL resources in it (use locks to protect)
What are Azure Management Groups and how do they enable governance at scale?
Management Group hierarchy:
Root Management Group (tenant level)
└── Management Group (e.g., "Production")
└── Management Group (e.g., "EMEA")
└── Subscription A
└── Subscription B
└── Management Group (e.g., "APAC")
└── Subscription C
Why Management Groups:
→ Apply Azure Policy across multiple subscriptions at once
→ Apply RBAC role assignments across multiple subscriptions
→ Manage hundreds of subscriptions from a single hierarchy
→ Up to 6 levels of hierarchy (not counting root)
Example governance with Management Groups:
Root
├── Corporate (MG) — Audit policy applied here (all subs inherit)
│ ├── Production (MG) — No public IP policy applied here
│ │ ├── Prod-UK subscription
│ │ └── Prod-US subscription
│ └── Non-Production (MG) — Dev/test allowed policies
│ ├── Dev subscription
│ └── Test subscription
└── Sandbox (MG) — Allow all (experimental)
└── Sandbox subscription
What is Azure Policy and how does it work?
Azure Policy enforces organisational standards and assesses compliance
at scale across subscriptions and resource groups.
Policy components:
Policy definition: the rule (e.g., "VMs must use managed disks")
Initiative: a collection of policy definitions (e.g., "CIS Azure benchmark")
Assignment: apply policy/initiative to a scope (MG/subscription/RG)
Compliance: dashboard showing compliant vs non-compliant resources
Remediation: fix non-compliant resources (deployIfNotExists,
modify effects)
Policy effects (what happens when condition is met):
Deny: block the resource creation/update (strongest)
Audit: allow but mark as non-compliant (reporting only)
Append: add fields to the resource (e.g., required tags)
Modify: change resource properties (e.g., disable public IP)
DeployIfNotExists: deploy a related resource if missing (e.g., deploy
diagnostics extension if not present)
AuditIfNotExists: audit if a related resource is missing
Common built-in policies:
→ "Allowed locations" — restrict deployments to specific regions
→ "Require a tag and its value" — enforce tagging standards
→ "Allowed virtual machine SKUs" — restrict VM sizes
→ "VMs should use managed disks" — no unmanaged disk VMs
→ "Secure transfer to storage accounts should be enabled"
→ "Allowed resource types" — whitelist only approved resource types
Policy vs RBAC:
Azure Policy: WHAT can be deployed (resource properties/config)
Azure RBAC: WHO can deploy/manage resources (identity-based)
Both needed: RBAC controls access, Policy controls configuration
What are Azure resource locks and when do you use them?
Resource locks prevent accidental deletion or modification of
critical Azure resources — even by subscription owners.
Lock types:
CanNotDelete (Delete lock):
→ Users can read and modify the resource but CANNOT delete it
→ Most common lock — protect production resources from accidental deletion
ReadOnly:
→ Users can read the resource but CANNOT modify OR delete it
→ Equivalent to applying Reader role to everyone
→ Use with caution — may break operations that need to modify resource
(e.g., restarting a VM requires a write operation)
Lock scope (inherited by children):
Subscription → Resource Group → Resource
Lock on RG protects all resources in the RG
Lock hierarchy:
If RG has Delete lock: resources in RG cannot be deleted
If Resource has Delete lock: that specific resource cannot be deleted
Most restrictive lock wins
Commands:
# Apply lock:
az lock create --name "ProductionLock" --resource-group "Prod-RG" \
--lock-type CanNotDelete
# PowerShell:
New-AzResourceLock -LockName "ProductionLock" -LockLevel CanNotDelete \
-ResourceGroupName "Prod-RG"
# Remove lock (must remove before deleting resource):
az lock delete --name "ProductionLock" --resource-group "Prod-RG"
Tip: Resource locks are the last safety net for production resources. Always apply a CanNotDelete lock to production resource groups. Locks must be removed before deleting — this intentional friction prevents accidents.
What is Azure Cost Management and how do you control spending?
Azure Cost Management + Billing:
→ Monitor, allocate, and optimise Azure spending
→ Access: Azure Portal → Cost Management + Billing
Key features:
Cost analysis: visualise spending by service, resource, tag, location
Budgets: set spending thresholds with email/action group alerts
Cost alerts: alert when budget reaches 50%, 75%, 90%, 100%
Advisor: AI recommendations to reduce cost, improve security,
performance, reliability
Cost control tools:
Tags: tag resources (Environment=Production, Team=Finance)
filter Cost Analysis by tag to see team-level spend
Reservations: commit to 1 or 3 years → 40-72% discount vs pay-as-you-go
Savings Plans: flexible hourly commitment → 11-65% discount
Azure Hybrid Benefit: use existing Windows Server/SQL Server licences
on Azure VMs → save up to 40%
Dev/Test pricing: reduced rates for non-production subscriptions
Auto-shutdown: schedule VMs to shut down outside business hours
Budget alert example:
Budget: £10,000/month for Production subscription
Alerts:
→ 80% (£8,000) reached: email Finance team
→ 100% (£10,000) reached: email Finance + CTO + trigger action group
→ Action group: call Azure Automation runbook to tag new VMs for review
2. Implement and Manage Storage (15–20%)
What are Azure Storage account types and redundancy options?
Storage account types:
Standard General Purpose v2 (GPv2):
→ Supports: Blob, File, Queue, Table storage
→ Performance: Standard (HDD-backed)
→ Use for: most workloads, backup, archive
Premium Block Blobs:
→ Supports: Block blobs and append blobs only
→ Performance: SSD-backed, very low latency
→ Use for: high-throughput scenarios, AI/ML data pipelines
Premium File Shares:
→ Supports: Azure Files only
→ Performance: SSD-backed
→ Use for: high-performance file shares (databases, FSLogix profiles)
Premium Page Blobs:
→ Supports: Page blobs only (used for VM disks)
→ Performance: SSD-backed
→ Use for: IaaS VM managed disks (usually auto-selected)
Redundancy options (data durability):
LRS (Locally Redundant Storage):
→ 3 copies within a single datacenter (single availability zone)
→ 11 nines durability (99.999999999%)
→ Cheapest — no protection against datacenter failure
ZRS (Zone-Redundant Storage):
→ 3 copies across 3 availability zones in the same region
→ 12 nines durability
→ Survives datacenter failure
→ Recommended for most production workloads
GRS (Geo-Redundant Storage):
→ LRS in primary region + LRS in secondary region (100s of miles away)
→ Secondary is read-accessible only after Microsoft initiates failover
→ 16 nines durability
GZRS (Geo-Zone-Redundant Storage):
→ ZRS in primary + LRS in secondary
→ Highest durability (16 nines)
→ Most expensive — use for critical data
RA-GRS / RA-GZRS:
→ Adds read access to secondary region at all times (not just after failover)
→ Use when: need to read from secondary for DR or read scale-out
What are the Azure Blob storage access tiers?
Access tiers (balance cost vs access frequency):
Hot:
→ Highest storage cost, lowest access cost
→ Use for: frequently accessed data (active databases, app files)
→ Minimum storage duration: none
Cool:
→ Lower storage cost than Hot, higher access cost
→ Use for: infrequently accessed data (30-day minimum recommended)
→ Minimum storage duration: 30 days (early deletion fees apply)
Cold (newer tier):
→ Lower storage cost than Cool, higher access cost than Cool
→ Use for: rarely accessed data, kept at least 90 days
→ Minimum storage duration: 90 days
Archive:
→ Lowest storage cost (~90% cheaper than Hot)
→ OFFLINE — data is not immediately accessible
→ Rehydration required before reading:
Standard rehydration: up to 15 hours
High priority rehydration: under 1 hour (higher cost)
→ Use for: long-term backup, compliance data, rarely accessed archives
→ Minimum storage duration: 180 days
Lifecycle Management policies:
→ Automatically transition blobs between tiers based on age/conditions
→ Example:
After 30 days → move from Hot to Cool
After 90 days → move from Cool to Cold
After 365 days → move to Archive
After 2555 days (7 years) → delete
# PowerShell - set blob access tier:
Set-AzStorageBlobTier -Container "mycontainer" -Blob "myfile.pdf" \
-Tier Archive -Context $storageContext
What is Azure Files and how does it compare to Blob storage?
Azure Files:
→ Fully managed cloud file shares accessible via SMB (3.0/2.1) and NFS
→ Can be mounted as a network drive on Windows, Linux, macOS
→ Compatible with on-prem file server applications
→ Use cases: lift-and-shift of on-prem file servers, shared config files,
FSLogix profile containers (Azure Virtual Desktop)
Azure File Sync:
→ Caches Azure file shares on Windows Server on-premises
→ On-prem server has frequently used files locally (fast access)
→ Infrequently used files stored only in Azure (cloud tiering)
→ Supports multiple on-prem servers all syncing to the same Azure share
→ Use for: hybrid file server scenarios, replacing DFS
Azure Blob vs Azure Files:
Blob Storage Azure Files
Access protocol: HTTP/HTTPS REST SMB, NFS, REST
Mount as drive: No Yes (SMB/NFS)
Use case: Unstructured data, File shares, home dirs
media, backup app config, lift-and-shift
Max file size: 190.7 TB per blob 4 TB per file (Standard)
Performance: Scales massively Depends on share tier
POSIX permissions: No Yes (NFS shares)
What are Shared Access Signatures (SAS) and when do you use them?
SAS (Shared Access Signature):
→ A URI that grants limited, time-bound access to Azure Storage resources
→ No need to share the storage account key
→ Granular control: which resource, what operations, how long, from where
SAS types:
Account SAS: access to multiple storage services (Blob + Queue + Table)
Service SAS: access to a specific service (Blob only)
User delegation SAS: signed with Entra ID user credentials (most secure)
no storage account key needed — audit trail in Entra logs
SAS parameters:
sv: service version
st: start time
se: expiry time (always set — limit exposure window)
sr: resource type (b=blob, c=container, f=file, q=queue)
sp: permissions (r=read, w=write, d=delete, l=list, c=create)
sig: cryptographic signature
Example SAS URI:
https://mystorageaccount.blob.core.windows.net/mycontainer/file.pdf
?sv=2023-11-03&st=2025-01-01T00%3A00%3A00Z
&se=2025-01-31T23%3A59%3A00Z&sr=b&sp=r
&sig=xxxxxxxxxxxxxxxxxxxxxxxxxxx
Stored Access Policy:
→ Define SAS constraints on the container/queue/table (server-side)
→ Allows revoking a SAS by modifying or deleting the stored access policy
→ Without a stored access policy, a SAS cannot be revoked before expiry
Access control hierarchy (most to least preferred):
1. Entra ID RBAC (Storage Blob Data Reader/Contributor) — no keys needed
2. User delegation SAS — Entra-backed, audited
3. Service/Account SAS — key-based, harder to revoke
4. Storage account key — full access, treat like a root password
3. Deploy and Manage Azure Compute (20–25%)
What are Azure Virtual Machines and key configuration concepts?
VM creation key decisions:
Region: where the VM runs — affects latency, compliance, availability
Size/SKU: CPU, RAM, temp disk, max NICs, max data disks
B-series: burstable (dev/test)
D-series: general purpose (most workloads)
E-series: memory optimised (databases, in-memory analytics)
F-series: compute optimised (batch, gaming)
N-series: GPU (AI/ML training, graphics)
Image: OS (Windows Server 2022, Ubuntu 22.04, RHEL, custom)
Disk: OS disk (required) + data disks (optional)
Authentication: SSH key pair (Linux) or password (Windows — avoid for prod)
Networking: VNet, subnet, NSG, public IP (avoid if possible)
Availability: Availability Zones, Availability Sets, VMSS
VM disk types:
Standard HDD: cheapest, high latency — backup/archive workloads
Standard SSD: lower latency than HDD — dev/test, light workloads
Premium SSD: SLA-backed, low latency — production workloads
Premium SSD v2: more granular performance control
Ultra Disk: highest IOPS (up to 400,000) — SAP HANA, top-tier DBs
Managed disks:
→ Azure manages storage account for disk — no self-management needed
→ Snapshots: point-in-time copy of a managed disk
→ Images: generalised snapshot used to create new VMs
→ Disk encryption: Azure Disk Encryption (BitLocker/dm-crypt)
or Server-Side Encryption with customer-managed keys
VM extensions:
Custom Script Extension: run scripts on VM after deployment
Azure Monitor Agent: collect metrics/logs to Azure Monitor
DSC Extension: apply PowerShell DSC configurations
Diagnostics Extension: send guest OS metrics to Azure Monitor
Microsoft Antimalware: antivirus for Windows VMs
What are Availability Sets and Availability Zones?
Availability Sets:
→ Protect VMs from hardware failures WITHIN a single datacenter
→ Fault domains: separate physical hardware (rack, power, network)
→ Update domains: separate maintenance windows
→ VMs spread across fault domains (up to 3) + update domains (up to 20)
→ SLA: 99.95% uptime (for 2+ VMs in an availability set)
→ Does NOT protect against datacenter/zone failure
→ Legacy approach — prefer Availability Zones for new deployments
Availability Zones:
→ Physically separate datacenters within the same Azure region
→ Each zone has independent power, cooling, and networking
→ Deploy VM in Zone 1, Zone 2, Zone 3 = protected from datacenter failure
→ SLA: 99.99% uptime (for 2+ VMs in different zones)
→ Not all regions have 3 AZs — check region capabilities first
→ Zone-resilient services: ZRS storage, zone-redundant App Gateways
VM Scale Sets (VMSS):
→ Deploy and manage a group of identical, load-balanced VMs
→ Auto-scale: add/remove VMs based on CPU, memory, schedule, or custom metrics
→ Uniform mode: all VMs use the same image and size
→ Flexible mode: mix of VMs (different sizes/images) — recommended
→ Works with Azure Load Balancer or Application Gateway
Decision:
Single datacenter protection → Availability Set (legacy)
Datacenter/zone protection → Availability Zones (recommended)
Auto-scaling web tier → VM Scale Sets in Availability Zones
Highest HA (99.99% SLA) → 2+ VMs across 2+ Availability Zones
What are Azure App Service and its key features?
Azure App Service:
→ Fully managed PaaS for hosting web apps, REST APIs, mobile backends
→ Supports: .NET, Java, Node.js, Python, PHP, Ruby, custom containers
→ No server management — Azure handles OS patching, scaling, load balancing
App Service Plan:
→ The compute resources underlying the App Service
→ Defines: region, number of VM instances, size of VM instances, pricing tier
Pricing tiers:
Free/Shared: no SLA, shared infrastructure, limited — dev only
Basic: dedicated compute, manual scale, no slots
Standard: auto-scale, deployment slots (staging), custom domains/SSL
Premium: more instances, more slots, VNet integration, higher performance
Isolated: App Service Environment (ASE) — private, VNet-injected, highest isolation
Key features:
Deployment slots:
→ Separate environments (staging, QA) on the same App Service
→ Swap slots: zero-downtime deployment
Staging → Production swap: production gets staging's code instantly
If deployment fails: swap back (rollback in seconds)
→ Standard tier: 5 slots, Premium: 20 slots
Auto-scale:
→ Scale out (add instances): based on CPU %, memory, HTTP queue length
→ Scale in (remove instances): when metrics drop
→ Schedule-based: scale up at 8am, scale down at 6pm
VNet Integration:
→ App Service can make outbound calls to resources in a VNet
→ Required for: private SQL, private storage, on-prem via VPN/ExpressRoute
→ Does NOT allow inbound traffic from VNet (use Private Endpoint for that)
Deployment methods:
→ GitHub Actions / Azure DevOps CI/CD pipeline (recommended)
→ Local Git deployment
→ ZIP deploy (az webapp deployment)
→ Container deployment (ACR → App Service)
What is Azure Kubernetes Service (AKS) and key admin concepts?
AKS: managed Kubernetes cluster on Azure
→ Microsoft manages: control plane (API server, etcd, scheduler)
→ You manage: worker nodes (node pools), workloads
Key concepts:
Node pool: group of VMs with same size/config — can have multiple pools
System pool: runs kube-system pods (required)
User pool: runs application workloads
Node size: choose VM SKU for the node pool
Autoscaler: cluster autoscaler — adds/removes nodes based on pod demand
HPA: Horizontal Pod Autoscaler — adds/removes pods based on CPU/memory
Networking modes:
Kubenet: nodes get IP from Azure VNet, pods get IP from overlay network
pods use NAT to communicate outside cluster
Azure CNI: pods get IP directly from VNet subnet
enables direct pod-to-pod communication from VNet
requires more IP addresses (plan subnet size carefully)
AKS + Entra ID integration:
→ RBAC for Kubernetes: use Entra ID groups as Kubernetes RBAC subjects
→ Managed Identity: AKS uses Managed Identity to pull from ACR, access Key Vault
→ Workload Identity: pods authenticate to Azure services using Entra ID
AKS storage:
Azure Disk: ReadWriteOnce — one pod on one node
Azure Files: ReadWriteMany — multiple pods on multiple nodes (SMB)
Azure Blob: ReadWriteMany (via BlobFuse) — large unstructured data
4. Implement and Manage Virtual Networking (15–20%)
What are Virtual Networks (VNets) and subnets in Azure?
Virtual Network (VNet):
→ Isolated, private network in Azure — your own address space
→ Resources in a VNet communicate privately by default
→ Define address space: CIDR notation (e.g., 10.0.0.0/16)
→ VNet scoped to a single region (cannot span regions)
Subnets:
→ Subdivide VNet address space into smaller ranges
→ Each resource goes in a specific subnet
→ Reserved addresses per subnet: first 4 + last 1 = 5 IPs unusable
e.g., 10.0.0.0/24 = 256 addresses, 251 usable
Subnet design example:
VNet: 10.0.0.0/16 (65,536 addresses)
├── WebSubnet: 10.0.1.0/24 (251 usable — web servers/App Service)
├── AppSubnet: 10.0.2.0/24 (251 usable — application tier)
├── DataSubnet: 10.0.3.0/24 (251 usable — databases)
├── GatewaySubnet: 10.0.4.0/27 (27 usable — VPN/ExpressRoute GW)
└── AzureBastionSubnet: 10.0.5.0/26 (59 usable — Bastion host)
(must be named exactly "AzureBastionSubnet")
VNet peering:
→ Connect two VNets privately through Microsoft backbone (no internet)
→ Can be same region (VNet Peering) or different regions (Global VNet Peering)
→ NOT transitive: A↔B, B↔C does NOT mean A↔C
Solution: hub-spoke topology or Azure Virtual WAN
→ Peering is non-overlapping: address spaces must not overlap
Service endpoints:
→ Extend VNet identity to Azure PaaS services (Storage, SQL, Key Vault)
→ Traffic stays on Microsoft backbone (not internet)
→ PaaS service can be locked to your VNet's traffic only
Private Endpoints:
→ Assign a private IP from your VNet to a PaaS service
→ The PaaS service gets a private IP in your VNet
→ Disable public internet access on the PaaS resource
→ Access via Private DNS Zone: storage.privatelink.blob.core.windows.net
→ More secure than Service Endpoints — completely private
→ Recommended over Service Endpoints for production
What are Network Security Groups (NSGs) and how do they work?
NSG (Network Security Group):
→ Stateful firewall controlling inbound/outbound traffic
→ Applied to: subnet (all resources in subnet) or NIC (specific VM)
NSG rule components:
Priority: 100-4096 (lower number = higher priority)
Name: descriptive name
Source: IP, CIDR range, Service Tag, or Application Security Group
Source port: * or specific port/range
Destination: IP, CIDR range, Service Tag, or Application Security Group
Destination port: 80, 443, 3389, * or range
Protocol: TCP, UDP, ICMP, or Any
Action: Allow or Deny
Default rules (cannot be deleted, lowest priority):
Inbound: AllowVNetInbound (65000), AllowAzureLoadBalancerInbound (65001),
DenyAllInbound (65500)
Outbound: AllowVNetOutbound (65000), AllowInternetOutbound (65001),
DenyAllOutbound (65500)
Service Tags (pre-defined IP ranges maintained by Microsoft):
Internet: all public IP addresses
VirtualNetwork: VNet address space + peered VNets + on-prem
AzureLoadBalancer: Azure Load Balancer's health probe IP (168.63.129.16)
Storage: Azure Storage IP ranges
Sql: Azure SQL IP ranges
AppService: App Service IP ranges
Application Security Groups (ASG):
→ Group VMs logically (WebServers, AppServers, DBServers)
→ Write NSG rules referencing ASG instead of IP addresses
→ As VMs are added to ASG, rules apply automatically — no IP updates needed
NSG best practices:
→ Apply at subnet level (not NIC) for simpler management
→ Never allow RDP (3389) or SSH (22) from Internet — use Azure Bastion
→ Use Service Tags instead of hard-coded IP ranges
→ Use ASGs for application-tier segmentation
What are Azure Load Balancer and Application Gateway?
Azure Load Balancer (Layer 4 — TCP/UDP):
→ Distributes inbound flows across backend pool VMs
→ Layer 4 (transport layer) — balances based on IP + port
→ No SSL termination, no URL routing, no WAF
Types:
Public LB: balances internet-facing traffic
Internal LB: balances traffic within a VNet (no public IP)
SKUs:
Basic: limited features, no SLA, no AZ support — avoid for production
Standard: zone-redundant, secure by default, SLA 99.99%, HTTPS health probes
Standard LB components:
Frontend IP: public or private IP that receives traffic
Backend pool: VMs or VMSS that receive distributed traffic
Health probe: TCP, HTTP, or HTTPS check — removes unhealthy VMs
Load balancing rule: frontend port → backend port mapping
Inbound NAT rule: direct specific port to specific VM (e.g., RDP to VM1)
Application Gateway (Layer 7 — HTTP/HTTPS):
→ Web traffic load balancer with SSL termination, URL routing, WAF
→ Layer 7 (application layer) — makes routing decisions based on URL/headers
Key features:
SSL termination: decrypts HTTPS at the gateway — backend can use HTTP
URL path routing: /images → ImageServers pool, /api → APIServers pool
Host-based routing: store.contoso.com → StoreFrontend, api.contoso.com → APIBackend
WAF (Web App Firewall): protects against OWASP Top 10 (SQLi, XSS, etc.)
Autoscaling: Application Gateway v2 scales automatically
Cookie-based affinity: stick user sessions to same backend server
Decision:
Non-HTTP traffic (TCP/UDP) → Azure Load Balancer
HTTP/HTTPS with URL routing/WAF → Application Gateway
Global HTTP routing (multi-region) → Azure Front Door
DNS-based routing (any protocol) → Azure Traffic Manager
What is Azure VPN Gateway and ExpressRoute?
Azure VPN Gateway:
→ Connects on-premises networks to Azure via encrypted VPN tunnel (IPsec/IKE)
→ Requires GatewaySubnet in the VNet
VPN types:
Site-to-Site (S2S):
→ Persistent IPsec tunnel between on-prem VPN device and Azure
→ Traffic encrypted over public internet
→ Bandwidth: up to 10 Gbps (depends on SKU)
→ Use for: branch office connectivity, hybrid workloads
Point-to-Site (P2S):
→ Individual client devices connect to Azure VNet via VPN
→ Supports: OpenVPN, SSTP, IKEv2
→ Use for: remote workers connecting to Azure resources
VPN Gateway SKUs:
Basic: dev/test, limited bandwidth, no zone-redundancy
VpnGw1-5: production, varying bandwidth (650 Mbps to 10 Gbps)
VpnGw1-5AZ: zone-redundant (recommended for production)
Azure ExpressRoute:
→ Private, dedicated connection from on-prem to Azure (NOT over internet)
→ Through a connectivity provider (BT, Equinix, Megaport)
→ Bandwidth: 50 Mbps to 100 Gbps
→ Latency: consistent, predictable (dedicated circuit)
→ SLA: 99.95% uptime
ExpressRoute vs VPN:
VPN Gateway ExpressRoute
Connectivity: Over internet Private (dedicated)
Encryption: Yes (IPsec) No (provider responsibility)
Bandwidth: Up to 10 Gbps Up to 100 Gbps
Latency: Variable Consistent/low
Cost: Lower Higher
Use for: Dev/test, smaller Production, regulated,
workloads high bandwidth needs
5. Monitor and Maintain Azure Resources (10–15%)
What is Azure Monitor and what does it collect?
Azure Monitor:
→ Comprehensive monitoring solution for Azure resources
→ Collects: metrics, logs, distributed traces, changes
Data types:
Metrics: numerical time-series data (CPU %, disk IOPS, request count)
stored 93 days in Azure Monitor Metrics store
sub-minute granularity available
Logs: structured/unstructured text data (activity logs, resource logs)
stored in Log Analytics workspace (configurable retention)
queried with Kusto Query Language (KQL)
Traces: distributed transaction traces (App Insights)
Changes: resource configuration changes (Change Analysis)
Azure Monitor sources:
Activity Log: ALL control plane operations in a subscription
(who created/deleted/modified which resource)
90 days retention, export to Log Analytics for longer
Resource logs: diagnostic logs from Azure resources
(VM guest OS, Storage analytics, SQL query logs)
not collected by default — must enable Diagnostic Settings
Guest OS metrics: CPU, memory, disk from inside the VM
requires Azure Monitor Agent installed on VM
Application logs: custom app telemetry via Application Insights SDK
Azure Monitor Agent (AMA):
→ Replaces deprecated Log Analytics agent (MMA) and Diagnostics extension
→ Collects: Windows Event Logs, Linux Syslog, performance counters
→ Configured via Data Collection Rules (DCRs) — flexible, per-VM rules
→ Deploy via Azure Policy for all VMs or Arc-connected servers
What is Log Analytics and KQL (Kusto Query Language)?
Log Analytics workspace:
→ Central repository for all Azure Monitor log data
→ All resource diagnostic logs, VM logs, Activity Log sent here
→ Queried using KQL
→ Retention: 30 days default, configurable to 2 years (longer via archive)
KQL basics (most common exam patterns):
// Get all VMs that had high CPU in last 24 hours:
Perf
| where TimeGenerated > ago(24h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where CounterValue > 90
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| order by avg_CounterValue desc
// Find all failed login attempts:
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4625 // Failed logon
| summarize count() by Account, Computer
| order by count_ desc
// Storage account operations:
StorageBlobLogs
| where TimeGenerated > ago(1d)
| where OperationName == "DeleteBlob"
| project TimeGenerated, CallerIpAddress, Uri, StatusCode
// Join two tables:
AzureActivity
| where OperationNameValue == "Microsoft.Compute/virtualMachines/delete"
| join kind=inner (
Heartbeat | summarize LastSeen=max(TimeGenerated) by Computer
) on $left.ResourceGroup == $right.Computer
KQL operators:
where: filter rows
project: select columns
summarize: aggregate (count, sum, avg, max, min)
order by: sort results
join: combine tables
extend: add calculated columns
ago(): time ago (ago(1h), ago(7d), ago(30d))
bin(): group time into buckets (bin(TimeGenerated, 1h))
What is Azure Backup and how does it work?
Azure Backup:
→ Cloud-based backup service — no on-prem backup infrastructure needed
→ Central management: Recovery Services Vault (classic) or Backup Vault (new)
What can be backed up:
Azure VMs: full VM backup (OS + data disks) — agent-less
Azure SQL in VM: workload-aware backup (transaction log backup)
Azure Files: file share snapshots
Azure Blobs: operational backup (point-in-time restore)
On-prem Windows: via MARS agent → files, folders, system state
On-prem servers: via Azure Backup Server (MABS) or DPM
Azure VM Backup:
→ Snapshot → transferred to Recovery Services Vault
→ Crash-consistent backup: VM point-in-time consistent
→ Application-consistent: quiesce app before snapshot (VSS on Windows)
→ Retention: up to 9,999 recovery points
→ Restore options: Create new VM | Replace existing disk | Restore files
Recovery Services Vault key settings:
Backup policy: frequency (daily/weekly) + retention schedule
(daily: 7-180 days, weekly: up to 5 years)
Soft delete: deleted backup data retained 14 days — protects against
ransomware deletion attacks (cannot be disabled in 14-day window)
Cross-region restore: restore to paired region for DR testing
Immutable vault: lock vault to prevent deletion of backup data
Azure Site Recovery (ASR):
→ Disaster Recovery — replicate VMs to another Azure region
→ Continuous replication → near-zero RPO (minutes)
→ Failover: redirect traffic to replica VMs in DR region
→ Failback: return to primary region after incident resolved
→ Test failover: validate DR plan without impacting production
RTO vs RPO:
RTO (Recovery Time Objective): how long before service is restored?
RPO (Recovery Point Objective): how much data loss is acceptable?
ASR typically achieves: RPO < 15 minutes, RTO < 2 hours
6. ARM Templates, Bicep & Automation
What are ARM templates and Bicep?
ARM Templates (JSON):
→ Declarative infrastructure-as-code for Azure resources
→ Define WHAT you want — Azure figures out HOW to create it
→ Idempotent: apply same template multiple times → same result
ARM template structure:
{
"$schema": "https://schema.management.azure.com/schemas/...",
"contentVersion": "1.0.0.0",
"parameters": { // inputs to the template
"vmName": { "type": "string", "defaultValue": "myVM" },
"vmSize": { "type": "string", "allowedValues": ["Standard_D2s_v3"] }
},
"variables": { // computed values
"storageAccountName": "[concat('storage', uniqueString(resourceGroup().id))]"
},
"resources": [ // resources to deploy
{
"type": "Microsoft.Compute/virtualMachines",
"apiVersion": "2023-03-01",
"name": "[parameters('vmName')]",
"location": "[resourceGroup().location]",
"properties": { ... }
}
],
"outputs": { // values to return after deployment
"vmPublicIp": { "type": "string", "value": "[reference(...).ipAddress]" }
}
}
Bicep (recommended over JSON ARM):
→ Domain-specific language that compiles to ARM JSON
→ Cleaner syntax, better IntelliSense, type safety
→ Transpiles to ARM JSON — same capabilities, easier to write
// Bicep equivalent of above:
param vmName string = 'myVM'
param vmSize string = 'Standard_D2s_v3'
var storageAccountName = 'storage${uniqueString(resourceGroup().id)}'
resource vm 'Microsoft.Compute/virtualMachines@2023-03-01' = {
name: vmName
location: resourceGroup().location
properties: {
hardwareProfile: {
vmSize: vmSize
}
...
}
}
output vmId string = vm.id
Deploy ARM/Bicep:
# Azure CLI:
az deployment group create \
--resource-group "MyRG" \
--template-file "main.bicep" \
--parameters vmName="MyVM"
# PowerShell:
New-AzResourceGroupDeployment \
-ResourceGroupName "MyRG" \
-TemplateFile "main.bicep" \
-vmName "MyVM"
What is Azure Automation and what are Runbooks?
Azure Automation:
→ Cloud-based automation service for process automation, configuration,
update management, and desired state configuration
Components:
Runbooks: PowerShell or Python scripts run automatically
Types: PowerShell, PowerShell Workflow, Python, Graphical
Schedules: trigger runbooks on a time-based schedule
Webhooks: trigger runbooks via HTTP POST (from alerts, Logic Apps)
Managed Identity: authenticate runbooks to Azure without stored credentials
Hybrid Runbook Worker: run runbooks on on-prem servers (not just Azure)
Update Management:
→ Assess and deploy OS updates to Azure VMs and on-prem servers
→ Scheduled maintenance windows for controlled patching
→ Compliance reporting: which VMs are missing updates
Desired State Configuration (DSC):
→ Ensure VMs maintain a specific configuration (PowerShell DSC)
→ E.g., ensure IIS is always installed, specific registry keys are set
→ Azure Automation pulls VM configuration and remediates drift
Common Runbook scenarios:
→ Start/stop VMs on schedule (cost saving)
→ Clean up old snapshots/resources
→ Auto-remediate Azure Policy non-compliance
→ Scale App Service Plan based on business hours
→ Rotate storage account keys and update Key Vault
# PowerShell Runbook example — stop all VMs in a resource group:
Connect-AzAccount -Identity # Use Managed Identity — no credentials
$vms = Get-AzVM -ResourceGroupName "Dev-RG" -Status |
Where-Object { $_.PowerState -eq "VM running" }
foreach ($vm in $vms) {
Stop-AzVM -Name $vm.Name -ResourceGroupName "Dev-RG" -Force
Write-Output "Stopped: $($vm.Name)"
}
7. Scenario-Based Questions
Scenario: Design a highly available web application architecture on Azure.
Requirement: 99.99% SLA, auto-scaling, DDoS protection, private backend
Architecture:
Internet
↓
[Azure DDoS Protection Standard] — protects public endpoint
↓
[Application Gateway v2 + WAF] — Zone-redundant, SSL termination,
↓ URL routing, OWASP protection
[VM Scale Set — WebTier] — Windows/Linux web servers
3 zones × auto-scale (2-20 instances)
Standard LB (internal) for health distribution
↓
[Internal Load Balancer] — distributes to app tier
↓
[VM Scale Set — AppTier] — application servers
3 zones × auto-scale (2-10 instances)
↓
[Azure SQL — Business Critical] — zone-redundant, auto-failover group
+ Azure Cache for Redis — session cache, query cache
Networking:
VNet: 10.0.0.0/16
WebSubnet: 10.0.1.0/24 (App Gateway + Web tier)
AppSubnet: 10.0.2.0/24 (App tier)
DataSubnet: 10.0.3.0/24 (SQL private endpoint)
GWSubnet: 10.0.4.0/27 (for future VPN/ExpressRoute)
Security:
NSG on WebSubnet: allow 80/443 from Internet, deny all else inbound
NSG on AppSubnet: allow only from WebSubnet (no direct internet)
NSG on DataSubnet: allow only from AppSubnet on SQL port 1433
Private Endpoint: Azure SQL accessed via private IP in DataSubnet
Bastion: Azure Bastion for admin RDP/SSH (no public IP on VMs)
Key Vault: store connection strings, API keys (no secrets in config)
Managed Identity: VMs access Key Vault and Storage — no credentials in code
Scenario: A VM cannot connect to Azure SQL Database. How do you troubleshoot?
-
Check NSG rules: VM's subnet NSG and NIC NSG — is outbound port 1433 allowed to the SQL server IP or Service Tag
Sql? -
Check SQL firewall: Azure SQL → Networking → is the VM's subnet added as a VNet rule (Service Endpoint) or is there a private endpoint? Is public access disabled?
-
Check routing: is there a custom UDR (User Defined Route) redirecting SQL traffic through a firewall/NVA? Ensure the route to SQL doesn't black-hole traffic.
-
Test connectivity from VM:
# Test TCP connection on port 1433: Test-NetConnection -ComputerName "myserver.database.windows.net" -Port 1433 # Check for: TcpTestSucceeded: True -
Check Private DNS: if using Private Endpoint, the VM must resolve
myserver.database.windows.netto the private IP (10.x.x.x), not the public IP. Check if the VNet is linked to the Private DNS Zone. -
Check SQL authentication: connection string correct? Entra ID auth vs SQL auth? User has permissions on the database?
-
Network Watcher — IP Flow Verify: Network Watcher → IP Flow Verify — simulate traffic from VM to SQL and see which NSG rule is blocking.
Scenario: Reduce Azure costs for a development environment running 24/7.
-
Auto-shutdown: enable VM auto-shutdown at 6pm (or use Azure Automation runbook) → restart at 8am. Saves ~58% of VM compute cost.
-
Spot VMs for dev workloads: use Azure Spot VMs for dev/test (up to 90% discount) — acceptable for non-critical workloads that can tolerate interruption.
-
B-series VMs: switch dev VMs to B-series (burstable) — much cheaper than D-series for low-average-load workloads.
-
Dev/Test pricing: activate Dev/Test subscription pricing for non-production workloads — discounted Windows Server and SQL licensing.
-
Azure Hybrid Benefit: if you have existing Windows Server or SQL Server licences with Software Assurance → apply Hybrid Benefit on VMs to eliminate licence cost.
-
Reserved Instances: for always-on resources (build server, dev database) → 1-year reservation saves ~40%.
-
Right-size: Azure Advisor Cost recommendations → identify over-provisioned VMs with <5% average CPU utilisation → downsize.
-
Storage lifecycle policies: dev storage → Cool tier after 30 days, Archive after 90 days for old test data.
-
Delete unused resources: set Azure Policy to audit/alert on resources without required tags. Runbook: delete resources tagged
Environment=Devolder than 30 days with no activity.
Scenario: How do you implement a backup strategy for Azure VMs in a regulated industry?
Requirements: 7-year retention, immutable backup, cross-region DR, audit trail
Implementation:
1. Recovery Services Vault per region (Primary + Secondary)
2. Backup policy:
Daily backup: retained 30 days
Weekly backup: retained 1 year (every Sunday)
Monthly backup: retained 7 years (first Sunday of month)
Yearly backup: retained 7 years (January 1st)
3. Soft delete: enabled (default 14 days)
Extended soft delete: 180 days for regulated environments
4. Immutable vault (locked):
→ Once locked, backup data cannot be deleted or modified
→ Protects against ransomware that tries to delete backups
→ Enable: Vault → Properties → Immutability → Locked
5. Cross-region restore: enabled on vault
→ Restore to secondary region for DR testing
6. Azure Monitor + Log Analytics:
→ Diagnostic settings on vault → Log Analytics
→ Alert on: backup job failures, unexpected deletions
→ KQL query: AzureDiagnostics | where Category == "AzureBackupReport"
| where OperationName == "DeleteBackupData"
7. Azure Backup Reports (Power BI dashboard):
→ Track compliance: which VMs are not backed up?
→ Report on backup storage consumption
→ Evidence for compliance audits
8. Azure Policy:
→ "Azure Backup should be enabled for Virtual Machines" initiative
→ Automatically enrol new VMs in backup policy
8. Cheat Sheet — Quick Reference
AZ-104 Exam Domain Weights
Domain Weight Priority
Manage Azure identities and governance 20-25% ⭐⭐⭐⭐⭐
Deploy and manage Azure compute resources 20-25% ⭐⭐⭐⭐⭐
Implement and manage virtual networking 15-20% ⭐⭐⭐⭐
Implement and manage storage 15-20% ⭐⭐⭐⭐
Monitor and maintain Azure resources 10-15% ⭐⭐⭐
VM Availability Quick Reference
Protect against: Use:
Hardware failure in rack Availability Set (2+ fault domains)
Datacenter failure Availability Zone (2+ VMs in 2+ zones)
Region failure Azure Site Recovery (replicate to paired region)
App crashes (load balance) Azure Load Balancer or App Gateway
SLAs:
Single VM (Premium SSD): 99.9%
Availability Set: 99.95%
Availability Zone (2+): 99.99%
Zone + Load Balancer: 99.99%
Storage Tier Decision
Data access frequency: Use tier:
Multiple times per day Hot
Once per 30 days Cool
Once per 90 days Cold
Once per year or less Archive (+ rehydration time)
Redundancy decision:
No DR needed: LRS (cheapest)
Zone outage protection: ZRS (recommended for most prod)
Region outage protection: GRS or GZRS
Read access to secondary: RA-GRS or RA-GZRS
Networking Quick Reference
Resource Purpose
VNet Private network — your isolated Azure network
Subnet Subdivide VNet — group resources by tier/function
NSG Stateful firewall on subnet or NIC
ASG Logical grouping of VMs for NSG rules
VNet Peering Connect VNets privately (non-transitive)
Private Endpoint Private IP for PaaS services in your VNet
Service Endpoint Route PaaS traffic via VNet backbone
Azure Bastion Secure RDP/SSH without public IP
Azure Firewall Managed, stateful L4-L7 firewall (centralised)
Load Balancer L4 TCP/UDP load balancing
Application Gateway L7 HTTP/HTTPS with WAF and URL routing
VPN Gateway Encrypted VPN to on-prem (over internet)
ExpressRoute Private dedicated circuit to on-prem
Azure Front Door Global HTTP load balancing + CDN + WAF
Traffic Manager DNS-based global routing (any protocol)
Useful Azure CLI Commands
# Login and set subscription:
az login
az account set --subscription "Production"
# Create resource group:
az group create --name "Prod-RG" --location "uksouth"
# Create VM:
az vm create --resource-group "Prod-RG" --name "MyVM" \
--image "Win2022Datacenter" --size "Standard_D2s_v3" \
--admin-username "azureadmin" \
--generate-ssh-keys
# Start/Stop VM:
az vm start --resource-group "Prod-RG" --name "MyVM"
az vm stop --resource-group "Prod-RG" --name "MyVM"
az vm deallocate --resource-group "Prod-RG" --name "MyVM"
# Create storage account:
az storage account create --name "mystorageacc" \
--resource-group "Prod-RG" --location "uksouth" \
--sku "Standard_ZRS" --kind "StorageV2"
# Create VNet and subnet:
az network vnet create --resource-group "Prod-RG" \
--name "MyVNet" --address-prefix "10.0.0.0/16"
az network vnet subnet create --resource-group "Prod-RG" \
--vnet-name "MyVNet" --name "WebSubnet" --address-prefix "10.0.1.0/24"
# Apply resource lock:
az lock create --name "ProdLock" --resource-group "Prod-RG" \
--lock-type "CanNotDelete"
# Deploy Bicep:
az deployment group create \
--resource-group "Prod-RG" --template-file "main.bicep"
# Query logs (KQL via CLI):
az monitor log-analytics query \
--workspace "MyWorkspace" \
--analytics-query "AzureActivity | where ActivityStatus == 'Failed' | take 10"
Top 10 Tips
-
ARM is the control plane for everything — every Azure action (Portal, CLI, PowerShell, Terraform) goes through ARM. Understanding ARM hierarchy (Management Group → Subscription → Resource Group → Resource) is fundamental to every governance, RBAC, and policy question.
-
Policy controls WHAT, RBAC controls WHO — Azure Policy enforces resource configuration (must use managed disks, must be in specific region). RBAC controls who can deploy/manage resources. Both are needed — not interchangeable.
-
Availability Zones over Availability Sets for new deployments — Availability Zones protect against datacenter failure (99.99% SLA). Availability Sets only protect within one datacenter (99.95% SLA). Always recommend AZs for production.
-
Private Endpoints over Service Endpoints for production — Service Endpoints route traffic via backbone but PaaS still has a public endpoint. Private Endpoints give PaaS a private IP in your VNet — you can completely disable public access. More secure, more enterprise.
-
ZRS for most production storage — ZRS replicates across 3 availability zones (survives datacenter failure). LRS only replicates within one datacenter. GRS/GZRS for cross-region requirements. Knowing the right tier for the scenario is heavily tested.
-
Blob lifecycle management policies for cost control — move blobs from Hot → Cool → Cold → Archive automatically based on age. This pattern (with the day thresholds) is a common scenario question.
-
NSG is stateful — response traffic is automatic — if you allow inbound port 80, the response traffic is automatically allowed outbound. You don't need a separate outbound rule for responses. Only need rules for initiated connections.
-
Azure Bastion eliminates public IP on VMs — never open RDP (3389) or SSH (22) to the internet from an NSG. Use Azure Bastion (deployed in AzureBastionSubnet) for secure RDP/SSH via the Azure Portal. This is the correct answer to any "how do you securely administer VMs" question.
-
Soft delete + Immutable vault = ransomware protection for backups — soft delete keeps deleted backups for 14 days. Immutable vault (locked) prevents backup data deletion entirely. Both are the answer to "how do you protect backups against ransomware."
-
Bicep over JSON ARM for new IaC — Bicep is the modern, recommended IaC language for Azure (compiles to ARM JSON). Cleaner syntax, type safety, better tooling. Know how to deploy Bicep via CLI (
az deployment group create --template-file main.bicep).
No comments:
Post a Comment