AI/ML, AWS, Azure, DevOps, GCP, M365, Microsoft Power Platform, RPA, SharePoint,: VMware

Monday, May 18, 2026

VMware — Complete Guide

vSphere · ESXi · vCenter · NSX-T · vSAN · Tanzu · HCX · Site Recovery · Aria · VCF · Cheat Sheet

Table of Contents

Core Concepts — VMware Portfolio Overview
vSphere & ESXi — Deep Dive
vCenter Server & Administration
VMware NSX-T — Network Virtualization
vSAN — Software-Defined Storage
VMware Tanzu & Cloud-Native
VMware Cloud Foundation & Multi-Cloud
Scenario-Based Questions
Cheat Sheet — Quick Reference

1. Core Concepts — VMware Portfolio Overview

What is VMware and how is its product portfolio organised?

VMware (now part of Broadcom) is the industry leader in virtualization and software-defined infrastructure — enabling organisations to abstract compute, network, and storage resources from underlying hardware and manage them as software.

Category	Products
Compute Virtualization	vSphere (ESXi + vCenter), VMware Workstation, VMware Fusion
Network Virtualization	NSX-T Data Center, NSX Advanced Load Balancer (Avi)
Storage Virtualization	vSAN, vSAN ESA, vSphere Virtual Volumes (vVols)
Cloud Management	Aria Suite (Operations, Automation, Log Insight, Network Insight)
Cloud-Native / Containers	Tanzu (TKGs, TKGi, TMC, App Platform)
Hybrid Cloud	VMware Cloud Foundation (VCF), VMware Cloud on AWS/Azure/GCP
Disaster Recovery	Site Recovery Manager (SRM), VMware Live Recovery
Migration	VMware HCX, vSphere Replication
Desktop Virtualization	Horizon (VDI), App Volumes, Dynamic Environment Manager

Tip: VMware's shift under Broadcom consolidates the portfolio into VMware Cloud Foundation (VCF) as the primary commercial offering — bundling vSphere, vSAN, and NSX-T into a single subscription. Understanding VCF is essential for modern VMware deployments.

What is the difference between VMware vSphere, ESXi, and vCenter?

ESXi (hypervisor):
→ Type-1 bare-metal hypervisor — runs directly on physical hardware
→ Replaces the OS: no underlying Windows or Linux required
→ Manages: CPUs, memory, storage, network interfaces
→ Hosts: virtual machines (VMs) and containers
→ Single host: can be managed standalone via DCUI or Host Client
→ Version: ESXi 8.x (current under Broadcom)

vCenter Server:
→ Centralised management platform for multiple ESXi hosts
→ Provides: vSphere Client (HTML5 UI), REST APIs, PowerCLI
→ Features: DRS, HA, vMotion, DPM, vSphere Lifecycle Manager
→ Deployment: vCenter Server Appliance (VCSA — Linux OVA)
→ No Windows vCenter — VCSA only since vSphere 7.x
→ Single Sign-On (SSO): authentication domain for all vSphere services

vSphere:
→ The product suite = ESXi + vCenter Server + related features
→ Not a single component — it's the umbrella term for the platform
→ Licences: vSphere Standard, vSphere Enterprise Plus (now VCF bundles)

Relationship:
Physical Server → ESXi (hypervisor) → Virtual Machines
ESXi (one or many) → managed by → vCenter Server
vCenter Server → part of → vSphere (the platform)

What are the core virtualisation concepts every VMware professional must know?

Hypervisor types:
Type-1 (bare-metal): ESXi, Hyper-V, KVM
  → Runs directly on hardware — no host OS
  → Lower overhead, better performance, enterprise use
Type-2 (hosted):     VMware Workstation, VirtualBox, Fusion
  → Runs on top of an existing OS (Windows/macOS)
  → Higher overhead, dev/test use only

Virtual Machine (VM):
→ Software-emulated computer — has vCPU, vRAM, vNIC, vDisk
→ Isolated from other VMs on same host
→ Runs any OS independently of underlying hardware
→ Portable: can migrate between hosts (vMotion), datacentres (HCX)

VM files:
.vmx       — VM configuration file (settings, hardware definition)
.vmdk      — VM disk file (virtual hard disk data)
-flat.vmdk — actual disk data; descriptor .vmdk is the header
.nvram     — VM BIOS/EFI state
.vmsd      — snapshot metadata
.vmsn      — snapshot state
.vmxf      — supplemental config
.log       — VM log file

Snapshots:
→ Point-in-time state capture (disk + memory + VM state optional)
→ NOT a backup — child disks grow; performance degrades over time
→ Best practice: no more than 2–3 snapshots per VM
→ Consolidate before decommissioning: right-click → Manage Snapshots

VMFS (Virtual Machine File System):
→ VMware's clustered file system for shared storage (SAN/iSCSI/FC)
→ Allows multiple ESXi hosts to access the same datastore concurrently
→ Supports vMotion, HA, DRS — requires shared VMFS datastore
→ Current version: VMFS-6 (supports 4K native drives, automatic UNMAP)

2. vSphere & ESXi — Deep Dive

How do vMotion, Storage vMotion, and Cross-vCenter vMotion work?

vMotion (live migration — compute):
→ Migrates a running VM between ESXi hosts with zero downtime
→ Requirements:
   - Shared storage (VMFS datastore accessible by both hosts)
   - Compatible CPUs (EVC mode if CPU generations differ)
   - L2 network connectivity (same port group or NSX-T stretched segment)
   - vMotion VMkernel port on both hosts
→ Process:
   1. vCenter pre-copies VM memory pages to destination host
   2. VM continues running on source; changed pages tracked
   3. At switchover: VM pauses (< 1 second), final memory state transferred
   4. VM resumes on destination; source VM deleted
→ Use case: host maintenance (Enter Maintenance Mode triggers auto-vMotion)

Storage vMotion (live migration — storage):
→ Migrates VM disk files between datastores with VM running
→ No CPU/network compatibility requirements (moves storage, not compute)
→ Requirements: vCenter, no RDMs (or use mapped LUN)
→ Use case: datastore maintenance, storage tiering, LUN reclamation

Cross-vCenter vMotion (xvMotion — long-distance):
→ Migrate VM between different vCenter Servers / sites
→ Can move compute + storage simultaneously
→ Requires: Enhanced Linked Mode OR HCX for stretched layer-2

EVC (Enhanced vMotion Compatibility):
→ Masks CPU features to a common baseline across a cluster
→ Allows vMotion between hosts with different CPU generations
→ Set at cluster level: Intel Broadwell, Cascade Lake, Ice Lake, etc.
→ EVC mode cannot be lowered if running VMs have higher baseline

What are vSphere HA, DRS, and DPM and when does each activate?

vSphere HA (High Availability):
→ Restarts VMs on surviving hosts if a host fails
→ Heartbeat: hosts exchange heartbeats every second (UDP 902)
→ Failure detection: no heartbeats for 10s → host declared failed
→ VM restart: vCenter restarts failed VMs on remaining hosts
→ VM/App monitoring: restarts VMs that stop sending VMware Tools heartbeats
→ Admission Control policies:
   - Percentage cluster resources reserved (e.g., 25% = tolerate 1/4 hosts)
   - Slot-based: reserves slots per host for worst-case failover
   - Dedicated failover hosts

DRS (Distributed Resource Scheduler):
→ Balances VM workloads across hosts in a cluster automatically
→ Monitors: CPU and memory utilisation per host every 5 minutes
→ Automation levels:
   Manual:         suggestions only — admin approves each migration
   Partially Auto: initial placement auto, balance manual
   Fully Auto:     vCenter migrates VMs automatically
→ DRS rules:
   Affinity:      keep VMs on same host (e.g., app + DB for latency)
   Anti-affinity: keep VMs on different hosts (e.g., HA pairs)
   Host affinity: pin VMs to specific hosts (licensing, hardware)

DPM (Distributed Power Management):
→ Consolidates workloads and powers down idle hosts to save energy
→ Wakes hosts when capacity is needed (IPMI/iLO/iDRAC WoL)
→ Best practice: enable only if hosts support remote power management

Cluster design best practice:
→ N+1 minimum for HA (lose 1 host, all VMs still fit)
→ N+2 for business-critical clusters
→ Never over-commit with HA admission control disabled
→ DRS Fully Automatic + Balanced threshold for production clusters

What is the VMkernel and what ports does it require?

VMkernel (vmk) — management network interfaces on ESXi:
→ Not a VM NIC — it's ESXi's own network stack
→ Each vmk port has: IP, subnet mask, gateway, VLAN tag, MTU

Service               Default vmk  Port / Protocol
─────────────────────────────────────────────────────────
Management            vmk0         TCP 443 (vSphere Client)
                                   TCP 902 (vCenter heartbeat)
vMotion               vmk1         TCP/UDP 8000 (vMotion data)
vSAN                  vmk2         UDP 12321, 23451 (vSAN traffic)
iSCSI / NFS           vmk3         TCP 3260 (iSCSI)
                                   TCP/UDP 111, 2049 (NFS)
Fault Tolerance (FT)  vmk4         TCP 80, 8100-8200 (FT logging)
Replication           vmk5         TCP 31031 (vSphere Replication)

Best practice:
→ Dedicate separate vmk ports to each service type
→ Use separate physical NICs (pNICs) for management vs vMotion vs vSAN
→ Jumbo frames (MTU 9000) for vMotion and vSAN traffic
→ vSAN and vMotion on 10/25/100 GbE NICs; management can share 1GbE

Networking constructs:
vSwitch (Standard): configured per host — no shared state
dvSwitch (VDS):     configured in vCenter — shared across hosts
  → Required for vMotion, DRS, HA features
  → Port groups: named network segments with policy (VLAN, security, QoS)
NSX-T segments:     overlay networks — no VLAN dependency

3. vCenter Server & Administration

How do you manage vSphere via PowerCLI?

# Connect to vCenter:
Connect-VIServer -Server vcenter.domain.local -Credential (Get-Credential)

# List all VMs in a cluster:
Get-Cluster "Production" | Get-VM | Select Name, PowerState, NumCpu, MemoryGB

# Power operations:
Start-VM -VM "WebServer01"
Stop-VMGuest -VM "WebServer01" -Confirm:$false    # graceful shutdown via Tools
Stop-VM -VM "WebServer01"     -Confirm:$false    # power off (hard)

# vMotion — live migrate VM to a different host:
Move-VM -VM "AppServer01" -Destination (Get-VMHost "esxi02.domain.local")

# Storage vMotion — migrate VM disk to a different datastore:
Move-VM -VM "AppServer01" -Datastore (Get-Datastore "SSD-DS-02")

# Create a snapshot (quiesced, no memory):
New-Snapshot -VM "DBServer01" -Name "Pre-Patch-$(Get-Date -f yyyyMMdd)" `
             -Description "Before monthly patching" -Memory $false -Quiesce $true

# Remove all snapshots on a VM:
Get-VM "DBServer01" | Get-Snapshot | Remove-Snapshot -Confirm:$false

# VM inventory export:
Get-VM | Select Name, @{N="ToolsVersion";E={$_.ExtensionData.Guest.ToolsVersion}},
                @{N="GuestOS";E={$_.ExtensionData.Guest.GuestFullName}},
                NumCpu, MemoryGB, ProvisionedSpaceGB |
         Export-Csv "vm-inventory.csv" -NoTypeInformation

# List hosts in maintenance mode:
Get-VMHost | Where-Object {$_.ConnectionState -eq "Maintenance"} |
             Select Name, ConnectionState, Version

# Get datastore space usage — sorted by used %:
Get-Datastore | Select Name, CapacityGB,
  @{N="FreeGB";E={[math]::Round($_.FreeSpaceGB,1)}},
  @{N="UsedPct";E={[math]::Round((1-($_.FreeSpaceGB/$_.CapacityGB))*100,1)}} |
  Sort UsedPct -Descending

# Place host into maintenance mode (evacuates VMs via vMotion):
Set-VMHost -VMHost "esxi01.domain.local" -State Maintenance

What are the key vCenter components and their dependencies?

vCenter Server Appliance (VCSA) components:
────────────────────────────────────────────────────────────────
Service              Port       Purpose
────────────────────────────────────────────────────────────────
vSphere Client       443        HTML5 web UI
vCenter SSO          443/7444   Authentication — issues SAML tokens
Lookup Service       7444       Service registry for vSphere services
Inventory Service    10443      VM/host/datastore inventory
vSphere API (SDK)    443        REST + SOAP APIs for automation
PostgreSQL DB        5432       Embedded DB (events, tasks, inventory)
vSAN Health          8006       vSAN monitoring service
Update Manager       9084/9087  vSphere Lifecycle Manager (patching)
────────────────────────────────────────────────────────────────

VCSA sizing (production recommendations):
Small:   up to 100 hosts /  1,000 VMs  —  2 vCPU, 12 GB RAM, 415 GB storage
Medium:  up to 400 hosts /  4,000 VMs  —  8 vCPU, 24 GB RAM, 480 GB storage
Large:   up to 1000 hosts / 10,000 VMs — 16 vCPU, 32 GB RAM, 870 GB storage
X-Large: up to 2000 hosts / 35,000 VMs — 24 vCPU, 48 GB RAM, 1665 GB storage

vCenter High Availability (vCHA):
→ Active-Passive pair with a Witness node (3 nodes total)
→ Automatic failover if Active VCSA fails (RTO: ~5 minutes)
→ No impact on running VMs (ESXi HA is independent of vCenter)
→ Requires: 3 VMs, shared/stretched network, 250 Mbps between sites

vSphere Lifecycle Manager (vLCM):
→ Manages ESXi patches, updates, firmware (replaces VUM in 7.x+)
→ Image-based management: define desired ESXi image per cluster
→ Depot: download patches from VMware or air-gapped offline bundle
→ Remediation: puts host in maintenance mode → patches → reboots → exits

4. VMware NSX-T — Network Virtualization

What is NSX-T and what problem does it solve?

NSX-T Data Center decouples networking and security from physical hardware — delivering software-defined networking, micro-segmentation, and consistent policy across on-premises and cloud environments.

Why NSX-T:
→ Traditional VLANs require physical switch changes for every new segment
→ Firewall rules applied at perimeter only — east-west traffic unprotected
→ No consistent networking policy across vSphere, bare-metal, and cloud
→ NSX-T solves all three with overlay networking and distributed security

NSX-T Architecture:
┌─────────────────────────────────────────────────────────────┐
│ NSX Manager (Control Plane + Management Plane)              │
│  - 3-node cluster for HA                                    │
│  - Stores policy, pushes config to data plane               │
└──────────────────────────────┬──────────────────────────────┘
                               │ Pushes fabric config
                               ▼
┌─────────────────────────────────────────────────────────────┐
│ Transport Nodes (Data Plane — ESXi hosts + bare-metal)      │
│  - NSX kernel modules installed on ESXi                     │
│  - N-VDS or VDS7+ as the virtual switch                     │
│  - TEP (Tunnel Endpoint): GENEVE overlay tunnels            │
└─────────────────────────────────────────────────────────────┘

Key NSX-T constructs:
Overlay Segments:           Layer-2 logical switches — no VLAN dependency
Tier-1 (T1) Router:         per-tenant/application gateway
Tier-0 (T0) Router:         datacenter edge — BGP/OSPF to physical fabric
Edge Nodes:                 NSX-T gateways for N-S traffic
Distributed Firewall (DFW): stateful firewall in ESXi kernel — every vNIC
NSX ALB (Avi):              L4/L7 load balancer, WAF, GSLB

Micro-segmentation with DFW:
→ Firewall rules enforced at vNIC level — every VM protected
→ Rules follow the VM (vMotion, migrations) — not tied to IP/port
→ Groups: dynamic workload grouping by tag, OS, VM name
→ East-west traffic secured without hairpinning to perimeter firewall
→ Example policy:
   Group "WebTier" (tag:Web) → Group "AppTier" (tag:App) → Allow TCP 8080
   Group "AppTier"            → Group "DBTier"  (tag:DB)  → Allow TCP 1433
   Any                        → Any                        → Block (default deny)

How does NSX-T routing work (Tier-0 / Tier-1)?

Two-tier routing model:

Tier-0 (T0) Gateway:
→ Connects NSX overlay to physical network (underlay)
→ Runs BGP/OSPF with physical ToR switches
→ Handles: N-S traffic, NAT (SNAT/DNAT), VPN (IPsec/L2VPN)
→ Deployed on Edge Transport Nodes (dedicated VMs or bare-metal)
→ Active-Active or Active-Standby HA modes

Tier-1 (T1) Gateway:
→ Per-application or per-tenant logical router
→ Connected to T0 upstream, overlay segments downstream
→ Handles: inter-segment routing, DHCP relay/server, DNS forwarder
→ Runs distributed on all ESXi transport nodes (low latency)

Traffic flow examples:
VM-to-VM (same segment):             switch locally on host — no router
VM-to-VM (different segments, T1):   routed at distributed T1 in host kernel
VM-to-VM (different T1s):            routed up to T0, back down to T1
VM-to-Internet:                      T1 SR → T0 SR on Edge → physical → Internet

Route redistribution:
→ T0 advertises NSX overlay prefixes into BGP to physical fabric
→ Physical fabric advertises default route / external routes into T0
→ T1 routes auto-redistributed to T0 via internal protocol

5. vSAN — Software-Defined Storage

What is vSAN and how does it work?

VMware vSAN aggregates local disks from ESXi hosts into a shared distributed datastore — eliminating the need for external SAN or NAS in most workloads.

vSAN architecture:

Each ESXi host contributes: NVMe/SSD (cache) + SSD/HDD (capacity)
→ All-Flash:  NVMe cache tier + SSD capacity tier (best performance)
→ Hybrid:     SSD cache tier + HDD capacity tier (lower cost)
→ ESA (Express Storage Architecture): all-NVMe, single-tier, vSAN 8.x

vSAN Objects and Components:
VM Storage Objects:
  - VM Home namespace (vmx, logs)
  - VM Swap object
  - VMDK data object (one per disk)
Each object is split into Components distributed across hosts:
  Component = chunk of data stored on a single host's disk group
  Witness   = tie-breaker metadata component (no user data)

Storage Policy Based Management (SPBM):
→ Per-VM storage policies — not per-datastore
→ FTT (Failures to Tolerate):
   FTT=1: survives 1 host/disk failure
   FTT=2: survives 2 failures
→ RAID type:
   RAID-1 (mirroring):  FTT=1 requires 3 hosts
   RAID-5 (erasure):    FTT=1 requires 4 hosts (more efficient)
   RAID-6:              FTT=2 requires 6 hosts

vSAN Cluster sizing:
Minimum hosts: 3 (FTT=1 RAID-1), 4 (FTT=1 RAID-5), 6 (FTT=2 RAID-6)
Recommended:   6+ hosts for production flexibility
vSAN network:  dedicated VMkernel port, 10/25 GbE, MTU 9000
2-node cluster: requires separate Witness Host (no storage contributed)

vSAN Stretched Cluster:
→ Two active sites + witness site
→ All writes mirrored to both sites (synchronous) — RPO = 0
→ Automatic HA failover between sites
→ Requires: < 5ms RTT between sites, < 200ms RTT to witness

What are vSAN performance monitoring best practices?

Key vSAN metrics (Skyline Health / vSAN Proactive Tests):
Latency:      read/write latency (target: <1ms read, <5ms write — all-flash)
Throughput:   MB/s per disk group and per VM object
IOPS:         read/write IOPS per VM, per host, per disk group
Congestion:   >0 congestion events = disk group bottleneck
Resync:       % resync traffic — elevated after host failure/add
Cache hit %:  flash read cache hit ratio (hybrid tier only)

Common vSAN issues and resolutions:
Issue                               Resolution
─────────────────────────────────────────────────────────────────────
vSAN component inaccessible         Check host connectivity, disk health
High congestion / latency           Check disk group IOPS; add hosts/disks
Resync traffic impacting production Enable resync throttling (IOPS limit)
Capacity imbalance between hosts    Rebalance via vSAN proactive rebalance
Disk failures                       Replace disk; policy compliance auto-restores
Snapshot performance degradation    Limit snapshot count; consolidate regularly

PowerCLI — vSAN health check:
Get-VsanDisk | Select VMHost, CanonicalName, State, Tier, OperationalState |
               Where-Object {$_.OperationalState -ne "ok"} |
               Format-Table -AutoSize

Get-VsanClusterConfiguration -Cluster "vSAN-Cluster" |
  Select SpaceEfficiencyEnabled, EncryptionEnabled, HealthCheckEnabled

6. VMware Tanzu & Cloud-Native

What is VMware Tanzu and how does it fit with vSphere?

VMware Tanzu is VMware's portfolio of products for running, managing, and securing Kubernetes workloads — both on-premises (on vSphere) and across clouds.

Tanzu product breakdown:
────────────────────────────────────────────────────────────────────────
Product                       Description
────────────────────────────────────────────────────────────────────────
TKGs (Tanzu K8s Grid          K8s clusters provisioned in vSphere Namespaces
 with Supervisor)              vSphere 7/8 required; NSX-T or NSX ALB for LB

TKGi (Tanzu K8s Grid          Enterprise K8s with BOSH; air-gapped, FIPS
 Integrated)                   Required for PKS migrations

Tanzu Mission Control (TMC)   Centralised multi-cluster K8s management (SaaS)
                               Policy, compliance, cost visibility

Tanzu Application Platform    Developer-focused app platform (TAP) on K8s
(TAP)                          Supply chain security, app accelerators

Tanzu Service Mesh             Service mesh for micro-services (Global Namespace)
────────────────────────────────────────────────────────────────────────

vSphere with Tanzu (Workload Control Plane — WCP):
→ vSphere 7+ Supervisor feature: run K8s natively in vCenter
→ vSphere Namespace: RBAC-controlled project space for K8s clusters
→ Storage: vSAN/vVols as PersistentVolumes via CNS (Container Native Storage)
→ Networking: NSX-T recommended; VDS 7+ also supported

Key concepts:
Supervisor Cluster:   vSphere control plane K8s cluster (runs on ESXi hosts)
Tanzu K8s Cluster:    guest cluster in a vSphere Namespace for workloads
VM Class:             pre-defined VM sizes (best-effort, guaranteed)
Storage Policy:       maps to vSAN SPBM policy for PVC provisioning
Load Balancer:        NSX ALB or HA Proxy for external service exposure

kubectl on vSphere:
# Login to Supervisor:
kubectl vsphere login --vsphere-username admin@vsphere.local \
                      --server wcp.domain.local

# List vSphere Namespaces:
kubectl get namespaces

# Apply a Tanzu K8s Cluster manifest:
kubectl apply -f tanzu-cluster.yaml

# Access workload cluster:
kubectl vsphere login --server wcp.domain.local \
  --vsphere-username admin@vsphere.local \
  --tanzu-kubernetes-cluster-name prod-cluster \
  --tanzu-kubernetes-cluster-namespace dev-namespace

7. VMware Cloud Foundation & Multi-Cloud

What is VMware Cloud Foundation (VCF) and what does it include?

VCF = VMware's integrated private cloud platform
→ Bundles: vSphere, vSAN, NSX-T, Aria Suite under one subscription
→ Deployed on validated hardware (VxRail, off-the-shelf servers)
→ Managed by: SDDC Manager (lifecycle automation, brownfield import)

VCF components:
┌─────────────────────────────────────────────────────────────────────┐
│ SDDC Manager — orchestrates bring-up, patching, scaling             │
├─────────────────────────────────────────────────────────────────────┤
│ vCenter Server (management) │ vSAN (storage) │ NSX-T (networking)   │
├─────────────────────────────────────────────────────────────────────┤
│ Aria Operations │ Aria Automation │ Aria Log Insight │ Aria Networks  │
└─────────────────────────────────────────────────────────────────────┘

VCF domains:
Management Domain:   vCenter, NSX Manager, SDDC Manager, Aria Suite
Workload Domains:    isolated vSphere+vSAN+NSX-T cluster for production VMs
VI Workload Domain:  standard VM workloads
VVD (Validated):     pre-tested reference architecture for specific use cases

Bring-up workflow:
1. Physical server preparation (BIOS settings, NIC connectivity verified)
2. Cloud Builder VM: orchestrates automated VCF bring-up (JSON spec input)
3. SDDC Manager deployed → manages ESXi patching, domain expansion
4. Workload domain: add hosts → SDDC Manager creates vSphere+vSAN+NSX cluster

VMware HCX (Hybrid Cloud Extension):
→ Application mobility across any vSphere environment or VMware Cloud
→ Migration types:
   Cold migration:              offline VM copy — no vMotion required
   vMotion:                     live migration (zero downtime) up to 150ms RTT
   Bulk migration:              mass move with maintenance window
   Replication-assisted vMotion (RAV): pre-seeds data, then live cutover
→ Components: HCX Manager, Service Mesh, Interconnect, WAN Opt, Network Ext
→ Network Extension: stretch L2 VLAN/NSX segment between sites — no IP renumbering

VMware Cloud on AWS (VMC):
→ VMware-managed vSphere SDDC running on dedicated AWS bare-metal
→ Same vSphere/NSX-T/vSAN stack — no re-training required
→ Native access to AWS services (RDS, S3, ELB in same AZ)
→ HCX included: migrate VMs from on-prem to VMC non-disruptively
→ Use case: DR, datacenter extension, cloud bursting

8. Scenario-Based Questions

Scenario 1: A production ESXi host fails. Walk through the HA and recovery process.

Immediate HA response (automatic):

Failure detection: surviving ESXi hosts stop receiving heartbeats from the failed host (default 10 seconds). vCenter marks the host as "Not Responding."
HA Master election: the HA Master host (elected from the cluster via datastore heartbeats) takes ownership of recovery.
VM restart: the Master instructs surviving hosts to restart VMs from the failed host. Restart order follows VM restart priority (High → Medium → Low → Disabled).
Admission Control validation: HA checks reserved capacity (N+1 design). If sufficient, all VMs restart within 1–3 minutes.

Post-failure remediation:

Identify root cause: check iDRAC/iLO hardware alerts, DCUI on host, vCenter Alarms, Aria Log Insight for ESXi syslog around failure time.
Host replacement/repair: replace failed components (PSU, NIC, disk). If connectivity issue, check ToR switch and vmk0 management network.
Reintroduce host: reconnect host to vCenter. vSphere Lifecycle Manager (vLCM) remediates host to cluster image baseline.
vSAN resync: if vSAN cluster, failed host's components are rebuilt on surviving hosts within FTT policy. Monitor resync completion in vSAN Health.
Post-mortem: document timeline, RCA, and verify HA design covers future single-host failures.

Scenario 2: Design a VMware platform for a 500-VM production environment with DR.

Primary datacenter design:

Compute: 6 ESXi hosts per cluster (N+2 for HA; DRS Fully Automatic). Two clusters: Management (vCenter, NSX Manager, Aria) + Workload (production VMs). All hosts: 2× 25GbE (vSAN + NSX-T TEP), 2× 10GbE (management + vMotion).
Storage: vSAN All-Flash on workload hosts. Policy: FTT=1 RAID-5 for standard workloads, FTT=2 RAID-6 for Tier-1 databases. vSAN stretched cluster for RPO=0 if budget permits.
Networking: NSX-T Data Center. Tier-0 BGP peering to ToR switches (ECMP for bandwidth). Tier-1 per application tier. DFW micro-segmentation: default deny east-west. NSX ALB for application load balancing.
Management: VCSA in vCHA (3-node HA) on management cluster. NSX Manager 3-node cluster. Aria Operations for performance monitoring and capacity planning.

DR site design:

Site Recovery Manager (SRM): policy-based replication and automated failover orchestration. Recovery Plans define failover order, IP customisation, pre/post scripts.
vSphere Replication: RPO 5-minute minimum per VM (no shared storage required). Configure replication groups matching application tiers.
RTO/RPO targets: RTO 4 hours (SRM automated failover + smoke tests), RPO 15 minutes. Tier-1: RPO 5 minutes.
DR testing: SRM supports non-disruptive test failovers (isolated network bubble). Test quarterly; review recovery plan annually.

Scenario 3: Migrate 200 VMs from on-premises vSphere to VMware Cloud on AWS.

Assessment: inventory VM sizes, dependencies, storage consumption, and network segments using RVTools and vSphere Replication assessment.
HCX deployment: deploy HCX Connector on-premises (OVA), activate with VMC HCX Cloud Manager. Create Service Mesh (interconnect, WAN optimisation, network extension).
Network Extension: extend on-premises VLANs/NSX segments to VMC via HCX Network Extension. VMs retain IP addresses — no renumbering during migration.
Migration wave planning: group VMs by application dependency (DB before App, App before Web). Schedule maintenance windows for stateful workloads.
Migration execution:
- Tier-3 (dev/test): Bulk Migration — replicate overnight, cutover in window
- Tier-2 (standard): RAV — pre-seed data, live cutover under 1 minute
- Tier-1 (business-critical): vMotion — zero downtime live migration
DNS/LB cutover: update DNS records and load balancer backends to VMC IPs after each wave. Validate application health before proceeding.
Network extension retirement: once all VMs migrated, unextend segments, update routing, decommission HCX Network Extension.
Post-migration: right-size VMs using Aria Operations recommendations. Apply VMC-appropriate storage policies. Remove HCX Service Mesh.

Scenario 4: Troubleshoot a VM with network connectivity issues in an NSX-T environment.

Step 1 — VM layer:
→ Ping default gateway from within VM
→ Check NIC driver and VMware Tools version
→ Verify IP/subnet/gateway configuration

Step 2 — NSX-T logical port:
→ NSX Manager → Networking → Segments → find VM's segment → check logical port
→ Verify port state: UP, Admin State: UP, Attachment: correct VM
→ API: GET /api/v1/logical-ports?attachment_id={vm-id}

Step 3 — Distributed Firewall:
→ NSX Manager → Security → Distributed Firewall → check flow analysis
→ Enable DFW logging on suspect rules temporarily
→ On ESXi host:
   vsipioctl getrules -f nic:xxxx   (DFW rules applied to vNIC)
   vsipioctl getflows -f nic:xxxx   (active connections)

Step 4 — Segment and T1 Gateway:
→ Check T1 routing table: NSX Manager → Tier-1 → Routes
→ Verify segment is attached to correct T1
→ Test: SSH to T1 edge SR node → ping VM gateway IP

Step 5 — T0 / Edge Node:
→ Check T0 BGP status: NSX Manager → Tier-0 → BGP Neighbours
→ Verify routes propagated to physical switches
→ Test N-S: ping from VM to external IP → check SNAT rules

Step 6 — Transport Node (ESXi):
→ NSX Manager → System → Fabric → Transport Nodes → Health (check VTEP)
→ On host: esxcli network vswitch dvs vmware vxlan stats get
→ Packet capture on TEP vmkernel: pktcap-uw --dir 2 --vmk vmk2

9. Cheat Sheet — Quick Reference

VMware Component Selection Guide

Requirement                                → Component
─────────────────────────────────────────────────────────────────────
Server virtualisation                      → vSphere (ESXi + vCenter)
Shared storage (all-flash, hyperconverged) → vSAN All-Flash / ESA
Network virtualisation + micro-segment.    → NSX-T Data Center
Load balancing (L4/L7, WAF, GSLB)         → NSX Advanced LB (Avi)
Kubernetes on vSphere                      → vSphere with Tanzu (WCP)
Multi-cluster K8s management               → Tanzu Mission Control (TMC)
VM performance monitoring / forecasting    → Aria Operations
Log aggregation (vSphere/NSX/guest)        → Aria Log Insight
IaC / self-service cloud automation        → Aria Automation
Network visibility and flow analysis       → Aria Networks (vRNI)
Live VM migration between sites            → vMotion / HCX vMotion
Mass migration on-prem → VMC/cloud         → HCX Bulk / RAV
DR orchestration + automated failover      → Site Recovery Manager (SRM)
Replication (no shared storage required)   → vSphere Replication
Integrated SDDC lifecycle management       → VMware Cloud Foundation (VCF)
VDI / virtual desktops                     → VMware Horizon

vSphere HA / DRS / DPM / FT Summary

Feature	Trigger	Action
HA	Host heartbeat loss (10s)	Restart VMs on surviving hosts
HA App Monitor	VMware Tools heartbeat loss	Restart VM OS
DRS	CPU/Mem imbalance (every 5 min)	vMotion VMs to balance hosts
DRS Rules	Affinity policy defined	Keep / separate VMs on hosts
DPM	Low cluster utilisation	Power off hosts; wake on demand
FT	VM failure	Instant failover to shadow VM (RPO=0, RTO=0)

ESXi Networking Quick Reference

Standard vSwitch (VSS):
→ Per-host config — no central management
→ No distributed features (no vMotion port group portability)
→ Use for: management vmk0 on standalone/small hosts only

Distributed vSwitch (VDS):
→ Managed centrally in vCenter — config pushed to all member hosts
→ Required for DRS, vMotion, and advanced HA features
→ LACP, LLDP, NetFlow, port mirroring support
→ Use for: all production clusters

NSX-T N-VDS / VDS7+:
→ Overlay networking — VMs on virtual segments (no VLAN dependency)
→ GENEVE encapsulation for TEP tunnels (replaces VXLAN in NSX-T)
→ vSphere 7.x+: N-VDS merged into VDS — single switch for both

NIC Teaming policies (VDS):
Route based on originating port: default; good for most workloads
Route based on IP hash:          requires LACP/EtherChannel on physical switch
Route based on physical NIC load: LBT — VDS only; dynamic balancing
Use explicit failover order:      active-passive, deterministic

PowerCLI Quick Reference

# Connect / Disconnect:
Connect-VIServer -Server vcenter.domain.local
Disconnect-VIServer -Confirm:$false

# VM management:
Get-VM | Where PowerState -eq "PoweredOff"
Get-VM "VMName" | Start-VM
Get-VM "VMName" | Restart-VMGuest
Get-VM "VMName" | Get-Snapshot | Remove-Snapshot -Confirm:$false
Move-VM -VM "VMName" -Destination (Get-VMHost "esxi02")
Move-VM -VM "VMName" -Datastore  (Get-Datastore "DS01")

# Host management:
Get-VMHost | Select Name, ConnectionState, Version, NumCpu, MemoryTotalGB
Set-VMHost -VMHost "esxi01" -State Maintenance
Get-VMHost -State Maintenance | Set-VMHost -State Connected

# Inventory and reporting:
Get-VM | Select Name, NumCpu, MemoryGB, ProvisionedSpaceGB |
         Export-Csv "vm-report.csv" -NoTypeInformation
Get-Datastore | Select Name, CapacityGB, FreeSpaceGB |
                Sort FreeSpaceGB | Format-Table

# vSAN health:
Get-VsanDisk | Where OperationalState -ne "ok"
Get-VsanClusterConfiguration -Cluster "vSAN-Cluster"

# Alarms and events:
Get-AlarmDefinition | Where Enabled -eq $true
Get-VIEvent -MaxSamples 100 -Types Error | Select CreatedTime, FullFormattedMessage

vSAN Storage Policy Reference

FTT	RAID Type	Min Hosts	Overhead	Use Case
1	RAID-1	3	100%	Small clusters, Tier-1 (mirroring)
1	RAID-5	4	33%	Standard production (efficient)
2	RAID-1	5	200%	Highest resilience, Tier-0
2	RAID-6	6	50%	Balanced resilience + efficiency
0	None	1	0%	Test/Dev only — no protection

All-Flash recommendations: Tier-1 (DB/critical): FTT=2 RAID-6 (6+ hosts) · Tier-2 (standard VMs): FTT=1 RAID-5 (4+ hosts) · Tier-3 (dev/test): FTT=1 RAID-1 (3+ hosts). Never use FTT=0 in production.

Top 10 VMware Tips

vSphere is not just ESXi. vSphere is the platform (ESXi + vCenter + features). Always distinguish ESXi (hypervisor), VCSA (management), and vSAN/NSX-T (infrastructure) when designing or troubleshooting solutions.
Snapshots are not backups. Snapshots consume disk space proportional to VM change rate and degrade performance over time. Use a dedicated backup solution (Veeam, VADP-based) for data protection. Consolidate snapshots before every maintenance window.
EVC protects vMotion across CPU generations. Set EVC at cluster creation, before adding hosts. You can raise EVC mode but cannot lower it while VMs are powered on. Baseline to the oldest CPU family in the cluster.
vSAN network requires jumbo frames (MTU 9000) end-to-end. Configure MTU 9000 on VMkernel port, VDS uplink, ToR switch ports, and all inter-switch links. An MTU mismatch silently drops vSAN traffic above 1500 bytes — often misdiagnosed as disk failure.
NSX-T DFW follows the VM — not the IP. Firewall rules applied to Security Groups (dynamic by VM tag/name) stay with VMs through vMotion, DR failover, and migrations. Design security policy around workload identity, not network location.
vCenter HA is not the same as vSphere HA. vSphere HA restarts VMs on host failure. vCenter HA protects the VCSA management appliance itself. Both coexist independently. Running VMs are completely unaffected by a vCenter outage.
HCX is the safest migration path for large-scale moves. HCX Network Extension eliminates IP renumbering. Use RAV migration for Tier-1 workloads — replication pre-seeds data and final cutover is under 1 minute with minimal downtime.
SPBM decouples storage policy from infrastructure. Define VM storage requirements (FTT, RAID, IOPS) in a policy; vSAN enforces it automatically. When hosts are added, vSAN rebalances to meet all policies without manual intervention.
VCF SDDC Manager owns lifecycle management. In VCF deployments, never patch ESXi, vCenter, or NSX manually. SDDC Manager orchestrates the patching sequence (NSX → vSAN → vCenter → ESXi) to maintain component compatibility. Manual patching breaks SDDC Manager state.
Monitor capacity proactively with Aria Operations. vSAN "time remaining" and Aria capacity forecasting give weeks of notice before exhaustion. Set alarms at 70% capacity threshold (not 90%) to allow orderly expansion without emergency procurement.

AI/ML, AWS, Azure, DevOps, GCP, M365, Microsoft Power Platform, RPA, SharePoint,

Monday, May 18, 2026

VMware — Complete Guide

VMware — Complete Guide

1. Core Concepts — VMware Portfolio Overview

What is VMware and how is its product portfolio organised?

What is the difference between VMware vSphere, ESXi, and vCenter?

What are the core virtualisation concepts every VMware professional must know?

2. vSphere & ESXi — Deep Dive

How do vMotion, Storage vMotion, and Cross-vCenter vMotion work?

What are vSphere HA, DRS, and DPM and when does each activate?

What is the VMkernel and what ports does it require?

3. vCenter Server & Administration

How do you manage vSphere via PowerCLI?

What are the key vCenter components and their dependencies?

4. VMware NSX-T — Network Virtualization

What is NSX-T and what problem does it solve?

How does NSX-T routing work (Tier-0 / Tier-1)?

5. vSAN — Software-Defined Storage

What is vSAN and how does it work?

What are vSAN performance monitoring best practices?

6. VMware Tanzu & Cloud-Native

What is VMware Tanzu and how does it fit with vSphere?

7. VMware Cloud Foundation & Multi-Cloud

What is VMware Cloud Foundation (VCF) and what does it include?

8. Scenario-Based Questions

9. Cheat Sheet — Quick Reference

VMware Component Selection Guide

vSphere HA / DRS / DPM / FT Summary

ESXi Networking Quick Reference

PowerCLI Quick Reference

vSAN Storage Policy Reference

Top 10 VMware Tips

No comments:

Post a Comment

Featured Post

Microsoft Entra ID — A Practical Introduction for M365 Admins

Popular posts

Search This Blog