GPU HA & Sharing Strategy
(VMware + NVIDIA + IIoT workloads)
Audience: Infra, Ops, IoT App, Architecture
Scope: L4 GPUs, VMware, Kubernetes, PCI Passthrough vs vGPU, SAN vs No-SAN
Goal: Maximum clarity on when, why, and how to design HA correctly—without overengineering.
1. Ground Truth
1.1 What HA really means (often confused)
| Layer | HA Means |
|---|---|
| VMware HA | VM restarts on another host |
| GPU HA | GPU workload continues with acceptable downtime |
| Application HA | App survives failure (may restart) |
| Business HA | SLA respected (minutes vs seconds matter) |
🚨 GPU hardware itself is NOT live-migratable (for L4 class GPUs today).
So HA ≠ zero downtime.
HA = fast, predictable recovery.
2. GPU Allocation Models (Core Decision)
2.1 PCI Passthrough (Most misunderstood)
What it is
-
GPU exclusively bound to one VM
-
Bare-metal-like performance
Truth
-
❌ No automatic GPU failover
-
❌ VM restart requires manual GPU rebind
-
❌ VMotion not supported
When it makes sense
-
Dedicated inference nodes
-
Predictable load
-
App-level restart is acceptable
2.2 NVIDIA vGPU (Licensed, controlled sharing)
What it is
-
GPU sliced into profiles (e.g., L4-1Q, 2Q, 4Q)
-
VMware + NVIDIA vGPU license required
Truth
-
✅ Better utilization
-
✅ Multiple VMs per GPU
-
❌ Still no live GPU migration
-
❌ Requires SAN for VMware HA
When it makes sense
-
Multiple small/medium workloads
-
Shared GPU economics
-
You already accept SAN cost
3. SAN & HBA — The Non-Negotiable Reality
3.1 Why SAN is required for VMware HA
VMware HA needs:
That means:
-
Shared storage
-
SAN (FC / iSCSI)
-
HBA or NIC depending on SAN type
3.2 When HBA card is required
| Scenario | HBA Needed? | Why |
|---|---|---|
| FC SAN | ✅ Yes | Fibre Channel protocol |
| iSCSI SAN | ❌ No (NIC enough) | Ethernet based |
| No SAN | ❌ No | Local disks only |
📌 HBA has NOTHING to do with GPU
HBA exists only for shared storage access
4. Scenario-Based Strategy (This is the heart)
Scenario A — Dedicated GPU Nodes, No SAN (Recommended Default)
Architecture
Characteristics
-
PCI Passthrough
-
Local disks only
-
Kubernetes controls restart
Failure Handling
Why this works
-
GPU workloads are stateless (CV inference)
-
Startup < 2–5 minutes acceptable
-
No SAN dependency
-
Lowest operational complexity
Verdict
✅ BEST for IIoT / CV / Edge inference
✅ Cheapest
✅ Clearest ownership
❌ No VM-level HA (but app HA exists)
Scenario B — VMware HA with SAN (Enterprise Traditional)
Architecture
Failure Handling
Tradeoffs
| Aspect | Reality |
|---|---|
| Cost | High (SAN + HBA + vGPU license) |
| Complexity | High |
| GPU failover | Restart only |
| Performance | Slight overhead |
Verdict
⚠️ Only if VM-centric org mandates VMware HA
❌ Overkill for inference workloads
Scenario C — Hybrid: GPU Passthrough + Buffer Node (Smart HA)
Architecture
Failure Handling
Why this is powerful
-
No SAN
-
No vGPU license
-
Operationally predictable
-
Capacity-based HA, not infra-based HA
📌 This matches industrial HA philosophy (N+1 capacity)
5. N+1 Capacity Rule (Non-Negotiable)
If:
Then:
🚨 If you run all GPUs at 100%, no HA exists
This is physics, not software.
6. Decision Matrix (Print This)
| Requirement | Best Choice |
|---|---|
| Lowest cost | PCI Passthrough |
| Fastest inference | PCI Passthrough |
| Shared GPU usage | vGPU |
| VMware-only ops | vGPU + SAN |
| Edge / IIoT | No SAN |
| Predictable restart | Kubernetes |
| Zero infra drama | Independent nodes |
7. Monitoring & Ops (Often Ignored)
Mandatory
-
NVIDIA DCGM
-
GPU temperature, memory, power
-
Node-level alerts
Facilities
-
UPS sizing for GPU surge
-
Cooling validation (L4 still draws spikes)
8. Final Recommendation (Clear & Defensible)
Primary Strategy
Independent GPU nodes with PCI Passthrough + Kubernetes HA + N+1 capacity
SAN + HBA only when
-
VMware HA is mandated
-
Stateful VMs must survive host loss
-
Budget + ops maturity exist
vGPU only when
-
Multiple small workloads must share GPU
-
License cost justified
-
SAN already exists
9. One-Line Summary
“We design HA at the application and capacity level, not by forcing GPUs into legacy virtualization models.”
HIGH-LEVEL DECISION FLOW
SAN & HBA SUB-DECISION (Embed This Separately)
QUICK DECISION TABLE (For Meetings)
| Question | If Answer is YES | If Answer is NO |
|---|---|---|
| Stateless workload? | Proceed with app-level HA | SAN-based design |
| GPU sharing needed? | vGPU | PCI Passthrough |
| VMware HA mandatory? | SAN required | No SAN |
| Restart acceptable? | K8s + N+1 | Redesign app |
| Budget constrained? | No SAN | SAN + vGPU |