GPU HA & Sharing Strategy

(VMware + NVIDIA + IIoT workloads)

Audience: Infra, Ops, IoT App, Architecture
Scope: L4 GPUs, VMware, Kubernetes, PCI Passthrough vs vGPU, SAN vs No-SAN
Goal: Maximum clarity on when, why, and how to design HA correctly—without overengineering.

1. Ground Truth

1.1 What HA really means (often confused)

Layer	HA Means
VMware HA	VM restarts on another host
GPU HA	GPU workload continues with acceptable downtime
Application HA	App survives failure (may restart)
Business HA	SLA respected (minutes vs seconds matter)

🚨 GPU hardware itself is NOT live-migratable (for L4 class GPUs today).

So HA ≠ zero downtime.
HA = fast, predictable recovery.

2. GPU Allocation Models (Core Decision)

2.1 PCI Passthrough (Most misunderstood)

What it is

GPU exclusively bound to one VM
Bare-metal-like performance

Truth

❌ No automatic GPU failover
❌ VM restart requires manual GPU rebind
❌ VMotion not supported

When it makes sense

Dedicated inference nodes
Predictable load
App-level restart is acceptable

2.2 NVIDIA vGPU (Licensed, controlled sharing)

What it is

GPU sliced into profiles (e.g., L4-1Q, 2Q, 4Q)
VMware + NVIDIA vGPU license required

Truth

✅ Better utilization
✅ Multiple VMs per GPU
❌ Still no live GPU migration
❌ Requires SAN for VMware HA

When it makes sense

Multiple small/medium workloads
Shared GPU economics
You already accept SAN cost

3. SAN & HBA — The Non-Negotiable Reality

3.1 Why SAN is required for VMware HA

VMware HA needs:

That means:

Shared storage
SAN (FC / iSCSI)
HBA or NIC depending on SAN type

3.2 When HBA card is required

Scenario	HBA Needed?	Why
FC SAN	✅ Yes	Fibre Channel protocol
iSCSI SAN	❌ No (NIC enough)	Ethernet based
No SAN	❌ No	Local disks only

📌 HBA has NOTHING to do with GPU
HBA exists only for shared storage access

4. Scenario-Based Strategy (This is the heart)

Scenario A — Dedicated GPU Nodes, No SAN (Recommended Default)

Architecture

Characteristics

PCI Passthrough
Local disks only
Kubernetes controls restart

Failure Handling

Why this works

GPU workloads are stateless (CV inference)
Startup < 2–5 minutes acceptable
No SAN dependency
Lowest operational complexity

Verdict

✅ BEST for IIoT / CV / Edge inference
✅ Cheapest
✅ Clearest ownership
❌ No VM-level HA (but app HA exists)

Scenario B — VMware HA with SAN (Enterprise Traditional)

Architecture

Failure Handling

Tradeoffs

Aspect	Reality
Cost	High (SAN + HBA + vGPU license)
Complexity	High
GPU failover	Restart only
Performance	Slight overhead

Verdict

⚠️ Only if VM-centric org mandates VMware HA
❌ Overkill for inference workloads

Scenario C — Hybrid: GPU Passthrough + Buffer Node (Smart HA)

Architecture

Failure Handling

Why this is powerful

No SAN
No vGPU license
Operationally predictable
Capacity-based HA, not infra-based HA

📌 This matches industrial HA philosophy (N+1 capacity)

5. N+1 Capacity Rule (Non-Negotiable)

If:

Then:

🚨 If you run all GPUs at 100%, no HA exists
This is physics, not software.

6. Decision Matrix (Print This)

Requirement	Best Choice
Lowest cost	PCI Passthrough
Fastest inference	PCI Passthrough
Shared GPU usage	vGPU
VMware-only ops	vGPU + SAN
Edge / IIoT	No SAN
Predictable restart	Kubernetes
Zero infra drama	Independent nodes

7. Monitoring & Ops (Often Ignored)

Mandatory

NVIDIA DCGM
GPU temperature, memory, power
Node-level alerts

Facilities

UPS sizing for GPU surge
Cooling validation (L4 still draws spikes)

8. Final Recommendation (Clear & Defensible)

Primary Strategy

Independent GPU nodes with PCI Passthrough + Kubernetes HA + N+1 capacity

SAN + HBA only when

VMware HA is mandated
Stateful VMs must survive host loss
Budget + ops maturity exist

vGPU only when

Multiple small workloads must share GPU
License cost justified
SAN already exists

9. One-Line Summary

“We design HA at the application and capacity level, not by forcing GPUs into legacy virtualization models.”

HIGH-LEVEL DECISION FLOW

START

  |

  v

Is GPU workload STATELESS?

(CV inference, analytics, AI inference)

  |

  +-- NO --> (Training / Stateful GPU / Long jobs)

  |            |

  |            v

  |        Use Dedicated GPU + SAN

  |        (vGPU optional)

  |        END

  |

  +-- YES -->

        |

        v

Is ZERO or NEAR-ZERO DOWNTIME required?

(< 1 min, no restart allowed)

        |

        +-- YES -->

        |        |

        |        v

        |   GPU live migration NOT supported

        |   → Redesign application (active-active)

        |   → Use traffic routing, not VMware HA

        |   END

        |

        +-- NO -->

              |

              v

Is VM-LEVEL HA mandatory by Infra policy?

              |

              +-- YES -->

              |        |

              |        v

              |   SAN REQUIRED

              |        |

              |   Is GPU sharing required?

              |        |

              |        +-- YES --> vGPU + SAN + HBA/iSCSI

              |        |

              |        +-- NO  --> PCI Passthrough + SAN

              |        |

              |        END

              |

              +-- NO -->

                    |

                    v

Do you want GPU sharing across workloads?

                    |

                    +-- YES -->

                    |        |

                    |        v

                    |   vGPU needed

                    |   SAN recommended

                    |   (Cost + license justified?)

                    |        |

                    |        +-- YES --> vGPU + SAN

                    |        |

                    |        +-- NO  --> Re-evaluate workload

                    |        END

                    |

                    +-- NO -->

                          |

                          v

Can application restart tolerate 2–5 min?

                          |

                          +-- YES -->

                          |        |

                          |        v

                          |   PCI Passthrough

                          |   Independent GPU nodes

                          |   Kubernetes HA

                          |   N+1 capacity

                          |   NO SAN

                          |   END

                          |

                          +-- NO -->

                                |

                                v

                        Add BUFFER NODE

                        (CPU or spare GPU)

                        Capacity-based HA

                        END

SAN & HBA SUB-DECISION (Embed This Separately)

QUICK DECISION TABLE (For Meetings)

Question	If Answer is YES	If Answer is NO
Stateless workload?	Proceed with app-level HA	SAN-based design
GPU sharing needed?	vGPU	PCI Passthrough
VMware HA mandatory?	SAN required	No SAN
Restart acceptable?	K8s + N+1	Redesign app
Budget constrained?	No SAN	SAN + vGPU

GPU HA & Sharing Strategy

1. Ground Truth

1.1 What HA really means (often confused)

2. GPU Allocation Models (Core Decision)

2.1 PCI Passthrough (Most misunderstood)

2.2 NVIDIA vGPU (Licensed, controlled sharing)

3. SAN & HBA — The Non-Negotiable Reality

3.1 Why SAN is required for VMware HA

3.2 When HBA card is required

4. Scenario-Based Strategy (This is the heart)

Scenario A — Dedicated GPU Nodes, No SAN (Recommended Default)

Architecture

Characteristics

Failure Handling

Why this works

Verdict

Scenario B — VMware HA with SAN (Enterprise Traditional)

Architecture

Failure Handling

Tradeoffs

Verdict

Scenario C — Hybrid: GPU Passthrough + Buffer Node (Smart HA)

Architecture

Failure Handling

Why this is powerful

5. N+1 Capacity Rule (Non-Negotiable)

6. Decision Matrix (Print This)

7. Monitoring & Ops (Often Ignored)

Mandatory

Facilities

8. Final Recommendation (Clear & Defensible)

Primary Strategy

SAN + HBA only when

vGPU only when

9. One-Line Summary

HIGH-LEVEL DECISION FLOW

SAN & HBA SUB-DECISION (Embed This Separately)

QUICK DECISION TABLE (For Meetings)

RECOMMENDED DEFAULT PATH (IIoT / CV / Edge)

Related

Comments

Leave a Reply Cancel reply