● Cisco UCS® C885A Rack Servers with NVIDIA HGX™ H200 and Spectrum™-X Ethernet.
● Cisco® Silicon One® NPU-based 8000 Series SONiC Switches.
● Cisco Optics and cables.
● Cisco provisioning, observability and security frameworks.
● C: Number of CPUs in the node.
● G: Number of GPUs in the node.
● N: Number of network adapters (NICs), categorized into:
● B: Average network bandwidth per GPU in gigabits per second (GbE).
B3140H, B3240
QSFP-400G-DR4 with CB-M12-M12-SMF cable
B3220, B3220L
QSFP-200G-SR4 with CB-M12-M12-MMF cable
8122-64EHF-O
OSFP-800G-DR8 with dual CB-M12-M12-SMF cable
8101-32FH-O
QDD-400G-DR4 with CB-M12-M12-SMF cable
QDD-400G-SR8-S with CB-M16-M12-MMF cable
QDD-2Q200-CU3M passive copper cable
QSFP-200G-SR4 with CB-M12-M12-MMF cable
8101-60Z4FH-O
QDD-400G-DR4 with CB-M12-M12-SMF cable
SFP-1G-T-X for 1G with CAT5E cable
SFP-10G-T-X for 10G with CAT6A cable
● East-West Compute Network
● Converged North-South Storage and Management Network
128
1024
4
16
16
8
1024
512
2048
1024
2048
256
2048
8
32
32
16
2048
1024
4096
2048
4096
512
4096
16
64
64
32
4096
2048
8192
4096
8192
1024
8192
32
128
128
64
8192
4096
16384
8192
16384
2048
16384
64
256
256
128
16384
8192
32768
16384
32768
4096
32768
128
512
512
256
32768
16384
65536
32768
65536
● Provides access to high performance storage from compute nodes
● Provides host management related access to compute nodes from Management nodes
● Interconnects with border leaf exit switches to forward traffic in and out of cluster
● Allows interconnecting to additional customer infrastructure such as data lakes, and other nodes for support, monitoring, log collection, etc., that a cloud provider wishes to add.
● The compute nodes are grouped into two SUs to create a half SU-group consisting of 64 nodes. Each half SU-group uses two 8122-64EHF-O front-end leaf switches. A total of eight half SU-groups (four SU-groups) are required to deploy 4K GPUs. Each front-end leaf switch 8122-64EHF-O uses 64 400GE downlinks to the compute nodes and 32 400GE uplinks (8 400GE to each spine switch) to provide greater than 12.5 Gbps target storage bandwidth per H200 GPU.
● On the storage side, networking will be pre-provisioned to deliver at least the amount of bandwidth requirement as mentioned above. Five redundant leaf-switch pairs, each with 64 200GE downlink ports per leaf pair, will be used for a total of 10 8101-32FH-O storage leaf switches and 320 200GE downlink ports. For uplinks, per storage leaf, 16 400GE ports will be used (2 400GE links per spine) for a total of 160 400GE ports. The number of storage nodes to connect to the storage leaf switches will vary based on cloud partners throughput and capacity needs on top of the minimum requirements.
● The management node network consists of two parts:
● Each core-group will consist of a single 128 400GE port 8122-64EHF-O switch (for a total of four switches in four core-groups) to meet the scale of 8K GPUs.
● 128 compute nodes (1K GPUs) are grouped into an SU-group connected to four leaf switches. Two SU-groups (2K GPUs) connect to eight 8122-64EHF-O leaf switches, which in turn connect to four 8122-64EHF-O spine switches. Four such parallel planes each with two SU-groups, are required to deploy 8K GPUs. In this design, each leaf switch with 64 400GE downlink ports connects to each spine switch via 8 400GE ports, and will still allow 12.5 Gbps of target storage bandwidth to each GPU. Each of the four spine switches connect to their respective core-group switches via 16 400GE ports.
● The storage network consists of 10 pairs of 8101-32FH-O leaf switches, each connected to four 8122-64EHF-O spine switches. Each redundant leaf-switch pair has 64 200GE downlink ports and 32 400GE uplink ports (8 to each spine switch) for a total of 20 leaf switches, 640 200GE downlink ports and 320 400GE uplink ports. Each spine switch connects to its corresponding core-group switch via 40 400GE ports. On an aggregate basis, this provides around 64 Tbps of storage bandwith, or 8 Gbps to each GPU.
● The management node network consists of two parts:
● The number of switches within a core-group are doubled to two for a total of eight core switches.
● Eight parallel planes each with two SU-groups of compute nodes (2K GPUs) are used to deploy 16K GPUs. Spine switches in each of these parallel planes connect to their respective core-group switches via 16 400GE links (8 400GE per core switch).
● The storage network consists of 20 pairs of 8101-32FH-O leaf switches, each connected to 8 8122-64EHF-O spine switches. Each redundant leaf-switch pair has 64 200GE downlink ports and 32 400GE uplink ports (4 to each spine switch) for a total of 40 leaf switches, 1280 200GE downlink ports and 640 400GE uplink ports. Each spine switch connects to its corresponding core-group switch via 40 400GE ports. On an aggregate basis, this provides around 128 Tbps of storage bandwith, or 8 Gbps to each GPU.
● The management node network consists of two parts:
● The number of switches within a core-group is doubled to 4 for a total of 16 core switches.
● 16 parallel planes each with two SU-groups of compute nodes (2K GPUs), are used to deploy 32K GPUs. Spine switches in each of these parallel planes connect to their respective core-group switches via 16 400GE links (4 400GE links per core switch).
● The storage network consists of 40 pairs of 8101-32FH-O leaf switches, each connected to 16 8122-64EHF-O spine switches. Each redundant leaf-switch pair has 64 200GE downlink ports and 32 400GE uplink ports (2 to each spine switch) for a total of 80 leaf switches, 2560 200GE downlink ports, and 1280 400GE uplink ports. Each spine switch connects to its corresponding core-group switches via 40 400GE ports. On an aggregate basis, this provides around 256 Tbps of storage bandwith, or 8 Gbps to each GPU.
● The management node network consists of two parts:
UCSC-885A-M8-HC1
Cisco UCS C885A M8 Rack Server with NVIDIA HGX
512
1024
2048
4096
8122-64EHF-O
Cisco 8000 series switch, 64x800Gbps OSFP
184
378
754
1506
8101-32FH-O
Cisco 8000 series switch, 32x400Gbps QSFP-DD
12
26
50
98
8101-60Z4FH-O
Cisco 8000 series switch 60x50G SFP28 4x400G QSFP-DD
36
72
144
288
OSFP-800G-DR8
800G OSFP transceiver, 800GBASE-DR8, SMF dual MPO-12 APC, 500m
11500
23128
46256
92512
QDD-400G-DR4
400G QSFP-DD transceiver, 400GBASE-DR4, MPO-12, 500m parallel
324
640
1264
2528
QSFP-400G-DR4
400G QSFP112 transceiver, 400GBASE-DR4, MPO-12, 500m parallel
5120
10240
20480
40960
QDD-400G-SR8-S
400G QSFP-DD transceiver, 400GBASE-SR8, MPO-16 APC, 100m
184
368
736
1472
QSFP-200G-SR4-S
200G QSFP transceiver, 200GBASE-SR4, MPO-12, 100m
368
736
1472
2944
SFP-1G-T-X
1G SFP
616
1232
2464
4928
SFP-10G-T-X
10G SFP
1128
2256
4512
9024
CB-M12-M12-SMF
MPO-12 cables
14164
28696
57392
114784
CB-M16-M12-MMF
MPO-16 to dual MPO-12 breakout cables
184
368
736
1472
CAT6A
Copper cable for 10G
1128
2256
4512
9024
CAT5E
Copper cable for 1G
616
1232
2464
4928
UCSC-C225-M8N
(storage server)
Cisco UCS C225-M8 1RU Rack Server
80
160
320
640
UCSC-C225-M8N
UCSC-C245-M8SX (management node)
Cisco UCS C225-M8 1RU Rack Server
Cisco UCS C245-M8 2RU Rack Server
24
48
96
192
● Provisioning the compute nodes either via Cisco Intersight or NVIDIA Base Command Manager (BCM) or additional provisioning tools/frameworks.
● Setup Slurm and/or Kubernetes control nodes for orchestrating jobs on worker compute nodes
● Additional infrastructure for observability, monitoring, and logs collections
● Cisco Secure Firewall
● Cisco Isovalent
● Cisco Hypershield
● Cisco AI Defense
Form Factor
8RU Rack Server (Air Cooled)
Compute + Memory
2x 5 Gen AMD EPYC 9575F (400W, 64 core, up to 5GHz)
24x 96GB DDR5 RDIMMs, up to 6,000 MT/S (recommended memory config)
24x 128GB DDR5 RDIMMs, up to 6,000 MT/S (max supported memory config)
Storage
Dual 1 TB M.2 NVMe with RAID support (boot device)
Up to 16 PCIe5 x4 2.5” U.2 1.92 TB NVMe SSD (data cache)
GPUs
8x NVIDIA H200 GPUs (700W each)
Network Cards
8 PCIe x16 HHHL NVIDIA BlueField-3 B3140H East-West NIC
2 PCIe x16 FHHL NVIDIA BlueField-3 B3240 North-South NIC
1 OCP 3.0 X710-T2L for host management
Cooling
16 hot-swappable (N+1) fans for system cooling
Front IO
2 USB 2.0, 1 ID button, 1 power button
Rear IO
1 USB 3.0 A, 1 USB 3.0 C, mDP, 1 ID button, 1 power button, 1 USB 2.0 C, 1 RJ45
Power Supply
6x 54V 3kW MCRPS (4+2 redundancy) and 2x 12V 2.7kW CRPS (1+1 redundancy)
Form Factor
1RU Rack Server (Air Cooled)
Compute + Memory
1x 4 Gen AMD EPYC 9454P (48-cores)
12x 32GB DDR5 RDIMMs 4800MT/s
Storage
Dual 1 TB M.2 SATA SSD with RAID (boot device)
Up to 10x 2.5-inch PCIe Gen4 x4 NVMe PCIe SSDs (each with capacity 1.9 to 15.3 TB) - Optional
Network Cards
1 PCIe x16 FHHL NVIDIA BlueField-3 B3220L configured in DPU mode
Or
1 PCIe x16 FHHL NVIDIA BlueField-3 B3140H configured in DPU mode
1 OCP 3.0 X710-T2L (2 x 10G RJ45) for x86 host management
Cooling
8 hot-swappable (N+1) fans for system cooling
Power Supply
2x 1.2KW MCRPs PSU with N+1 redundancy
BMC
1G RJ45 for host management