Exhibitor Press Releases
Inside the AI Rack: Power, Fabric, and Fiber
Preface: A Personal Note on Complexity, from IBM’s TCM to Today’s AI Racks
Early in my career, I worked for a company that supplied chip‑packaging products and assembly tools to IBM. We were involved in the design and manufacturing processes for the Thermal Conduction Module (TCM) used in the IBM System/3090 mainframes (one of the most advanced semiconductor packaging designs of its era).
The TCM was a remarkable innovation: a densely integrated, liquid‑cooled bipolar logic module at a time when compute requirements were outpacing the capabilities of conventional cooling and interconnect approaches.
Yet its sophistication did not guarantee longevity. When CMOS displaced bipolar logic, the justification for the TCM’s complexity evaporated. A generation of exquisitely engineered packaging became obsolete almost overnight (not because it failed, but because the design constraints changed).
I see a parallel today.
AI racks are becoming extraordinarily complex systems: 100‑kW‑plus thermal envelopes, dense direct‑to‑chip liquid cooling, thousands of copper and fiber terminations, and extremely tight mechanical and routing constraints. These are the TCMs of our era (brilliant engineering pushing against physical limits).
And, as with the TCM, the industry may be nearing a pivot. Analysts expect early CPO adoption to begin in 2026, with meaningful deployments accelerating in 2027–2028, potentially reshaping the balance between copper, pluggable optics, and optical fabrics inside the rack. With optics breaking the 2-meter limit of copper interconnects, it may not be necessary to pack all accelerators, switches and networking into a single rack, and packaging can be simplified.
Some of today’s most intricate rack architectures may prove to be transitional—necessary steps on the way to more optically integrated, less mechanically constrained designs. If history repeats, complexity may peak just before a major architectural simplification.
Dr. Alan Keizer, Senior Technical Advisor, AFL
From Node to Rack to Pod
In the first Market Shifts blog, the discussion focused on why AI adoption is accelerating. Models continue to improve, hardware capabilities are advancing, and costs per token are declining. The outcome is straightforward: people and machines are using AI far more extensively. That demand does not exist as an abstract concept in the cloud. The demand materializes in large data halls filled with racks that consume significant power and move vast amounts of data.
For most of the last decade, the basic unit of design was a server or a small GPU node. Capacity growth came from adding more nodes, expanding racks, and scaling the cluster. In AI data centers, the basic unit is shifting. The rack and the pod are becoming the primary design blocks. A single rack can now contain enough accelerators and networking to resemble a small supercomputer. A pod consists of multiple such racks connected by a shared fabric.
This blog explores that environment in more detail. The focus is on how an AI rack is built today, how racks are interconnected within a pod, and how these architectures affect the fiber plant at layer zero. The aim is to provide a clear mental model for those who design, build, procure, install, or take a broader interest in AI infrastructure.
The Modern AI Rack as a Building Block
A few years ago, a “large” AI system typically consisted of an 8-GPU server. Multiple systems were installed in a rack alongside storage and networking, and a fully populated rack often drew 10 to 20 kilowatts. At the time, this was considered dense. That model is now outdated.
Modern AI racks function as rack-scale systems:
- A current 8-GPU node can draw on the order of 10 kW
- Multiple nodes may share a rack and a local high-speed fabric
- Newer rack platforms integrate dozens of accelerators into a single tightly coupled domain (one leading example from NVIDIA incorporates 72 accelerators per rack and operates at approximately 120 kW of IT load)
From the perspective of an AI model, a rack is no longer a collection of independent GPUs. The accelerators are joined by a high-bandwidth scale-up fabric, allowing the rack to behave as a single, very large processor.
This shift has two important consequences:
Racks and pods as the new building blocks of AI workloads
First, the rack becomes the atomic unit of compute. Capacity planning for training workloads is increasingly expressed in racks and pods rather than individual servers. Design discussions focus on the number of rack-scale domains required, how those domains interconnect, and how power and cooling are delivered at that scale.
Designing for reliability in high-density racks
Second, the rack becomes a significant failure domain. The loss of a 72-accelerator rack represents the loss of a substantial portion of system capacity, not a small fraction of the cluster. This influences redundancy strategies, maintenance planning, and the way power and connectivity are routed to prevent localized faults from cascading across the system.
An AI pod can be viewed as a grid of rack-scale “engines” connected by a high-bandwidth fabric. The remainder of the data hall exists primarily to deliver power, cooling, and fiber to support those engines.
Inside the Rack: Power, Cooling and Networking
When rack power reaches the 100-kW range, air cooling alone becomes insufficient. Most AI racks now rely on some form of liquid cooling:
- Rear-door heat exchangers, where warm air from the rack passes through a liquid-cooled door
- Direct-to-chip cooling, with cold plates on accelerators and a dedicated liquid loop
- Fully direct liquid cooling in the most aggressive AI pods, where nearly all IT heat is removed by liquid rather than air
The shift to full liquid cooling extends beyond the rack, fundamentally changing the facility’s water loop. Cold air supply in the room is no longer required, and water carrying heat from the racks can operate at higher temperatures.
This delivers two major benefits:
- Higher water temperatures improve cooling efficiency, allowing chillers, if used, to operate with better coefficients of performance
- In appropriate climates, active chillers may be unnecessary (warm-water loops can reject heat through dry coolers or adiabatic systems, reducing both energy consumption and plant complexity)
At the rack level, the impact of high-density AI deployments is immediately visible: the rear is crowded.
Manifolds, hoses, valves, and sensors compete for space alongside busways, power distribution units, and additional monitoring or safety hardware. Surfaces are densely packed, with little clearance between components.
Two AI networks interconnect the accelerators: the scale-up network within the rack and the scale-out network linking racks across a cluster. The scale-up network is currently implemented using circuit boards with controlled-impedance stripline traces and twin-ax micro-cables. Both approaches face bandwidth and reach limitations, typically capping 100–200G links at around 2 meters. Optical connections are expected to become necessary for higher-bandwidth, multi-rack scale-up fabrics, bringing fiber inside the rack.
The scale-out network between racks is (and will remain) fiber based. For inter-rack connections, three practical guidelines emerge:
Prioritize front-of-rack access
Patching, testing, and reconfiguring from the front aisle is safer and simpler. Technicians stay clear of the hottest, highest-power areas, and cabling avoids weaving through congested rear infrastructure.
Separate fiber from power and liquid
Fiber should have dedicated trays and pathways. Where proximity to power or liquid is unavoidable, cables should be protected and clearly routed. Ad-hoc mixed runs in dense AI pods increase the risk of damage and slow future maintenance.
Design controlled fiber paths
Cables must be guided along defined routes while remaining accessible, often navigating sharper bends and tighter spaces. This approach favors:
- Bend-insensitive single-mode fiber (e.g., G.657-class)
- Compact patch cords and trunks with robust jackets and strain relief
- Routing hardware that enforces minimum bend radii
In short, high rack power and full liquid cooling influence not only mechanical and thermal design but also how fiber is routed in and out of the rack. Well-planned pathways are essential for fiber routing in high-density racks, requiring strict adherence to standardized procedures during installation.
Across Racks and Rows: Scale-Out Fabrics and Fiber Density
Racks do not operate in isolation. Training large AI models or supporting high-volume inference requires connecting multiple rack-scale domains into a unified fabric. Within an AI pod, this fabric is built from scale-out networking:
- Most operators rely on Ethernet with a lossless transport layer, such as RoCE
- Future optimizations for AI networks are in development through initiatives like UEC, SUE, and ESUN
- We anticipate most large AI clusters will adopt an Ethernet-derived protocol
- Some deployments use InfiniBand, leveraging its low latency and congestion guarantees
The pattern is straightforward. Each accelerator maintains both scale-up and scale-out connections. The scale-out fabric typically has two or three switch layers, with the first layer located in the accelerator rack, mid-row, or at the row’s end. The fabric is organized into multiple planes, with every accelerator connected to each plane.
Port speeds on these links have increased steadily over time:
- 100G at scale beginning around 2015
- 400G for many cloud back-ends
- 800G as the standard for new AI pods
- 1.6T emerging on roadmaps and in early trials
At these speeds, copper links cannot reach far. Almost all inter-rack connections in an AI pod are optical, which drives fiber counts higher. Consider a single 1U leaf switch with 64 800G ports. Each 800G port can support two 400G transceivers, with four parallel lanes of eight fibers each. Using parallel-optic formats, a single port may terminate on 16 fiber cores. Multiply by the number of ports, and a single switch face can have 1,024 fibers.
Scaling up:
- A high-power AI accelerator rack can host hundreds to over a thousand fiber terminations when accounting for uplinks and east-west connections
- A row of 20 racks can carry tens of thousands of fiber cores
- A pod with multiple rows can reach into the hundreds of thousands
Topology choices introduce additional complexity.
Early cloud fabrics often used a simple spine-leaf design. Many AI fabrics continue to use this approach, but there is a gradual shift toward:
- Multiple planes (Ethernet) or rails (InfiniBand) to improve resilience and enable traffic engineering
- Additional “express” links and partial mesh topologies in very large clusters
From the fiber perspective, the exact logical topology is less important than the physical layout. AI pods generate more east-west traffic and diagonal paths across rows, which can quickly turn a neat front-to-back arrangement into a dense, tangled web if not carefully planned.
Structured cabling is becoming the default for AI pods:
- High-fiber-count trunks run between defined distribution points
- Breakout harnesses and cassettes connect trunks to leaf and spine ports
- Instead of hundreds of individual jumpers between racks, a smaller number of large backbone cables reduces on-site connector work and centralizes complexity at patch panels
For architects and installers, the advantages of this approach include reduced connector work in the most difficult areas, more systematic testing of high-count links, and a clear physical structure for future upgrades from 800G to 1.6T and beyond.
Viewed from above, an AI pod is no longer a random forest of cables, but rather a grid of high-power racks joined by planned optical corridors. The design and execution of these corridors determine how easily the fabric can grow and be operated over time.
Optics, Connectors, and the Structured Fiber Plant
The scale‑out fabric in AI pods is overwhelmingly optical. Short in‑rack links may still use DACs or short AOCs but inter‑rack and leaf/spine connections rely on single‑mode fiber (SMF) for reach, bandwidth, and density. This is where fiber counts rise quickly.
Port speeds and form factors
400G remains a dependable workhorse, while new AI pods favor 800G to simplify topologies and reduce congestion. 1.6T ports are appearing on roadmaps and in early trials. Pluggables are typically OSFP (for higher power/thermal headroom) or QSFP‑DD (for ecosystem continuity). Consistent lane and breakout policies across sites help limit SKU proliferation and unexpected power or cooling issues.
High-density connectivity
MPO/MTP connectors handle parallel optics, while compact duplex connectors (e.g., MDC/MMC) are used where space is tight. The goal is to terminate more high-speed ports without consuming additional rack units or proliferating patch panels.
Structured by design
Pre‑terminated trunks, cassettes, and defined meet-points reduce on-site terminations, speed testing, and leave a clear upgrade path from 800G to 1.6T and beyond. Plan for front-of-rack access, separate pathways for fiber vs. power/liquid, and controlled routes that respect minimum bend radius, using G.657‑class bend-insensitive SMF for tighter spaces. Validate polarity, cleanliness, and insertion-loss budgets upfront, and minimize mated pairs on critical runs.
A quick sense of scale
A single leaf switch face can exceed 1,000 fiber cores. A high-power accelerator rack can have hundreds to over 1,000 terminations when uplinks and east-west links are counted.
The Power Delivery Shift
AI racks now routinely draw 50–120 kW and are trending higher, exceeding the limits of traditional low-voltage, AC-centric distribution. Many operators are exploring higher-voltage AC and HVDC topologies to reduce conversion losses, minimize copper usage, limit heat, and simplify pod-scale deployment.
This evolution affects space, routing, and safety assumptions across the facility. Busways, trays, and liquid manifolds compete for overhead and rear-of-rack space. Reliable designs treat power, cooling, and fiber as a coordinated system: fiber pathways align with busway routes, thermal zones and leak protection dictate tray placement, and isolation from high-energy conductors minimizes risk.
Patterns That Age Well (2026 – 2028)
Designs that support smooth upgrades generally follow a few consistent patterns:
- A clear optics strategy with a limited set of pluggables and well-defined breakout policies
- Direct-to-chip liquid cooling with warm-water loops, reducing air-side complexity
- Structured SMF paths using UHFC trunks, defined distribution points, front-of-rack access, and separation from power and liquid
- Bend-insensitive G.657-class fiber, ample tray capacity, and scalable patch panels to support 800G today and 1.6T and up
- Procedural discipline: up-to-date as-builts, labeling, acceptance testing, and inspection/cleaning runbooks
The optical layer becomes an enduring anchor. It must be treated as a managed product: carefully designed, documented, tested, and prepared for incremental upgrades.
Final Thought
AI racks are the building blocks of the AI factory. Individually, they resemble small supercomputers and connect into pod-, hall-, and campus-level mega-systems. As with every generation, today’s solutions may give way to new technologies, just as bipolar mainframe modules were replaced by CMOS designs.
The evolution continues, and the optical layer will anchor the next decade of that transition.
Look for our upcoming final blog in this series, wherein we will explore the impact of advanced technologies and macro environmental factors shaping AI deployments.
Contributors:
Antonio Castano, Global Market Development Director, AFL
Ben Atherton, Technical Writer, AFL
Cloud & AI Infrastructure
DevOps Live
Cloud & Cyber Security Expo
Big Data & AI World
Data Centre World