[Product Innovation]
Chip Set Creates High-Speed, Fail-Safe Switch Fabrics
Distributed queuing and a highly integrated control architecture let this 160-Gbit/s single-stage switch-fabric chip set remove data bottlenecks.
As more data moves around and between various networks, data-switching speeds need to increase. But higher speed isn't the only requirement because networks now carry audio, video, and other time-dependent information. So, quality of service (QoS) has become a key issue in current and future systems to ensure the delivery of time-dependent packets without delays. Additionally, networks have permeated all levels of business and industry, and disruptions in service can cause both economic and professional damage. Therefore, future switching systems must also incorporate fail-safe redundancy capabilities.
All issues of high-data bandwidth, QoS, and redundancy can be addressed by using a plethora of components that result in expensive, power-hungry, and rack-filling systems. Able to tackle all of the same issues simultaneously, engineers at Vitesse Semiconductor have developed TeraStream. This is a synchronous, switch-fabric chip set that lets designers craft network-protocol-independent Layer-1 switch fabrics with the lowest cost per 10-Gbit port commercially availableless than $1270 per port for a fully redundant system.
The chips allow the creation of fail-safe switch fabrics with a scalable user bandwidth of up to 160 Gbits/s. Moreover, the company has defined a future roadmap that will lead to switch bandwidths of 320 to 640 Gbits/s.
Before releasing the TeraStream chip set, Vitesse had released the less feature-rich GigaStream and CrossStream switch fabrics. TeraStream promises to greatly reduce system power, size, complexity, and cost, while adding features such as queuing, QoS, and redundancy to satisfy future system demands. To accomplish that goal, the chip set consists of two key components, the VSC871 Queuing Engine, and the VSC881 packet exchange matrix (PEM). Both chips are fabricated in CMOS using 0.18-µm design rules and operate from 2.5- and 1.8-V power supplies.
The high operating speeds of the chips will require some careful cooling considerations, however. On average, the switch fabric will consume about 1.4 W per gigabit/s of bandwidth. For a 160-Gbit/s system, that translates into about 224 W, not counting the network interface circuits and the network processor or other control subsystem. This power level is still well below the power that alternative system solutions would consume.
In a system implementation, the queuing engine chip resides on each line card in a switching system, while multiple instances of the PEM chip form the actual switch-matrix fabric (Fig. 1). A 16-port OC-192 switch fabric might include 16 line cards, each with a queuing engine, and a card or two containing the switch matrix, which is composed of multiple PEM chips.
The queuing engine chip resides on each line card. On one side of the chip it communicates using an industry-standard CSIX-compatible interface to the network processor or some other traffic manager that also resides on the card. The CSIX interface on each VSC871 can be organized as four OC-48-capable 32-bit ports, or as one OC-192-capable 128-bit port. Contained in the VSC871 are data flow paths for both unicast and multicast data streams, enabling the chip to perform one-to-one or one-to-many data transfers.
Unlike previous multicast approaches, which required the line card to repeatedly replicate the data packets for transfer to the switch matrix, the new chip set only needs to send one set of data to the switch matrix. Circuitry on the PEM chip will then replicate the data for multicast operations. That greatly reduces data traffic on the serial backplane and decouples ingress traffic from egress congestion.
For unicast traffic, the chip creates a set of virtual-output queues that minimize head-of-line blocking at the switch-fabric input ports. A total of 64 virtual-output queuing planes are implemented. The PEM can address 32 of them. Plus, a total of 16 per-class queues are implemented for both unicast and multicast traffic. These queues can be arbitrarily divided into a mix of strict priority queues and weighted round robin queues, allowing the designer to set the desired QoS level for each stream.
A key characteristic of the VSC881 matrix is that it implements crosspoint queues for unicast traffic and input queues for multicast traffic. Therefore, the total number of unicast crosspoints is 1024. But because every crosspoint implements two separate queue priorities, the grand total of unicast queues is 2048. For the multicast queues, two separate queue priorities are implemented per input. That translates into 64 available multicast queues.