Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Product Innovation]
Quad 64-Bit Multiprocessor Targets Comm Applications
Construct a super computer or super switch using chip’s advanced HyperTransport/SPI-4 links.

William Wong  |   ED Online ID #1968  |   October 14, 2002


Getting data to and from a processor quickly is key to high-performance network processing. Broadcom's new BCM1400 multiprocesssor tackles this problem with a trio of flexible advanced HyperTransport/SPI-4 Phase 2 links. Of course, packing four 64-bit MIPS processors into the same package didn't hurt either. The result is a chip that provides multiprocessing support alone or in an array of HyperTransport linked chips.

The BCM1400 targets communication-oriented applications that need significant computational support, like Internet service routers and switches with deep content switching and differentiated services such as quality-of-service (QoS) and virtual private networks (VPNs). In addition, the BCM1400 addresses Internet-Protocol (IP) servers and subscriber-management platforms, servers supporting high computational re- quirements for scientific or Enterprise Java environments, and wireless infrastructure equipment. The multiprocessing architecture also makes it suitable for scientific and embedded applications requiring significant computational capabilities.

The chip contains a number of peripherals along with its sophisticated memory and communication support (see the table). Up to eight chips can be connected via the HyperTransport links, for a 32-processor symmetrical multiprocessing (SMP) system (see "Multifunctional HyperTransport," p. 48).

Differentiating the BCM1400 SMP support from most small-scale SMP systems with two to eight processors is its use of a nonuniform memory access (NUMA) architecture. This is similar to the NUMA used with AMD's new Opteron 64-bit CPU. The NUMA architecture is often used by medium-scale microprocessor systems with eight to 32 processors. Broadcom's solution is unusual because of its high integration, low power consumption, and multiplexing of memory and I/O traffic on the same link.

In a conventional SMP system, all processors have the same memory access time. A bus or switch acts as an interface between processors and the memory subsystem. Cache coherence is maintained by monitoring the bus or the switch traffic.

With NUMA, the memory address space is made up of the combined local memory from each node in the system. A processor can access its local memory faster than nonlocal memory. NUMA systems have the advantage of being easily expanded, while adding a processor to a conventional SMP shared memory architecture is more difficult because an additional port is needed.

Broadcom uses a cache-coherent form of NUMA, or ccNUMA. This allows on-chip caches to remain up to date even while data moves through the processor/memory interconnect. The BCM-1400's on-chip double-data-rate (DDR) memory controller supports the chip's local, off-chip memory. Its HyperTransport links provide ccNUMA support.

Three-Way HyperTransport/SPI-4 Links: The BCM1400's triple HyperTransport link architecture is critical to its use in communication and multichip multiprocessing support (see the figure). Each link can be configured as an 8- or 16-bit HyperTransport connection, or as a streaming SPI-4 interface. The SPI-4 support includes hardware hash and route acceleration functions.

In addition, the HyperTransport links work with a mix of HyperTransport transactions, including encapsulated SPI-4 packets and nonlocal NUMA memory access.

The key is that hardware handles movement of in-formation. For ex-ample, nonlocal memory accesses are determined by the memory mapping hardware that generates a HyperTransport request for reads or writes. These packets are automatically routed to the proper node that handles memory requests via its local memory. Operating systems simply set up the memory maps and HyperTransport links.

Although ccNUMA incurs an access-time penalty, the effects of using nonlocal memory are mitigated by on-chip caches and the HyperTransport transfers that occur at high speeds. So there's an initial delay when filling a cache entry. But subsequent memory accesses by a processor happen at faster cache speeds than even local memory accesses.

Code prefetching effectively masks the latency of the system. A large 1-Mbyte, level 2 cache per BCM1400 means that only small, random, nonlocal memory accesses will cause any significant slowdown. Moving large amounts of sequential memory via nonlocal memory isn't a problem as only the transfer initiation incurs a latency penalty—a small fraction of the time necessary to send the block of data. The 64-kbyte level 1 cache per processor is split between a 32-kbyte instruction and 32-kbyte data cache.


<-- prev. page     [1] 2     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Rochester Electronics Establishes New Design and Technology Group
  • Custom Sources Light Way To 22-nm IC Lithography
  • In EDA, A Year Of Mergers, Failed And Otherwise
  • Software Turns Scopes Into Vector RF Signal Analyzers
  • Couple’s $15 Million Gift Advances Rice Engineering Education
  • November 7, 2008
  • Startup Sets Sail For Speedier Spice Simulation
  • Electronic Design Update: October 29, 2008
    1) Ultracapacitors Branch Out Into Wider Markets
    (298 views today)
    2) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (290 views today)
    3) Easily Convert Decimal Numbers To Their Binary And BCD Formats
    (189 views today)
    4) Rotating LED Array Emulates Marquee-Type Display
    (149 views today)
    5) Chevy Volt Takes Charge In New York City
    (118 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources