Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Embedded in Electronic Design]
Intel/ADI DSP Delivers 600M 16-Bit MACs, Low Power

Ray Weiss  |   ED Online ID #3844  |   July 9, 2001


Intel/Analog Devices' joint DSP design has crafted a new flexible ISA architecture. Analog Devices' (ADI's) BlackFin implementation delivers a 300-MHz, 16-bit DSP that supports dual MAC execution and low-power operation.

This is the latest of the fourth-generation DSPs that have emerged to power today's network, Internet-driven applications. Its competitors include ADI's TigerSHARC, Agere/Motorola's StarCore, and Texas Instruments' C6x. The new Micro Signal Architecture (MSA) will give them a run for their money. The 16-bitter can scale to 1 GHz and beyond.

A 16-bit, dual-MAC DSP architecture, MSA builds on ADI's high-performance VLIW, SIMD DSP architectures, and on Intel's memory management, power management, performance monitoring, and SIMDs. The resultant DSP implements dynamic power reduction, a memory management unit (MMU), and performance monitoring.

MSA delivers an innovative multi-instruction ISA that supports high-density 16-bit instructions, 32-bit immediate instructions, and 64-bit DSP (packed) instructions. It can execute two 16-bit MACs, two 32/40-bit arithmetic, a 32/40-bit shift or rotate, or four 8-bit video instructions per pipelined cycle.

This fourth-generation DSP targets high-performance, midrange 16-bit DSP applications. It packs enough memory on-chip—308 kbytes—for many tasks. In addition, the MSA supports low-power operation for portables and Internet appliances. Under software control, the core voltage and clock rates can be varied to cut power.

A balanced architecture, MSA supports both high code density and a simplified ISA. Listed are the keys to MSA's flexible instruction design:

  • Load/store architecture:   work from registers
  • 16-bit basic instruction:   high code density
  • Extended 32-bit instruction:   large immediates
  • Combined instructions:   multi-issue instructions

The DSP was designed around 16-bit instructions for high code density, and most control instructions are 16-bitters. But for operations that need larger immediate values or more fields, the ISA was extended to a 32-bit instruction.

For DSP operations, the designers added a 64-bit multi-issue instruction. A composite, this instruction is made up of two 16-bit instructions and a 32-bit instruction. This combination can specify complex DSP operations with two data loads, but it only takes one instruction fetch. Even better, it can make use of the same decode logic already implemented for the standard 16-bit and 32-bit instructions. The decoder takes in a 64-bit wide pluck and can issue one, two, or three instructions per cycle.

For speed, the DSP is pipelined with eight stages. Two stages execute the dual MACs that feed into dual 40-bit accumulators. The pipeline can start a dual-MAC instruction every cycle, delivering apparent dual-MAC executions per cycle.

This DSP core breaks down into separate addressing and execution sections. The addressing section incorporates dual data addressing generators (DAGs), supported by a pointer register file of eight 32-bit registers and an addressing register file. The latter has four entries. Each entry contains a set of four 32-bit registers—for indexing, modification, length, and base address. These four entries support four addressing contexts, minimizing interrupt context saves. The execution section consists of two 16- by 16-bit multipliers, two 32/40-bit ALUs, quad 8-bit video ALUs, a 40-bit barrel register, and dual 40-bit accumulators.

This is a load/store architecture. The next set of operands for the dual-MAC operations are fetched as two 32-bit words from the L1 memory (D cache, scratchpad RAM) and loaded into 32-bit data registers. These furnish the next X and Y values to the DSP execution units on the next cycle. For dual MACs, the 16-bit operands are grouped in 32-bit sets—two X and two Y 16-bit values.

Also, for higher processing bandwidth, the hardware performs SIMD operations—i.e., the same operation passed through the four 8-bit video ALUs. This tactic speeds up video pixel processing by four times. The ALUs also accomplish dual 16-bit ALU or 32-bit ALU operations and shifts.

On-chip memory has two levels or stages. Level one interfaces the CPU. It has a 16-kbyte instruction cache, 32-kbyte data cache, and 4-kbyte scratchpad SRAM. These memories have a two-cycle access. They can load two 32-bit data words and one instruction to the core per clock cycle. The second level of larger SRAM functions as a unified memory (I and D). The L1 caches can be configured as SRAM, or mixed cache and SRAM. They also support cache locking.

To speed accesses, the hardware supports relaxed ordering between Loads and Stores. Loads can take precedence. Also, there are two write queues from the CPU to L1 memory and from L1 memory to the system interface. Addressing is byte and word level.

Designed for C/C++ coding, the ISA supports two software stacks (user, system), held in the scratchpad RAM for fast access. Plus, unlike many DSPs, the MSA supports I/D MMUs for memory protection. It supports emulation, system, and user execution modes. For coding simplicity, the assembler implements an algebraic notation.


<-- prev. page     [1] 2     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • 2008 BEST Electronic Design Winners
  • In EDA, A Year Of Mergers, Failed And Otherwise
  • 2008 BEST Electronic Design Winners
  • Engineers Rely On Internet For Product Info
  • Rochester Electronics Establishes New Design and Technology Group
  • November 17, 2008
  • Custom Sources Light Way To 22-nm IC Lithography
  • Software Turns Scopes Into Vector RF Signal Analyzers
    1) Switch-Mode ICs Promote Efficient Power Management, Part 1: Switch-Mode Fundamentals
    (1924 views today)
    2) Ubuntu Goes Embedded
    (462 views today)
    3) Parts Add Up To Home Theater PC
    (337 views today)
    4) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (221 views today)
    5) The ABCs Of Fiber Bragg Gratings
    (198 views today)
    ALL TOP 20



    Reader Comments

    Please tell me kindly how I will be able to store 16-bit data in an 8-bit register.

    Pratap -October 13, 2004

    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources