[Design Application]
Factors To Consider When Choosing The Right DSP For The Job
DSP Performance Isn't Just About MIPS. Application-Specific Issues Can Strongly Affect A Chip's Performance.
Contributing Author
|
ED Online ID #7615 |
June 8, 1998
The recently-introduced Texas Instruments TMS320C67x and the
well-established Analog Devices ADSP-2106x SHARC processors are the two
highest-performance, floating-point DSPs on the market today.3
Which of these two processors provides the highest system performance?
As we shall see, the answer really depends on the kind of task you're
trying to perform. Keep in mind that Analog Devices (ADI) will be releasing
their next generation SHARC processors, and Texas Instruments (TI) has
an aggressive plan to increase the speed of the 'C67x range.
System engineers must select the device that provides the most effective
solution to meet the requirements of their DSP application. While the
obvious step is to compare the raw processing power of the two processors,
this comparison will give little indication of expected system performance,
especially in highly demanding multiprocessing applications.
Choosing the most suitable DSP platform, from a systems perspective,
requires an analysis of many aspects of the application. First, the I/O
data rates and channel density must be reviewed to determine the bandwidth
in and out of the system.
The next step involves the mapping of DSP algorithms to DSP devices.
This may be complex, and requires an understanding of I/O data paths,
memory management, interprocessor communication capability, and synchronization
mechanisms. While the resolution of these issues determines the best technical
solution, other factors also require consideration. For example, time-to-market
is influenced by the availability of third-party library support, and
the characteristics of the development tools accompanying each processor.
A comparison of the two components logically begins with an analysis
of the features of each device. Rather than a comprehensive feature list,
this section summarizes the features that differentiate the performance
of each (see the table ) . Full specifications are available
in the data sheets provided by each vendor. As a detailed specification
was not available for the 'C67x at the time of this writing, some parameters
(e.g. power consumption) are not addressed here.
From the table, it is clear that the 'C6701 outperforms the 21060 in
single-processor, low- and medium-bandwidth configurations. Using a conservative
estimate of the sustained computational capacity of the 'C6701, its raw
performance exceeds the 21060 by more than five to one.
However, the 21060, although less powerful, has other distinct advantages.
Applications requiring large internal memory resources, either program
or data, benefit from a configurable internal memory that is four times
that of the 'C6701. In addition, multiprocessing applications can take
advantage of the efficient native multiprocessing support of the 21060
processor. Finally, the 21060 has a higher cumulative I/O bandwidth than
the 'C6701.
Of course, the 'C6701 has substantial I/O bandwidth and, with the assistance
of external hardware, it may also be used effectively in multiprocessing
architectures. This is investigated in the multiprocessing section.
Local Memory Support Is Key It is clear that the SHARC gains the upper hand when it comes to internal
memory capacity. However, it is rare that an entire application and its
associated data can be accommodated in internal memory for either of these
devices. It is, therefore, worth investigating the external memory options
available in each case--and considering the performance.
High-Performance Memory There are many instances where the algorithm developer needs high- performance
external memory, but in some circumstances, it is critical to the application.
For example, high performance is required when code must be executed directly
from external memory, and when critical variables (e.g. filter tap coefficients)
are stored externally due to a lack of internal resources. Both the SHARC
and the 'C67x support high-performance external memory.
A SHARC processor is easily interfaced to asynchronous SRAM (ASRAM),
accessible in a single 25-ns clock cycle. Of course, ASRAM is both expensive
and low in density, with a practical maximum capacity of 512 k-by-32 per
cluster in most commercial-off-the-shelf (COTS) implementations.
The 'C67x directly supports SBSRAM, SDRAM, and ASRAM as high-performance
resources. This memory is currently available at 133 MHz, supporting an
access every two 6-ns clock cycles of the DSP. It will likely be available
at 166 MHz by the time the DSP is shipping, allowing for single-cycle
access. The pipeline delay of SBSRAM should be taken into account in throughput
considerations, as it is another three cycles for each first access. The
consequence here is that critical sections of code must be run from internal
DSP memory as the memory will require more than 8 clock cycles to load
a single 256-bit instruction from any external memory. As with ASRAM,
SBSRAM is expensive and low in density, with a typical allocation of approximately
128k by 32 per DSP in COTS 'C6x boards.
In summary, the SBSRAM interface of the 'C67x gives it a major performance
advantage when accessing external memory, four times the throughput of
a SHARC accessing ASRAM. However, this can only be realized for multiple
consecutive external accesses where the pipeline delay becomes negligible.
Furthermore, in cases where consecutive instructions must be accessed
from external memory, the theoretical performance of the 'C67x can be
reduced from 1328 to 166 MIPS. The SHARC sustains its 40-MIP rate whether
it executes from internal or external memory.
High-Density Memory Support In data-driven applications (e.g. imaging and radar), the DSP requires
high-density memory for temporary storage of data. Usually, memory access
is sequential due to the correlated nature of the data.
With the addition of some external logic, the SHARC can be interfaced
to low-cost, bulk DRAM, with one or two 25-ns wait states. It is fairly
typical to find COTS configurations with 64 Mbytes or more of DRAM per
cluster. The 'C67x, on the other hand, supports a glue-less connection
to SDRAM.
As with SBSRAM, there is a pipeline latency of three cycles, but sequential
accesses take two 6-ns clock cycles. Paging and refresh delays also need
to be considered as these will result in non-deterministic delays of ten
cycles or more. In spite of this, SDRAM clearly has an advantage over
DRAM when making sequential accesses to large sets of data.
<-- prev. page
[1]
2
3
4
next page -->