# THE INSTITUTE FOR SYSTEMS RESEARCH

**ISR TECHNICAL REPORT 2015-05** 

# Digital Signal Processors: A brief summary

Aparna Kotha





ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical, heterogeneous and dynamic problems of engineering technology and systems for industry and government.

ISR is a permanent institute of the University of Maryland, within the A. James Clark School of Engineering. It is a graduated National Science Foundation Engineering Research Center.

www.isr.umd.edu

# Digital Signal Processors: A brief summary

Aparna Kotha Graduate Student, University of Maryland, College Park www.ece.umd.edu/~akotha, akotha@umd.edu

1 May 2008

## 1 Abstract

Most consumer electronics perform specific Digital Signal Processing computations. They require the system to consume low power, have a real-time response and provide high I/O performance. These needs of the consumer electronics market have driven the development of programmable Digital Signal Processors (DSPs) with architectural features that increase performance and reduce power for specific applications over general purpose processors. Companies such as Texas Instruments, Agere Systems, Analog Devices and Motorola have developed a wide variety of programmable DSPs to cater to these needs of the ever growing market.

This report is an attempt to summarize the different features of the in-numerous DSP architectures available in the market today, broadly classifying them as Enhanced Harvard , Very Long Instruction Word (VLIW) and Parallel / Multi-core.

## 2 Introduction

A Digital Signal Processor (DSP) is a specialized microprocessor designed specifically for a digital signal processing application performing real time computing [10].

The typical features that characterize a DSP include real time computing, high performance with streaming data, low power consumption and single cycle execution of many complex numerical computations. Some of the architectural features that help realize these features are the presence of separate data and program memo-

ries, special instructions to speed up computation, direct memory access support, Analog to Digital and Digital to Analog converters, floating point unit integrated directly into the data path, pipelined architectures, highly parallel accumulators and multipliers, special looping hardware that does low or zero overhead looping, memory address calculation unit to name a few.

A wide variety of DSPs exist in the market today providing different performance and features, each suited for a different task. The price of DSPs can range from about USD \$1.5 to USD \$300

## 3 DSPs versus ASICs

Many products that are implemented using programmable Digital signal Processors (DSPs) can be implemented using Application Specific Integrated Circuits (ASICs). ASICs provide more efficient hardware utilization because they include only what the end system designer needs, but we prefer DSPs for many applications as they have the following advantages [9]

• Programmability DSPs provide a fully debugged hardware platform along with development tools such as compilers, profilers and high-level language debugger. Hence, they can be programmed easily and don't have a design cycle as long as that of the ASICs. Also if any bug is discovered at a later stage of design a software patch can be provided to fix it, where as this flexibility is not possible for ASICs. Various DSPs

can easily be reconfigured to execute different functions.

- Low Power Consumption Many DSP applications are used in hand held mobile devices where low power consumption is an essential need. Power minimization techniques have been efficiently implemented in many DSPs. DSP manufacturers provide power down modes and sleep modes. With the advancements in Integrated Circuit (IC) technology, the supply voltages are decreasing thereby reducing the power consumption.
- High Performance and I/O Many signal processing applications need very high million instructions per second (MIPS) rate. As DSPs are designed to perform computation intensive digital signal processing algorithms they provide high performance. The development of parallel, VLIW and enhanced harvard DSP architectures boost the performance too. Hence we are able to sustain the performance needs of consumer electronics in DSPs.
- Reduced Cost History shows that the DSP unit pricing falls about two orders of magnitude every decade.
- Reduced Form Factor The form factor of DSPs is reduced in two ways. First, development of customizable DSPs reduces system component count. Second, innovative packaging techniques help reduce the size of DSPs.
- On-chip Memory With the advancement in IC technology the size of on-chip memory is continuously increasing. More memory on-chip increases the processing power of DSPs. Also as the on-chip memory is Static RAMs, the power dissipation is a function of the number of accesses to it only. Increasing on-chip memory does not increase power dissipation but helps increase processing power. Increase in processing power favors integration of number of signal processing applications on one chip thereby sat-

isfying some of the needs of consumer electronic market.

Also it is to note that improvements in IC technology help all the above factors.

# 4 Applications

Programmable DSPs are predominantly used in consumer electronics. A broad range of their applications include [7]:

- Audio applications such as digital hearing aids, digital radio, mp3 players/recorders and portable media players.
- Communication and Telecom applications such as modems, routers, bluetooth headsets, cell phones, IP phones, power line communications and servers.
- Computer Peripherals such as fingerprint biometrics, USB phones, USB speakers and Wireless LAN Cards.
- Consumer Electronics such as DVD players, digital set-top-box, digital still cameras, digital video recorders, microwave ovens, notebook PCs, personal digital assistant, refrigerators, LCD and digital televisions, portable DVD players and washing machines.
- Industrial Applications such as meteorological equipment, navigation devices such as global positioning system (GPS) and various temperature sensors.
- Medical equipment such as blood pressure monitors, electrocardiograms (ECG), magnetic resonance imaging (MRI), portable blood gas analyzer and ultrasound system
- Security Systems such as surveillance cameras and smoke detectors
- Space and Defense systems such as radar/sonar and military imaging.

### 5 DSPs in the 1980's

In the 1980's DSPs were characterized by the time they took to perform the Multiply Accumulate operation [8]. They were able to perform real time voice but could not process real time video. Compiler tools for these processors were being developed and were not as powerful as present day.

The next two subsections describe briefly the features present in two DSPs that were used extensively two decades ago.

#### 5.1 TMS320C25

This is one of the early DSPs launched by Texas Instruments in the 1980's [6]. It had an instruction cycle time of 80ns. It had 544 words on-chip data RAM, 4K words on-chip program ROM, 128K words of data/program space, 32 bit ALU/Accumulator, 16X16 bit multiplier with 32 bit product, support for repeat instructions, serial port for direct codec interface, synchronization input for synchronous multiprocessor configurations, wait states for communication to slow off chip memories, on-chip timer for control operations, single 5V power supply using Complementary Metal-oxide-semiconductor (CMC technology. The block diagram of the same shown in figure 1.



**Figure 1:** TMS320C25

#### 5.2 DSP56001

This one is a member of Motorola's family of HCMOS, low-power, general purpose DSPs launched in the 1980's [1]. It featured 512 words of full speed on-chip RAM (PRAM) memory, two 256 word data RAMs, two preprogrammed data ROMs and special on-chip bootstrap hardware to perform convenient loading of user programs into the program RAM. The core of the processor has three 24-bit execution units operating in parallel, the data ALU, the address generation unit, and the program controller. It has MCU style on-chip peripherals, program and data memory, as well as memory expansion port. It featured microprocessors (MPU) style programming model and no-overhead DO instruction and REPEAT instructions.

It achieved 16.5 MIPS at 33MHz clock and was suited for communication , high speed control , numeric processing, computer and audio applications. The WAIT instruction could be used to shut off certain parts of the central processor and the STOP instruction halted the internal oscillator. These instructions helped to achieve low power. The block diagram of the architecture is shown in Figure 2



Figure 2: DSP56001

Both these processors are not in production anymore.

## 6 Case Studies of Present DSP

From the early DSPs of 1980's DSPs have developed to be almost like mini micro-processors. The sections below present case studies of three DSPs developed recently.

# 6.1 Very Long Instruction Word (VLIW)-TMS320C6424

This is one of TI's highest performing fixed-point VLIW architecture in 2007 [3]. Some of its features include the following.

- $\bullet$  ~2ns instruction cycle time , 400-600 MHz clock rate, 3200-4800 MIPS
- Eight 32-bit instructions per cycle
- Eight highly independent functional units, six ALUs (32-/40 Bit), each supports single 32 bit, dual 16-bit or quad 8-bit arithmetic per clock cycle. Two multipliers support four 16X16 bit multipliers (32-bit results) per clock cycle or eight 8X8 bit multipliers (16 bit results) per clock cycle.
- Load store architecture with non-aligned support
- 64 32-bit general purpose registers
- Instruction packing reduces code size
- All instructions are conditional
- Enhanced features include protected mode operation, exceptions support for error detection and program redirection and hardware support for modulo loop operation
- Instruction set features include compact 16-bit instructions , instructions to support complex multiplications , byte addresability, 8-bit overflow protection, bit field extract , set , clear.
- It has a L1/L2 memory architecture. 32KB L1 program RAM/cache, 80KB L1 data RAM/cache and 128KB L2 unified mapped RAM/Cache.

- Supports both Little Endian and Big Endian
- 32-bit DDR2 SDRAM memory controller, supports upto 333MHz (data rate) bus and interfaces to DDR2-400 SDRAM
- Asynchronous 16-bit wide external memory interfaces with upto 128MB address reach and flash memory interfaces
- Enhanced Direct Memory Access (DMA) controller (64 independent channels)
- Two 64-bit general purpose timers (Each configurable as two 32-bit timers)
- 64-bit watch dog timer
- JTAG, Ethernet MAC, UARTs, telecom interfaces, 16-bit host port interface, peripheral component interconnect (PCI)
- On-chip ROM bootloader
- Individual power savings mode

The functional block diagram of the same is shown in Figure 3



Figure 3: TMS320C6424

Applications of this DSP include Telecom, Audio and Industrial Applications.



- L Local port
- G Global port
  I Instruction port
- C/D Crossbar/data port

**Figure 4:** SMJ320C80

## 6.2 Parallel / Multi-core DSPs SMJ320C80

SMJ320C80 is a Multi-core DSP built by Texas Instruments in Late 1990's [2]. The features of this Multicore Digital Signal Processor include the following.

- Single-Chip Parallel Multiple Instruction/Multiple Data (MIMD) Digital Signal Processor.
- More than 2 billion RISC Equivalent operations per second.
- The master processor is a 32-bit reduced instruction set computing processor having IEEE-754 floating point capability. It has 4KB instruction cache and 4KB data cache
- The four parallel processors have 32-bit advanced DSPs, 64-bit opcode provides many parallel operations per cycle. Each of the parallel processors has 2KB instruction cache and 8KB data RAM.

- The Transfer Controller present is capable of doing 64-bit data transfers, at up to 400 Megabytes per second (Mbps) transfer rate. It supports 32-bit addressing with direct DRAM/VRAM interface
- The Video Controller present provides video timing and video random access memory (VRAM) control. It has a dual-frame timers for two simultaneous image capture and / or displays.
- It supports both big endian and little endian operations
- 50KB on-chip RAM
- 4GB address space
- 20ns instruction cycle time
- 3.3V operation
- IEEE standard 1149.1 test access port (JTAG).

The functional block diagram for the same is shown in Figure 4

## 6.3 Enhanced Harvard TMS320VC5510

The TMS320C5X series of DSPs developed by Texas Instruments belong to the enhanced harvard architecture [4] family. Some of its features include the following.

- High performance and low power fixed point DSP
- 5-6.25 ns instruction cycle time, 160-200 MHz clock rate
- One or two instructions executed per cycle
- Dual multipliers that can give up to 400 million multiply accumulates per second
- Two arithmetic / logic units
- One internal program bus
- Three internal data / operand read buses
- Two internal data / operand write buses
- 24KB instruction cache
- 160K X 16-bit on-chip RAM composed of eight blocks of 4K X 16-bit dual access RAM and 32 blocks of 4K X 16-bit single access RAM
- 16K X 16-bit maximum addressable external memory space
- 32-bit external memory interface with glue-less interface to asynchronous static RAM, asynchronous EPROM, synchronous DRAM, synchronous burst SRAM
- Programmable low-power control of six device functional domains
- On-chip peripherals include two 20-bit timers, six-channel direct memory access controller, three multichannel buffered serial ports, 16-bit parallel enhanced host port interface, programmable digital phaselocked loop (DPLL) clock generator and eight general-purpose I/O (GPIO) pins dedicated general purpose outputs

- On-chip scan based emulation logic
- IEEE std 1149.1 (JTAG) boundary scan logic
- 3.3V I/O supply voltage
- 1.6V core supply voltage

The functional block diagram of this DSP is shown in Figure 5

#### 6.4 Observations

Digital Signal Processors have come a long way from the time they have been first designed in the 1980's. We can see the advances broadly categorized below.

- Processing power: From 80ns instruction cycle and 15 MIPS to close to 2ns instruction cycle and 2 billion MIPS, the industry has progressed a long way.
- Broader Application: In the 1980's these processors were mainly used for audio applications, but today we can find the DSPs in almost every embedded application, ranging from military applications to medical imaging.
- Functional Units: Early 80's had one accumulator and one multiplier / ALU on one DSP core. Today's DSP's integrate several functional units on one chip.
- Architectural Features: Most of the early DSPs followed harvard architecture. Today's DSPs have features, such as pipelining, multi-cores and VLIW execution.
- Interfaces: Early DSPs provided only serial interfaces. Today's DSPs provide many more interfaces.



Figure 5: TMS320C6424



Figure 6: Block diagram of the audio system

# 7 Application Player/Recorder

MP3 8 Conclusions

This section presents description of a system that uses a DSP. The application that is describes is the MP3 player and recorder by Texas Instruments [5]. The block diagram of this system is shown in Figure 6.

The TMS320VC5X (Enhanced Harvard) can be used in this system. It performs audio/encode functions, executes post-processing algorithms like equalizing and bass management and system related tasks like file management and the user interface control. The memory stores executing code and data/parameters. The peripheral interface allow users to control I/Os and display. The audio CODEC interfaces with the phone lines, audio input, microphone, headphone and speaker for digitizing the audio in the DSP. The power unit converts the battery power to run various functional blocks. Optional interfaces like XM connect and play and NTSC Enc TV out and FM tuner can be provided.

DSPs differ from microprocessors in a number of ways. Microprocessors are typically built for a range of general purpose functions, and normally run large blocks of software, such as operating systems like Windows or UNIX. Although today's microprocessors, including the popular and well-known Pentium family, are extremely fast-as fast or faster than some DSPs-they are still not often called upon to perform real-time computation or signal processing. Usually, their bulk processing power is directed more at handling many tasks at once, and controlling huge amounts of memory and data, and controlling a wide variety of computer peripherals. Microprocessors such as Pentiums are notorious for their size, cost, and power consumption to achieve their muscular performance, whereas DSPs are more dedicated, performing a smaller range of functions at lightning speed, yet less costly and requiring much less size and power consumption to achieve their purpose.

# References

- [1] Digital signal processor. http://ppd.fnal.gov/experiments/e907/TPC/DAQ/DSP56001.pdf.
- [2] Digital signal processor. http://focus.ti.com/lit/ds/symlink/smj320c80.pdf.
- [3] Fixed-point digital signal processor. http://focus.ti.com/lit/ds/symlink/tms320c6424.pdf.
- [4] Fixed point digital signal processor. http://focus.ti.com/lit/ds/symlink/tms320vc5510a.pdf.
- [5] Mp3 player/recorder (portable audio). http://focus.ti.com/docs/solution/folders/print/12.html.
- [6] Second generation digital signal processor. http://focus.ti.com/lit/ds/symlink/tms320c25.pdf.
- [7] Ti. www.ti.com.
- [8] E.A. Lee. Programmable dsp architectures. i. ASSP Magazine, IEEE, 5(4):4–19, Oct 1988.
- [9] N. Seshan, G. Frantz, and K.-S. Lin. The advantages of digital signal processors in pcn. Consumer Electronics, IEEE Transactions on, 38(3):410–416, Aug 1992.
- [10] Wikipedia. Digital signal processors. http://en.wikipedia.org/wiki/Digital\_signal\_processor.