|
The most important design goal in new processor developments is power efficiency. With silicon implementation technologies scaling rapidly to 90nm and beyond, power consumption is a primary issue holding back SoC designers from integrating more functions on a single chip. At the same time, popularity of portable applications, such as music players, cameras, and cell phones, has exploded in the past decade; all of these applications demand high levels of chip integration. The challenge is to achieve this integration while addressing the consumer demand for long battery life. In order to attack this problem when designing a new DSP, the best approach is a holistic one. This article will discuss how Synopsys and Philips met the power efficiency challenge.
Only a combination of low-power design approaches at every level of the design will result in an overall power-efficient solution. In terms of design constraints, this means the following:
- To achieve optimal power, the processor design must be application-specific, i.e., it must be optimized for a specific application domain (in this case, audio processing). Instructions and bit widths in the data path were chosen to best fit audio processing.
- The processor must be small. Small area and fewer toggling gates as well as a compact layout directly contribute to lower-power consumption.
- Memory area and memory access must be minimized, as the memories contribute significantly to the total area and power consumption of an application chip built using the DSP core.
- The ratio between maximum clock frequency and required cycle count for target applications must be optimized, such that aggressive dynamic voltage scaling can be applied, leading to quadratic improvements of power consumption. Parallelism is introduced where applicable, yielding an optimal trade-off between size and reduced cycle count.
- Enabling application programmers to use C-language is key to productivity, and the compiler and subroutine library combination must make the resulting performance comparable to assembly language coding. In addition, extensive profiling support is required to allow the application programmer to make algorithmic and programming trade-offs to hit the optimal computing performance level, while minimizing power.
Conceptual design of an ultra-low-power DSP core
From the outset, the design goals for CoolFlux DSP were ultra-low power consumption and C-compiler friendliness. The main application domain for the CoolFlux DSP is portable or wearable audio processing. A re-targetable C-compiler and simulator environment from Target Compiler Technologies formed the basis for architectural exploration for the Philips CoolFlux DSP design team. See Figure 1 for a target design flow diagram.

Figure 1. Click to zoom.
A processor machine description is used as the basis for generating a C-compiler, which is used to translate the application program(s) in machine code, which can then be simulated and profiled. Cycle count and code size were optimized by iterating this design loop. At the same time, any architectural changes were quickly evaluated in the hardware implementation design flow, closing the loop with respect to gate count and performance. Philips was able to take advantage of its library of application software already in commercial audio ICs to drive this optimization process.
Ultra-low-power DSP core implementation
The CoolFlux DSP hardware architecture (see Figure 2) includes a dual Harvard memory architecture; full 24/56-bit data paths, two 24 x 24-bit multipliers, and 56-bit accumulators. Extensive addressing modes ensure efficient memory access without cycle penalties. This includes modulo protection and bit-reversed addressing for multiple dedicated address registers. All of these features are based on Philips’ many years of experience building power-efficient applications, including, for example, hearing aid ICs.

Figure 2. Click to zoom.
The core synthesizes to 43k gates. In a 0.18m CMOS process, the performance is 135 MHz (worst case commercial conditions), which yields >1000 MOPS. MP3 decoding was implemented with only 14.5 MHz of required clock frequency, 3.9 Kw program memory, and 9.8 Kw data memory. These results have been obtained using the C-compiler only, without assembly level optimizations. Core power consumption for this application is less than 1mW in 0.18m CMOS.
Applications
The CoolFlux DSP is targeted for applications such as digital hearing instruments, headsets, portable audio equipment, and biomedical sensor processing. Philips has developed application software for CoolFlux DSP, including solutions for pitch control, virtual sound field rendering for headphones and speakers, audio beam forming, speech recognition, speech intelligibility improvement, noise cancellation, echo cancellation, full-duplex hands-free processing, and audio CODECS such as MP3, SBC, G722, and others.
Deployment: Synopsys DesignWare Star IP
The CoolFlux DSP is being distributed through the Synopsys DesignWare Star IP program. As such, the Design View, consisting of simulation and timing models plus documentation, are distributed as part of the DesignWare Library and the DesignWare Verification Library free of charge to the more than 25,000 DesignWare Library product licensees. This facilitates very simple evaluation of the CoolFlux DSP core for new designs. If a designer decides to license the CoolFlux DSP for incorporation into their own design, Philips will license the core, and then Synopsys will deliver and support the Implementation View. The Implementation View includes the synthesizable RTL, scripts, and other deliverables that are necessary to implement the CoolFlux DSP core.
In order to facilitate such a large customer base as the DesignWare Library, Synopsys has redesigned the CoolFlux DSP for reuse. This includes creating both Verilog and VHDL versions of the core, and ensuring that the RTL runs through the sweet spot of the entire implementation flow, as well as packaging it with coreBuilder. Synopsys delivers the core with coreConsultant, a tool that automates the process of configuration, implementation, and verification of the core. As part of making the core highly reusable, Synopsys has also enabled use with the leading Verilog and VHDL simulators.
|