## Электронный компонент: CS2461QL | Скачать: PDF ZIP |

- Features
- Applications
- Fast Fourier Transform
- CS2461 Symbol and Pin Description
- Programming the Core
- Input and Output Data Format
- Transform Computation
- Fixed Word Length and Accuracy
- Loading Input and Downloading Result
- Overflow handling
- Processing Time and Latency
- AVAILABILITY AND IMPLEMENTATION INFORMATION

application specific core computes the FFT/IFFT based on radix-4 algorithm in three computation passes. The

CS2461 is available in both ASIC and FPGA versions that have been handcrafted by Amphion for maximum

performance while minimizing power consumption and silicon area.

Xlm

Ylm

12-bit complex input/output in two's

complement format (24-bit complex word)

13-bit twiddle factors generated inside the

core

15-bit fixed-point internal arithmetic operation

Programmable shift down control

Radix-4 architecture

Transform performed in three computation

passes with zero-waiting

Simultaneous loading/downloading

supported

Both input and output in normal order

No external memory required

Optimized for both ASIC and FPGA

technologies with the same functionality

Fully synchronous design

802.11a and HiperLAN2

Image processing

Atmospheric imaging

Spectral representation

defined below.

proportional to Nlog

of multiplication is required, however the more simultaneous

multiple data access is required which causes the circuits to be

more complicated. The radix-4 algorithm offers a balance

between the computational and circuit complexity and is often

used in construction of higher radix FFT computation units

when designing high performance FFT/IFFT hardware.

IFFT core symbol, and the I/O interface descriptions respectively.

Unless otherwise stated, all signals are active high and bit(0) is the

when CLR is active

point block. The remaining data of the 64-point data block is loaded into the core in the fol-

lowing clock cycles in the natural order.

is active and returns to LOW when the last data of the 64-point block is loaded into the core. XBS is

ignored when it is HIGH.

the last data of the 64-point block is loaded into the core and returns to LOW when the core is ready to

accept the next input data block. XBS is ignored when it is HIGH.

radix-4, forward or inverse Fast Fourier Transforms on a 64-

point complex data block. The transform is scheduled in three

computation passes and the data is loaded into the core in

normal sequential (natural) order. The transform result is

outputted from the core also in the natural order. The core is

on-line programmable on the transform type and scaling

down control. It's input/output data and the twiddle factor

wordlengths are selected such that it can be used in a wide

range of applications.

arithmetic with programmable shift down control on each

computation passes to handle the possible wordlength growth

and overflow in the transform. This achieves the maximal

accuracy possible while maintaining the desired dynamic

range for the output. This core is a synchronous design with

all the flip-flops being triggered at the rising edge of the clock

signal CLK.

synchronously reset. This is done through asserting signal

CLR and applying the appropriate signals to the input ports

IFFT and SDC, where port IFFT specifies the transform type

i.e. FFT/IFFT. Table 2 lists the FFT/IFFT value for programming

data during the 64-Point transform. Theoretically the 64-Point

FFT may have up to a total of 7-bits word growth. The CS2461

core can perform up to 7-bit controlled shifting down

operation to avoid possible overflow and also to allow the

transform gain to be controlled. This is programmed through

port SDC. The total number of shift down bits decides the

transform scaling down factor. Table 3 lists the SDC values for

programming the scaling factor.

output transform result and returns to LOW when YEnab is asserted to download the result.

block is available on the output port. The remaining data of the 64-point transform result is available at

the output of the core in the following clock cycles in natural order.

result

performed. It is reset when a new transform starts and is associated with the 64-point block.

core is reset to the default mode: 64-point FFT with a 7-bit

shifting operation. Programming the core can be performed at

any time subsequently. The programming signals are valid

only when CLR is asserted. This is illustrated in Figure 3. It is

noted that when CLR is applied the core is reset as well.

by 12-bit real and imaginary components, namely XRe and

XIm, in the two's complement format. The input data is

loaded into the core in the normal order, i.e., X(0) enters the

core first, followed immediately in the next clock cycle by

X(1), and then X(2), etc. In total it takes 64 clock cycles for a

data block to enter the core for FFT/IFFT processing. The

transformed data is represented by complex numbers which

consist of a 12-bit real component YRe and a 12-bit imaginary

component YIm both in the two's complement format. The

output data is burst out from the core when the transform has

been performed to the stage that allows the result to be output

and the output port is enabled. The result from the core is also

in the normal order, i.e., Y(0) first, followed by Y(1), Y(2), etc

in consecutive clock cycles.

each pass the controller fetches the intermediate data from the

internal dual port memory, sends it to the processing unit,

fetches the computation results from the processing unit and

writes the result back to memory for the next pass or for the

output. The CS2461 employs a Cooley-Tukey radix-4

decimation-in-frequency (DIF) to compute the FFT/IFFT. This

algorithm requires the calculation of radix-4 butterflies and

twiddle multiplications in multiple passes. Theoretically the

intermediate result value of a radix-4 butterfly with twiddle

operation may grow by a factor of up to 5.657. This represents

up to three-bit wordlength growth.

possibe computation accuracy. When the intermediate value is

derived from the twiddle multiplication result, or the input to

the butterfly is scaled down, round-to-the-nearest operation is

possible for the given wordlength.

the intermediate result in the four passes, according to the

scaling down control programmed. Table 4 lists the

relationship between the programming input signal SDC and

the number of scaling down bits performed in the four passes.

It is noted that there is no overflow in the computation when

the total number of shifting bits is equal to 7 bits.

transform. The twiddle factors (Sine and Cosine values),

which are generated by the core internally, have 13-bit

accuracy. At the end of each computation pass, the result is

rounded to 12 bits. Figure 4 illustrates the word lengths at

various computation stages in the CS2461 core.

computation accuracy possible for the given word lengths.

When the intermediate value is derived from the twiddle

multiplication result, the output from the butterflies is scaled

down, or the intermediate result is right shifted, the core

performs the round-to-the-nearest operation to keep the loss

of accuracy minimal.

of CS2461 core. The results are obtained by applying 64 blocks

of 12-bit random input data to the core and the scaling down

control is set such that there is just no overflow in the

computation, i.e., the output magnitude is maximized while

no overflow occurs. The 12-bit output data from the core is

compared with the result of double precision FFT model. The

error is measured in terms of the output LSB weight. It is

noted that when overflow occurs the transform accuracy will

be decreased severely.

signal XBS. Signal XBS should be asserted when the output

signal XBIP and BUSY are de-asserted. It indicates the first

data of the 64-point data block. The data is clocked in, on the

rising edge of the CLK signal. The remaining data of the 64-

point data block is loaded successively on the rising edge of

the clock in natural order. When the core starts to load an 64-

point data block, signals XBIP and BUSY get asserted to

indicate that loading a data block is in progress. The signal

XBS will be ignored if XBIP is HIGH. When the last data of the

block is loaded into the core, signal XBIP returns is de-

asserted and signal Busy remains asserted to indicate the

transform computation is in progress. Signal XBS is still

ignored in this case until Busy returns to LOW.

of loading the 64-point data block when the required data has

been loaded, i.e., the input data loading is overlapped with

the first computation pass. This compensates for the latency

introduced by the pipelined computation units so that the

input data loading and the three computation passes can be

completed in 7*64 clock cycles.

multiplier has been implemented using only two normal

multipliers. This means that each full complex multiplication

requires two clock cycles. Therefore, each of the three

computation passes requires 2*64 clock cycles. The

consequences of this are further explained in Processing Time

and Latency section.

result is available. Downloading of the transform result is

started by asserting the input signal YEnab when Done is

asserted. The signal Done returns to LOW when downloading

is started and the first sample of the transform result is

outputted from the core in the natural order two clock cycles

later after the assertion of theYEnab signal. Output signal YAV

is asserted when the data on port YRe and YIm are valid and

output signal YBS is asserted if the first sample of the 64-point

result is output from the core. The output data is burst out

from the core in 64 clock cycles.

computation pass to achieve 7*64 clock cycles operation, if

input signal YEnab is asserted as soon as the output signal

Done goes to HIGH. Loading the next data block can be

started as soon as output signal Busy is de-asserted.

clock cycle I/O and transform operation. It is noted that the

input signal YEnab can be constantly asserted and if so the

transform result will be automatically downloaded when it is

available.

samples compared