ChipFind - документация

Электронный компонент: A236

Скачать:  PDF   ZIP
A236
TM
Parallel Video Digital Signal Processor
Chip
the System Designer's Parallel DSP Chip (TM)
the first Video DSP Chip
Data Sheet Summary
Rev. February 1, 2001.
Click
if no navigation bar on left.
download.
Table of Contents
1. Features
2. Packaging
3. Applications
4. Functional Description
5. Block Diagram
with links to additional description
See the
A436 Video DSP Chip
, which uses a 4th generation Ax36
core. It is 20x more powerful for imaging than the A236 and is
signal pin-compatible.
1. Features
Fully user-programmable, stand-alone, Video Digital Signal Processor Chip optimized for
real-time image and video processing
Specifically intended for high volume, low cost, high performance, mass market embedded
applications
Provides simultaneous, continuous, video capture, processing AND display without any dedicated
frame buffers
1 of 6
2/2/01 4:40 PM
A236 Parallel Video DSP Chip from Oxford Micro Devices
file:///W|/a236-sum.html
Simple hardware design is easy to understand and use
Video interface device drivers, sample board-level schematics, application notes and training videos
speed you to market
Specifically designed for ease of programming using our parallel-enhanced, ANSI-compatible C
compiler
Existing C programs can be compiled to run on the A236 Chip, and performance-sensitive loops can
be modified to use our parallel enhancements to C
Programs are small, execute very efficiently and can be optimized easily, and their execution speeds
can be forecast easily
Suite of software development tools puts you in full control of the program
Highly integrated system-level building block combines many functions into a single chip and is easy
to design into systems
Glueless interfaces to common video encoder and decoder chips and low cost Synchronous DRAM
Simple interfaces to low cost digital image sensors, high speed, high resolution, digital image sensors
and other data streaming devices
Device drivers to interface A236 Chip to common video encoder and decoder chips are available
32-bit
Structure Processing Instruction Set
provides fast instruction execution and
single-instruction parallel operations on C-like parallel data structures
Four 2x8-/16-bit parallel arithmetic units, one 24-bit scalar arithmetic unit and on-chip Motion
Estimation/Pattern Matching Coprocessor for high performance
Four 16-bit x 16-bit multiply-adds, each with 40-bit accumulation, per CPU Clock
Supports 16-bit parallel arithmetic, and normal and saturation, signed and unsigned, 8-bit parallel
arithmetic
Native instruction-level support for efficient operations upon monochrome and YUV and SRGB
composite color video data
Superior data movement capabilities including extremely powerful, parallel-byte and -word
addressing on arbitrary byte addresses
Linear 16 MB address space is used for all program and data storage for ease of programming
On-chip, synchronous, burst pipelined, 64-bit wide, 1 KB Instruction and Data Caches with efficient,
64-byte transfers
Powerful memory managment - programs simply address image data as it is needed, no block moves
are required
32-bit wide, 100 MHz, 400 MB/S memory port to various configurations of external, low cost, high
density Synchronous DRAMs
Memory port is
adaptively timed for 100% utilization of speed of Synchronous DRAMs
Three 16-bit, double-buffered, bidirectional, packet capable and
video-aware DMA ports connect
directly to common video chips
DMA ports can access 32-bit address space of host processors and control I/O devices
General purpose, RS-232 serial port with programmable baud rate
Serial bus port for control of peripherals
Asynchronous port design - all ports are clocked totally independently of one another and CPU for
maximum performance
Multiple A236 Chips can easily be used together in serial or parallel when even higher performance
is required
Low cost, 0.6 um, 5v (3.3v DRAM port), triple layer metal, standard cell CMOS in 208-pin PQFP
package
Uses Oxford Micro Devices' third generation, Ax36 Video DSP core (see
A336
and
A436
Video
DSPs for fourth generation Ax36 core)
2 of 6
2/2/01 4:40 PM
A236 Parallel Video DSP Chip from Oxford Micro Devices
file:///W|/a236-sum.html
2. Packaging
Photo of A236 Chip in 208-pin Plastic Quad Flat Pack
3. Applications
Real-time video/image
capture, processing and
display
Data acquisition and formatting,
lens correction
Multimedia, digital office
equipment
Biometrics, neural networks and
pattern recognition
Signal processing and video
effects generation
Communications, encryption,
decryption
Programmable video
compression and
decompression
Internet video appliances and
smart cards
4. Functional Description
The A236 Chip is a versatile, stand alone,
fully user-programmable
, general purpose building block for
real-time digital image and video signal processing. Its unique combination of ease of programming,
parallel processing, three video aware DMA ports, simple and powerful memory management that
automatically accesses data when it is needed without block moves, and Synchronous DRAM port
provide more flexibility and much better performance, more memory, higher system-level integration for
ease of use, and lower cost than other fast DSPs. It has: (a) an enhanced single-instruction multiple-data
(SIMD) architecture with four, 2x8- or 16-bit parallel arithmetic units that accumulate products to 40-bits
and have a total of 256, 16-bit registers; (b) a 64-bit wide, 1 KB, 2-way set-associative, synchronous,
burst-pipelined, data cache with sixteen 64-byte pages; (c) a 64-bit wide, 1 KB, 2-way set-associative,
3 of 6
2/2/01 4:40 PM
A236 Parallel Video DSP Chip from Oxford Micro Devices
file:///W|/a236-sum.html
synchronous, burst-pipelined, instruction cache with sixteen 64-byte pages; (d) a single, 32-bit instruction
unit supporting single-instruction operation on C-like parallel data structures; (e) a 24-bit scalar
arithmetic unit for program control and computing data and program addresses and loop counts; (f) a
Crossbar Switch that passes information among the on-chip arithmetic units and functions as a 64-bit
barrel shifter with 8-bit increments; and (g) barrel shifters built into the scalar and parallel arithmetic
units.
The A236 Chip's general purpose
Structure Processing Instruction Set
is extremely powerful and handles
a wide range of high performance applications.
It supports single-instruction, parallel operations on
C-like parallel data structures.
A single instruction can address a set of 4 or 8 parallel operands in
memory, fetch the parallel operands, operate upon all of the parallel operands and compute a new
memory address. Most instruction words are 32 bits long and execute at the rate of one per CPU clock
cycle. When processing four 16-bit operands, the equivalent of 12 instructions on a conventional RISC
CPU are typically executed by the A236 for every instruction word as a result of its efficient parallel
architecture, providing the equivalent of 480 MIPS with only a 40 MHz CPU clock. When processing
eight 8-bit operands, the equivalent of 24 instructions on a conventional RISC CPU are typically
executed by the A236 for every instruction word, providing the equivalent of 960 MIPS with only a 40
MHz CPU clock. Even higher performance is obtained during motion estimation/pattern matching
(approximately 4,000 MIPS on a conventional RISC CPU). Superior data movement capability is also
provided which is extremely useful when manipulating color video data in the YUV and SRGB formats
and performing hierarchical or pyramid processing.
A single linear 16 MB memory address space is used, simplifying program development. Data is accessed
simply by addressing it without the use of any block moves. The manipulation of quad and octal 8-bit and
quad 16-bit parallel variables stored on
unaligned addresses is supported to maximize memory utilization
and performance. Color planes can be stored separately for manipulation then combined for video output,
or composite input video data can be split into separate color planes. Signed and unsigned, and normal
and saturating arithmetic can be performed on four 8-bit parallel variables simultaneously using 16-bit
precision. Normal and saturating arithmetic can be performed on eight 8-bit parallel variables
simultaneously using 9-bit precision.
Several application-specific enhancements are provided. For motion estimation in video compression,
color plane alignment in scanners and pattern matching in fingerprints,
Pixel Distance computes the sum
of the absolute values of the differences between four or eight pairs of pixels from two sets of four or
eight 8-bit pixels every CPU clock cycle, with one set of operands coming from memory; the sum is
accumulated to 16 bits to handle large blocks. The best match is also tracked. For video overlay
operations such as chroma keying, four or eight 8-bit or four 16-bit binary masks can be computed
simultaneously, eliminating the need for most jump instructions to merge two images. For convolution
and the updating of video frames, the addressing and use of successive 64-bit words on
any address is
supported at the full CPU clock rate to provide a sliding window.
The A236 Chip is specifically designed to be programmed in C. Our breakthrough,
Symbolic Parallel
Programming Method and Parallel Programming Model are implemented in our
Parallel-Enhanced,
ANSI-Compatible C Compiler
, enabling the A236 Chip to be programmed quickly and easily using a
familiar, scalar processing programming model.
Existing C programs can be compiled to run on the
A236 Chip. Performance-sensitive loops can be rewritten using the parallel enhancements in our C
compiler to use the parallel processing capability of the A236 Chip. Using a high level form of operator
overloading and built-in data structures including
quad_long, quad_int, quad_short and oct_short,
simultaneous operations upon multiple data elements can be coded as simply as quad_B += quad_A. No
4 of 6
2/2/01 4:40 PM
A236 Parallel Video DSP Chip from Oxford Micro Devices
file:///W|/a236-sum.html
cryptic macros, in-line assembly code or function calls are required to utilize the parallel processing in the
A236 Chip. Careful code generation and optimization is done to avoid needless loads, stores and no-ops.
The A236 Chip has six ports. Three 16-bit bi-directional, asynchronous, double-buffered, packet capable
and
video-aware DMA ports are provided for loading data and passing information among multiple A236
Chips.
No glue logic is required to connect common video decoder and encoder chips to the A236 Chip.
Very simple interfaces to low cost digital image sensors, high speed, high resolution, digital image
sensors and other data streaming devices are provided. Absolutely
all frame buffering is provided within
the DRAM connected to the A236 Chip
under control of the A236 Chip; no external frame buffers are
required. Polarities of the video control signals are software programmable for maximum flexibility. Any
number of pixels per line and numbers of lines per frame are supported in progressive and interlaced
modes.
Two video inputs and one video output can be supported simultaneously.
A 32-bit wide, high
performance memory port with 64-byte bursts provides a
400 mega-bytes/S interface to inexpensive,
synchronous DRAMs for virtually instantaneous access to up to 16 MB of program, data and I/O buffers,
sustaining high performance for live video, large data sets and large programs. No memory bus resizing is
done, so the full memory bandwidth is available at all times. The memory interface is
adaptively timed to
compensate for the relatively slow access time of Synchronous DRAMs, providing 100% utilization for
maximum performance. A serial bus port is provided for controlling peripherals such as video encoders
and decoders, and loading of small programs and/or
Basic I/O System Software (BIOS) from serial
EEPROM
into the synchronous DRAMs via the A236 Chip upon reset. Serial EEPROMs can be loaded
by the A236 Chip for ease of modification. In packet mode, which is fully hand-shaked, all of the
DMA ports can be used to access a host's 32-bit memory address space for obtaining data or transmitting
results, or control I/O devices such as mass-storage units, enabling the A236 Chip to be used as the CPU
in low cost, stand-alone applications. A RS-232 port with programmable baud rate can be used for
programmed serial I/O and to provide test access to the A236 Chip for in-situ application development.
A basic system nucleus requires only three chips, an A236 Chip, a 32-bit synchronous DRAM and a
serial EEPROM, yet provides the ability to
simultaneously and continuously capture, process and display
live video images. No external video capture or display buffers are required. Pixel, line, field and frame
sync signals are directly supported by the video-aware, parallel DMA ports for the utmost ease of video
interface. All ports are asynchronous of each other for maximum flexibility.
The A236 Development Environment, a suite of software development tools for the A236 Chip, runs
under Microsoft Windows 95/98. The A236 Chip executes a single task in parallel, so simple, familiar,
scalar processing programming techniques can be used, and a simple, single-task operating system can be
used for software development. An assembler, parallel-enhanced ANSI-compatible C compiler, linker,
loader, simulator and debugger are provided. A reduced functionality version is available from our Web
page. The
most extensive Help capability in the industry is also provided within the tools. A
hardware/software evaluation kit for an IBM-compatible PC with a PCI bus and running Microsoft
Windows 95/98 is also available.
5. Block Diagram of A236 Parallel Video Digital Signal Processor
Chip
5 of 6
2/2/01 4:40 PM
A236 Parallel Video DSP Chip from Oxford Micro Devices
file:///W|/a236-sum.html