ChipFind - документация

Электронный компонент: IDT79R3081E-20PFB

Скачать:  PDF   ZIP
5.5
1
IDT79R3081 RISController
MILITARY AND COMMERCIAL TEMPERATURE RANGES
MILITARY AND COMMERCIAL TEMPERATURE RANGES
SEPTEMBER 1995
1995 Integrated Device Technology, Inc.
5.5
DSC-9064/4
IDT 79R3081
TM
, 79R3081E
IDT 79RV3081, 79RV3081E
IDT79R3081
RISController
TM
with FPA
R3081 BLOCK DIAGRAM
Clock Generator
Unit/Clock Doubler
Master Pipeline Control
System Control
Coprocessor
(CP0)
Integer
CPU Core
Exception/Control
Registers
Memory Management
Registers
Translation
Lookaside Buffer
(64 entries)
General Registers
(32 x 32)
ALU
Shifter
Mult/Div Unit
Address Adder
PC Control
Virtual Address
Configurable
Data
Cache
(4kB/8kB)
Configurable
Instruction
Cache
(16kB/8kB)
Physical Address Bus
BIU
Control
DMA
Arbiter
4-deep
Write
Buffer
ClkIn
Int(5:0)
32
36
BrCond(3:2,0)
Data Bus
R3051 Superset Bus Interface Unit
Address/
Data
DMA
Ctrl
Rd/Wr
Ctrl
SysClk
Register Unit
(16 x 64)
Exponent Unit
Add Unit
Divide Unit
Multiply Unit
Floating Point
Coprocessor
(CP1)
Coherency
Logic
Invalidate
Control
Exception/Control
Data Bus
FP Interrupt
Parity
Generator
4-deep
Read
Buffer
2889 drw 01
The IDT logo is a registered trademark, and RISController, R3041, R3051, R3052, R3071, R3081, R3720, R4400, R4600, IDT/kit, and IDT/sim are trademarks of Integrated Device Technology, Inc.
FEATURES
Instruction set compatible with IDT79R3000A, R3041,
R3051, and R3071 RISC CPUs
High level of integration minimizes system cost
-- R3000A Compatible CPU
-- R3010A Compatible Floating Point Accelerator
-- Optional R3000A compatible MMU
-- Large Instruction Cache
-- Large Data Cache
-- Read/Write Buffers
43VUPS at 50MHz
-- 13MFlops
Flexible bus interface allows simple, low cost designs
Optional 1x or 2x clock input
20 through 50MHz operation
"V" version operates at 3.3V
50MHz at 1x clock input and 1/2 bus frequency only
Large on-chip caches with user configurability
-- 16kB Instruction Cache, 4kB Data Cache
-- Dynamically configurable to 8kB Instruction Cache,
8kB Data Cache
-- Parity protection over data and tag fields
Low cost 84-pin packaging
Superset pin- and software-compatible with R3051, R3071
Multiplexed bus interface with support for low-cost, low-
speed memory systems with a high-speed CPU
On-chip 4-deep write buffer eliminates memory write stalls
On-chip 4-deep read buffer supports burst or simple block
reads
On-chip DMA arbiter
Hardware-based Cache Coherency Support
Programmable power reduction mode
Bus Interface can operate at half-processor frequency
Integrated Device Technology, Inc.
5.5
2
IDT79R3081 RISController
MILITARY AND COMMERCIAL TEMPERATURE RANGES
The R3051, which incorporates 4kB of instruction cache
and 2kB of data cache, but omits the TLB, and instead uses
a simpler virtual to physical address mapping.
The R3081E, which incorporates a 16kB instruction cache,
a 4kB data cache, and full function memory management
unit (MMU) including 64-entry fully associative Translation
Lookaside Buffer (TLB). The cache on the R3081E is user
configurable to an 8kB Instruction Cache and 8kB Data
Cache.
The R3081, which incorporates a 16kB instruction cache,
a 4kB data cache, but uses the simpler memory mapping
of the R3051/52, and thus omits the TLB. The cache on the
R3081 is user configurable to an 8kB Instruction Cache and
8kB Data Cache.
Figure 1 shows a block level representation of the functional
units within the R3081E. The R3081E could be viewed as the
embodiment of a discrete solution built around the R3000A
and R3010A. However, by integrating this functionality on a
single chip, dramatic cost and power reductions are achieved.
CPU Core
The CPU core is a full 32-bit RISC integer execution
engine, capable of sustaining close to single cycle execution.
The CPU core contains a five stage pipeline, and 32 orthogonal
32-bit registers. The R3081 uses the same basic integer
execution core as the entire R3051 family, which is the
R3000A implementation of the MIPS instruction set. Thus, the
R3081 family is binary compatible with the R3051, R3052,
R3000A, R3001, and R3500 CPUs. In addition, the R4000
represents an upwardly software compatible migration path to
still higher levels of performance.
The execution engine in the R3081 uses a five-stage
pipeline to achieve near single-cycle instruction execution
rates. A new instruction can be initiated in each clock cycle;
the execution engine actually processes five instructions
concurrently (in various pipeline stages). Figure 2 shows the
concurrency achieved in the R3081 execution pipeline.
System Control Co-Processor
The R3081 family also integrates on-chip the System
Control Co-processor, CP0. CP0 manages both the exception
handling capability of the R3081, as well as the virtual to
physical address mapping.
As with the R3051 and R3052, the R3081 offers two
versions of memory management and virtual to physical
address mapping: the extended architecture versions, the
R3051E, R3052E, and R3081E, incorporate the same MMU
as the R3000A. These versions contain a fully associative 64-
entry TLB which maps 4kB virtual pages into the physical
address space. The virtual to physical mapping thus includes
kernel segments which are hard-mapped to physical
addresses, and kernel and user segments which are mapped
page by page by the TLB into anywhere in the 4GB physical
address space. In this TLB, 8 pages can be "locked" by the
kernel to insure deterministic response in real-time applications.
Figure 3 illustrates the virtual to physical mapping found in the
R3081E.
INTRODUCTION
The IDT R3051 family is a series of high-performance 32-
bit microprocessors featuring a high-level of integration, and
targeted to high-performance but cost sensitive processing
applications. The R3051 family is designed to bring the high-
performance inherent in the MIPS RISC architecture into
low-cost, simplified, power sensitive applications.
Thus, functional units have been integrated onto the CPU
core in order to reduce the total system cost, rather than to
increase the inherent performance of the integer engine.
Nevertheless, the R3051 family is able to offer 43VUPS
performance at 50MHz without requiring external SRAM or
caches.
The R3081 extends the capabilities of the R3051 family, by
integrating additional resources into the same pin-out. The
R3081 thus extends the range of applications addressed by
the R3051 family, and allows designers to implement a single,
base system and software set capable of accepting a wide
variety of CPUs, according to the price/performance goals of
the end system.
In addition to the embedded applications served by the
R3051 family, the R3081 allows low-cost, entry level computer
systems to be constructed. These systems will offer many
times the performance of traditional PC systems, yet cost
approximately the same. The R3081 is able to run any
standard R3000A operation system, including ACE UNIX.
Thus, the R3081 can be used to build a low-cost ARC
compliant system, further widening the range of performance
solutions of the ACE Initiative.
An overview of this device, and quantitative electrical
parameters and mechanical data, is found in this data sheet;
consult the
"R3081 Family Hardware User's Guide"
for a
complete description of this processor.
DEVICE OVERVIEW
As part of the R3051 family, the R3081 extends the offering
of a wide range of functionality in a compatible interface. The
R3051 family allows the system designer to implement a
single base system, and utilize interface-compatible processors
of various complexity to achieve the price-performance goals
of the particular end system.
Differences among the various family members pertain to
the on-chip resources of the processor. Current family members
include:
The R3052E, which incorporates an 8kB instruction cache,
a 2kB data cache, and full function memory management
unit (MMU) including 64-entry fully associative Translation
Lookaside Buffer (TLB).
The R3052, which also incorporates an 8kB instruction
cache and 2kB data cache, but does not include the TLB,
and instead uses a simpler virtual to physical address
mapping.
The R3051E, which incorporates 4kB of instruction cache
and 2kB of data cache, along with the full function MMU/
TLB of the R3000A.
5.5
3
IDT79R3081 RISController
MILITARY AND COMMERCIAL TEMPERATURE RANGES
The extended architecture versions of the R3051 family
(the R3051E, R3052E, and R3081E) allow the system designer
to implement kernel software which dynamically manages
user task utilization of system resources, and also allows the
Kernel to protect certain resources from user tasks. These
capabilities are important in general computing applications
such as ARC computers, and are also important in a variety of
embedded applications, from process control (where protection
may be important) to X-Window display systems (where
virtual memory management can be used). The MMU can
also be used to simplify system debug.
R3051 family base versions (the R3051, R3052, and R3081)
remove the TLB and institute a fixed address mapping for the
various segments of the virtual address space. These devices
still support distinct kernel and user mode operation, but do
not require page management software, leading to a simpler
software model. The memory mapping used by these devices
is shown in Figure 4. Note that the reserved spaces are for
compatiblity with future family members, which may map on-
chip resources to these addresses. References to these
addresses in the R3081 will be translated in the same fashion
as the rest of their respective segments, with no traps or
exceptions signalled.
When using the base versions of the architecture, the
system designer can implement a distinction between the
user tasks and the kernel tasks, without having to implement
page management software. This distinction can be
implemented by decoding the output physical address. In
systems which do not need memory protection, and wish to
have the kernel and user tasks operate out of the same
memory space, high-order address lines can be ignored by
the address decoder, and thus all references will be seen in
the lower gigabyte of the physical address space.
Floating Point Co-Processor
The R3081 also integrates an R3010A compatible floating
point accelerator on-chip. The FPA is a high-performance co-
processor (co-processor 1 to the CPU) providing separate
add, multiply, and divide functional units for single and double
precision floating point arithmetic. The floating point accelerator
features low latency operations, and autonomous functional
units which allow differing types of floating point operations to
function concurrently with integer operations. The R3010A
appears to the software programmer as a simple extension of
the integer execution unit, with 16 dedicated 64-bit floating
point registers (software references these as 32 32-bit registers
when performing loads or stores). Figure 5 illustrates the
functional block diagram of the on-chip FPA.
Clock Generator Unit
The R3081 is driven from a single input clock which can be
either at the processor rated speed, or at twice that speed. On-
chip, the clock generator unit is responsible for managing the
interaction of the CPU core, caches, and bus interface. The
R3081 includes an on-chip clock doubler to provide higher
frequency signals to the internal execution core; if 1x clock
mode is selected, the clock doubler will internally convert it to
Figure 4. Virtual to Physical Mapping of Base Architecture Versions
1MB Kernel Rsvd
Kernel Cacheable
Tasks
Kernel/User
Cacheable
Tasks
Inaccessible
Kernel Boot
and I/O
0xffffffff
0xc0000000
0xa0000000
0x80000000
0x00000000
1024 MB
2048 MB
512 MB
512 MB
VIRTUAL
PHYSICAL
Kernel Cached
(kseg2)
Kernel Uncached
(kseg1)
Kernel Cached
(kseg0)
User
Cached
(kuseg)
1MB User Rsvd
2889 drw 04
Figure 3. Virtual to Physical Mapping of Extended Architecture
Versions
Kernel Mapped
(kseg2)
Kernel Uncached
(kseg1)
Kernel Cached
(kseg0)
User Mapped
Cacheable
(kuseg)
Physical
Memory
Memory
0xffffffff
0xc0000000
0xa0000000
0x80000000
0x00000000
3548MB
512 MB
Any
Any
VIRTUAL
PHYSICAL
2889 drw 03
Figure 2. R3081 5-Stage Pipeline
IF
Current
CPU
Cycle
I#1
ALU
RD
MEM
WB
IF
I#2
ALU
RD
MEM
WB
IF
I#3
ALU
RD
MEM
WB
IF
I#4
ALU
RD
MEM
WB
IF
I#5
ALU
RD
MEM
WB
2889 drw 02
5.5
4
IDT79R3081 RISController
MILITARY AND COMMERCIAL TEMPERATURE RANGES
a double frequency clock. The 2x clock mode is provided for
compatiblity with the R3051. The clock generator unit replaces
the external delay line required in R3000A based applications.
Instruction Cache
The R3081 implements a 16kB Instruction Cache. The
system may choose to repartition the on-chip caches, so that
the instruction cache is reduced to 8kB but the data cache is
increased to 8kB. The instruction cache is organized with a
line size of 16bytes (four entries). This large cache achieves
hit rates in excess of 98% in most applications, and substantially
contributes to the performance inherent in the R3081. The
cache is implemented as a direct mapped cache, and is
capable of caching instructions from anywhere within the 4GB
physical address space. The cache is implemented using
physical addresses (rather than virtual addresses), and thus
does not require flushing on context switch.
The instruction cache is parity protected over the instruction
word and tag fields. Parity is generated by the read buffer
during cache refill; during cache references, the parity is
checked, and in the case of a parity error, a cache miss is
processed.
Data Cache
The R3081 incorporates an on-chip data cache of 4kB,
organized as a line size of 4 bytes (one word). The R3081
allows the system to reconfigure the on-chip cache from the
default 16kB I-Cache/4kB D-Cache to 8kB of Instruction and
8kB of Data caches.
The relatively large data cache achieves hit rates in excess
of 95% in most applications, and contributes substantially to
the performance inherent in the R3081. As with the instruction
cache, the data cache is implemented as a direct mapped
physical address cache. The cache is capable of mapping any
word within the 4GB physical address space.
The data cache is implemented as a write-through cache,
to insure that main memory is always consistent with the
internal cache. In order to minimize processor stalls due to
data write operations, the bus interface unit incorporates a 4-
deep write buffer which captures address and data at the
processor execution rate, allowing it to be retired to main
memory at a much slower rate without impacting system
performance. Further, support has been provided to allow
hardware based data cache coherency in a multi-master
environment, such as one utilizing DMA from I/O to memory.
The data cache is parity protected over the data and tag
fields. Parity is generated by the read buffer during cache refill;
during cache references, the parity is checked, and in the case
of a parity error, a cache miss is processed.
Bus Interface Unit
The R3081 uses its large internal caches to provide the
majority of the bandwidth requirements of the execution
engine, and thus can utilize a simple bus interface connected
to slower memory devices. Alternately, a high-performance,
low-cost secondary cache can be implemented, allowing the
processor to increase performance in systems where bus
bandwidth is a performance limitation.
As part of the R3051 family, the R3081 bus interface utilizes
a 32-bit address and data bus multiplexed onto a single set of
pins. The bus interface unit also provides an ALE (Address
Latch Enable) output signal to de-multiplex the A/D bus, and
Figure 5. FPA Functional Block Diagram
Cache
Data
Data Bus
Instructions
Operands
Condition
Codes
Exponent Part
Fraction
Register Unit (16 X 64)
(32)
(32)
Control Unit
and Clocks
(11)
(11)
(11)
(53)
(53)
(53)
(53)
(53)
(56)
(53)
(53)
(56)
A
B
Result
Result
Exponent
Unit
A
B
Add Unit
Round
Result
A
B
Result
A
B
Divide Unit
Multiply Unit
2889 drw 05
5.5
5
IDT79R3081 RISController
MILITARY AND COMMERCIAL TEMPERATURE RANGES
simple handshake signals to process CPU read and write
requests. In addition to the read and write interface, the R3051
family incorporates a DMA arbiter, to allow an external master
to control the external bus.
The R3081 also supports hardware based cache coherency
during DMA writes. The R3081 can invalidate a specified line
of data cache, or in fact can perform burst invalidations during
burst DMA writes.
The R3081 incorporates a 4-deep write buffer to decouple
the speed of the execution engine from the speed of the
memory system. The write buffers capture and FIFO processor
address and data information in store operations, and present
it to the bus interface as write transactions at the rate the
memory system can accommodate.
The R3081 read interface performs both single datum
reads and quad word reads. Single reads work with a simple
handshake, and quad word reads can either utilize the simple
handshake (in lower performance, simple systems) or utilize
a tighter timing mode when the memory system can burst data
at the processor clock rate. Thus, the system designer can
choose to utilize page or nibble mode DRAMs (and possibly
use interleaving, if desired, in high-performance systems), or
use simpler techniques to reduce complexity.
In order to accommodate slower quad word reads, the
R3081 incorporates a 4-deep read buffer FIFO, so that the
external interface can queue up data within the processor
before releasing it to perform a burst fill of the internal caches.
The R3081 is R3051 superset compatible in its bus interface.
Specifically, the R3081 has additional support to simplify the
design of very high frequency systems. This support includes
the ability to run the bus interface at one-half the processor
execution rate, as well as the ability to slow the transitions
between reads and writes to provide extra buffer disable time
for the memory interface. However, it is still possible to design
a system which, with no modification to the PC Board or
software, can accept either an R3041, R3051, R3052, R3071,
or R3081.
SYSTEM USAGE
The IDT R3051 family has been specifically designed to
allow a wide variety of memory systems. Low-cost systems
can use slow speed memories and simple controllers, while
other designers may choose to incorporate higher frequencies,
faster memories, and techniques such as DMA to achieve
maximum performance. The R3081 includes specific support
for high perfromance systems, including signals necessary to
implement external secondary caches, and the ability to
perform hardware based cache coherency in multi-master
systems.
Figure 6 shows a typical system implementation.
Transparent latches are used to de-multiplex the R3081
address and data busses from the A/D bus. The data paths
between the memory system elements and the A/D bus is
managed by simple octal devices. A small set of simple PALs
is used to control the various data path elements, and to
control the handshake between the memory devices and the
CPU.
Depending on the cost vs. performance tradeoffs appropriate
to a given application, the system design engineer could
include true burst support from the DRAM to provide for high-
performance cache miss processing, or utilize a simpler,
lower performance memory system to reduce cost and simplify
the design. Similarly, the system designer could choose to
implement techniques such as external secondary cache, or
DMA, to further improve system performance.
DEVELOPMENT SUPPORT
The IDT R3051 family is supported by a rich set of
development tools, ranging from system simulation tools
through PROM monitor and debug support, applications
software and utility libraries, logic analysis tools, sub-system
modules, and shrink wrap operating systems. The R3081,
which is pin and software compatible with the R3051, can
directly utilize these existing tools to reduce time to market.
Figure 7 is an overview of the system development process
typically used when developing R3051 family applications.
The R3051 family is supported in all phases of project
development. These tools allow timely, parallel development
of hardware and software for R3051 family applications, and
include tools such as:
Optimizing compilers from MIPS, the acknowledged leader
in optimizing compiler technology.
Cross development tools, available in a variety of
development environments.
The IDT Evaluation Board, which includes RAM, EPROM,
I/O, and the IDT PROM Monitor.
IDT/sim
TM
, which implements a full prom monitor
(diagnostics, remote debug support, peek/poke, etc.).
IDT/kit
TM
, which implements a run-time support package for
R3051 family systems.
PERFORMANCE OVERVIEW
The R3081 achieves a very high-level of performance. This
performance is based on:
An efficient execution engine. The CPU performs ALU
operations and store operations in a single cycle, and has
an effective load time of 1.3 cycles, and branch execution
rate of 1.5 cycles (based on the ability of the compilers to
avoid software interlocks). Thus, the execution engine
achieves over 35 VUPS performance when operating out
of cache.
A full featured floating point accelerator/co-processor.
The R3081 incorporates an R3010A compatible floating
point accelerator on-chip, with independent ALUs for floating
point add, multiply, and divide. The floating point unit is fully
hardware interlocked, and features overlapped operation
and precise exceptions. The FPA allows floating point
adds, multiplies, and divides to occur concurrently with
each other, as well as concurrently with integer operations.
Large on-chip caches. The R3051 family contains caches
which are substantially larger than those on the majority of
today's microprocessors. These large caches minimize the
number of bus transactions required, and allow the R3051
family to achieve actual sustained performance very close
to its peak execution rate. The R3081 doubles the cache
available on the R3052, making it a suitable engine for