ChipFind - документация

Электронный компонент: MIPS645KC

Скачать:  PDF   ZIP

Document Outline

MIPS64TM 5KcTM Processor Core Datasheet
November 19, 2001
MIPS64TM 5KcTM Processor Core Datasheet, Revision 02.23
Copyright 1999-2001 MIPS Technologies Inc. All right reserved.
The MIPS64TM 5KcTM processor core from MIPS Technologies is a synthesizable, highly-integrated 64-bit MIPS RISC
microprocessor core designed for high-performance, low-power, low-cost embedded applications. To semiconductor
manufacturing companies and system OEMs who are building complex System-On-Chip ASIC devices, the 5Kc core
offers the long-awaited benefits of an easy-to-integrate, synthesizable core that provides 64-bit address and data paths along
with the 64-bit computing power of an R5000
-class processor. The 5Kc core is portable across processes, is highly
configurable, and is easily integrated into standard design flows, thereby reducing time to market and allowing designers
to focus their attention on end-user products. The 5Kc core is ideally positioned to support new products for emerging
segments of the digital consumer, network, and office automation markets. The power-management features of the 5Kc core
make it ideally suited for use in battery-powered applications.
The 5Kc core implements the MIPS64 Architecture. It contains special multiply-accumulate, conditional move, prefetch,
wait, leading zero/one detect instructions, and the 64-bit privileged resource architecture. A coprocessor interface is also
provided, which allows designers a way to easily extend their architectures by addition of custom functionality, such as
floating-point, network, or graphics coprocessors.
The memory management unit contains a configurable 16, 32, or 48 dual-entry Joint TLB (JTLB) with variable page sizes,
a 4-entry Instruction micro TLB (ITLB), and a 4-entry Data micro TLB (DTLB). Using a TLB with the 5Kc core is
optional. The alternative is to use a far simpler Fixed Mapping Translation (FMT) scheme.
Optional instruction and data caches are fully configurable from 0 - 64 KBytes in size, with a maximum size of 16 KBytes/
way in a 4-way set associative implementation. In addition, each cache can be organized as direct-mapped, 2-way, 3-way,
or 4-way set associative. The 5Kc core supports an instruction scheduling mechanism that eliminates pipeline stalls on
cache misses, and a load scheduling slot is also supported.
To ease software debugging, the EJTAG debug solution in the 5Kc core includes instruction software breakpoints, a single-
step feature, and a dedicated Debug Mode. Optional hardware breakpoints include 4 instruction and 2 data breakpoints. An
optional Test Access Port (TAP) forms the interface to an external debug host and provides a dedicated communication
channel for debugging of an embedded system.
Figure 1
shows a block diagram of the 5Kc core. The core is divided into required and optional blocks as shown.
Figure 1 5Kc Core Block Diagram
Mul/Div Unit
Execution
Core
System
Coprocessor
MMU
TLB
Cache
Control
Instruction
Cache
Data
Cache
BIU
EC interface
Fixed/Required
Optional
Power
Mgmt.
5Kc Core
COP interface
FMT
EJTAG
Breakpoints
TAP Ctrl
Features
2
MIPS64TM 5KcTM Processor Core Datasheet, Revision 02.23
Copyright 1999-2001 MIPS Technologies Inc. All right reserved.
Features
64-bit Data and Address Path
(42-bit virtual and 36-bit physical address space)
MIPS64 Compatible Instruction Set
Based on MIPS V
TM
Instruction Set Architecture
Multiply-Accumulate and Multiply-Subtract
Instructions (MADD, MADDU, MSUB, MSUBU)
Targeted Multiply Instruction (MUL)
Zero/One Detect Instructions (CLZ, CLO, DLCO,
DCLZ)
Wait Instruction (WAIT) for low power control
Conditional Move Instructions (MOVZ, MOVN)
Prefetch Instructions (PREF, PREFX)
General Purpose FPU/Coprocessor Interface
Supports all MIPS V instructions, including advanced
COP1X instructions
Supports both COP1 and COP2 coprocessors
Utilizes high-performance features of the integer unit
Dual-issue capable interface supports execution of an
arithmetic coprocessor instruction and an integer or
coprocessor load/store instruction every cycle
Multiply/Divide Unit
Maximum issue rate of one 32x16 multiply per clock
Maximum issue rate of one 32x32 multiply every other
clock
Maximum issue rate of one 64x64 multiply every 9 clocks
37 clock latency on 32/32 divides
69 clock latency on 64/64 divide
Early-in feature for divides allows results sooner for
smaller dividend values
MIPS64 privileged resource architecture
Count/Compare registers for real-time timer interrupts
Instruction and Data watch registers for software
breakpoints
Separate interrupt exception vector
Supervisor Mode operation
Performance Monitoring logic for analyzing application
speed
Memory Management Unit
16, 32, or 48 dual-entry JTLB with variable page sizes
or a simple Fixed Mapping Translation (FMT)
mechanism (optional)
4-entry instruction micro TLB
4-entry data micro TLB
Support for 8-bit ASID
Support for 4 KB - 16 MB page sizes
Programmable Cache Sizes
Individually configurable instruction and data caches
Sizes from 0 - 16 KBytes/way (64 KBytes maximum)
Direct Mapped, 2-, 3-, or 4-Way Set Associative
Non-blocking loads
32-byte cache line size, doubleword sectored
Virtually indexed, physically tagged
Support for locking cache lines
Non-blocking prefetches
Optional parity protection
Simple Bus Interface Unit (BIU)
All I/Os fully registered
Separate, unidirectional 36-bit address and 64-bit data
buses
32-byte write buffer (4 doublewords)
1-line (32-byte) eviction buffer
Power Control
Minimum frequency: 0 MHz
Power-down mode (triggered by WAIT instruction)
Support for software controlled clock divider
Sleep mode: During this mode the clocks are shut off.
Sleep mode is entered automatically from power-down
mode after all bus activity stops.
EJTAG Debug Support
Software Debug Breakpoint Instruction (SDBBP)
Single-step feature
Debug Mode
Optional hardware breakpoints (4 instruction and 2 data
breakpoints)
Optional Test Access Port (TAP) interface to debug
host, including fast data download/upload feature
Testability for Production Test:
Muxed-FF fullscan design with configurable number of
scan chains. ATPG test coverage can exceed 99%
(library and configuration dependent).
Optional memory BIST, either through integrated
memory test (March C+ or IFA-13 algorithm) or by use
of industry standard memory BIST CAD tools.
Architectural Overview
The 5Kc core contains both required and optional blocks.
Optional blocks can be added to the 5Kc core based on the
needs of the implementation. The required blocks are as
follows:
Execution Unit
Floating Point Unit (FPU)
Multiply/Divide Unit (MDU)
System Control Coprocessor (CP0)
Memory Management Unit (MMU)
Translation Lookaside Buffer (TLB) or
Fixed Mapping Translation (FMT)
Pipeline Flow
MIPS64TM 5KcTM Processor Core Datasheet, Revision 02.23
3
Copyright 1999-2001 MIPS Technologies Inc. All right reserved.
Cache Controllers
Bus Interface Unit (BIU)
Basic EJTAG debug features
Power Management
Optional blocks include:
Instruction Cache
Data Cache
EJTAG Debug Test Access Port (TAP)
EJTAG Hardware Breakpoints
Memory BIST module
The section entitled
"5Kc Core Required Logic Blocks" on
page 4
discusses the required blocks. The section entitled
"5Kc Core Optional Logic Blocks" on page 14
discusses
the optional blocks.
Pipeline Flow
The 5Kc core implements a high-performance 6-stage
pipeline:
Instruction fetch (I stage)
Dispatch (D stage)
Register read (R stage)
Execution (E stage)
Memory access (M stage)
Writeback (W stage)
The 5Kc core implements a bypass mechanism that allows
the result of an operation to be forwarded directly to the
instruction that needs it without having to write the result
to the register and then read it back.
Figure 2
shows a timing diagram of the 5Kc core pipeline.
Figure 2 5Kc Core Pipeline
I Stage: Instruction Fetch
During the Instruction Fetch stage:
The Translation Lookaside Buffer (TLB) or the Fixed
Mapping Translation (FMT) performs the virtual-to-
physical address translation for instruction fetch
addresses.
An instruction is fetched from instruction cache.
D Stage: Dispatch
During the Dispatch stage:
Branch decode and prediction is performed.
An instruction is dispatched to the coprocessor/integer
unit.
R Stage: Register Read
During the Register Read stage:
The General Purpose Register (GPR) file is read.
The instruction is decoded.
E Stage: Execution
During the Execution stage:
The Arithmetic Logic Unit (ALU) computes the
arithmetic or logical operation for register-to-register
instructions.
The ALU determines whether the branch condition is
true.
All multiply and divide operations begin.
The ALU calculates the full virtual address for load
and store instructions.
The cache look-up starts for loads and stores.
M Stage: Memory Access
During the memory access stage:
The Data Translation Lookaside Buffer (DTLB) or the
Fixed Mapping Translation (FMT) performs the
virtual-to-physical address translation for data load/
store addresses.
The data cache lookup completes.
Load data is aligned.
I
D
R
E
M
W
I$ Data
I$ Tag
T
ag Cmp.
ITLB
W
ay Select
Dispatch
Branch Tgt
GPR Read
Decode
GPR Write
Byp
Low
Addr
ALU/Addr
D$ Tag
DTLB
T
ag Cmp.
D$ Data
W
ay Select
Load Align
Bypass
Bypass
Bypass
Modes of Operation
4
MIPS64TM 5KcTM Processor Core Datasheet, Revision 02.23
Copyright 1999-2001 MIPS Technologies Inc. All right reserved.
W Stage: Writeback
During the writeback stage:
For register-to-register or load instructions, the
instruction result is written back to the register file.
Modes of Operation
The 5Kc core supports four modes of operation: User
Mode, Supervisor Mode, Kernel Mode, and Debug Mode.
User Mode is most often used for applications programs.
Kernel and Supervisor Modes are typically used for
handling exceptions and operating system functions,
including CP0 management and I/O device accesses.
Debug Mode is used for EJTAG software debugging and is
similar to Kernel Mode, but also allows programming of
debug resources and has special handling of exceptions and
other debug related issues.
The processor enters Kernel Mode both at reset and when
an exception is taken. While in Kernel Mode, software has
access to the entire address space as well as all CP0
registers. User Mode accesses are limited to a subset of the
virtual address space and can be inhibited from accessing
CP0 functions.
5Kc Core Required Logic Blocks
The 5Kc core consists of the following required logic
blocks as shown in
Figure 1
. These logic blocks are defined
in the following subsections:
Execution Unit
Multiply/Divide Unit (MDU)
System Control Coprocessor (CP0)
Cache Controllers
Memory Management Unit (MMU)
Translation Lookaside Buffer (TLB) or
Fixed Mapping Translation (FMT)
Bus Interface Control (BIU)
Basic EJTAG debug features
Power Management
Execution Unit
The 5Kc core execution unit implements a load/store
architecture with single-cycle ALU operations (logical,
shift, add, subtract). The 5Kc core contains thirty-two 64-
bit general-purpose registers used for integer operations
and address calculation. The register file consists of two
read ports and two write ports and is fully bypassed to
minimize operation latency in the pipeline.
The execution unit includes:
64-bit adder used for calculating arithmetic results and
the data addresses
Program counter the next instruction address
Logic for branch determination and branch target
address calculation
Load and store aligner
Bypass multiplexers used to avoid stalls when
executing instructions streams where data producing
instructions are followed closely by consumers of their
results.
Instruction buffer that eliminates penalties to the
pipeline when branches are predicted correctly, and
reduces the penalty to one pipeline bubble when a
branch is mispredicted.
Zero/One detect unit for implementing the CLZ,
DCLZ, CLO, and DCLO instructions.
Logic unit for performing bitwise logical operations
Multiply/Divide Unit (MDU)
The 5Kc core contains a Multiply/Divide Unit (MDU) with
a separate pipeline for multiply and divide operations. This
pipeline operates in parallel with the Integer Unit (IU)
pipeline and does not stall when the IU pipeline stalls. This
allows long-running MDU operations, such as divides, to
be partially masked by system stalls and/or other integer
unit instructions.
The MDU consists of a 32x16 booth recoded multiplier,
result/accumulation registers (HI and LO), a divide state
machine, and all necessary multiplexers and control logic.
The first number shown (`32' of 32x16) represents the rs
operand. The second number (`16' of 32x16) represents the
rt operand. The 5Kc core only checks the value of the latter
(rt) operand to determine how many times the operation
must pass through the multiplier. The 16x16 and 32x16
operations pass through the multiplier once, allowing for a
multiply operation every clock. A 32x32 operation passes
5Kc Core Required Logic Blocks
MIPS64TM 5KcTM Processor Core Datasheet, Revision 02.23
5
Copyright 1999-2001 MIPS Technologies Inc. All right reserved.
through the multiplier twice, allowing for a multiply
operation every other clock. A 64x64 operation passes
through the multiplier nine times, allowing for a multiply
operation every nine clocks.
Appropriate interlocks are implemented to stall the issue of
back-to-back 32x32 and 64x64 multiply operations.
Multiply operand size is automatically determined by logic
built into the MDU.
Divide operations are implemented with a simple 1 bit per
clock iterative algorithm. A 32-bit divide requires 37 clock
cycles to complete, while a 64-bit divide requires 69 clock
cycles. Any attempt to issue a subsequent MDU instruction
while a divide is still active causes an IU pipeline stall until
the divide operation is completed.
However, the divider has an early-in feature which detects
the size of the dividend in 8-bit increments. When a smaller
dividend is detected, the algorithm reduces the number of
iterations accordingly.
Table 1
lists the latencies (number of cycles until a result is
available) for the 5Kc core multiply and divide instructions.
The MIPS architecture defines that the results of a multiply
or divide operation be placed in the HI and LO registers.
Using the move-from-HI (MFHI) and move-from-LO
(MFLO) instructions, these values can be transferred to the
general purpose register file.
The 5Kc core implements an additional multiply
instruction, MUL, which specifies that multiply results be
placed in the general purpose register file instead of the HI/
LO register pair. This instruction avoids the explicit MFLO
instruction, normally required in order to use the results of
multiply operations.
Two other instructions, multiply-add (MADD) and
multiply-subtract (MSUB), are used to perform multiply-
accumulate operations. The MADD instruction multiplies
two numbers and then adds the product to the current
contents of the HI and LO registers. Similarly, the MSUB
instruction multiplies two operands and then subtracts the
product from the HI and LO registers. The MADD and
MSUB operations are commonly used in DSP algorithms.
The DMULT/DMULTU and DDIV/DDIVU instructions
are used to support 64-bit operands.
Exception Logic
The Exception block contains the logic for identifying and
managing exceptions. Exceptions can be caused by a
variety of sources, including boundary cases in data, TLB
misses, external events, or program errors.
Table 1
5Kc Core Integer Multiply/Divide Unit Latencies
Opcode
Operand
Size
Latency
(cycles)
MULT/MULTU,
MADD/MADDU,
MSUB/MSUBU,
DMULT/DMULTU
16 bit
1
32 bit
2
64 bit
9
MUL
16 bit
2
32 bit
3
DIV/DIVU,
DDIV/DDIVU
8 bit
11
16 bit
19
24 bit
27
32 bit
35
DDIV/DDIVU
40 bit
43
48 bit
51
56 bit
59
64 bit
67
Table 2 5Kc Core Exception Types
Exception
Description
Reset
Assertion of SI_ColdReset signal.
Soft Reset
Assertion of SI_Reset signal.
DSS
Debug Single Step.
DINT
Debug Interrupt.
DDBLImpr
Debug Data Break on Load Imprecise.
NMI
Assertion of EB_NMI signal.
Cache Error -
Data Access
A cache error occurred on a load or store
data reference (imprecise).
Machine Check
TLB write that conflicts with an existing
entry.
DBE
Load or store bus error.
Interrupt
Assertion of unmasked HW or SW
interrupt signal.
Deferred Watch
Deferred Watch.
DIB
Debug Instruction Hardware Break.