MIPS32TM 4KEpTM Processor Core Datasheet

November 8, 2002

MIPS32TM 4KEpTM Processor Core Datasheet, Revision 02.00

Document Number: MD00113

The MIPS32TM 4KEpTM core from MIPS® Technologies is a member of the MIPS32 4KETM processor core family. It is a
high-performance, low-power, 32-bit MIPS RISC core designed for custom system-on-silicon applications. The core is
designed for semiconductor manufacturing companies, ASIC developers, and system OEMs who want to rapidly integrate
their own custom logic and peripherals with a high-performance RISC processor. It is highly portable across processes, and
can be easily integrated into full system-on-silicon designs, allowing developers to focus their attention on end-user
products. The 4KEp core is ideally positioned to support new products for emerging segments of the digital consumer,
network, systems, and information management markets, enabling new tailored solutions for embedded applications.

The 4KEp core implements the MIPS32 Release 2 Architecture with the MIPS16eTM ASE, and the 32-bit privileged
resource architecture. The Memory Management Unit (MMU) consists of a simple, Fixed Mapping Translation (FMT)
mechanism for applications that do not require the full capabilities of a Translation Lookaside Buffer- (TLB-) based MMU.

Instruction and data caches are fully configurable from 0 - 64 Kbytes in size. In addition, each cache can be organized as
direct-mapped or 2-way, 3-way, or 4-way set associative. Load and fetch cache misses only block until the critical word
becomes available. The pipeline resumes execution while the remaining words are being written to the cache. Both caches
are virtually indexed and physically tagged to allow them to be accessed in the same clock that the address is translated.

An optional Enhanced JTAG (EJTAG) block allows for single-stepping of the processor as well as instruction and data
virtual address/value breakpoints. Additionally, real-time tracing of instruction program counter, data address, and data
values can be supported.

Figure 1

shows a block diagram of the 4KEp core. The core is divided into required and optional blocks as shown.

Figure 1 4KEp Core Block Diagram

System

Coprocessor

MDU

FMT

MMU

D-cache

BIU

TAP

EJTAG

Power

Mgmt

I-cache

Off-Chip

Debug I/F

Fixed/Required

Optional

Execution

Core

(RF/ALU/Shift)

Thin I/F

On-Chip Bus(es)

Trace

Off/On-Chip

Trace I/F

CP2

UDI

On-Chip

Coprocessor 2

Cache

Controller

MIPS32TM 4KEpTM Processor Core Datasheet, Revision 02.00

Features

· 5-stage pipeline

· 32-bit Address and Data Paths

· MIPS32-Compatible Instruction Set

Multiply-Accumulate and Multiply-Subtract
Instructions (MADD, MADDU, MSUB, MSUBU)

Targeted Multiply Instruction (MUL)

Zero/One Detect Instructions (CLZ, CLO)

Wait Instruction (WAIT)

Conditional Move Instructions (MOVZ, MOVN)

Prefetch Instruction (PREF)

· MIPS32 Enhanced Architecture (Release 2) Features

Vectored interrupts and support for external interrupt
controller

Programmable exception vector base

Atomic interrupt enable/disable

GPR shadow registers (optionally, one or three
additional shadows can be added to minimize latency
for interrupt handlers)

Bit field manipulation instructions

· MIPS16eTM Code Compression

16 bit encodings of 32 bit instructions to improve code
density

Special PC-relative instructions for efficient loading of
addresses and constants

SAVE & RESTORE macro instructions for setting up
and tearing down stack frames within subroutines

Improved support for handling 8 and 16 bit datatypes

· Programmable Cache Sizes

Individually configurable instruction and data caches

Sizes from 0 - 64KB

Direct Mapped, 2-, 3-, or 4-Way Set Associative

Loads block only until critical word is available

Write-back and write-through support

16-byte cache line size

Virtually indexed, physically tagged

Cache line locking support

Non-blocking prefetches

· Scratchpad RAM Support

Can optionally replace 1 way of the I- and/or D-cache
with a fast scratchpad RAM

Independent external pin interfaces for I- and D-
scratchpads

20 index address bits allow access of arrays up to 1MB

Interface allows back-stalling the core

· MIPS32 Privileged Resource Architecture

Count/Compare registers for real-time timer interrupts

I and D watch registers for SW breakpoints

· Memory Management Unit

Simple Fixed Mapping Translation (FMT) mechanism

· Simple Bus Interface Unit (BIU)

All I/O's fully registered

Separate unidirectional 32-bit address and data buses

Two 16-byte collapsing write buffers

Designed to allow easy conversion to other bus
protocols

· CorExtendTM User Defined Instruction Set Extensions

(available in 4KEp ProTM core)

Allows user to define and add instructions to the core at
build time

Maintains full MIPS32 compatibility

Supported by industry standard development tools

Single or multi-cycle instructions

Separately licensed; a core with this feature is known as
the 4KEp ProTM core

· Multiply/Divide Unit

32 clock latency on multiply

34 clock latency on multiply-accumulate

33-35 clock latency on divide (sign-dependent)

· Coprocessor 2 interface

32 bit interface to an external coprocessor

· Power Control

Minimum frequency: 0 MHz

Power-down mode (triggered by WAIT instruction)

Support for software-controlled clock divider

Support for extensive use of local gated clocks

· EJTAG Debug

Support for single stepping

Virtual instruction and data address/value breakpoints

PC and data tracing

TAP controller is chainable for multi-CPU debug

Cross-CPU breakpoint support

· Testability

Full scan design achieves test coverage in excess of
99% (dependent on library and configuration options)

Optional memory BIST for internal SRAM arrays

Architecture Overview

The 4KEp core contains both required and optional blocks.
Required blocks are the lightly shaded areas of the block
diagram in

Figure 1

and must be implemented to remain

MIPS-compliant. Optional blocks can be added to the
4KEp core based on the needs of the implementation.

The required blocks are as follows:

· Execution Unit

MIPS32TM 4KEpTM Processor Core Datasheet, Revision 02.00

· Multiply/Divide Unit (MDU)

· System Control Coprocessor (CP0)

· Memory Management Unit (MMU)

· Fixed Mapping Translation (FMT)

· Cache Controllers

· Bus Interface Unit (BIU)

· Power Management

Optional blocks include:

· Instruction Cache

· Data Cache

· Scratchpad RAM interface

· Coprocessor 2 interface

· CorExtendTM User Defined Instruction (UDI) support

· MIPS16e support

· Enhanced JTAG (EJTAG) Controller

The section entitled "4KEp Core Required Logic Blocks"
on page 4 discusses the required blocks. The section
entitled "4KEp Core Optional Logic Blocks" on page 11
discusses the optional blocks.

Pipeline Flow

The 4KEp core implements a 5-stage pipeline with
performance similar to the R3000

pipeline. The pipeline

allows the processor to achieve high frequency while
minimizing device complexity, reducing both cost and
power consumption.

The 4KEp core pipeline consists of five stages:

· Instruction (I Stage)

· Execution (E Stage)

· Memory (M Stage)

· Align (A Stage)

· Writeback (W stage)

The 4KEp core implements a bypass mechanism that
allows the result of an operation to be forwarded directly to
the instruction that needs it without having to write the
result to the register and then read it back.

Figure 2

shows a timing diagram of the 4KEp core pipeline.

Figure 2 4KEp Core Pipeline

I Stage: Instruction Fetch

During the Instruction fetch stage:

· An instruction is fetched from instruction cache.

· MIPS16e instructions are expanded into MIPS32-like

instructions

E Stage: Execution

During the Execution stage:

· Operands are fetched from register file.

· The arithmetic logic unit (ALU) begins the arithmetic

or logical operation for register-to-register instructions.

· The ALU calculates the data virtual address for load

and store instructions.

· The ALU determines whether the branch condition is

true and calculates the virtual branch target address for
branch instructions.

· Instruction logic selects an instruction address.

· All multiply and divide operations begin in this stage.

M Stage: Memory Fetch

During the Memory fetch stage:

· The arithmetic ALU operation completes.

· The data cache access and the data virtual-to-physical

address translation are performed for load and store
instructions.

· Data cache look-up is performed and a hit/miss

determination is made.

· A multiply operation stalls the MDU pipeline for 31

clocks in the M stage.

I-A1

RegRd

I Dec

ALU Op

Align

RegW

D-AC

Bypass

I-A2

MUL

RegW

mul, div

RegW

I-Cache

D-Cache

MIPS32TM 4KEpTM Processor Core Datasheet, Revision 02.00

· A multiply-accumulate operation stalls the MDU

pipeline for 33 clocks in the M stage.

· A divide operation stalls the MDU pipeline for 32-34

clocks in the M stage.

A Stage: Align

During the Align stage:

· Load data is aligned to its word boundary.

· A multiply/divide operation updates the HI/LO

registers.

· A MUL operation makes the result available for

writeback. The actual register writeback is performed
in the W stage.

W Stage: Writeback

During the Writeback stage:

· For register-to-register or load instructions, the

instruction result is written back to the register file.

4KEp Core Required Logic Blocks

The 4KEp core consists of the following required logic
blocks, shown in

Figure 1

. These logic blocks are defined

in the following subsections:

· Execution Unit

· Multiply/Divide Unit (MDU)

· System Control Coprocessor (CP0)

· Memory Management Unit (MMU)

· Fixed Mapping Translation (FMT)

· Cache Controller

· Bus Interface Unit (BIU)

· Power Management

Execution Unit

The 4KEp core execution unit implements a load/store
architecture with single-cycle ALU operations (logical,
shift, add, subtract) and an autonomous multiply/divide
unit. The 4KEp core contains thirty-two 32-bit general-
purpose registers used for integer operations and address
calculation. Optionally, one or three additional register file
shadow sets (each containing thirty-two registers) can be
added to minimize context switching overhead during
interrupt/exception processing. The register file consists of

two read ports and one write port and is fully bypassed to
minimize operation latency in the pipeline.

The execution unit includes:

· 32-bit adder used for calculating the data address

· Address unit for calculating the next instruction

address

· Logic for branch determination and branch target

address calculation

· Load aligner

· Bypass multiplexers used to avoid stalls when

executing instructions streams where data producing
instructions are followed closely by consumers of their
results

· Leading Zero/One detect unit for implementing the

CLZ and CLO instructions

· Arithmetic Logic Unit (ALU) for performing bitwise

logical operations

· Shifter & Store Aligner

Multiply/Divide Unit (MDU)

The 4KEp core includes a multiply/divide unit (MDU) that
contains a separate pipeline for multiply and divide
operations. This pipeline operates in parallel with the
integer unit (IU) pipeline and does not stall when the IU
pipeline stalls. This allows the long-running MDU
operations to be partially masked by system stalls and/or
other integer unit instructions.

Multiply and divide operations are implemented with a
simple 1 bit per clock iterative algorithm. Any attempt to
issue a subsequent MDU instruction while a multiply/
divide is still active causes an MDU pipeline stall until the
operation is completed.

Table 1

lists the latency (number of cycles until a result is

available) for the 4KEp core multiply and divide
instructions. The latencies are listed in terms of pipeline
clocks.

Table 1 4KEp Core Area-Efficient Integer Multiply/

Divide Unit Operation Latencies

Opcode

Operand

Sign

Latency

MUL, MULT, MULTU

any

MADD, MADDU,

MSUB, MSUBU

any

MIPS32TM 4KEpTM Processor Core Datasheet, Revision 02.00

The MIPS architecture defines that the results of a multiply
or divide operation be placed in the HI and LO registers.
Using the move-from-HI (MFHI) and move-from-LO
(MFLO) instructions, these values can be transferred to the
general-purpose register file.

In addition to the HI/LO targeted operations, the MIPS32
architecture also defines a multiply instruction, MUL,
which places the least significant results in the primary
register file instead of the HI/LO register pair.

Two other instructions, multiply-add (MADD) and
multiply-subtract (MSUB), are used to perform the
multiply-accumulate and multiply-subtract operations,
respectively. The MADD instruction multiplies two
numbers and then adds the product to the current contents
of the HI and LO registers. Similarly, the MSUB
instruction multiplies two operands and then subtracts the
product from the HI and LO registers. The MADD and
MSUB operations are commonly used in DSP algorithms.

System Control Coprocessor (CP0)

In the MIPS architecture, CP0 is responsible for the virtual-
to-physical address translation and cache protocols, the
exception control system, the processor's diagnostics
capability, the operating modes (kernel, user, and debug),
and whether interrupts are enabled or disabled.
Configuration information, such as cache size and set
associativity, is also available by accessing the CP0
registers, listed in Table 2.

DIVU

any

DIV

pos/pos

any/neg

neg/pos

Table 2 Coprocessor 0 Registers in Numerical Order

Register

Number

Register

Name

Function

0-6

Reserved

Reserved in the 4KEp core.

HWREna

Enables access via the RDHWR
instruction to selected hardware
registers.

Table 1 4KEp Core Area-Efficient Integer Multiply/

Divide Unit Operation Latencies

Opcode

Operand

Sign

Latency

BadVAddr

Reports the address for the most
recent address-related exception.

Count

Processor cycle count.

Reserved

Reserved in the 4KEp core.

Compare

Timer interrupt control.

Status

Processor status and control.

IntCtl

Interrupt system status and control.

SRSCtl

Shadow register set status and
control.

SRSMap

Provides mapping from vectored
interrupt to a shadow set.

Cause

Cause of last general exception.

EPC

Program counter at last exception.

PRId

Processor identification and
revision.

EBASE

Exception vector base register.

Config

Configuration register.

Config1

Configuration register 1.

Config2

Configuration register 2.

Config3

Configuration register 3.

LLAddr

Load linked address.

WatchLo

Low-order watchpoint address.

WatchHi

High-order watchpoint address.

20-22

Reserved

Reserved in the 4KEp core.

Debug

Debug control and exception
status.

Trace
Control

PC/Data trace control register.

Trace
Control2

Additional PC/Data trace control.

User Trace
Data

User Trace control register.

TraceBPC

Trace breakpoint control.

DEPC

Program counter at last debug
exception.

Table 2 Coprocessor 0 Registers in Numerical Order

Register

Number

Register

Name

Function

Document Outline

Ð­Ð»ÐµÐºÑ‚Ñ€Ð¾Ð½Ð½Ñ‹Ð¹ ÐºÐ¾Ð¼Ð¿Ð¾Ð½ÐµÐ½Ñ‚: MIPS324KEp

Document Outline

ÐÐ»ÐµÐºÑ‚Ñ€Ð¾Ð½Ð½Ñ‹Ð¹ ÐºÐ¾Ð¼Ð¿Ð¾Ð½ÐµÐ½Ñ‚: MIPS324KEp