Overriding
concern: performance
Execution
Time = # instructions * CPI * CycleTime
#
instructions depends on instruction set and compiler
CPI
and clock cycle time depend on implementation of processor
Implement
subset of MIPS: memory reference, arithmetic-logical and beq/jump
· Understand
the general process for designing the datapath (part of processor that
does arithmetic) and control (part of processor that commands the datapath,
memory, and I/O devices according to program instructions)
· Understand
connection between instruction set and implementation
· Understand
how implementation choices affect clock rate and CPI
· Identify
steps to be taken for each type of instruction
· Consider
the hardware requirements for each step
· Connect
the hardware with appropriate controls to achieve desired result
Figure
5.1
Working
with bits: on/off values
Implement
in electronics: high/low
Asserted
- signal that is logically high. Assert
- make high.
Two
types of logic elements: combinational and state
Combinational:
outputs depend only on current inputs (ALU)
State/Sequential:
has some internal storage (memory, registers).
Output
depends on both current inputs and internal state.
Clocking
methodology determines when signals can be read/written.
Edge-triggered
- update only on clock edge
Figure
5.2
Initial
implementation - 1 long clock cycle.
Refinement
- multiple clock cycles/instruction
First
step for every instruction type is to fetch the next instruction to be
executed. The program instructions
are stored in memory. A register
called the program counter (PC) keeps track of the address of the current
instruction in memory.
Figure
5.5
ALU
- performs operation after values read from registers.
R-format
specifies 3 registers (two source, one destination). Register file has
two outputs (contents of two specified sources) and four inputs (3 register
numbers plus data for write operation). Also
has clock signal to control timing of write.
Figure
5.7
lw
$t1, offset($base)
sw
$t1, offset($base)
ALU
- computes memory address
Data
memory - reads/writes specified memory location
Sign
extend - needed for 16-bit offset field
beq
$t1, $t2, offset
ALU
- compute branch target address, using PC and offset
Sign
extend - needed for 16-bit offset field. Also
need to shift 2 bits (so target is word offset).
Branch
control - determines whether branch is actually taken
Simplest:
execute all instructions in one clock cycle
Restriction:
no datapath element can be used more than once per instruction. Must
duplicate an resources needed more than once. Examples:
have separate memory for instructions v. data. Have
adder for updating PC that is separate from the main ALU.
Requirement:
allow input to datapath elements to come from different sources. Examples:
ALU may add two register values (R-type instruction) or may add a register
and a sign-extended portion of the instruction itself (I-type instruction). Value
to be written into a register may come from the ALU (R-type result) or
memory (load). Solution: use
multiplexors with appropriate control signals.
Datapath
for memory and arithmetic-logical instructions:
Figure
5.12
Exercise 1 - handout in class
Additional
requirements to handle branch instructions:
Need
adder for computing branch target address (ALU is used for branch comparison)
Need
to select between branch target address and normal sequential address (PC
+ 4)
Resulting
datapath:
Figure
5.13
Selection
of registers is determined by simply decoding the instruction and using
the appropriate fields to select registers from the register file.
Based
on the instruction being executed, the control unit must then determine:
· ALU
control - what operation to be performed by the ALU
· Multiplexor
controls - what signal to select for each mux
· Write
signals needed for state elements (memory, register file)
Five
possible functions for ALU: see table on page 353
Strategy:
multiple levels of decoding.
1. Main
control unit determines whether ALU control is based on the function field
of the instruction (for R-type instructions) OR is an add (for memory access
instructions) OR is a subtract (for beq instructions). This
portion of the control is called ALUOp. Values
generated are 00 (for memory access), 01 (for beq) or 10 (use funct field).
2. ALU
control unit generates actual signals to ALU, to match table above.
First
step in designing ALU control: create a truth table of ALUOp/funct field
combinations and the resulting ALU control signals.
Figure
5.14 - original table with all possibilities
Figure
5.15 - revised table with additional don't cares, showing only entries
that must be asserted.
Optimizing
the truth table and converting to hardware gates is a mechanical process
that is best left to a computer program.
ALUSrc
- determines second operand for ALU. Will
be either a register (R-type) or the sign-extended immediate field (memory
access).
PCSrc
- determines what address to use for next instruction (what address to
load into PC). Will be either
PC+4 (normal operation; next instruction) or PC+4+[sign-extended, left-shifted]
immediate field (branch target).
MemtoReg
- determines what value is written into the register file. Will
be either the output of the ALU (R-type) or data read from memory (memory
access).
RegDst
(not in Figure 5.13) - determines what register to write. Will
be specified by either the
rt
field (memory access) or rd field (R-type).
Most
of these decisions can be based solely on the opcode field of the instruction. PCSrc
is based on both the opcode AND the result of the compare.
RegWrite
- assert for R-type and load instructions
MemRead
- assert to read data from memory for load instructions
MemWrite
- assert to write data to memory during store instructions
These
signals are also based on the opcode field of the instruction.
Figure
5.20
Programmable
Logic Array – array of AND gates followed by array of OR gates. Inputs
to AND gates are function inputs/inverses. Inputs
to OR gates are outputs of ANDS.
Figure
C.5
Figure
5.19
Exercise 2 - handout in class
Four
steps to execute an R-type instruction (Ex: add $t1, $t2, $t3)
1. Fetch
instruction from memory, increment PC.
2. Read
source registers from register file
3. ALU
performs operation (based on funct field)
4. Result
is written into register file
Five
steps to execute memory access (I-type) instruction (Ex: lw $t1, offset($t2))
1. Fetch
instruction from memory, increment PC.
2. Read
base register from register file
3. ALU
computes sum of base + sign-extended offset
4. Sum
is used as address for data memory
5. Data
from memory is written into register file
Four
steps to execute branch (I-type). (Ex:
beq $t1, $t2, offset)
1. Fetch
instruction from memory, increment PC.
2. Read
compare registers from register file.
3. ALU
performs subtraction. PC+4
is added to sign-extended, left-shifted offset to calculate target branch
address.
4. Zero
result from ALU is used to decide which result to store in PC.
To
complete the control function, need to create a truth table for each output.
See
table on page 367 and Figure 5.27
Jumps
add another option for the PC: jump address calculated by adding 4 bits
from PC to 26 bits (left shifted by 2) from instruction. Requires
another mux and shifter.
Figure
5.29
Clock
cycle has same length for every instruction. CPI
is 1.
BUT,
clock cycle is determined by longest possible path, so the clock cycle
must be set to the slowest instruction - in this case a load instruction.
ALU
and adders - 2 ns
Register
file access - 1 ns
Required
length of various instructions:
In
a multicycle implementation, each step in the execution will take 1 clock
cycle. Different
instructions have different number of steps (and therefore different number
of clock cycles). Functional
units may be used more than once per instruction. Reduces
potentially costly elements such memory or ALUs, but sometimes requires
the addition of storage units to preserve results for use later in an instruction. Registers
and multiplexors are smaller and cheaper. Differences
between single- and multi-cycle implementations · Single
memory can be used for both instructions and data · There
is a single ALU, rather than an ALU and two adders · Registers
are added after every major functional unit. These
store data that will be used later in the same instruction. Data
that will be used by subsequent instructions is stored into one of the
major functional units (register file, PC, memory). Specific
registers added to implementation: · Instruction
Register (IR) and Memory Data Register (MDR) save output of memory. · A
and B registers hold values read from register file. · ALUOut
holds the output of the ALU. Additional
multiplexors: · ALU
Input 1 - may be either a register or the PC · ALU
Input 2 - may be a register, the immediate field (for memory access), the
constant 4 (for updating the PC) or the sign-extended shifted offset field
(for branches). Additional
control signals: · PCWrite
and PCWriteCond - to write PC unconditionally (in normal operation) and
conditionally (during branch) Figure
5.33 Goal:
Balance the amount of work to minimize the clock cycle time Strategy:
Break execution of an instruction into steps, with each step taking one
clock cycle. To obtain roughly
even step sizes, restrict each step to contain at most one major operation
(ALU operation, register file access, memory access). Clock
cycle is then as short as the longest of these operations. NOTE:
Register file access has enough overhead to be considered a separate operation,
but just storing a result into a single stand-alone register is short enough
to be part of a step. Exercise 4 - handout in class
Control
must specify both the signals required for that step and what step to take
next in the sequence. Two
techniques/representations for control: finite state machines and
microprogramming. The
actual hardware implementation can be synthesized from either of these
formats by an automated CAD system (Appendix C). Finite
state machine - set of states and directions on how to change states [state
= step]. Next state function
maps the current state and inputs to a new state. Each
state specifies outputs [control signals] that are asserted during that
state. Implementation assumes
that all signals not explicitly asserted are deasserted (i.e., not don't
cares). Finite state control
essentially corresponds to five steps of execution. Each
state is represented by a circle. Outputs
for that state are listed within the circle. Arcs
between states are labeled with conditions. Figure
5.36 Figure
5.42 Graphical
representations (i.e., finite state machines) can be unwieldy with larger,
more varied instruction sets. As
an alternative, think about establishing a set of rules or instructions
for determining what control signals to assert and in what order. These
low-level control instructions are called microinstructions. The
process of designing the set of control instructions is called microprogramming. Remember:
a microprogram is a symbolic representation of what must happen. The
underlying hardware will still be implemented with gates, ROMs or PLAs. The
goal is to develop a format that makes it easy to write and understand
the microprogram. It should
also be difficult to write inconsistent microinstructions (two different
values for same control signal). Each
microinstruction contains a set of fields. Each
field specifies a nonoverlapping set of control signals OR is like a directive
that provides guidance on how to interpret the microprogram. Figure
5.45, 5.46 Read Instructions
Assume
(based on gcc instruction mix): loads: 5
cycles 23% stores: 4
cycles 13% ALU 4
cycles 43% Branches 3
cycles 19% Jumps 3
cycles 2 % CPI
= 0.23 * 5 + 0.13 * 4 + 0.43 * 4 + 0.19 * 3 + -.02 * 3 = 4.02 One
of the hardest parts of control is to implement exceptions and interrupts
- events that change the normal flow of execution. It's
much harder to add exception handling later. External
events - I/O device request - interrupt Invoke
operating system, arithmetic overflow, divide by 0, undefined instruction
- internal - exception Hardware
malfunction - could be either Two
ways to communicate the reason for an exception: status register (Cause
register) has field that indicates reason, or vectored interrupt - address
of interrupt service routine (ISR) depends on cause of exception. MIPS
architecture includes EPC (address of instruction that caused interrupt;
subtract 4 from PC) and Cause register that records reason. Figure
5.50
Calculate PC
Write PC
Decode
Write IR
Read Registers
Calculate Branch
Write A,B
Subtract
Write ALUOut
Use Zero
Write ALUOut
Write PC Cond
(source 01)