Decode, Rename & Dispatch Unit

ECE 442 Project Report

Computer System and Memory Design

Sashisu Bajracharya

Sukhonthip Rueangvivatanakij

Team 4

Spring 2002

Dr. D. Tabak

Abstract

The purpose of the project is to design a computing system with instruction sets and memory and cache management. We have to develop the superscalar system with 8-issue instructions per cycle. Further the instruction set is designed for the total of 55 instructions which provides all the general and sufficient programming needs. The instruction set is designed as register-to-register or Load/Store architecture similar to what RISC system has. Our system supports direct, immediate and absolute addressing modes. Moreover, we have branch prediction mechanism of Bimodal Local Branch Prediction which is used for speculative execution to support instruction level parallelism. Moreover, we have Reorder Buffer for reordering the commit of instructions which are executed in parallel. We have Renamed Registers to remove artificial dependence like WAW and WAR data hazards to provide full parallelism. We have multiple functional units as the program generally needs for typical 8 instructions at a time. We also have reservation stations before each execution units in order to keep the instructions until all the input operands are ready. Further, we design the cache and main memory with given specifications. Our cache as specified is set-associative type which has 4 lines (blocks) per set. We implement different cache policies like random replacement, write-through, write-allocate when we build the VHDL model for cache, main memory and CPU. We do the simulation of data transfer between CPU and cache and main memory showing HIT-read, HIT-write, MISS-read and MISS-write. We also do simulation for 8-issue instructions fetching and dispatching to corresponding units.

Specifications

The specifications that are required for the project are:

System and Instruction Set Design

Superscalar processing system with 8 instructions issue per cycle
A branch prediction mechanism
Instruction set design with instruction and data format and addressing modes
System word size: 64 bits
Register file : 32 registers
Number of instructions: 44 at least

Memory and Cache Management Design

Main memory size: 512 Mbyte
Page size: 16 Kbyte
Cache size: 512 Kbyte
Dual Cache (Instruction and Data Cache)
Line(Block) Size: 64 Bytes
Cache mapping : set associative 4 lines/set
Write policy: write-through

We specified additional design parameters as suitable for our project:

the Secondary Memory is chosen to be 8 GB.
The instruction size is constant and same size as Word Size of 64 Bytes
memory access only through Load/Store Command
Register to register instruction formats
Bimodal local branch prediction was selected by our team
Four Pipeline stages: Instruction Fetch, Instruction Decode, Instruction Execute & Write Back with Commit
Presence of Rename Registers of quantity 32 inside the register file for overcoming WAW and WAR hazard
Four general execution units are chosen: Branch Unit(BU), Integer Unit(IU), Floating Point Unit(FPU) and Load/Store Unit(LSU)
Presence of Reorder Buffer and Reservation Stations to avoid stalls and allow for parallel execution of multiple instructions possibly out of order but committed in the right order
Write-allocate policy is chosen if there is a miss in cache while storing data
Instruction Buffer size of 16 entries.
Three addressing modes was chosen: Register Direct, Immediate, Absolute

Description of Design

Instruction Set Design

We have,

Word Size : 64 bits

Number of instructions: 55 (designed)

Since, 55 < 64 = 2⁶ , 6 bits are all we need for Opcode.

Memory Size = 512 Mbytes = 2²⁹

So, we need 29 bits for absolute memory address

We have chosen immediate operand to have 32 bits.

We chose 3 addressing modes:

Addressing Modes

1. Register Direct

2. Immediate

3. Absolute

Since other instructions than Load/Store instructions do not access the memory, the instruction operates on register to register, we did not need to put addressing mode bits inside the Instruction. Instead, the opcode carries the information to tell if the instruction involves absolute address or just register or immediate values.

The instruction formats are shown as follows for different types of instructions:

Instruction Format

Word Size : 64 bits

No. of instr : 55 < 2⁶= 6 bits opcode

3-operand Register to Register

2-Operand Register to Register

1-Operand

LOAD/STORE

IMMEDIATE

JUMP

BRANCH ON CONDITION

Instruction Set

Our instructions are mainly divided into 4 general groups: Arithmetic type, Logic, Control and Data Transfer type. The list of instructions is as follows:

Opcodes Instruction Operation Performed Comment Arithmetic 1) 000000 ADDF Rx, Ry, Rz ; Rx =Ry + Rz [Floating point Addition] 2) 000001 SUBF Rx, Ry, Rz ; Rx = Ry - Rz [Floating point Subtraction] 3) 000010 MULF Rx, Ry, Rz ; Rx =Ry * Rz Floating point multiplication 4) 000011 DIVF Rx, Ry, Rz ; Rx =Ry / Rz Floating point Division 5) 000100 ADD Rx, Ry, Rz ; Rx =Ry + Rz [Interger Addition] 6) 000101 SUB Rx, Ry, Rz ; Rx = Ry - Rz [Interger Subtraction] 7) 000110 MUL Rx, Ry, Rz ; Rx = Ry(32) * Rz(32) Integer Multiply lower 32 bits of sources 8) 000111 DIV Rx, Ry, Rz ; Rx = Ry / Rz Integer Division, Rx gets Quotients 9) 001000 MOD Rx, Ry, Rz ; Rx = Ry % Rz Rx gets remainder of division 10) 001001 DEC Rx ; Rx = Rx - 1 Decrement by 1 11) 001010 INC Rx ; Rx = Rx + 1 Increment by 1 12) 001011 CLR Rx ; Rx = 0 Clear the register to 0 13) 001100 NEG Rx ; Rx = 0 - Rx Change to 2’s Complement 14) 001101 SEW Rx ; Sign Extends 16 bit to 64 bit 15) 001110 SEL Rx ; Sign Extends 32 bit to 64 bit 16) 001111 NOP ; No operation Logic 17) 010000 AND Rx, Ry, Rz ; Rx = Ry AND Rz Logical AND of each bit 18) 010001 OR Rx, Ry, Rz ; Rx = Ry OR Rz Logical OR of each bit 19) 010010 NOT Rx ; Rx = 1’s complement of Rx bits flipped 20) 010011 XOR Rx, Ry, Rz ; Rx = Ry XOR Rz Logical XOR of each bit 21) 010100 ROR Rx ; Rotate right with carry 22) 010101 ROL Rx ; Rotate left with carry 23) 010110 SHR Rx ; Shift right Rx with carry 24) 010111 SHL Rx ; Shift left Rx with carry 25) 011000 SAR Rx ; Shift Arithmetic right with high order bit copied 26) 011001 TST Rx ; Sets or Clears ‘Z’ & ‘N’ flag 27) 011010 CLC ; Clears Carry flag 28) 011011 CLV ; Clear Overflow flag 29) 011100 SEC ; Set Carry Flag 30) 011101 SEV ; Set Overflow Flag Control Instruction 31) 011110 JMP X ; PC =X Jump to Absolute Address X 32) 011111 BEQ Rx, Ry, X ; if Rx - Ry = 0, PC=X If Rx = Ry ,Jump to address X 33) 100000 BNE Rx, Ry, X ; if Rx - Ry ¹ 0, PC=X If Rx ¹ Ry ,Jump to address X 34) 100001 BGT Rx, Ry, X ; if Rx - Ry > 0, PC=X If Rx > Ry ,Jump to address X 35) 100010 BLT Rx, Ry, X ; if Rx - Ry < 0, PC=X If Rx < Ry ,Jump to address X 36) 100011 BGE Rx, Ry, X ; if Rx - Ry >= 0, PC=X If Rx >= Ry ,Jump to address X 37) 100100 BLE Rx, Ry, X ; if Rx - Ry <= 0, PC=X If Rx <= Ry ,Jump to address X 38) 100101 JSR X ; SP = SP - 8, Jump to Subroutine with (SP) =(PC), PC=X address X 39) 100110 RTS ; PC =((SP)) Return from subroutine SP = [SP] + 8 40) 100111 RTI ; Return from Interrupt loads all the Previously saved Registers, SP and PC Data Transfer 41) 101000 PUSH Rx ; SP =SP - 8, Push Rx contents to stack (SP) = Rx 42) 101001 POP Rx ; Rx = (SP), Pop From stack to Rx SP = SP + 8 43) 101010 LOAD Rx , M1 ; Rx = M[M1]8 Load 8 bytes from memory Starting at M1 to Rx 44) 101011 STORE M1, Rx ; M[M1]8 = Rx Store 8 bytes to memory starting at M1 from Rx 45) 101100 MOVE Rx, Ry ; Rx = Ry Copy the contents of Ry to Rx 46) 101101 LOADI Rx , #Im ; Rx = #Im Load Immediate number to Rx 47) 101110 EXCHG Rx , Ry ; (Rx ) <=> (Ry ) Exchange values of Rx and Ry 48) 101111 TSR Rx ; Rx = SP Transfer SP address to Rx 49) 110000 TRS Rx ; SP = Rx Transfer Rx value to SP 50) 110001 LOADB Rx , M1 ; Rx = M[M1]1 Load 1 byte from M at M1 to Rx

8-issue Pipelining Operation

System Schematic

The Fetch Unit, Instruction Buffer, Branch Prediction Unit and Instruction cache is shown in detail as follows:

Fetch Unit

The fetch unit provides the first stage of the pipeline and starts the multiple issue instruction level parallelism. In our case it fetches 8 instructions at a time form Instruction Cache starting from address "PC_out" signal to Instruction Cache. In order to fetch the instructions the Fetch Unit puts the first address of the instructions to "PC_out" signal and put the number of instructions that it wants to fetch and sets the ‘load’ to high to notify Instruction cache to transfer the set of instructions. The Instruction Cache returns those set of instructions in "instr" bus which can hold up to 8 instructions. Now the fetch unit checks if any of the instruction is Control(Branch) instruction. If it finds a branch instruction it sends the lower 6 bits of address of that instruction to Branch Prediction Unit (BPU) and set "predict" signal to high to notify the BPU. The BPU will return back the prediction to take or not to take through "take" signal. If the prediction was not to take and all the branch instruction in the set of 8 had been predicted, then the fetch unit issues those set of 8 instructions to the Instruction Buffer. If the prediction is to take, then the instructions of quantity of remaining after that branch instruction in the set of 8 has to be fetched starting from the branch target address. So, once again the remaining new set of instructions is requested to Instruction Cache which will supply that needed number of instructions to the fetch unit. The fetch unit will then transfer the 8 instructions to the Instruction Buffer.

After the set of instructions had been executed, the Branch type instruction will return the information of branch taken or not taken to the Fetch Unit and also to Reorder Buffer. The fetch unit will then send the "update" signal to BPU to modify the corresponding branch prediction entry for the next prediction. It does so by making "update" signal to high and putting address of last 6 bits to "branch_addr" and branch taken or not taken information through "taken" signal.

Also if the branch was mispredicted then new set of instructions are fetched from the Fetch Unit with similar process as before and old set of mispredicted instructions are flushed out from Reorder Buffer and in any Reservation Stations.

Branch Prediction Unit

The branch prediction unit contains 64 entries of two bits which gives the information about the branch to take or not to take. This prediction scheme that we have used is the Bimodal Branch Prediction scheme. The 64 entries are addressed by the last 6 bits of branch instruction. So, each entry correspond to a branch instruction at a particular location in the program. The two bits contents can have 4 possible values: 00, 01, 10, 11.

The unit initiates its action for two situations: one when the "predict" input is high and the other when the "update" input is high. When the "predict" input is high, the corresponding 6 bits of input is read to get the position of entries of the branch in the table. Then prediction is done. The most significant bit determines the prediction. Here, 00 and 01 signifies the prediction of not to take the branch. The 10 and 11 implies the prediction to take the branch. The predicted bit is returned through "take" output to Fetch Unit. If the update bit is high, then the BPU gets to the position located by 6 bit input of "branch_addr" and according to information from "taken" input bit, if the particular branch was actually taken, its corresponding entry is incremented with saturating on 11. If the branch was not actually taken, the corresponding entry is decremented with saturating on 00. Repeatedly taken branch will be predicted to be taken, and repeatedly not-taken branches will be predicted to be not-taken.

Instruction Buffer

The instruction buffer has 16 entries which holds the instructions and sends the first 8 instruction to the Decode, Rename & Dispatch unit. If there is a stall then it temporarily holds the instruction and stores the newly fetched instruction in the last positions.

Decode, Rename & Dispatch Unit

After the set of 8 instructions are issued from the Instruction buffer unit, each instruction is decoded for its opcode in this unit. Depending on the type of instruction to be either Control Instruction, Integer Operation, Floating-point operation or Data transfer operation, the dispatch unit dispatches the instructions to the particular Reservation Stations of corresponding execution units. Since we have decided to have 4 Branch Units(BU), 8 Integer Units(IU), 4 Floating Point Units(FPU) and 6 Load/Store Units(LSU), if in the set of 8 instructions there are instructions greater than the number of Execution Units of particular type, the instructions are queued in the Reservation Stations.

The decode, rename and dispatch unit checks for operands in the Register File and sets the register file to output two or one operands depending on the instruction type to corresponding field in the Reservation Units. While checking for the source and destination operands of the 8 instructions at a time, it checks for any data dependencies between those instructions and previous pipelined instructions including RAW (Read After Write) hazard, WAW (Write after Write) and WAR (Write After Read) hazards. WAW and WAR hazards are removed by assigning the new Rename Register to that particular operand after finding those hazards, puts the mapping of architected register to renamed register in the mapping table and sends the address of that Rename Register to the Reservation Stations through Register File.

Moreover, the instruction gets an entry in the Re-order Buffer in the order of issue. The entry resides there until the instruction gets executed, dependencies resolved, and the instruction committed. When multiple instructions are executed in parallel possibly out-of-order, the result is written in the corresponding entry in the Reorder Buffer of that instruction.
Reorder Buffer

The reorder buffer is the buffer with one entry per active instruction which has been dispatched for execution but not yet committed. This buffer is implemented as a FIFO queue for the active instructions in the order of issue. As instructions are dispatched they are given an entry in the Reorder Buffer. As instructions complete execution in parallel with order instructions which may happen out of order, the result value is placed in the corresponding entry in the Reorder Buffer wherever they are. When the instruction reaches the head of the FIFO buffer, if it has completed execution and dependencies resolved, the entry is removed from the Reorder Buffer and result value placed in the corresponding architected register. The typical Reorder Buffer structure is as follows:

Entry

Busy

Instruction

State

Destination

Value

1

No

Add R1, R5, R6

Commit

R1

11

2

Yes

Sub R7, R5, R1

Write result

R7

3

4

Reservation Stations

Reservation Stations (RS) are the temporary storage before the execution units for the instructions that are dispatched and awaiting execution with operands all of which may not be ready at the time of dispatch. The Reservation Station holds the decoded instruction until it is free of all the dependencies and the execution unit is free to execute the instruction. When an instruction is dispatched to the Reservation Station, any already available operand values are read from the register file and placed in the reservation station. After that, the RS checks for the operand names of unavailable data with the result operands of completing instructions that comes back to RS through feedback bus. If there is a match, the result value is placed on the reservation station. When the instructions in RS have all of its operand ready, then the instruction is issued to the execution unit if it is free. There is separate Reservation Stations for each number of individual functional unit to provide the efficient flow of parallelism. The structure of the Reservation Station looks as follows:

operation

source1

data1

valid1

source2

data2

valid2

destination

The decoded opcode is stored on the operation field. The two source operands are designated in source1 and source2. The data field contains the value of the operands. The valid bit indicates if the data value is present in the reservation station. As an instruction completes and produces result data, a comparison of the instruction’s destination field is compared with the source fields of the instructions in the reservation station. If it matches, the data is written to data field and the valid bit is set. If all the source operands are ready, then the instruction is issued to the execution unit for its execution.

Instruction Execution and Comitting

We have 4 different general types of execution units: Branch Unit, Integer Unit, Floating Point Unit, Load/Store Unit where each have their Reservation Stations to support instruction level parallelism. We have 4 Branch Units, 8 Integer Units, 4 Floating Point Units, 6 Load/Store Units which has been decided on our design considering 8-issue superscalar system and the relative average occurrence of particular type of instructions and the cost considerations. These units execute all the 55 instructions that we have designed for our Instruction Set. Integer Arithmetic, Logic instructions and some data transfer between registers are handled by Integer Unit. Floating Point arithmetic is handled by Floating Point Unit. Control Instructions including Jump and Branch on condition are handled by Branch Unit. Load/ Store Instruction and instructions involving data transfer to memory are handled by Load/Store Unit.

When the execution of instruction is finished, the result operand value is sent to the Reorder Buffer for corresponding entry, sent as feedback to Reservation Stations and also sent to Data Cache if it was a Store type instruction. The Load instruction initiates data transfer from Data Cache and sends the result back to Reorder Buffer. When the result is sent back to all the Reservation Stations, it is compared with all the pending operands that is waiting for the result. If it matches then the result value is written to the corresponding source operand field. This provides the forwarding mechanism to make data available to waiting instruction as soon as possible which reduces instruction stall that might have arisen with the late availability of the data.

When the result value of instruction executed is sent to the Reorder Buffer, the corresponding entry for the instruction gets the destination value. If the destination is the rename register the corresponding rename register in the Register File is updated. If the instruction also reaches the first 8 positions of the FIFO Reorder Buffer, then the result value is updated in the actual architected register corresponding to that rename register shown by the mapping table. This is the commit state of the instruction that happens in order satisfying the sequential order in which the program is written.

Memory Design

Design of Memory and Cache Management

Line (Block) size : 64 Bytes ( 2⁶ B)

Main Memory size : 512Mbytes (2²⁹ B)

Number of lines in Memory = 2²⁹/ 2⁶ = 2²³ lines /M

Set = 4 lines/set (2² Lines)

Page size = 16KB (2¹⁴)

Number of Pages = 2²⁹/ 2¹⁴ = 2¹⁵ pages/M

Cache size = 512 Kbytes = 2¹⁹ bytes

Number of lines in Cache = 2¹⁹ /2⁶ = 2¹³Lines/cache

Number of sets in Cache = 2¹³/2² = 2¹¹ sets/cache

Number of possible lines per one set = ( 2²³ lines/M) / (2¹¹set/cache) = 2¹²Lines from M can be / set

Detailed Model of Memory and Cache Management

The memory and cache management is designed in detail as above. This design has been simulated in VHDL to show the transfer of data between CPU, Cache and Main Memory in the presence of cache hit on read and write and cache miss on read and write instruction initiated by CPU.

The cache has input and output or bi-directional signals of clock, address, data_CPU, address_M, data_M, load, store, load_to_cache, done_to_CPU, store_M and ready_M. When CPU wants to load a value from memory, it puts the address in address signals of 29 bits and sets the ‘load’ signal to high to notify Cache to load the data. If the CPU wants to store the data to memory, it puts the address in address signals of 29 bits and it sets the ‘store’ signal to high to notify the cache. The cache upon receiving either ‘load’ to high or ‘store’ to high, initiates its action to check if the memory addressed is available in the cache. It went on to steps as shown on the following diagram:

The cache checks ‘set’ number to which the ‘address’ belongs by looking at bits from 16 downto 6 which gives the corresponding set number. Then the cache looks in that set and compares each of the 4 entries’ TAG value with TAG in address(the bits from 28 downto 17). If there is a hit, then corresponding word addressed by bits from 5 downto 3 is accessed. If the instruction was to load(HIT ‘load’), then the corresponding word is returned to the CPU via ‘data_cpu’ signal.

If the instruction was to store on the hit(HIT ‘Store’), then the value of ‘data_CPU is stored in the cache at the corresponding entry. Since we have write-through policy, the corresponding entry in the Main Memory has also to be modified. So, the cache puts the address in the "address_M" signal and data of unit of one "line" (64 Byte) in the "data_M" signal and sets "store_M" to high to notify the main memory. Main memory upon finding "store_M" high stores the line passed in "data_M" signal to the address at "address_M".

However if there is not any match on checking the 4 entries of the set, the "miss" occurs. In this case the main memory has to be accessed. If we have a miss on the ‘load’ instruction by CPU (MISS ‘load’) , then the cache has to fetch that data from the main memory. So, the cache puts the address in the "address_M" and sets its "load_to_cache" signal to high to notify the main memory. Main memory upon receiving this signal, gets the line of data at "address_M" and puts it into the "data_M" signal connected to cache and sets "ready_M" signal to high. The cache upon finding ready_M to high gets the line form "data_M", chooses a random entry, as specified by Random Replacement Policy, in the set and replaces that with the new loaded ‘line’ of data and returns the corresponding word value of 64 bits (denoted by bits 5 downto 3) to CPU through "data_CPU" signal.

If the miss occurs in the ‘store’ instruction by CPU (MISS ‘store’), then since we have decided on our design to have ‘write-allocate’ policy, the missing block(line) from the main memory is loaded going through the same process as in miss in the ‘load’ in previous paragraph. Then it modifies the corresponding word with the new value of "data_CPU" and then its copies back the modified block to the main memory by the same process as in HIT on ‘store’.

However, if the block (line) needed is not even in the main memory, we have the "Page Fault " So page of size 16 Kbyte is loaded from the Secondary Memory to the main memory and the corresponding block is loaded to the cache and then the depending on ‘load’ or ‘store’ instruction the data is either loaded to CPU or stored to cache and the main memory. Writing back to secondary memory occurs only when the page is going to be replaced by another new loading page.

Simulation of Memory Management:

The simulation of memory management and data transfer does the following things:

First the instruction: Load R1, $00000000 is sent by testbench to the CPU where it should create "HIT READ" since the entry is in the cache. The CPU passes the address to cache where it compares the TAG with that of address and returns the data to CPU as it matches. The completion of this instruction takes about 2 clock cycle since only cache access used.

Second the instruction, STORE $00000084, R1 is sent by testbench to CPU where it should result to "HIT WRITE" as the entry is in the Cache initialized before the simulation. However, since the Specification for our design is to have Write-through policy, the cache stores it both to itself and to the main memory. So, the completion of this instruction takes long time about 13 clock cycles since both cache and one memory access are done.

Third the instruction, LOAD R3, $000000a0 is sent by the testbench to CPU where it should result to "MISS READ" since the entry is not put in the cache at the initialization. So when cache checks the TAG it finds the misses to all 4 lines and then requests main memory to transfer the corresponding line to the cache. So, the completion of this instruction takes about 13-14 clock cycles since one memory access is used.

Fourth the instruction, STORE $00000048, R3 is sent by the testbench to CPU where its should create "MISS WRITE" since the entry does not exist during initialization. So when cache checks the TAG in the corresponding set, it finds the miss. Since we decides to use Write-allocate policy, first the corresponding line has to be loaded from memory to cache and then new word value stored at the particular position and then the modified line should be store to memory again. So two memory accesses are used , hence the completion takes about 25 clock cycles as shown on the simulated waveform.

Simulation of the system

The system was simulated for its fetching unit and decode-dispatch unit in VHDL that shows 8 instructions at a time are being fetched and dispatched to the corresponding unit out of 4 categories of multiple units (BU, IU, FPU, LSU) one after another. This shows the superscaled pipeline of multiple instructions being fetched and dispatched to corresponding execution units. The particular test on VHDL shows three sets of 8 instructions being fetched and issued one after another.

The actual instructions that are used for simulation are:

LOAD R1, $00000000F

ADD R3, R5, R10

Load R09, $00000089

ADDF R20, R18, R24

BEQ R3, R10, $000000A0

shl r4

store $000000B8, r21

load r11, $00000056

-- 1 BU , 3 IU, 1 FPU, 3 LSU instructions in above set of 8

AND R5, R8, R10

SUBF R18, R19, R24

LOAD R31, $00000081

ROR r3

BLE R2, R8, $000000A7

store $00000FFF, r17

MULF R11, R5, R120

BGE R2, R8, $000000A7

-- 2BU , 2 IU, 2 FPU, 2 LSU instructions in above set of 8

ADDF R10, R19, R32

AND R3, R21, R6

SUBF R12, R11, R14

LOAD R31, $00000081

ROL r3

MULF R11, R5, R120

CLR R10

BGE R5, R3, $00000068

-- 1 BU , 3 IU, 3 FPU, 1 LSU instructions in above set of 8

The simulation results are shown in next page:

Conclusion

Overall, the project of System, Instruction Set and Memory and Cache Management design was completed successfully by out team. We learned to design instructions and its format in RISC type instructions which provide full support for Instruction Level Parallelism. The design helped us to explore the workings of superscalar system and purpose of different components like Reorder-Buffer, Reservation Station, Branch Prediction, Rename Registers to provide the performance of Instruction Level Parallelism with multiple issues. We learned the design of memory management and saw the transfer of data between CPU, cache and main memory while simulating our model. In fact, we had the opportunity to implement different cache policies like write-through, write-allocate and random replacement. In turn, the project gave us the design skills and the opportunity to implement various possibilities of building system and memory management.

Entry	Busy	Instruction	State	Destination	Value
1	No	Add R1, R5, R6	Commit	R1	11
2	Yes	Sub R7, R5, R1	Write result	R7
3
4