- Reducing Branch Delay. In the MIPS pipeline architecture shown schematically in Figure 5.4, we currently assume that the branch condition is evaluated in Stage 3 of the pipeline (EX). If we move the branch evaluation up one stage, and put special circuitry in the ID (Decode, Stage #2), then we can evaluate the branch condition for the beq instruction.
- . ISA says N instructions after branch/jump always executed –MIPS has 1 branch delay slot Stall (+ Zap). prevent PC update. clear IF/ID pipeline register –instruction just fetched might be wrong one, so convert to nop. allow branch to continue into EX stage.
- In case the branch did modify the PC, the fetch+decode will take notice and decode the next instruction from new destination, so on classic MIPS the branch delay slot is only 1 instruction 'big' (I have no idea if more complex MIPS CPUs can have more stages and more delay slots available, technically with 5 stage pipeline even 5 instructions delayed sounds HW possible, but it would be probably very difficult to use practically and sounds like it would create more problems than help).
- . ISA says N instructions after branch/jump always executed –MIPS has 1 branch delay slot Stall (+ Zap). prevent PC update. clear IF/ID pipeline register –instruction just fetched might be wrong one, so convert to nop. allow branch to continue into EX stage.
DO NOT ATTEMPT TO COPY MY FILES IN YOUR PROJECT!
千万别在课设里抄袭!!!!
Overview & Introduction
Mips Delayed Branch
This ties in with the other delay slot issues such as issue #330 for mips and so should be considered when implementing their fix. I have come across another related issue to the mips branch delay problems. It may be considered that this is just how unicorn works with regards to delay slots.
This is a repository for the copy of submitted and accepted project files in the 'Computer Organization' course in School ofComputer Science and Engineering(SCSE),Beihang University. All the projects files in this repository are finished during the autumn semester of 2018-2019(1st semester of Grade 2)
This repository contains the following projects:
*Project3:Monocycle CPU implemented and simulated by Logisim
*Project4:Monocycle CPU implemented and simulated by Verilog (Xinlix ISE and ISIM)
*Project5:5-stage Pipeline CPU implemented and simulated by Verilog (Xinlix ISE and ISIM)
-achieved hazard control (stall/forward) and branch delayed slot
-support a mips instruction set containing 11 instructions
*Project6 (for HAC Honor College(23rd faculty)):5-stage Pipeline CPU (supporting Interrupt Request and Exception )
-implemented and simulated by Verilog (Xinlix ISE and ISIM)
-supporting Interrupt Request and Exception
*Project6 (for SCSE (6th faculty)):5-stage Pipeline CPU implemented and simulated by Verilog (Xinlix ISE and ISIM)
-support a mips instruction set containing 50 instructions
-support integer multiplication and division
*Project7:5-stage Pipeline CPU (the combination of P6 HAC version and Non-HAC version)
supporting Interrupt Request and Exception
-support a mips instruction set containing 50 instructions
-support integer multiplication and division
*Project 8:5-stage Pipeline CPU (FPGA,hardware and software interface)
-support I/O
--support uart transmission( implementing this function with interrupt request)
--support 8-digit digital tube
--support user keyboard
-contains 3 mips code which implement a calculator, a uart-transmission test,a counter on the 5-stage CPU
-bit files are generated ,loaded and tested on Xilinx Spartan6 XC6SLX100 FPGA board,speed leve 2,packageFGG676
Ben Dugan, Winter 2001
Instructions:
This test has 10 questions, totaling 90 points.
The test is closed book, closed notes, closed calculator, open mind,etc. Remember, don't spend all your time on one question. You shouldbudget roughly one minute per point. Please show your work forpartial credit.
Question | Possible | Score |
---|---|---|
Page 2 | 6 | |
Page 3 | 10 | |
Page 4 | 10 | |
Page 5 | 10 | |
Page 6 | 10 | |
Page 7 | 8 | |
Page 8 | 17 | |
Page 9 | 19 | |
Total | 90 |
Question 1. Data Dependences and Data Hazards (6 points)
Consider the following fragment of a program:1. Given a 5 stage pipeline that resolves data hazards through stalling,how many cycles would the above sequence take to execute? Assume thatthe results from an ALU operation are available 3 cycles later, and resultsfrom loads are available 2 cycles later. You should ignore the 'startupcost' of the pipeline. In other words, don't count the cycles before thepipeline is at full capacity.2. Given a 5 stage pipeline that resolves data hazards through forwarding,how many cycles would the above sequence take to execute? Again,ignore the 'startup cost' of the pipeline.3. Can you reorder the above sequence to further improve the forwardingpipeline? If yes, show the new sequence (you can just show thenumbers of the instructions). If no, explain your answer.Question 2. Control Hazards (10 points)
Consider the following fragment of a program:Assume that the loop iterates 10 times and that our pipeline has abranch delay of 2 cycles. That is, the branch is resolved at theend of the Execute state (the third stage). The pipelineuses forwarding to resolve data hazards to the extent possible.1. Suppose the pipeline resolves branch hazards by always stallingthe pipeline for 2 cycles. How many cycles does it take to executethe above fragment? Again, ignore the 'startup cost' of the pipeline.2. Suppose the pipeline uses a predict-not-taken scheme, where everybranch is predicted as not taken. Signal slot vs observer pattern. Correct predictions cost nothing,mispredictions cost 2 cycles. How many cycles does it take to executethe above fragment? Again, ignore the 'startup cost' of the pipeline.3. Suppose the pipeline uses a branch prediction table. The penaltyfor a misprediction is 2 cycles. How many cycles does it take to executethe above fragment? (Assume mispredictions on the first and last iterationof the loop.)4. Suppose the pipeline uses delayed branches (with 2 delay slots).Rewrite the code fragment (you can just show the numbers) to takeadvantage of the delay slots. Insert nops if you can't fill all ofthe slots. How many cycles does it take to execute the new code?
Question 4. Machine Organization (10 points)
Correct answers were #1=1, #2=3, #3=4, #4=3, #5=1. Some answers receivedpartial credit.1. Which of the following best describes the impact of going from a single-cycle implementation to a multi-cycle implementation?
- decrease cycle time, increase CPI
- decrease cycle time
- increase cycle time, increase CPI
- increase cycle time, decrease CPI
- increase CPI
- decrease cycle time, increase instruction count, increase transistor count
- increase cycle time, increase CPI, decrease instruction count
- decrease cycle time, increase transistor count, usually increase CPI
- decrease cycle time, increase transistor count
- increase cycle time, decrease transistor count
- increase cycle time, increase data dependences, increase branch delays
- decrease cycle time, increase data dependences
- decrease cycle time, decrease CPI, increase data dependences
- decrease cycle time, increase data dependences, increase branch delays
- decrease cycle time, decrease data dependences, increase branch delays
- decrease CPI, decrease instruction count, increase data dependences
- decrease CPI, increase instruction count, increase data dependences
- decrease CPI, increase data dependences
- increase CPI, increase data dependences
- increase instruction count, decrease cycle time
- decrease instruction count, minimize loads and stores, improvedinstruction scheduling
- decrease instruction count, improved cycle time
- decrease instruction count, minimize loads and stores,
- increase instruction count, improved cycle time
- increase instruction count, improved instruction scheduling
Question 5. Caching (10 points)
Suppose a benchmark program executes 10,000,000 instructions. On aprocessor with one level of cache, it has a cache miss rate of 10% andthe penalty for a miss is 50 cycles.1. Suppose 10% of the instructions are stores and 20% are loads. Whatis the average number of memory references per instruction?2. What is the average number of stall cycles per instruction?3. In a perfect world (no cache misses) assume the program has aCPI of 2.0. What is the relative performance of the program in theperfect world versus the real world (with misses)?4. Suppose we add an L2 cache (a second cache between the L1 cache andthe main memory) that has a miss rate of 1%. If a miss occurs in theL1 cache, the L2 cache is checked for a possible hit. Misses in theL1 cache that hit in the L2 cache incur a penalty of only 5 cycles,while misses in both caches cost 50 cycles. How much faster is the neworganization?
Mips Pipeline Branch Delay Slots
Question 6. Memory Hierarchy (8 points)
Correct answers were #1=4, #2=1, #3=4, #4=2. Some answers receivedpartial credit.1. Which of the following best describes the behavior of a write-back,write-around cache on a read miss?
Our first and foremost goal is to constantly update the slot machines demo collection, categorizing them based on casino software and features like Bonus Rounds or Free Spins. Play 5000+ free slot games for fun - no download, no registration or deposit required. SlotsUp has new advanced online casino algorithm which is developed to pick the. Play 30+ FREE 3-reel and 5-reel slots: Mountain Fox, Treasures of Egypt, Flaming Crates, Prosperous Fortune, Magic Wheel, Fruit Smoothie, Party Bonus, Video Poker and more! FREE Online Slot Machines! Win at least 500 credits and press the sweepstakes button to enter. Free online bonus slot machines no download games. 8888+ Free Online Slots Games Free Slots No Download Slot Machines No Credit Card, No Registration. $/£ 20 Free No Deposit Bonus! The games can be played any time, anywhere, as long as you have an internet connection and there is no need to download software, or register details, so playing is anonymous. Why Play Free Slots Online? If you love slots games, then free online slots allow you to experience the thrill of real money gambling, without spending a dime.
- The block is updated only in memory
- The block is updated only in the cache
- The block is fetched from memory into the cache
- Dirty words in the existing block are written back tomemory, and the new block is brought in from memory
- Dirty words in the existing block are written back tomemory
- The block is fetched from memory, updated in the cache, andupdated in memory.
- The block is updated only in memory.
- The new block is fetched from memory and updated in the cache.
- The block is updated only in the cache.
- Dirty words in the existing block are written back to memory,the, the new block is fetched from memory, updated in the cache,and updated in memory.
Mips Branch Instructions
- Nothing, the miss is handled in hardware.
- The OS loads up the TLB with the correct page table entry, andcontext switches to another program.
- The program has performed an illegal operation, so it is killed.
- The OS loads up the TLB with the correct page table entry, andrestarts the program.
- The OS brings the right page in from disk.
- Nothing, the page fault is handled in hardware.
- The OS initiates a disk I/O to bring in the right page, andcontext switches to another program.
- The OS loads up the TLB with the correct page table entry, andcontext switches to another program.
- The OS brings in the right page from disk.
- The OS initiates a disk I/O to bring in the right page, andrestarts the faulting program.
Question 7. Cache Organization (5 points)
Suppose we have a cache with block (line) size of 8 words (32 bytes) andthat this cache has a total of 1024 (1K) lines.1. What is the total capacity of the cache (in bytes)?2. Show how the below 32 bit address is broken into its 3 componentparts (tag, index, discard).
Question 8. Virtual Memory & Paging (12 points)
Suppose we have a page-based virtual memory system with 8KB pages,and that we wish to have a 4GB virtual address space.1. How many entries are in the page table?2. If each page table entry is 4 bytes, how large is the page table (in bytes)?3. Show how the below 32 bit address is broken into its 2 componentparts (virtual page number and offset).
4. How can the OS allow process P1 to share a region of memory withprocess P2 in a page-based VM system?5. How can the OS guarantee that process P1 doesn't get to read or writememory belonging to process P3 in a page-based VM system?Question 9. TRUE/FALSE (15 points)
- TRUE / FALSE Paging systems usually use a write-through policy inorder to keep the disk coherent with memory.
- TRUE / FALSE TLBs are used to cache data on pages.
- TRUE / FALSE Page faults are generally handled in hardware.
- TRUE / FALSE Higher associativity reduces conflict misses.
- TRUE / FALSE Larger lines (blocks) in a cache may increaseconflict misses.
- TRUE / FALSE Larger caches decrease both the overall miss rate and access time.
- TRUE / FALSE An advantage of smaller disks (in diameter) is thatthe seek time decreases.
- TRUE / FALSE By spinning disks faster we can decrease the rotationallatency of a disk I/O.
- TRUE / FALSE The ethernet is an example of an interconnection networkthat uses centralized arbitration.
- TRUE / FALSE In the MIPS procedure call convention, leaf procedures(those that don't call any other procedures) often don't have to save anyregisters.
- TRUE / FALSE The VAX implementation is complicated by the many addressing modes provided by the the ISA.
- TRUE / FALSE In the BEQ instruction, a branch target may not befurther than 2^15 bytes away.
- TRUE / FALSE Compilers for the MIPS will typically place long-livedvalues into caller-saved registers ($T0-$T9) in order to minimizeloads and stores to/from the stack.
- TRUE / FALSE The cycle time of a pipelined machine is determinedby the maximum time required to complete a single stage of the pipeline.
- TRUE / FALSE This statement is false.