Instructor: | Prof. Milo Martin |
---|---|
Due: | Wed, November 24th (via blackboard) |
The is the first part of a two-part assignment in which you will explore the effectiveness of out-of-order execution (dynamic scheduling) impact on instruction-level-parallelism. As a fully detailed simulation of such a core is beyond what could be feasibly done for an assignment, the model of the processor core described in this assignment is approximate in many ways (failing to capture many second-order effects) but captures the most important first-order effects of instruction execution.
This assignment uses the same traces as previous assignments (see Trace Format document).
As before, you'll write a program that reads in the trace and simulates different processor core configurations. You're welcome to write the program in whatever language you like; although we are going to ask for you to turn in your source code (via blackboard), to end result of this assignment are the experimental results (the program source code is just a means to the end).
This is the first (and smaller and easier) part of a two-part assignment. Part A is on register renaming (due before Thanksgiving break), and Part B is the dynamic scheduling and experimental part (due after Thanksgiving break). I had originally intended this to be a single large assignment, however, it is more programming that the previous assignments, so I wanted to break it into more manageable pieces.
Warning
This is the smaller part of this two assignments! By the time this one is due, to be on target, you should have already started on the next assignment.
The first step is applying register renaming to the instruction stream. You will need two data structures for this:
The renaming algorithm:
For each valid source register: microop.phys_reg_source[i] = maptable[microop.arch_reg_source[i]] For each valid destination register: microop.phys_reg_to_free[i] = maptable[microop.arch_reg_destination[i]] new_reg = freelist.dequeue() maptable[microop.arch_reg_destination[i]] = new_reg microop.phys_reg_destination[i] = new_reg
At commit:
for each valid destination register: freelist.enqueue(microop.phys_reg_to_free[i])
Handling flags. One wrinkle in dealing with x86 is that we need to handle the flags (condition codes). To avoid WAW and WAR hazards on the flags, you must rename them as well. To do this, we'll treat the flags like just another architectural register by assigning it architectural register 49. Any instruction the reads (or write) the flags is then set to read (or write) architectural register 49. Thus, each micro-op has up to three source registers and up to two destination, all of which must be renamed according to the above algorithm.
To help you debug your simulator, we've provided some renamed traces for you to compare your code with. Even without all the rest of the out-of-order machinery, you can generate a trace of the renamed instruction stream for a given number of physical registers. To generate a rename trace, instead of waiting to commit to free the registers, you can just free the register right after renaming it; the FIFO (first-in, first-out) nature of the queue will ensure the rename trace is the same.
The output rename trace has the following format:
43, r7 -> p49, r5 -> p7 [p5] | MOV LOAD 44, r5 -> p7, r44 -> p58 [p13], r49 -> p50 [p59] | CMP SUB_IMM 45, r49 -> p50 | J JMP_IMM 46, r45 -> p44 [p52] | CMP SAVE_PC
The first column is the micro-op count (the line number from the trace). This is followed by pairs of architectural registers and their mapping to physical registers. The destination registers also include the register that should be freed when they commit in square brackets. If an instruction doesn't have a valid architectural register, it doesn't display anything for that register. The register mappings are followed by the pipe ("|") symbol and then the macro-op and micro-op descriptors.
The renamed instruction trace for the gcc-1K.trace.gz for 60 and 512 registers, respectively:
As this is just the first part of a two-part assignment, there aren't any experimental results to report yet or questions to answer. Please turn in: