Eventually I'd like to get some blog going using github pages, on which I'll probably post a summary of the development of this project.
Pipeline hazards
After running a really simple program to test my CPU up to a point, I realised that it wasn't quite behaving properly, and it took a while for me to figure out that I hadn't accounted for any pipeline hazards... Specifically, the below program results in a data hazard in the pipeline:
addi r10, r10, 1
sw r10, r0, 1
jal r0, 0
Basically, the 5 stage pipeline gets in our way here. The 5th clock cycle is when the addi instruction completes its register WB stage (r10 latches its newly incremented value), but on that same clock cycle the sw instruction finishes performing a memory access operation (the old value of r10 gets stored in 1(r0)). This is a data hazard and can be fixed in a few ways: NOP padding (bad), pipeline stalling (better, but still bad) and operand forwarding (good!).
I had to do some signal plumbing around the hierarchy in order to support single-cycle operand forwarding, but was overall pretty straight-forward. In the instruction decoder, I pass in a single cycle delayed version of the rd_reg_offset to track which register is set to be overwritten in the immediately previous instruction (N-1). I then compare this delayed indicator with the current indicator for the source registers and check if they're equal, if so then we have a data hazard that needs to be accounted for. In this case, I read directly from the ALU's combinational output (before it's had a chance to get latched, since that would be too late... I need the signal before the clock edge). After some hacking around the simulation is indicating that it's working, with more debug required.
Moving to Verilator + GTKWave for the development of this project
TODO
Inserting a halt signal
In preparation for instantiating this design in another repo as a submodule (link to CPU hoist environment) I thought it would be useful to have some kind of halting logic, that when active would halt the entire CPU, essentially freezing the pipeline and causing it to 'pause' the current state of all the registers. This would also allow for some top level logic to sequence the CPU through individual instructions, allowing for the CPU to be single-stepped at the instruction level.
Using two different memories
The reason why I'm using two different memories is because the pipeline's Fetch and Mem Access stages need to operate simultaneously, and it's more convenient to have to separate memory blocks. Read more about I and D caches (which apparently exist largely for this reason).
Eventually I'd like to get some blog going using github pages, on which I'll probably post a summary of the development of this project.
Pipeline hazards
After running a really simple program to test my CPU up to a point, I realised that it wasn't quite behaving properly, and it took a while for me to figure out that I hadn't accounted for any pipeline hazards... Specifically, the below program results in a data hazard in the pipeline:
Basically, the 5 stage pipeline gets in our way here. The 5th clock cycle is when the
addiinstruction completes its register WB stage (r10latches its newly incremented value), but on that same clock cycle theswinstruction finishes performing a memory access operation (the old value ofr10gets stored in1(r0)). This is a data hazard and can be fixed in a few ways: NOP padding (bad), pipeline stalling (better, but still bad) and operand forwarding (good!).I had to do some signal plumbing around the hierarchy in order to support single-cycle operand forwarding, but was overall pretty straight-forward. In the instruction decoder, I pass in a single cycle delayed version of the rd_reg_offset to track which register is set to be overwritten in the immediately previous instruction (N-1). I then compare this delayed indicator with the current indicator for the source registers and check if they're equal, if so then we have a data hazard that needs to be accounted for. In this case, I read directly from the ALU's combinational output (before it's had a chance to get latched, since that would be too late... I need the signal before the clock edge). After some hacking around the simulation is indicating that it's working, with more debug required.
Moving to Verilator + GTKWave for the development of this project
TODO
Inserting a halt signal
In preparation for instantiating this design in another repo as a submodule (link to CPU hoist environment) I thought it would be useful to have some kind of halting logic, that when active would halt the entire CPU, essentially freezing the pipeline and causing it to 'pause' the current state of all the registers. This would also allow for some top level logic to sequence the CPU through individual instructions, allowing for the CPU to be single-stepped at the instruction level.
Using two different memories
The reason why I'm using two different memories is because the pipeline's
FetchandMem Accessstages need to operate simultaneously, and it's more convenient to have to separate memory blocks. Read more about I and D caches (which apparently exist largely for this reason).