Rather than the CPU accessing memory directly itself (and therefore having to stall when performing a memory operation), have an intermediate block that fetches a chunk of memory and stores it locally in a fast-access register file ready for the CPU to access. The CPU is free to write values into the register file, which will then eventually be written into memory by the cache controller. The CPU will only ever access the register file, which will be a fast-access mirror image of the RAM itself. The cache controller should have the ability to stall the CPU if it needs to go and fetch some new content from the main system RAM.
If the cache does its job 100% perfectly (which is unlikely), then it will never have to stall the CPU at any point and thus each instruction will only take a single clock cycle per stage of the pipeline. Maybe include some control registers in the cache controller that the CPU can write to in order to specify a range of memory values that the CPU guarantees it'll stay within, thus working together to make sure it never gets a cache miss.
Rather than the CPU accessing memory directly itself (and therefore having to stall when performing a memory operation), have an intermediate block that fetches a chunk of memory and stores it locally in a fast-access register file ready for the CPU to access. The CPU is free to write values into the register file, which will then eventually be written into memory by the cache controller. The CPU will only ever access the register file, which will be a fast-access mirror image of the RAM itself. The cache controller should have the ability to stall the CPU if it needs to go and fetch some new content from the main system RAM.
If the cache does its job 100% perfectly (which is unlikely), then it will never have to stall the CPU at any point and thus each instruction will only take a single clock cycle per stage of the pipeline. Maybe include some control registers in the cache controller that the CPU can write to in order to specify a range of memory values that the CPU guarantees it'll stay within, thus working together to make sure it never gets a cache miss.