Df.fifo has 1 cycle latency even when the fifo is empty. We should add new version that can be configured to have 0 or 1 cycle latency when the fifo is empty. The existing Df.fifo can then just be hard-configured to have the 1 cycle latency. This will allow the user to make longest-path vs average latency trade-offs.