- The problem is to compute the distance
cusing the Pythagoras theorem:

- This requires hardware to:
- Square values
aandb. - Add the squared results.
- Take the square root of the sum.
- Square values
- In a modern processor running at, say, 1 GHz, we can only perform a limited amount of logic (around 20 gates deep) within a single clock cycle. If the logic is deeper, it won't complete before the next clock cycle begins, leading to incorrect behavior.
-
The computation is too deep to fit in a single clock cycle, so we distribute the operations across multiple clock cycles.
-
Breakdown of the pipeline stages:

- Cycle 1: Square
aandband store these intermediate results in flip-flops. - Cycle 2: Add the squared values and store the result in another set of flip-flops.
- Cycle 3: Take the square root of the sum. (In reality, calculating the square root might take multiple cycles, but for simplicity, it's assumed to take one cycle here.)
- Cycle 1: Square
-
Each clock cycle advances the computation one step further, with intermediate values being captured in flip-flops to prevent loss of data across cycles.
- In traditional RTL design, you explicitly define the logic operations and the flip-flops that capture intermediate results.
- For example:
- Cycle 1: The squaring of
aandbis computed, and the results are stored in flip-flops. - Cycle 2: The addition of
a^2andb^2is performed, and the sum is stored in flip-flops. - Cycle 3: The square root is computed and stored.
- Cycle 1: The squaring of
- While functional, RTL requires careful management of timing to ensure that each stage captures the correct results at the right clock edge, leading to complex timing management.
- TL-Verilog introduces a higher-level abstraction called pipeline stages. Instead of manually coding flip-flops between stages, the pipeline abstraction in TL-Verilog automatically implies the presence of flip-flops.
- For example, in TL-Verilog:
- You define a pipeline called
calc. - The pipeline has multiple stages:
stage 1,stage 2, andstage 3. - In each stage, you specify the logic, and TL-Verilog automatically handles the flip-flops between stages.
- You define a pipeline called
- Pipeline abstraction allows you to focus on the operations (e.g., squaring, adding, square rooting) without worrying about the low-level implementation details like flip-flop placement.
-
The code in TL-Verilog mirrors the conceptual diagram of a pipeline. You define the pipeline stages, and within those stages, you describe the operations.

-
Traditional RTL Code:
- You have to explicitly manage when the squaring, adding, and square-rooting happens, and manually add flip-flops between stages.
-
TL-Verilog Code:
- The stages are defined abstractly, and the flip-flops are implied based on how you structure the stages.
-
This leads to code reduction, as TL-Verilog eliminates the need to manually code flip-flops and timing management. This reduction in code improves readability, decreases bugs, and speeds up the design process.

- The main benefit of TL-Verilog's timing abstraction is the separation of function from timing.
- Function: The operations you want to perform (squaring, adding, square rooting) stay the same.
- Timing: The pipeline stages that define when these operations occur (e.g., squaring in stage 1, adding in stage 2) can be adjusted independently without changing the logic of the circuit.
- This abstraction provides the flexibility to change the timing of operations (i.e., adjust the number of pipeline stages) without affecting the overall behavior of the design.
- In traditional RTL, changing the number of cycles in a design would require significant rewiring of the flip-flops and logic, increasing the chance of introducing bugs.
- Design Adaptation: Imagine the signal
ais located on one corner of the chip, and the resultcis needed on the opposite corner. In modern silicon, it can take multiple clock cycles (e.g., 25 cycles) for a signal to propagate across the die. - Timing Flexibility: TL-Verilog allows you to stretch the pipeline by adding stages to account for this signal propagation delay. For example, instead of completing the operation in 3 stages, you might need 5 stages to handle the delay.
- The key point is that the logic stays the same (you are still performing the same squaring, adding, and square-rooting). The only difference is the timing—how long it takes for the operation to complete and how many stages are used.
- With TL-Verilog, changes to pipeline stages guarantee that the behavior of the circuit remains the same. The only risk is that timing might break if you try to consume data before it's been fully produced.
- However, the logic itself is unchanged, and this abstraction reduces the chances of introducing bugs related to timing changes (which are common in traditional RTL design).
- In traditional RTL, changing pipeline timing (e.g., adding extra cycles) requires significant manual effort:
- Rewiring flip-flops.
- Adjusting logic to match the new timing requirements.
- Ensuring the circuit continues to behave correctly after the changes.
- These changes are error-prone and time-consuming, often introducing bugs if not carefully managed.
- TL-Verilog simplifies this process by making timing changes much easier through pipeline abstraction, saving design time and reducing opportunities for mistakes.
- Pipeline logic helps break down deep computations into manageable stages spread over multiple clock cycles.
- TL-Verilog provides a timing abstraction that simplifies the design of pipelines, reducing code and bugs compared to traditional RTL.
- Flexibility: TL-Verilog allows for easy changes in the pipeline staging without affecting the functional behavior of the circuit.
- Timing abstraction separates the function of the logic from the timing, making the design more adaptable to different hardware environments (e.g., signal propagation delays).
- When your clock is running too fast for your logic to execute within a single clock cycle, pipelining is needed.
- Pipelines not only solve timing issues but also provide performance benefits by allowing designs to run at higher clock frequencies.
- Combinational logic between flip-flops limits the clock speed. The time between flip-flops defines the clock frequency.
- Pipelining divides the computation into smaller, manageable tasks spread across multiple stages, which allows the clock to run faster. By shortening the logic path between flip-flops, you can present a new set of inputs every clock cycle.
- While pipelining increases the number of clock cycles (stages) it takes to compute a result (increased latency), it enables higher throughput.
- Throughput improves because a new set of data can be processed every clock cycle. Therefore, more data is handled per second, even though individual results take longer to compute (due to multiple stages).
- In this example, we are computing the distance
cusing the Pythagoras theorem (c = sqrt(a^2 + b^2)). - The logic is distributed across stages. For example:
- Stage 1: Square values
aandb. - Stage 2: Add the squared results.
- Stage 3: Compute the square root of the sum.
- Stage 1: Square values
- Makerchip provides a waveform viewer to visualize the pipeline behavior over time.

- In a pipeline, data is distributed across time, meaning that inputs at an earlier stage impact outputs at a later stage.
- The waveform viewer lets you track how inputs (e.g.,
aandb) at different stages of the pipeline correspond to outputs (e.g.,c) a few clock cycles later.
- Combinational Logic (Single Cycle): In the example, if all computations are done in one cycle (non-pipelined), the logic to compute
c(squaringaandb, adding, and taking the square root) occurs in a single clock cycle.- In the waveform viewer, you would see that inputs
a = 9andb = 12(represented asCin hexadecimal) result in an outputc = F(hexadecimal) in the same cycle.
- In the waveform viewer, you would see that inputs
- Pipelined Logic: When pipelined, the logic is spread across multiple stages.
- For example, in a 3-stage pipeline, you would see:
- Stage 1: Inputs
aandb. - Stage 2: The intermediate result from squaring
aandb. - Stage 3: The final output
ctwo clock cycles later.
- Stage 1: Inputs
- The pipeline adds flip-flops between stages to store intermediate results and propagate data.
- For example, in a 3-stage pipeline, you would see:
- In the waveform, each signal is tagged with the stage at which it was generated (e.g.,
@1for stage 1,@3for stage 3). - For instance:
- Input
ain Stage 1 (tagged@1) affects the outputcin Stage 3 (tagged@3) two clock cycles later.
- Input
- This stage tagging helps you understand how intermediate results progress through the pipeline and when final results appear.
- TL-Verilog simplifies pipeline design by allowing you to focus on the logical operations at each stage without manually managing flip-flops.
- In TL-Verilog, a signal like
a_squaredin stage 1 automatically gets propagated to later stages (e.g., stage 2) through flip-flops, which are implied by the pipeline structure. - The timing abstraction in TL-Verilog treats these signals as part of a single pipeline, but under the hood (SystemVerilog level), they are separate signals for each stage.
- Retiming adjusts the position of flip-flops in the pipeline to better distribute the logic across stages.
- For example, if
a_squaredis computed in stage 1 and needs to be used in stage 2, a flip-flop is placed between the stages to hold the result ofa_squaredand propagate it to the next stage.
- Beyond pipelining, sequential logic involves feeding back data from later stages to earlier stages (e.g., Fibonacci sequence).
- For instance, if a feedback loop is added to the pipeline (e.g., feeding back the value of
a), it allows the current stage’s logic to depend on results computed several cycles earlier. - In this case, delayed versions of signals (e.g.,
a_4,a_12) are used, indicating data that has passed through the pipeline for several stages.
- In the Makerchip platform, you can visualize feedback loops and track how signals are staged through multiple flip-flops.
- The feedback loop might refer to the version of a signal from 4, 12, or more cycles ahead, creating complex interactions across different stages.
- Pipeline diagrams help in visualizing how flip-flops are implied in your design and how data flows across different stages.
- These diagrams are useful for understanding:
- Where logic operations are occurring.
- How flip-flops store and propagate data.
- How the feedback mechanism works, especially in sequential logic.
- Pipelining allows designs to run at higher clock frequencies by dividing computations into smaller tasks across multiple stages.
- The throughput of the design improves with pipelining, even though individual computations take longer (increased latency).
- TL-Verilog simplifies pipeline design by abstracting the timing of operations, reducing the complexity of managing flip-flops.
- The waveform viewer in Makerchip helps correlate inputs and outputs at different stages of the pipeline, allowing designers to track data flow over time.
- Feedback loops enable more complex sequential logic, where later-stage results are fed back into earlier stages for future computations.
-
- In TL-Verilog, identifiers follow a specific syntax based on their role in the design.
- Pipe Signals: These signals represent values that travel through the pipeline stages and are named using lowercase letters with underscores as delimiters between tokens.
- Example: A pipe signal would be written as
a_pipe_signal, withabeing the base andpipe_signalindicating the role and function.
- Example: A pipe signal would be written as
- State Signals: These signals store state values (not part of the current topic) and follow a camelCase or PascalCase naming convention. In TL-Verilog, PascalCase (where each token, including the first, starts with an uppercase letter) is the recommended style for these signals.
- Example:
StateSignalorComputeValuewould represent state signals.
- Example:
- Uppercase Identifiers: While not discussed in this session, uppercase identifiers follow a similar convention but are fully capitalized with underscore delimitation, typically used for constants or macros in designs.
- Example:
CONSTANT_VALUEorMAX_DEPTH.
- Example:
-
Numeric Identifiers:
- TL-Verilog allows numbers to be included in signal names but only at the end of a token. For example,
base64is valid, but identifiers cannot start with a number or have numbers standing alone without accompanying text.
- TL-Verilog allows numbers to be included in signal names but only at the end of a token. For example,
-
Implicit vs Explicit Pipelining:
- In TL-Verilog, all logic is inherently in a pipeline, even when you do not explicitly define it. By default, computations are assigned to stage 0 of the pipeline unless you specify otherwise.
- Explicit Pipeline Declaration:
- When explicitly defining pipeline stages, you designate where in the pipeline certain logic is performed.
- Example:
- Instead of performing all operations in stage 0, you can define
@1to specify that logic belongs to pipeline stage 1. - If an operation takes multiple cycles, you can split it into stages, such as stage 1 for computation and stage 3 for additional checks (like overflow detection).
- Instead of performing all operations in stage 0, you can define
- Pipeline Depth: This becomes useful for deep pipelines (multi-cycle operations), where different logic blocks occur in different stages, improving the design's performance and clock speed.
-
Pipeline Signals:
- Signals that are processed in different stages of the pipeline are propagated through the stages. These signals are stored in flip-flops between stages, and their values are computed in sequence as they move down the pipeline.
-
Pipeline Setup:
- The Fibonacci series is computed sequentially, with each term relying on the values of the previous terms.
- The pipeline handles these computations across multiple stages.
- Stage 1: Compute the sum of the last two Fibonacci numbers.
- Stage 2 (optional): Store the result in a register (flip-flop) for the next iteration.
- Simplified Example: Let’s say Fibonacci
Fn = Fn-1 + Fn-2is computed at stage 1. The result is fed into stage 2 for storage and further computation.
-
Difference from Non-Pipelined Version:
- In the earlier non-pipelined version, all logic was implicitly placed in stage 0. The design lacked the explicit assignment of pipeline stages, making it difficult to manage deep pipelines and more complex computation.
- The pipelined version explicitly declares where each piece of logic occurs (e.g.,
@1,@2, etc.), making it more modular and scalable.
-
Multi-Stage Error Conditions:
- In pipelines, different stages can encounter various conditions, such as:
- Bad Input: In stage 1, an invalid input could trigger an error.
- Overflow: In stage 3, an arithmetic overflow condition might arise.
- Divide-by-Zero: In stage 6, a divide-by-zero error might occur during computation.
- In pipelines, different stages can encounter various conditions, such as:
-
Error Signal Aggregation:
- When different error conditions are detected across stages, the pipeline must aggregate these errors into a single output signal indicating an error has occurred.
- Example:
- Let’s assume that in stages 3 and 6, the pipeline detects
error_stage3anderror_stage6, respectively. - To manage these conditions, the errors are aggregated using an OR gate. If any of these errors occur, the
errorsignal in stage 6 will assert.
- Let’s assume that in stages 3 and 6, the pipeline detects
-
TL-Verilog Code:
- In TL-Verilog, this error detection could be coded as:
@1
$err1 = ($bad_input || $illegal_op) ? 1 : 0;
@3
$err2 = ($err1 || $overflow) ? 1 : 0;
//$err2 = (>>2$err1 || $overflow) ? 1 : 0;
//when u do this, err1 value from 2 cycles after the current cycle is considered (not from 1st cycle) - so its wrong
@6
$err3 = ($divby0 || $err2) ? 1 : 0;
-
The error conditions from different stages (
@3,@6) are ORed together in stage 6 to produce the final error signal. -
Using the Waveform Viewer:
- The waveform viewer in Makerchip helps visualize how these error conditions propagate through the pipeline. When any of the error conditions are detected at their respective stages, the final
errorsignal should reflect this, allowing for debugging and validation of the design.
- The waveform viewer in Makerchip helps visualize how these error conditions propagate through the pipeline. When any of the error conditions are detected at their respective stages, the final
-
Example Scenario:
- The provided example circuit handles error conditions across a six-stage pipeline.
- Error conditions are detected in:
- Stage 3: Overflow or illegal operation.
- Stage 6: Divide by zero.
- The final error signal (
error3) in stage 6 indicates if any error occurred.
-
Logic for Error Conditions:
- The blue circles in the diagram represent OR gates, combining the error conditions.
- The red circles represent input conditions, which are assumed to be predefined in the pipeline.
- The logic could be coded up as follows:
@3 error_stage3 <= illegal_operation | overflow; @6 error_stage6 <= divide_by_zero; @6 error3 <= error_stage3 | error_stage6;
- Waveform Check:
- After coding up the logic, use the waveform viewer to inspect the behavior of the error signals.
- The final error signal (
error3) should assert whenever any of the error conditions in stages 3 or 6 are triggered.
- Objective: Integrate the calculator circuit you previously developed into a named pipeline (
calc), specifically in stage 1 of that pipeline. Also, incorporate the counter you developed into the same pipeline, stage 1. - Once both the calculator and counter are in the same pipeline stage, you can simulate the circuit to verify that both components work as expected in the context of the pipeline.
- When dealing with a high-frequency circuit, you may need to break down computations over multiple clock cycles for better timing closure.
- Calculator Circuit Modifications:
- Split the calculator operation into two stages:
- Stage 1: Perform the actual arithmetic operation (e.g., addition, subtraction, multiplication, or division).
- Stage 2: Use a multiplexer to select the correct result based on the selected operation.
- Split the calculator operation into two stages:
- By breaking up the logic into two stages, the circuit can handle higher frequencies, ensuring that the timing of each stage is met.
- Since the circuit now takes two cycles to perform a computation, you must handle the input-output loopback appropriately.
- Modify the design so that:
- The output is looped back to the input with a two-cycle latency (rather than one cycle).
- This ensures that the computation’s result is available after two cycles and used as the next input, maintaining continuity in the iterative process.
-
Step 1: Output Loopback
- Modify the alignment of the output so that it loops back with a two-cycle latency.
- Initially, the multiplexer is still in the same stage as the arithmetic operations (addition, subtraction, etc.).
- The loopback incorporates two staging flops, which hold intermediate results and recirculate them back to the input after two cycles.
-
Step 2: Single-Bit Counter for Cycle Tracking
- You need to track whether the current cycle is a computation cycle or a meaningless cycle (where no valid computation occurs).
- Use a single-bit counter to toggle between 0 and 1:
- This counter alternates between 0 and 1, keeping track of even and odd cycles.
- This acts as an oscillator to determine which cycles should perform calculations (even cycles) and which cycles are idle (odd cycles).
- The output of this circuit serves as a valid signal, which is
1during computation cycles and0during meaningless cycles.
-
Step 3: Valid Signal and Reset Logic
- Connect the valid signal along with the reset signal to determine when to drive the output with a zero value:
- If the system is in reset or the cycle is invalid (odd cycle), the output should be driven to
0. - This ensures that during idle cycles, the circuit produces a zero output, keeping the system stable during non-computation cycles.
- If the system is in reset or the cycle is invalid (odd cycle), the output should be driven to
- Connect the valid signal along with the reset signal to determine when to drive the output with a zero value:
-
Step 4: Re-Timing the Multiplexer
- Move the multiplexer logic from stage 1 to stage 2, finalizing the two-stage pipeline design.
- By shifting the multiplexer, the selection of the arithmetic operation result happens in stage 2, while the actual operation occurs in stage 1.
- After making these changes, you can verify the circuit’s behavior in the simulation.
- The resulting pipeline will now perform a calculation every other cycle:
- During even cycles (valid cycles), the circuit performs computations.
- During odd cycles (invalid cycles), the inputs are meaningless, and the output is driven to zero.
- In the waveform viewer, you should observe the calculator executing operations on valid cycles and outputting results every other cycle, with zero output during invalid cycles.
- Pipeline Staging: Dividing the computation into two pipeline stages allows the circuit to operate at higher frequencies by splitting the workload across clock cycles.
- Two-Cycle Latency: The output is looped back with a two-cycle delay, reflecting the fact that the computation now takes two cycles to complete.
- Single-Bit Counter: The use of a single-bit counter creates a simple oscillation to track which cycles are valid for computation.
- Valid Signal: The valid signal ensures that computations only happen on designated cycles, and the circuit outputs zero during idle cycles.
- Multiplexer Re-Timing: Moving the multiplexer to the second stage separates the operation from the selection, ensuring that timing constraints are met without overloading a single pipeline stage.












