Computer Engineering Research Project | University of Jordan
Features • Architecture • Results • Team • Docs
🎓 Academic Affiliation: Computer Engineering Department, The University of Jordan
🎪 Fetch & Decode Stage
Dual-Fetch Mechanism:
- Fetches 2 instructions per cycle from I-Cache
- Parallel decode paths for both instructions
- Integrated branch predictor for control flow speculation
- Instruction alignment logic ensures proper pairing
- Supports variable-length instruction formats
Performance: Maintains sustained instruction supply to backend stages.
🎯 Reservation Station Design
18-Entry Organization:
| Operation Type | Entries | Purpose |
|---|---|---|
| ALU Operations | 8 | Arithmetic & Logic |
| Memory Operations | 6 | Load/Store & AGU |
| Branch/Control | 4 | Branches & Jumps |
Advanced Features:
- Dynamic instruction scheduling based on operand availability
- Real-time dependency tracking
- Priority-based issue logic
- Hazard detection and mitigation hardware
- Support for speculative execution
Research Findings: Larger RS capacity directly correlates with improved IPC for realistic workloads.
⚙ Execution Pipeline
Execution Unit Specifications:
| Unit | Function | Latency | Throughput | Pipelining |
|---|---|---|---|---|
| ALU-1 | ADD, SUB, AND, OR, XOR | 1 cycle | 1 op/cycle | Non-pipelined |
| ALU-2 | ADD, SUB, AND, OR, XOR | 1 cycle | 1 op/cycle | Non-pipelined |
| AGU | Address Calculation | 1 cycle | 1 op/cycle | Non-pipelined |
| BEU | Branch Resolution | 1 cycle | 1 op/cycle | Non-pipelined |
Design Rationale: Single-cycle execution units optimized for high clock frequency and low latency.
📊 Write-Back & Commit Stage
Triple Write-Back Pipeline:
- Result Capture: Simultaneous collection from 3 execution units
- Register Update: Parallel register file writes with bypass network
- ROB Commit: Reorder Buffer updates for in-order retirement
Advantages:
- 3× write-back bandwidth compared to single-port designs
- Eliminates structural hazards at commit stage
- Enhanced data forwarding reduces bypass latency
- Supports high-throughput OoO execution
mermaid
graph TB
A[Instruction Fetch] --> B[Dual Decode]
B --> C[Instruction Issue]
C --> D[Reservation Stations
18 Entries]
D --> E1[ALU-1
Integer Ops]
D --> E2[ALU-2
Integer Ops]
D --> E3[AGU
Address Gen]
D --> E4[BEU
Branches]
E1 --> F[Write-Back Units
3 Parallel Channels]
E2 --> F
E3 --> F
E4 --> F
F --> G[Commit Stage
In-Order Retirement]
style A fill:#e3f2fd
style B fill:#e3f2fd
style C fill:#fff3e0
style D fill:#fce4ec
style E1 fill:#e8f5e9
style E2 fill:#e8f5e9
style E3 fill:#e8f5e9
style E4 fill:#e8f5e9
style F fill:#f3e5f5
style G fill:#e0f2f1
Our research includes rigorous testing across multiple process, voltage, and temperature (PVT) corners to ensure robust operation:
| PVT Corner | Fmax (MHz) | Setup Slack (ns) | Hold Slack (ns) | Critical Path | Power (mW) |
|---|---|---|---|---|---|
| Slow 85°C | ~45 | ~1.5 | ~0.35 | RS→ALU→WB | Highest |
| Slow 0°C | ~48 | ~3.0 | ~0.29 | RS→ALU→WB | Medium |
| Fast 0°C | ~52 | Optimal | Minimal | RS→ALU→WB | Lowest |
| Typical 25°C | ~50 | ~2.5 | ~0.32 | RS→ALU→WB | Nominal |
View Detailed Timing Analysis
Setup Timing Analysis:
Critical Path: Reservation_Station → ALU_Execute → WriteBack_Network Total Delay: ~20.8 ns (Slow 85°C corner) Components:
- RS Issue Logic: 6.2 ns
- ALU Execution: 8.5 ns
- WB Routing: 4.1 ns
- Register File Write: 2.0 ns
Hold Timing Analysis:
- Minimum hold time met across all paths
- Clock skew compensated through careful routing
- No hold violations in post-layout analysis
Comprehensive Test Suite:
- ✅ 1,200+ instruction sequences tested
- ✅ Functional verification passed
- ✅ Cycle-accurate simulation validated
- ✅ Corner case analysis completed
- ✅ Data hazard scenarios verified
- ✅ Control flow testing passed
- ✅ Stress testing successful
Test Coverage Statistics:
Functional Coverage: ████████████████████ 98.7% Code Coverage: ███████████████████░ 96.3% Branch Coverage: ████████████████████ 99.1% Toggle Coverage: ██████████████████░░ 92.5%
|
Hardware Requirements:
|
Software Dependencies:
|
bash
git clone https://github.com/quantum-titans/ooo-processor.git cd ooo-processor
pip3 install -r requirements.txt
source scripts/setup_environment.sh
make compile
make verify
make synthesize
📦 Detailed Setup Instructions
bash git clone --recursive https://github.com/quantum-titans/ooo-processor.git cd ooo-processor git submodule update --init --recursive
For Ubuntu/Debian:
bash
sudo apt-get update
sudo apt-get install -y build-essential git python3 python3-pip
tcl-dev tk-dev libreadline-dev
For macOS: bash brew install python3 git tcl-tk
bash pip3 install --upgrade pip pip3 install -r requirements.txt pip3 install -r requirements-dev.txt # For development tools
bash
export QUARTUS_ROOTDIR="/path/to/intelFPGA/20.1/quartus" export PATH="$QUARTUS_ROOTDIR/bin:$PATH" export PATH="$QUARTUS_ROOTDIR/sopc_builder/bin:$PATH"
source ~/.bashrc # or source ~/.zshrc
bash
quartus_sh --version vsim -version python3 --version
./scripts/check_environment.sh
ooo-processor/ ├── 📂 rtl/ # RTL source files │ ├── core/ # Core processor modules │ │ ├── fetch_stage.sv # Instruction fetch │ │ ├── decode_stage.sv # Instruction decode │ │ ├── issue_logic.sv # Instruction issue │ │ └── commit_stage.sv # In-order commit │ ├── execution/ # Execution units │ │ ├── alu.sv # Arithmetic Logic Unit │ │ ├── agu.sv # Address Generation Unit │ │ ├── beu.sv # Branch Execution Unit │ │ └── jr_unit.sv # Jump Register Unit │ ├── memory/ # Memory subsystem │ │ ├── instruction_cache.sv # I-Cache │ │ ├── data_cache.sv # D-Cache (future) │ │ └── register_file.sv # Register file │ ├── ooo/ # Out-of-order logic │ │ ├── reservation_station.sv # RS implementation │ │ ├── reorder_buffer.sv # ROB for commit │ │ └── rename_stage.sv # Register renaming │ └── top/ # Top-level integration │ └── quantum_titans_top.sv # Main processor module ├── 📂 tb/ # Testbenches │ ├── unit/ # Unit tests │ │ ├── tb_alu.sv │ │ ├── tb_reservation_station.sv │ │ └── ... │ ├── integration/ # Integration tests │ │ ├── tb_processor.sv │ │ └── tb_full_system.sv │ └── common/ # Shared testbench utilities │ └── test_utils.sv ├── 📂 tools/ # Software tools │ ├── assembler/ # Custom assembler │ │ ├── assembler.py │ │ └── isa_def.json │ ├── simulator/ # Cycle-accurate simulator │ │ └── simulator.py │ └── profiler/ # Performance profiler │ └── profiler.py ├── 📂 docs/ # Documentation │ ├── architecture.md # Architecture guide │ ├── isa_specification.md # ISA details │ ├── verification_plan.md # Verification strategy │ ├── performance_analysis.md # Performance results │ └── images/ # Diagrams and figures ├── 📂 scripts/ # Build and utility scripts │ ├── build.sh # Main build script │ ├── simulate.sh # Simulation runner │ ├── synthesize.sh # Synthesis script │ └── analyze_timing.tcl # Timing analysis ├── 📂 benchmarks/ # Performance benchmarks │ ├── dhrystone/ │ ├── coremark/ │ └── custom/ ├── 📂 constraints/ # FPGA constraints │ ├── timing.sdc # Timing constraints │ └── pinout.qsf # Pin assignments ├── 📄 Makefile # Build automation ├── 📄 README.md # This file ├── 📄 LICENSE # MIT License └── 📄 requirements.txt # Python dependencies
bash
make test-all
make test-alu # ALU verification make test-rs # Reservation station tests make test-writeback # Write-back unit tests make test-integration # Full system integration
make test-all WAVES=1
bash
make sim-gui
vlog -sv rtl//*.sv tb//*.sv vopt +acc tb_processor -o tb_processor_opt
vsim tb_processor_opt do wave.do run -all
vsim -batch -do "run -all; quit" tb_processor_opt
bash
./scripts/analyze_performance.sh --detailed
cd benchmarks ./run_all_benchmarks.sh
./run_dhrystone.sh ./run_coremark.sh
./tools/profiler/profiler.py --input program.asm --output profile.json
python3 tools/profiler/visualize.py --data profile.json
bash
make synthesize-all
make compile # Analysis & elaboration make synthesize # Logic synthesis make fitter # Place & route make timing-analysis # Static timing analysis make assembler # Generate programming file
quartus_sh --flow compile quantum_titans
-c quantum_titans_fast
--optimize=area # or speed, power
quartus_sh --flow compile quantum_titans &&
quartus_sta quantum_titans -c quantum_titans &&
quartus_fit --report quantum_titans
quartus_pgm -c USB-Blaster -m jtag -o "p;output_files/quantum_titans.sof"
bash
cat > test_program.asm << 'EOF'
ADDI R1, R0, 10 # Load base address
ADDI R2, R0, 20 # Load offset
ADD R3, R1, R2 # Calculate address
LOAD R4, 0(R3) # Load from memory
ADDI R5, R0, 5 # Load constant
MUL R6, R4, R5 # Multiply
STORE R6, 0(R3) # Store result
EOF
./tools/assembler/assembler.py test_program.asm -o test_program.hex
./tools/simulator/simulator.py --hex test_program.hex --cycles 1000
vsim -do "do run_program.do test_program.hex" tb_processor
Architecture Guide Detailed processor architecture 📖 Read Document |
ISA Specification Instruction set architecture 📖 Read Document |
Verification Plan Testing methodology 📖 Read Document |
Performance Analysis Benchmark results 📖 Read Document |
- 🚀 Getting Started Guide - First steps with the project
- 🔧 API Reference - Tool and interface documentation
- ❓ Troubleshooting Guide - Common issues and solutions
- 💡 FAQ - Frequently asked questions
- 🤝 Contributing Guidelines - How to contribute to the research
- 📊 Benchmark Suite - Performance testing methodology
- 🎨 Design Decisions - Architectural choices and rationale
mermaid gantt title Quantum Titans Research Roadmap dateFormat YYYY-MM section Phase 1: Optimization Cache Hierarchy :2024-12, 90d Multi-phase Issue :2025-03, 60d Branch Prediction :2025-05, 45d Register Renaming :2025-06, 60d section Phase 2: Expansion 4-way Superscalar :2025-07, 90d OoO Memory Ops :2025-10, 75d Virtual Memory :2026-01, 60d Security Features :2026-03, 60d section Phase 3: Advanced SIMD Extensions :2026-05, 90d Power Management :2026-08, 60d Multi-Core Design :2026-10, 120d section Phase 4: Integration SoC Platform :2027-02, 180d OS Integration :2027-08, 120d
Conference Papers (Planned):
- "Optimizing Dual-Issue Out-of-Order Execution for High-Performance Computing" - IEEE International Conference on Computer Design
- "Large-Scale Reservation Stations: A Study on Instruction Window Sizing" - ACM SIGARCH Computer Architecture News
- "Write-Back Bandwidth Impact on Superscalar Processor Performance" - International Symposium on Computer Architecture
Workshop Presentations:
- University of Jordan Engineering Symposium 2024
- Jordan Computer Engineering Research Forum
- Regional Computer Architecture Workshop
Current System Limitations
1. Memory Subsystem:
- ⚠ No Cache Hierarchy: Direct memory access only
- Impact: Higher average memory latency (10-20 cycles)
- Workaround: Optimize data locality in software
- Future: Implement L1/L2 cache hierarchy
2. Branch Prediction:
- ⚠ Static Prediction Only: No adaptive mechanisms
- Misprediction penalty: 3-5 cycles
- Impact on branch-heavy workloads: 15-20% performance loss
- Future: Implement tournament or perceptron predictor
3. Power Consumption:
- ⚠ No Power Optimization: Focus has been on performance
- Estimated power: Higher than commercial designs
- Future: Add clock gating, power gating, DVFS
4. Scalability:
- ⚠ Limited to 2-Way Issue: Cannot exceed dual-issue
- Theoretical maximum: ~2 IPC
- Future: Expand to 4-way superscalar
5. Floating-Point Operations:
- ⚠ No FPU: Integer operations only
- Limited applicability for scientific computing
- Future: Add IEEE 754 compliant FP unit
For Different Workload Types:
Memory-Intensive:
- Organize data for spatial locality
- Use loop blocking techniques
- Minimize random access patterns
Branch-Heavy Code:
- Reduce conditional branches
- Use predication where possible
- Organize hot paths for fall-through
Compute-Intensive:
- Maximize instruction-level parallelism
- Unroll loops when beneficial
- Balance ALU vs AGU operations
We welcome collaboration from students, researchers, and industry professionals! Here's how you can contribute to this research project:
|
🐛 Report Issues & Bugs
✨ Suggest Research Directions
|
📝 Improve Documentation
🔧 Submit Contributions
|
bash
git clone https://github.com/YOUR_USERNAME/ooo-processor.git cd ooo-processor
git checkout -b feature/your-feature-name
make test-all make synthesize # Ensure it still synthesizes
git add . git commit -m "Add feature: detailed description"
git push origin feature/your-feature-name
- RTL Code: Follow SystemVerilog best practices
- Naming: Use descriptive, consistent naming conventions
- Comments: Document complex logic and design decisions
- Testing: Add testbenches for new modules
- Documentation: Update relevant markdown files
See CONTRIBUTING.md for complete guidelines.
This research project is released under the MIT License, promoting open collaboration and academic use.
MIT License
Copyright (c) 2024 Quantum Titans Research Team Computer Engineering Department, University of Jordan
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
For Academic Use: Please cite our research if you use this work in academic publications.
We extend our gratitude to the following for their support and contributions:
🎓 Academic Support:
- The University of Jordan - Computer Engineering Department
- Faculty Advisors - For invaluable guidance and mentorship throughout the research
- Laboratory Facilities - For providing development resources and equipment
- Peer Reviewers - For constructive feedback and suggestions
💻 Technical Community:
- Open Source Hardware Community - For inspiring tools and methodologies
- RISC-V Foundation - For ISA design inspiration
- SystemVerilog Community - For best practices and design patterns
- EDA Tool Developers - Intel, Mentor Graphics, and Synopsys
📚 Knowledge Resources:
- Computer Architecture Textbooks - Hennessy & Patterson, and others
- Research Papers - Academic publications that informed our design
- Online Courses - MIT OCW, Coursera, and other educational platforms
|
GitHub Repository github.com/quantum-titans |
Research Email quantum.titans.research@ju.edu.jo |
Department Computer Engineering, UJ |
Computer Engineering Department
The University of Jordan
Amman, Jordan
SystemVerilog RTL ████████████████░░░░ 65.0% (32,500 lines) Testbenches ███████░░░░░░░░░░░░░ 20.0% (10,000 lines) Python Tools/Scripts ████░░░░░░░░░░░░░░░░ 10.0% (5,000 lines) Documentation ██░░░░░░░░░░░░░░░░░░ 5.0% (2,500 lines)
| Metric | Value | Industry Standard | Status |
|---|---|---|---|
| Total Logic Elements | ~45,000 | 30,000-60,000 | ✅ Optimal |
| Total Registers | ~8,500 | 5,000-10,000 | ✅ Good |
| Total Pins | ~250 | 200-300 | ✅ Standard |
| Max Fanout | 28 | <50 | ✅ Excellent |
| Critical Path Levels | 15 | 10-20 | ✅ Acceptable |
Research & Planning ████████████████████ 6 months Design & Implementation ███████████████████████████ 9 months Verification & Testing ████████████████ 5 months Optimization & Analysis █████████████ 4 months Documentation ████████ 3 months
Total Development Time: 27 months (ongoing)
To advance the state-of-the-art in processor architecture through rigorous research, innovative design, and open collaboration. We aim to:
- 🎓 Educational Excellence: Provide hands-on experience in advanced computer architecture
- 🔬 Research Innovation: Explore novel techniques in out-of-order execution
- 🌍 Knowledge Sharing: Contribute to the open-source hardware community
- 🚀 Performance Leadership: Push the boundaries of what's achievable in academic settings
- 🤝 Collaboration: Foster partnerships with industry and academia
This research demonstrates that high-performance processor design is achievable within academic constraints, providing valuable learning experiences and contributing to the broader computer engineering community.

















