Viewing: Project

Yousef Moussa

Computer Engineering Co-op Student

Embedded Systems • ASIC Design • Autonomous Robotics

Fourth-year Computer Engineering student at the University of Alberta (3.9 GPA) specializing in real-time embedded systems, digital design, and robotics. Proven track record of delivering measurable results: completed RISC-V CPU, realtime ROS2 robotic software presented on a world stage, 25x network performance improvement, Health Canada-approved medical devices, and mission-critical UAV firmware.

Yousef Moussa

32-bit RISC-V CPU

CPU Architect & Digital Designer | Fall and Winter 2025

Designed and implemented two complete 32-bit RISC-V multi-cycle CPUs with FSM control and full RV32I coverage.

RISC-V CPU datapath and controller diagram
VHDL Verilog RISC-V ISA FSM Design Datapath Architecture Vivado FPGA Xilinx Zybo Z7
See project

Tausworthe PRNG

Digital Designer | Fall 2025

32-bit Taus88 PRNG with three parallel LFSRs, XOR combination, and 125 MHz FPGA validation.

Tausworthe PRNG architecture diagram
VHDL FPGA Xilinx Vivado Zybo Z7-10 LFSR RTL Design Cryptography
See project

Secure Element FSM

Security Digital Designer | Fall 2025

Hardware security controller with instant attack override and Moore FSM for synchronized status outputs.

Secure element FSM architecture diagram
VHDL Hardware Security FPGA FSM Design Xilinx Vivado Zybo Z7-20 Attack Detection Cryptographic Control
See project

2-Bit Comparator IC

VLSI Design | Cadence Virtuoso

Full custom 2-bit comparator through schematic-to-layout flow with clean DRC/LVS and post-layout validation.

2-bit comparator schematic in GSCLIB 45nm
Cadence Virtuoso Spectre Simulation GSCLIB 45nm LVS Verification DRC Checking Layout Design Parasitic Extraction Standard Cell Library
Closed Source
See project

ARVP AUV Systems

Software Co-Lead | RoboSub

ROS2 autonomy, localization, and embedded control across three AUV platforms with Teensy hardware.

ARVP AUV pool testing
ROS2 C++ Python Teensy 4.0 Device Drivers FreeRTOS DVL Navigation IMU/AHRS CAN Bus Gazebo Simulation GitLab CI/CD Leadership
Closed Source
See project

UVAD Control Surface Firmware and More

Software Engineer | Jan–Aug 2025

Safety-critical 4ms control loop for CAN servos with UART DMA ingest and hardware timer safeguards. TCP ground station to UAV quality improvement. Angle of attack and sideswipe firmware. Route upload improvements.

UVAD servo control board Rev B front view
C/C++ STM32 FreeRTOS CAN Bus SPI UART DMA Hardware Timers Real-Time Systems TCP/IP MQTT
Closed Source
See project

Shoulder Rehab Sensor

Engineering Intern & Co-op | 2023–2024

ESP32-based tele-rehab sensor streaming IMU and force data, moving from prototype to Health Canada pathway.

Assembled shoulder rehab sensor PCB
ESP32 ESP-IDF FreeRTOS Arduino BLE GATT BNO055 HX711 I2C PCB Design EasyEDA C/C++
Closed Source
See project

ML-Based Smart Cushion

Machine Learning Engineer | 2023–2024

Pressure-sensing wheelchair cushion with 5-layer neural net achieving 0.99 AUC for seating position classification.

3D pressure visualization for smart cushion
Python PyTorch CUDA Arduino Multiplexers Velostat Sensors Neural Networks Visualization
Closed Source
See project

Unix Shell

Systems Programmer | Fall 2025

Custom Unix shell with pipelines, redirection, job control, and robust parsing.

Custom Unix shell terminal view
C Unix System Calls Process Control Signal Handling File I/O String Parsing
See project

Simulated Filesystem

Systems Programmer | Fall 2025

Unix-like filesystem simulator with block allocation, inode metadata, and multi-level directories.

Filesystem architecture visualization
C File Systems Block I/O inode Management Memory Management Data Structures
See project

MapReduce Framework

Distributed Systems Programmer | Fall 2025

Coordinator/worker MapReduce implementation with IPC, multithreading and more.

MapReduce workflow diagram
C Distributed Systems IPC Parallel Computing Process Management Fault Tolerance
See project

Quacker Platform

Full-Stack Developer | Winter 2024

Twitter-style platform with secure auth, feeds, follows, posts, comments, and SQL schema.

Quacker app login screen
Python Flask PostgreSQL SQL HTML/CSS JavaScript Bootstrap REST API
See project
01.

Digital Design & Computer Architecture

32-bit RISC-V CPU: Multi-Cycle FSM Architecture

CPU Architect & Digital Designer | Fall and Winter 2025

2
CPU Implementations
25+
RISC-V Instructions
~5
Cycles per Instruction

Designed and implemented two complete 32-bit RISC-V multi-cycle CPUs: a VHDL implementation focusing on core instruction set and a Verilog implementation extending to full RV32I support. Both use FSM-based control with hardware reuse through component sharing across execution cycles. Includes comprehensive testbenches, waveform verification, and validated JALR subroutine execution.

VHDL Verilog Vivado RISC-V ISA FSM Design Datapath Architecture FPGA Xilinx Zybo Z7
Key Achievements: Two multi-cycle FSM CPU implementations in VHDL and Verilog, full RV32I instruction set (25+ instructions), JALR subroutine validation with bit-counter test program. Debugged FSM timing hazards through systematic datapath analysis.

Architecture Overview: VHDL vs Verilog Implementations

VHDL Implementation (Core Instruction Set)

  • Instruction Set: ADD, ANDI, LSR, LW, SW, BEQ, BNE, JALR, NOP, HLT (10 instructions)
  • FSM States: FETCH, DECODE, MEM_ADR, MEM_READ/WB, MEM_W, ALU_WB, BRANCH, JUMP
  • Combined instruction/data memory (von Neumann)
  • 32×32-bit register file with modular VHDL components
  • Strong typing enforces careful signal management
  • Pre-loaded test program validates bit-counter subroutine using JALR
  • Xilinx Vivado project structure with behavioral simulation

Verilog Implementation (Full RV32I)

  • Instruction Set: Complete RV32I base integer ISA (25+ instructions)
  • R-Type: ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU
  • I-Type: ADDI, ANDI, ORI, XORI, SLTI, SLTIU, SLLI, SRLI, SRAI, LB, LH, LW, LBU, LHU
  • S-Type: SB, SH, SW
  • B-Type: BEQ, BNE, BLT, BGE, BLTU, BGEU
  • U/J-Type: LUI, AUIPC, JAL, JALR
  • FSM States: RESET, FETCH, DECODE, MEM_ADR, MEM_READ, JUMP, WRITE_BACK, HALT
  • Same datapath architecture and project structure as VHDL, demonstrating ISA scalability

Shared Architecture: Both implementations use identical multi-cycle design patterns, FSM controller, intermediate registers (IR, A, B, ALU_Reg, Data_Reg), hardware reuse with single ALU for data operations and address calculation, and extend units for immediate sign extension. The Verilog version demonstrates how the architecture scales from core instruction set to comprehensive RV32I support.

CHALLENGE: FSM Timing and Data Hazards

Register Value Timing Issues: Initial JALR implementation failed due to incorrect PC value capture - newer PC values were being used instead of the current instruction value needed for return address calculation (PC+4).

  • Symptom: JALR return addresses calculated incorrectly, causing control flow bugs
  • Root cause: Tapping PC value after flip-flop update instead of before
  • Similar issues at ALU output requiring careful selection between direct ALU output vs ALU register
SOLUTION: Datapath Signal Analysis

Systematic Debugging Methodology: Carefully analyzed which register values (before vs after flip-flops) should be used at each multiplexer and datapath junction.

  • Traced PC value through datapath to identify correct tap point for PC+4 calculation
  • Used timing diagrams to verify register update sequences
  • Applied same analysis to ALU output path - determined when to use direct ALU result vs latched ALU_Reg
  • Validated fixes with comprehensive waveform simulation showing correct JALR behavior
IMPACT: Complete RV32I Multi-Cycle CPU
Successfully designed and verified two multi-cycle CPUs demonstrating ISA scalability, from core 10-instruction VHDL implementation to full 25+ instruction RV32I in Verilog. Architecture demonstrates industry-relevant skills: FSM-based control, hardware resource sharing, register staging, and comprehensive testbench verification. Bit-counter subroutine validates JALR function call/return.

Datapath & Controller Design

Controller FSM States: The multi-cycle controller implements a 10-state FSM orchestrating instruction execution. Every instruction begins with FETCH (load instruction from memory, increment PC) and DECODE (read source registers, calculate branch target). Execution then diverges based on instruction type:

  • Memory Instructions (LW/SW): MEM_ADR (calculate effective address) → MEM_READ or MEM_WRITE → MEM_WB (write back for loads)
  • Arithmetic/Logic (ADD/ANDI/LSR): ALU_WB (execute operation and write back result)
  • Branches (BEQ/BNE): BRANCH (update PC if condition met)
  • Jump and Link (JALR): JUMP (calculate target from register + offset, store return address)
  • Halt (HLT): STUCKED (freeze FSM, prevent further instruction execution)

Control Signals: Controller generates 9 control signals directing datapath operation: pc_write, reg_write, ir_write, mem_write, adr_src, alu_src_a[1:0], alu_src_b[1:0], alu_ctrl[2:0], result_src[1:0]. These signals configure multiplexers, enable registers, and select ALU operations, with default values minimizing state-specific definitions.

Datapath Components: Register file (32×32-bit), ALU (arithmetic/logic/shift), extend unit (immediate sign extension), combined instruction/data memory, intermediate registers (IR, A, B, ALU_Reg, Data_Reg), and multiplexers for operand selection. Components are reused across cycles - same ALU computes both data operations and memory addresses, same memory serves instructions and data in different cycles.

Verification & Testing

Bit Counter Subroutine : Implemented comprehensive test program counting set bits in a register using JALR for function call/return. Program flow:

  1. JALR x12, 44(x0) - Jump to bit counter at address 44, store return address in x12
  2. ANDI x3, x3, 0x00 - Zero accumulator
  3. ANDI x9, x2, 0x01 - Set shift amount to 1
  4. Loop: ANDI x2, x8, 0x01 - Extract LSB
  5. ADD x3, x3, x2 - Accumulate if bit set
  6. LSR x8, x8, x9 - Logical shift right by 1
  7. BNE x8, x0, -12 - Loop while data remains
  8. JALR x2, 0(x12) - Return to caller using saved address

Waveform Verification: All instructions validated through Vivado simulation with detailed timing analysis. Verified register updates, memory operations, control signal transitions, and FSM state progressions. Confirmed correct PC management, particularly critical for JALR return address calculation and branch target computation.

02.

Hardware Random Number Generation

Tausworthe Pseudo-Random Number Generator

Digital Designer | Fall 2025

32-bit
Taus88 Algorithm
3 LFSRs
Parallel Generation
125 MHz
Generation Rate

Implemented Tausworthe 88-bit PRNG using three parallel Linear Feedback Shift Registers (LFSRs) with XOR combination for high-quality 32-bit random output. Deployed on Xilinx Zybo Z7 FPGA, achieving full-speed operation at 125 MHz system clock with deterministic, verifiable sequences.

VHDL FPGA Xilinx Vivado Zybo Z7-10 LFSR RTL Design Cryptography
Key Achievements: 3 parallel LFSRs with XOR combination, 125 MHz single-cycle generation, seedable deterministic sequences, FPGA deployment with hardware validation.

Design Architecture

  • Modular VHDL component with parameterizable shifts and masks
  • Three independent LFSR generators with unique configurations (U1, U2, U3)
  • Final output produced by XOR combination of all three streams
  • Seedable design for deterministic sequence generation
  • Parallel architecture enables single-cycle random number generation
  • Verified against software implementation for correctness
Hardware Validation

Deployed on Zybo Z7-10 FPGA with 7-segment display and LED indicators. Used 1Hz clock divider to observe sequential bytes of 32-bit output. Verified deterministic behavior by comparing hardware-generated sequences against simulation logs - confirmed exact match across multiple runs with same seed values.

03.

Hardware Security & FSM Design

Secure Element Finite State Machine

Security Digital Designer | Fall 2025

Multiple
Moore FSM
Instant
Attack Detection

Implemented a hardware security controller FSM for cryptographic secure element management. Coordinates system startup, self-test verification, secure channel operations, power management, and high-priority attack response. Moore FSM architecture ensures glitch-free status outputs synchronized to clock edges.

VHDL Hardware Security FPGA FSM Design Xilinx Vivado Zybo Z7-20 Attack Detection Cryptographic Control
Key Achievements: 6-state Moore FSM, single-cycle attack response (8ns @ 125MHz), RGB status indication, 15-second auto-wake timer, hardware-validated security properties.

FSM States & Security Features

System States:

  • OFF (000): Power-down state, entry point after reset
  • STARTUP (Blue/001): Initialization with self-test execution
  • IDLE (Yellow/110): Ready state awaiting secure operations
  • SECURE_CHANNEL (Green/010): Active cryptographic operations
  • SLEEP (White/111): Low-power mode with 15-second auto-wake timer
  • ALARM (Red/100): Security breach detected - highest priority state

Critical Security Mechanisms:

  • Instantaneous Attack Response: Attack_detected signal triggers immediate single-cycle transition to ALARM state
  • Fail-Safe Design: Self-test failure during STARTUP forces ALARM rather than IDLE
  • Visual Status Indicator: RGB LED shows color-coded system state for external monitoring
  • Integrated Timer: 15-second countdown in SLEEP state triggers automatic wake and key refresh
  • LED Flasher Module: Dedicated Mealy FSM flashes ALARM status at 1Hz for high-visibility alert
Verified Security Properties
  • Attack detection latency: Single clock cycle (8ns @ 125MHz)
  • Priority override: Attack signal successfully preempts all concurrent inputs
  • State integrity: No transient glitches observed in Moore FSM outputs
  • Timer accuracy: 15-second sleep period
  • Visual feedback: LED flasher operates at consistent 1Hz without affecting controller timing
04.

VLSI Design: Full Custom IC Flow

2-Bit Digital Comparator IC

VLSI Design | Cadence Virtuoso

45nm
GSCLIB Process
LVS
Clean Match
DRC
0 Violations
Full Flow
Schematic → Layout

Designed and implemented a 2-bit digital comparator through the complete VLSI design flow using Cadence Virtuoso. Project demonstrates proficiency in schematic capture, transistor-level design using standard cells, physical layout with metal routing, design rule checking (DRC), layout vs schematic (LVS) verification, and pre/post-layout simulation analysis using Spectre.

Cadence Virtuoso Spectre Simulation GSCLIB 45nm LVS Verification DRC Checking Layout Design Parasitic Extraction Standard Cell Library
Key Achievements: Complete schematic-to-layout flow, clean LVS verification with exact netlist match, zero DRC violations, successful post-layout simulation validating timing against pre-layout results.

Design Architecture

Circuit Overview:

The 2-bit comparator compares two 2-bit inputs A[1:0] and B[1:0], outputting a match signal when A equals B. The design uses a registered architecture with D flip-flops for input synchronization, followed by XOR gates for bit-wise comparison and an OR gate to combine results.

Standard Cell Components (GSCLIB 45nm):

  • DFFQX1: D Flip-Flops (6 total) - Register inputs A[1:0], B[1:0] and output on clock edges
  • XOR2XL: 2-Input XOR Gates (2 total) - Compare corresponding bits (A[1]⊕B[1], A[0]⊕B[0])
  • OR2X1: 2-Input OR Gate - Combine XOR outputs (mismatch if either bit differs)

Signal Flow:

  • Inputs A[1:0] and B[1:0] captured by DFFs on rising clock edge
  • XOR gates produce '1' when corresponding bits differ
  • OR gate outputs '0' only when both XOR outputs are '0' (exact match)
  • Final output registered through another DFF for clean timing

Physical Design & Verification

Layout Implementation:

Created physical layout in Cadence Virtuoso Layout Suite XL, placing standard cells and routing metal interconnects across multiple layers. Layout follows design rules for the 45nm process node with proper metal spacing, via placement, and power/ground distribution.

Verification Results:

  • LVS (Layout vs Schematic): Clean match - schematic netlist exactly matches extracted layout netlist with no shorts, opens, or device mismatches
  • DRC (Design Rule Check): 0 violations across 552 design rule checks - validates manufacturability for 45nm process
  • Parasitic Extraction: RC parasitics extracted for accurate post-layout timing analysis

Simulation & Timing Analysis

Pre-Layout vs Post-Layout Comparison:

Performed transient simulations using Spectre to validate functionality and compare timing characteristics before and after layout. Post-layout simulation includes extracted parasitic capacitances and resistances, revealing realistic signal propagation delays.

Simulation Observations:

  • Functional Verification: Both simulations show correct comparator behavior - Match output transitions based on A=B equality
  • Clock Synchronization: All flip-flops trigger cleanly on rising clock edges
  • Post-Layout Effects: Minor propagation delay increase due to RC parasitics, but within timing margins
  • Signal Integrity: Clean digital transitions with no ringing or signal degradation
VLSI Skills Demonstrated
  • Schematic Capture: Standard cell instantiation and hierarchical design in Virtuoso Schematic Editor
  • Layout Design: Cell placement, metal routing, and power distribution in Layout Suite XL
  • Physical Verification: LVS and DRC using Pegasus verification tools
  • Simulation: Pre/post-layout transient analysis using Spectre simulator
  • Design Flow: Complete RTL-to-GDSII methodology for digital IC design
05.

ARVP: Autonomous Underwater Vehicle Systems

Embedded Engineer → Software Co-Lead

University of Alberta ARVP Team

Sept 2023 – Present

RoboSub
International Competition
cm-level
Localization Accuracy
~60
Team Members
2 AUVs
Kenai & Koda

Progressed from embedded systems developer to Software Co-Lead for the Autonomous Robotic Vehicle Project, a student team developing autonomous underwater vehicles competing annually at the U.S. Navy-sponsored RoboSub competition. Work spans hardware design and validation (Teensy 4.0 control boards, sensor integration), ROS2 software architecture (localization, motion planning, control), and technical leadership (sensor evaluation, pool test coordination, cross-functional collaboration).

ROS2 C++/Python Teensy 4.0 Device Drivers FREERTOS DVL Navigation IMU/AHRS CAN Bus Gazebo Simulation GitLab CI/CD Leadership and Technical Management
Key Achievements: Designed custom Teensy 4.0 control boards (Rev A → Rev B), implemented DVL-based localization achieving cm-level accuracy, developed ROS2 motion planning with visual servoing, and leading ~60-member software team to RoboSub 2026 success.

Part 1: Embedded Systems Engineer

Role: Designed and validated embedded control systems for servo actuation, internal environment management, sensor and system integration on autonomous underwater vehicles.

Control Board Development: Design → Test → Iterate

Challenge:

Needed reliable, real-time servo control for actuatiors and auxiliary systems with CAN bus communication to Hitec smart servos, while maintaining system health monitoring and fault detection capabilities.

Design Process:

  • Requirements Analysis: Collaborated with electrical and mechanical teams to define interface requirements (6 auxiliary servos, CAN bus for smart servos, current monitoring and claw grab detection, environmental monitoring)
  • Component Selection (REV B) : Selected Teensy 4.0 (600 MHz ARM Cortex-M7) for high-speed real-time control, MCP2515 CAN controller with TJA1050 transceiver for robust CAN communication, INA219 for current monitoring, MS561101BA03-50 for pressure and temperature sensing
  • PCB Design: Initial layout with focus on getting all required functionality working while minimoizing route length and ensuring signal integrity using a 4 layer board with power and ground planes
  • Firmware Development: Implemented drivers for all peripherals like the INA219, and FreeRTOS tasks to schedule all required operations; managed communication between tasks and peripherals

PCB Design Evolution:

Control Board Rev A Front

Control Board Rev A - Front

Control Board Rev A Back

Control Board Rev A - Back

Control Board Rev B Front

Control Board Rev B - Front

Control Board Rev B Back

Control Board Rev B - Back

Testing & Validation:

  • Bench Testing: Validated CAN bus communication with servos, verified PWM signal quality on oscilloscope, tested current sensing and power draw with simulated load
  • Pool Testing: Integrated Rev A boards into vehicles for underwater testing, identified signal integrity issues under high-current thruster operation, discovered thermal management concerns during extended missions
  • Failure Analysis: Used oscilloscope to diagnose ground bounce affecting CAN communication during simultaneous thruster activation, thermal imaging revealed hotspots near voltage regulators

Rev B Improvements:

  • Signal Integrity: Redesigned ground plane to eliminate ground bounce, added additional decoupling capacitors near high-current switching components
  • Thermal Management: Optimized component placement to separate high-power and low-power sections, improved copper pour for heat dissipation, added larger thermal vias under voltage regulators
  • Reliability: Added redundant power filtering stages, added current sensing for fault detection on all actuators, improved PCB routing to reduce EMI susceptibility, used keyed connectors to prevent mis-mating, added on board termination resistors for CAN bus, added configurable actuator voltage supply, better USB power protection

Results:

  • Rev B boards successfully deployed in Kenai
  • Achieved reliable operation during 2+ hour pool test sessions with no communication failures
  • 50Hz servo control loop maintained consistent timing under full system load (all thrusters + sensors active)
  • System passed RoboSub 2025 competition validation with zero hardware faults during mission attempts

Part 2: Software Developer / Software Co-Lead

Role: Transitioned to ROS2 software development focused on autonomous navigation in GPS-denied underwater environments. Promoted to Software Co-Lead role, coordinating development across ~60 team members and driving technical decisions on software architecture and sensor selection.

ROS2 Software Architecture

ROS2 Software Architecture used on Kenai, Koda, and Arctos AUVs

Motion Planning & Autonomous Behaviors

Development Approach:

  • Action Server Architecture: Implemented ROS2 action servers for object-relative navigation allowing higher-level mission planner to command complex maneuvers (approach object, orbit target, visual servoing)
  • 3-Phase Navigation: Designed object approach behavior as rotate → search → approach sequence, enabling robust operation even when objects initially outside camera field of view
  • Visual Servoing Control: Developed control law using object centroid error and derivative terms to smoothly approach targets while maintaining them in camera frame
  • Path Planning: Implemented cubic spline interpolation for smooth path following with multiple orientation control modes (tangent-aligned, constant heading, SLERP interpolation)

Testing & Validation:

  • Simulation: Used Gazebo with underwater physics to test various motion profiles (straight line, circular paths, barrel roll, object detection based movement).
  • Coordinated weekly pool tests to validate perception, planning, and control integration
  • Developed systematic testing procedures: individual behavior validation → multi-behavior sequences → full autonomous missions
  • Analyzed ROS2 bag files post-test to identify timing issues, control instabilities, and behavior transitions

Results:

  • Successfully demonstrated autonomous object detection and approach during pool testing
  • Completed gate navigation, path following, and object detection interaction tasks autonomously at RoboSub 2025
  • Motion planning system enabled complex maneuvers including barrel roll execution for bonus points
Team Testing Kenai

Pool Testing: Kenai AUV

Team Testing Arctos

Pool Testing: Arctos AUV

RoboSub 2024 Competition

RoboSub 2024 Competition

Software Co-Lead: Technical Leadership

Key Responsibilities:

  • Technical Decision-Making: Drive architecture decisions on software stack (ROS2), sensor selection (DVL model, IMU/AHRS evaluation), development tools (GitLab, Docker, Gazebo)
  • Sensor Evaluation & Research: Lead IMU/AHRS evaluation research comparing tactical-grade options (SBG Ellipse micro and Advanced Navigation Motus) vs DVL internal IMU to improve localization accuracy
  • Pool Test Coordination: Organize weekly pool testing sessions coordinating software, electrical, and mechanical teams. Develop test plans, allocate pool time across subsystems, conduct post-test analysis meetings
  • Cross-Functional Collaboration:
    • Electrical team: Define sensor interface requirements, debug power/communication issues
    • Mechanical team: Coordinate sensor mounting locations, actuators performance and function (ex: Claw)
    • Software team: Mentor new members through onboarding materials, code reviews, debugging sessions
  • Development Workflow Management: Maintain GitLab repository organization, implement code review processes, establish testing standards before pool deployment
  • Team Growth: Created comprehensive onboarding materials covering ROS2 fundamentals, Git/SSH basics, project structure. Coordinate ~15 software members across multiple software development tracks

RoboSub Competition & Resources

The international RoboSub competition challenges teams to develop autonomous underwater vehicles capable of navigating obstacle courses and completing tasks without human intervention. Tasks include gate navigation, path following, object detection/classification, buoy interaction, and torpedo/dropper delivery - all executed autonomously in a GPS-denied underwater environment.

Competition Performance:

  • RoboSub 2025 and 2024 (San Diego, CA): Top Canadian team completing many tasks autonomously, including gate navigation, torpedo, etc.
  • Demonstrated barrel roll maneuver for bonus points using motion planning system
  • Rebuilt 2 full AUVs from the ground up, integrating new sensors and control systems in 2025

Kenai AUV Introduction Video:

06.

UVAD: Autonomous UAV Systems

Servo Support Board: Safety-Critical Real-Time Control

Unmanned Vehicle Applied Dynamics

Software Engineer | Jan 2025 – Aug 2025

~4ms
Complete Control Cycle
4
CAN Servos (Concurrent)
921600
UART Baud Rate (bps)

Designed and implemented multi-threaded FreeRTOS firmware for a safety-critical UAV control surface actuation board. The system receives flight commands via UART at 921600 bps, processes them with CRC validation, and orchestrates four Hitec CAN servos (ruddervators and ailerons) through a shared SPI bus architecture. The complete control loop, from UART reception through servo actuation to metadata return, executes in approximately 4ms with a 5ms safety timeout.

STM32 FreeRTOS CAN Bus SPI UART DMA Hardware Timers Real-Time Systems
Key Achievements: Multi-threaded FreeRTOS firmware with ~4ms control cycle, hardware timer debugging to eliminate false timeouts, shared SPI mutex arbitration for 4 concurrent CAN servos. Deployed on FALCON and Alpine Swift UAV platforms.

System Architecture

Hardware Architecture:

→ Flight Computer (UART TX @ 921600 bps)

→ STM32 UART DMA Receiver

→ Shared SPI Bus (connecting 4 CAN transceivers)

→ 4x MCP2515 CAN Controllers

→ 4x Hitec CAN Servos (2x Ruddervators, 2x Ailerons)

→ STM32 UART DMA Transmitter (metadata back to flight computer)

Data Flow:

UART Reception (921600 bps)
Flight computer sends message containing 4 servo angles (multiplied by large integer to create uint32_t values), message type, and CRC checksum
DMA-Triggered Processing
STM32 DMA receives complete message → triggers callback → message placed on received queue → signals decoding thread
CRC Validation
Thread separates payload and CRC, calculates expected CRC, compares with received. Invalid messages discarded
Angle Decoding
Divide uint32_t values to recover original angles for all 4 servos
Servo Object Interface
Each servo has dedicated object. Interface generates 4 CAN messages per servo: angle command, angle request, voltage/torque request (alternating), current torque request
Shared SPI Bus Arbitration
4 threads (one per CAN transceiver) compete for shared mutex. Messages placed on separate queues, threads acquire mutex and transmit via SPI to respective transceiver
Metadata Collection
CAN transceivers receive responses → set line high → triggers metadata thread → acquires mutex → reads data → decodes using HITEC protocol → builds return UART message
Safety Timeout
Hardware timer (5ms) triggers if complete cycle exceeds deadline. Remaining servos marked Loss of Link (LOL), message sent immediately to flight computer (required 7-8.3ms response window)

Thread Architecture (FreeRTOS):

  • Message Decode Thread: Validates CRC, decodes angles, interfaces with servo objects
  • 4x CAN Transceiver Threads: Manage SPI transactions with shared mutex arbitration
  • Metadata Processing Thread: Collects servo responses, builds return UART message
  • Hardware Timer ISR: Safety timeout monitoring, LOL detection
PROBLEM: False Timeout Events
While developing the safety-critical 5ms timeout mechanism, intermittent false timeout events occurred even though all servos were connected and the measured execution time was consistently ~4ms, well within budget. The FreeRTOS software timer exhibited approximately ±1ms jitter under load, causing unreliable timeout detection in a system where timing precision was paramount for flight safety.
SOLUTION: Hardware Timer Migration

Root Cause Analysis:

  • Profiled entire communication pipeline using GPIO toggling and oscilloscope
  • Measured UART reception, SPI transactions, servo response times, message processing
  • Confirmed ~4ms execution time consistently within budget
  • Isolated FreeRTOS software timer - observed ±1ms jitter under system load

Implementation: Replaced FreeRTOS software timer with STM32 hardware timer peripheral, providing deterministic timing independent of system load.

IMPACT: Reliable Safety-Critical System
Eliminated all false timeouts and restored reliable communication under full operational conditions. The system now operates deterministically with predictable timing behavior essential for flight safety. This demonstrates proficiency in real-time systems debugging, oscilloscope-based timing analysis, and understanding the critical distinction between software and hardware timing in safety-critical applications.

UAV Platforms

UVAD develops multiple autonomous UAV platforms for defense and commercial applications. The servo support board and control systems developed will be deployed across the fleet, enabling precise flight control for diverse mission profiles.

TCP Network Performance: 25x Latency Improvement

Unmanned Vehicle Applied Dynamics

Software Engineer | Jan 2025 – Aug 2025

25x
Performance Improvement
>5s → ~200ms
Round-Trip Latency
3
Computers Time-Synced

The Alpine Swift UAV provides customers with payload integration capability, enabling custom sensors to communicate with the ground station via TCP. The specification required ~200ms round-trip latency for ground station ↔ UAV communication, but actual measured performance exceeded 5 seconds. Tasked with analyzing and resolving this critical performance bottleneck affecting customer operations.

TCP/IP MQTT Mosquitto Network Analysis SSH C++ Embedded Linux
Key Achievements: Systematic distributed debugging across 3 time-synced computers, identified message queueing bugs through instrumentation, reduced latency from >5s to ~200ms meeting customer specifications.

Systematic Debugging Methodology

System Architecture
Network path: Ground Station Computer → Antenna Computer → RF Antenna → UAV Modem → Flight Computer → Payload
Time Synchronization
Executed parallel SSH commands to synchronize time across all three computers simultaneously. Achieved ~100ms synchronization accuracy - sufficient for identifying bottlenecks
Instrumentation
Modified applications throughout the network stack to log timestamps at every handoff: message creation → transmission → antenna receipt → antenna transmission → UAV receipt → return path. Sent test messages every 30 seconds for comprehensive timing analysis
Data Analysis
Plotted time spent in and between each application, revealing which components introduced latency
Bug Identification
Found critical bugs:
  • Messages placed on "needs acknowledgement" queue BEFORE actually sending (logged as sent, but not transmitted)
  • Incorrect message priority assignment causing low-priority messages to block high-priority payload data
  • Message processing order issues in queue management
RESULTS
Reduced round-trip latency from >5 seconds to ~200ms, meeting customer specifications and enabling real-time payload operations. Demonstrated expertise in distributed systems debugging, network protocol analysis, and systematic root cause analysis across complex multi-computer architectures. The MQTT message broker architecture with Mosquitto enabled robust communication between multiple applications across the UAV and ground station systems.

Additional UVAD Projects

Angle of Attack / Sideswipe Sensor Board

Developed FreeRTOS-based firmware for aerodynamic angle measurement system using potentiometer sensors (180° or 360° range) connected to STM32 ADC with ~63,000 resolution. Multi-threaded architecture: ADC reading thread with calibrated offset conversion, terminal interface thread for real-time display and calibration commands, CAN transmission thread for flight computer communication, and command processing thread for remote calibration control. Implemented zero-point calibration system enabling in-flight angle reference adjustment.

Route Upload System

Implemented waypoint upload system using NATO STANAG 4586 protocol over UDP with acknowledgement verification. Previous implementation sent all waypoints then immediately requested download, causing errors when long routes had incomplete transmission. Designed counter-based verification system to track message IDs based on route size, ensuring all waypoints received before download command sent to vehicle. Eliminated waypoint gap errors and improved upload reliability for complex mission profiles.

STM32 FreeRTOS ADC Calibration Systems UDP STANAG 4586 QTCreator
07.

Medical Devices: Health Canada Pathway

Shoulder Rehabilitation Sensor: Prototype to Health Canada

Rehabilitation Robotics Lab

Engineering Intern & Embedded Systems Co-op | May 2023 – Dec 2024

Health Canada Approval Pathway
$100k
Grant Funding Secured
75 Hz
Data Collection Rate
30 Hz
BLE Streaming
3
Sensors Integrated

Two-year journey developing a shoulder rehabilitation sensor from Arduino prototype to Health Canada-approved medical device. The system measures shoulder range of motion and applied force during rehabilitation exercises for patients in rural/remote areas who lack access to specialized physical therapy. Initial Arduino prototype demonstrated feasibility and secured $100,000 in grant funding. Production version features custom PCB design, ESP32-based embedded system with FreeRTOS firmware, BNO055 IMU integration, HX711 force sensing, and Bluetooth Low Energy communication.

ESP32 ESP-IDF FreeRTOS Arduino BLE GATT BNO055 IMU HX711 I2C PCB Design EasyEDA C/C++
Key Achievements: Arduino prototype → $100k grant → ESP32 production PCB with FreeRTOS firmware. 75 Hz sensor fusion (BNO055 + HX711), BLE GATT streaming, custom calibration system. Device entered Health Canada approval pathway for rural telehealth deployment.

Development Timeline: Prototype to Production

Phase 1: Arduino Prototype (May - Aug 2023)
Initial proof-of-concept using Arduino C++ and off-the-shelf breakout boards. Demonstrated viability of force measurement for shoulder rehabilitation assessment at University Hospital. Prototype validated clinical use case and user experience with rehabilitation patients.
Grant Funding Success (Aug 2023)
Prototype demonstration secured $100,000 in grant funding for development of production-ready medical device. Funding enabled transition from breadboard prototype to custom PCB design and Health Canada approval pathway.
Phase 2: Production Redesign (May - Dec 2024)
Complete redesign with ESP32-WROOM-32D microcontroller, custom PCB integration, FreeRTOS multi-threaded firmware, BLE communication, and battery management system. Transitioned to ESP-IDF framework.
Calibration & Clinical Testing
Developed calibration procedures using Newton meter in lab (January 2025). Established empirical conversion coefficients for force sensor. Implemented persistent calibration storage and runtime adjustment via BLE commands. Created Standard Operating Procedures (SOPs) for clinical deployment.
Health Canada Approval Pathway
Completed technical documentation detailing firmware architecture, sensor integration, calibration methodology, and safety considerations. Device entered Health Canada approval pathway for clinical deployment in rural rehabilitation settings.

Prototype Development & Lab Work

Hands-on development from breadboard prototype to functional medical device. The initial force meter prototype integrated multiple Arduino-compatible boards, force sensing circuitry, and battery management into a portable enclosure. Extensive lab testing validated force measurement accuracy and established calibration procedures for clinical deployment.

System Architecture

Hardware Platform: ESP32-WROOM-32D with custom PCB integrating:

  • BNO055 9-axis IMU (I2C, 0x28) - orientation, acceleration, gyroscope, magnetometer
  • HX711 24-bit ADC for half-bridge force sensor
  • Battery monitoring system with RGB status LEDs
  • BLE 4.2 radio for wireless data transmission

Firmware Architecture (FreeRTOS):

  • Sensor collection task: Combined IMU + force data at 75 Hz
  • BLE GATT server: Two service profiles for command/control and data streaming
  • Calibration system: Zero-force offset and IMU orientation calibration with persistent storage
  • Battery monitoring task: ADC-based voltage measurement with LED indication
  • Data buffering: 40KB circular buffer for high-throughput data collection

Firmware Architecture (FreeRTOS Multi-threaded)

Main Collection Task

  • Runs at ~75 Hz (13.3ms period)
  • Synchronized IMU + force sensor sampling
  • Microsecond-precision timestamping with ESP32 hardware timer
  • 40KB circular buffer management for high-throughput data

BLE GATT Server

  • Dual service profiles: Command/Control + Data Stream
  • Characteristic-based communication for device control
  • Event-driven architecture with callback handlers
  • MTU negotiation for optimal BLE throughput

Calibration System

  • Zero-force offset calibration with 20-sample averaging
  • Persistent storage in NVS (non-volatile storage)
  • IMU orientation calibration for wearable positioning
  • Runtime calibration via BLE commands

Battery Monitoring Task

  • ADC-based voltage measurement with averaging filter
  • RGB LED status indication (charging/charged/low battery)
  • Periodic monitoring at 1Hz to optimize power consumption
  • Low-battery detection and shutdown protection

Data Transmission Modes

Collect & Send Mode: Buffers 1000 data points (~18 seconds at 60 Hz) in SPIFFS filesystem, then transmits complete dataset over BLE. Enables comprehensive data collection without real-time transmission requirements.

Real-Time Streaming Mode: Transmits data continuously at 30 Hz over BLE using JSON format compatible with RRL BLE Dashboard. Initialization message sent once containing device metadata, followed by continuous stream messages with sensor values.

Hardware Development Journey

The device evolved from breadboard prototype to production-ready PCB through multiple iterations. Custom PCB integrates ESP32-WROOM-32D, BNO055 IMU breakout, HX711 ADC, battery management circuitry, and micro-USB charging. Design considerations included I2C pullup resistor placement for reliable sensor communication, power plane layout for stable sensor operation, and compact form factor for wearable medical device application.

IMPACT & DOCUMENTATION
  • Mentored another student in PCB design
  • Authored Standard Operating Procedures (SOPs) for device operation
  • Created comprehensive technical documentation detailing firmware architecture, sensor integration, and calibration procedures
  • Device entered Health Canada approval pathway for clinical deployment in rural rehabilitation settings
08.

Smart Cushion: Seating Analytics

ML-Based Smart Cushion: Wheelchair Seating Position Analysis

Rehabilitation Robotics Lab - Dean's Research Award

Machine Learning Engineer | Sep 2023 – Apr 2024

112,640
Training Data Points
0.99 AUC
Classification Performance
5
Hidden Layers (NN)
4
Seating Positions

Developed ML-powered pressure sensing system for real-time wheelchair seating position classification to prevent pressure sores in long-term wheelchair users. Designed custom velostat-based pressure pad with multiplexer circuit and flexible PCB, collected 112,640 training samples across 4 seating positions, implemented 5-layer neural network in PyTorch with CUDA acceleration, and built 3D force visualization interface in Java using Processing IDE. Achieved 0.99-1.00 AUC classification performance across three position classes, demonstrating viability for clinical deployment.

PyTorch CUDA Neural Networks Arduino Velostat Sensors Multiplexers Java Processing IDE Altium Flexible PCB ROC Analysis Python
Key Achievements: Custom velostat pressure sensor array, 112,640 sample dataset, 5-layer PyTorch neural network achieving 0.99-1.00 AUC. 3D real-time visualization in Java Processing. Dean's Research Award recipient.

Pressure Sensing Technology: Velostat

Velostat (pressure-sensitive conductive material) changes electrical resistance when compressed. With no applied pressure, carbon particles within the material maintain high electrical resistance (low current flow). When pressure is applied, carbon particles move closer together, creating more conductive pathways and dramatically reducing resistance (high current flow). This piezoresistive property enables precise pressure mapping across the cushion surface.

Custom Sensor Array Design: Constructed flexible pressure pad with velostat layer sandwiched between copper electrode arrays. Adhesive layers secure the vinyl substrate while carbon particles within the velostat respond to applied force. Multiple sensing points connected through multiplexer circuit enable spatial pressure distribution measurement without requiring individual ADC channels for each sensor.

Hardware Development

Multiplexer Circuit Design: Implemented analog multiplexer array to read multiple pressure sensors through single Arduino ADC input. Multiplexer addressing controlled via digital GPIO pins, enabling sequential sensor polling. This architecture reduces pin requirements and allows scalable sensor array expansion without ADC channel limitations.

Flexible PCB Design (Altium): Designed flexible printed circuit board to interface with velostat sensor array. Flex PCB conforms to cushion curvature while maintaining reliable electrical connections. Layout considerations included trace routing for minimal crosstalk, via placement for mechanical flexibility, and connector positioning for Arduino integration. PCB manufacturing process used polyimide substrate for durability under repeated flexing cycles.

Data Acquisition System: Arduino microcontroller samples all sensors at fixed intervals, formats pressure readings into structured data packets, and transmits via serial connection to host computer for real-time visualization and dataset logging. System captures spatial pressure distribution at sufficient sampling rate for position classification while maintaining stable USB communication.

Machine Learning Pipeline

Dataset Collection: Captured 112,640 labeled samples across 4 distinct seating positions (properly centered, leaning forward, leaning right, sitting on something). Each sample contains pressure readings from all sensor locations, creating high-dimensional feature vectors. Multiple subjects contributed to dataset for generalization across different body types and sitting habits. Data collection protocol ensured balanced class distribution to prevent model bias.

Neural Network Architecture: Implemented feedforward neural network with 5 hidden layers in PyTorch. Input layer dimensionality matches sensor array size, hidden layers apply ReLU activation with progressive dimensionality reduction, output layer uses softmax for 4-class probability distribution. Architecture depth chosen through experimentation to balance model capacity against overfitting risk given dataset size.

Training Methodology: Leveraged CUDA-enabled GPU acceleration for efficient batch gradient descent. Training process monitored validation loss to detect overfitting, employed early stopping when validation performance plateaued. Loss curves demonstrate successful convergence with training and validation losses tracking closely, indicating good generalization without memorization. Final model achieves low loss (~0.3) on both training and validation sets after 12 epochs.

RESULTS & CLINICAL IMPACT

Classification Performance: Achieved near-perfect classification for three seating positions (AUC 0.99-1.00), demonstrating system viability for clinical deployment. Class 3 (sitting on something) showed lower but acceptable performance (AUC 0.67), this is because Class 3 was tested on data it had not seen to see how it would preform. Showing that the model needed more training data for this class to improve generalization.

3D Visualization System: Developed real-time pressure visualization in Java using Processing IDE. Interface displays pressure distribution as 3D height-mapped surface with color-coded intensity (red indicates high pressure regions). Visualization enables clinicians to quickly assess seating patterns and identify problematic pressure concentrations that could lead to tissue damage.

Clinical Application: System provides immediate feedback for proper wheelchair positioning to prevent pressure sores in long-term wheelchair users. Pressure sores (decubitus ulcers) represent serious health risk for individuals with limited mobility. Early detection and position correction can prevent costly medical interventions and improve quality of life. Grant funding application supported by strong ROC performance metrics.

09.

System Programming & Process Control

Unix Shell Implementation

Systems Programmer | Fall 2025

Process
fork/exec/wait
I/O
Redirection
Pipelines
IPC

Built a fully-functional Unix shell from scratch, providing command parsing, process execution, I/O redirection, and pipeline support. Handles both built-in commands and external program execution with proper signal handling for job control.

C Unix System Calls Process Control Signal Handling File I/O String Parsing
Key Achievements: fork/exec/wait process management, I/O redirection and pipelines, signal handling (SIGINT, SIGCHLD), built-in commands, background execution.

Shell Capabilities

Features:

  • Command parsing with argument tokenization and quote handling
  • Process creation and management using fork()/exec()/wait() system calls
  • I/O redirection (>, <, >>) with file descriptor manipulation
  • Pipeline implementation (|) supporting multiple command chaining
  • Built-in commands: cd, pwd, exit, history
  • Background process execution with & operator
  • Signal handling (SIGINT, SIGCHLD) for proper job control
  • Environment variable expansion
Implementation Details

Implemented robust command parser handling quotes, escapes, and special characters. Managed file descriptor lifecycle for complex pipelines with proper cleanup. Handled zombie processes through SIGCHLD signals and proper wait() usage. Developed command history mechanism with efficient storage and retrieval.

10.

Systems Programming

Simulated Unix Filesystem

Systems Programmer | Fall 2025

Block I/O
Disk Simulation
inode
File Metadata
Directory
Path Resolution

Built a complete Unix-like filesystem from scratch in C, implementing block-level storage management, inode-based file representation, and hierarchical directory structures. Simulates virtual disk I/O operations with proper handling of file creation, reading, writing, deletion, and seeking across multi-level directories.

C File Systems Block I/O inode Management Memory Management Data Structures
Key Achievements: Block allocation with bitmap tracking, inode-based metadata, hierarchical directories with path resolution, CRUD operations across multi-level directory structures.

System Architecture

Core Features:

  • Block allocation and free space management with bitmap tracking
  • inode table for file metadata including size, permissions, and block pointers
  • Directory entry management with name-to-inode mapping
  • Path parsing and traversal for both absolute and relative paths
  • Support for create, read, write, delete, and seek operations
  • Multi-level directory support with proper handling of "." and ".." entries
Technical Challenges

Implemented efficient block allocation strategies to minimize fragmentation, developed robust path parsing to handle edge cases (., .., /), and designed data structures for optimal memory usage. Ensured filesystem consistency across operations with proper error handling and validation.

11.

Distributed Systems

Distributed MapReduce Framework

Distributed Systems Programmer | Fall 2025

Parallel
Worker Coordination
Distribution
Load Balancing
Fault Tolerant
Worker Recovery

Developed a distributed computing framework inspired by Google's MapReduce paper, enabling parallel processing of large datasets across multiple worker processes. The coordinator manages task assignment, monitors worker health, and handles failures gracefully through task reassignment.

C Distributed Systems IPC Parallel Computing Process Management Fault Tolerance
Key Achievements: Coordinator/worker architecture with IPC, heartbeat-based fault detection, task reassignment on failure, exactly-once execution semantics.

System Components

Architecture:

  • Coordinator process for centralized task management and worker orchestration
  • Worker processes with configurable map/reduce function execution
  • Inter-process communication using sockets and shared memory
  • Task partitioning and intermediate data shuffling mechanisms
  • Fault detection through heartbeat mechanism and task reassignment on worker failure
  • Configurable number of map and reduce workers for scalability
Implementation Challenges

Implemented efficient data partitioning strategy to balance workload across workers, developed heartbeat mechanism for failure detection, and designed intermediate file format for map-to-reduce data transfer. Handled race conditions in shared state access and ensured exactly-once task execution semantics.

12.

Full-Stack Web Development

Quacker - Social Media Platform C++/SQL

Full-Stack Developer & Database Management | Winter 2024

Auth
Secure Sessions
Social
Post & Comment
Follow
User Connections

Built a Twitter-inspired social media platform using Flask backend with PostgreSQL database and responsive HTML/CSS/JavaScript frontend. Implements core social networking features including user profiles, posts (quacks), comments, likes, and follow relationships with proper authentication and database normalization.

Python Flask PostgreSQL SQL HTML/CSS JavaScript Bootstrap REST API
Key Achievements: Flask REST API with PostgreSQL, secure authentication with bcrypt, normalized database schema, social features (posts, comments, follows, likes).

Platform Features & Architecture

Core Functionality:

  • User authentication and session management with secure password hashing
  • Profile creation with customizable bio, avatar, and user information
  • Post creation, editing, and deletion with timestamp tracking
  • Comment system with nested discussions on posts
  • Follow/unfollow mechanics for building social connections
  • Personalized feed showing posts from followed users
  • Search functionality for discovering users and content
  • Like/unlike system for post engagement tracking

Technical Implementation:

  • Backend: Flask application with RESTful API design, SQL ORM for database interactions, Flask-Login for session management
  • Database: PostgreSQL with normalized schema - users, posts, comments, follows, likes tables with proper foreign key relationships
  • Frontend: Responsive design with Bootstrap, AJAX for dynamic content updates without page refreshes
  • Security: Password hashing with bcrypt, CSRF protection, SQL injection prevention through parameterized queries

Technical Skills

Embedded Systems & Real-Time

STM32 ESP32 FreeRTOS ESP-IDF Arduino Hardware Timers DMA Multi-threading

Communication Protocols

CAN Bus SPI I2C UART BLE MQTT TCP/IP UDP

Programming Languages

C C++ Python Verilog VHDL Assembly JavaScript Bash

Hardware Design & Tools

Altium EasyEDA Vivado LTspice Oscilloscope Logic Analyzer Soldering

Robotics & Autonomy

ROS2 IMU/AHRS Navigation Motion Control Sensor Fusion

Machine Learning & Data

PyTorch CUDA NumPy Pandas Matplotlib

Contact

Let's Build Something Together

Currently seeking co-op and internship opportunities for May 2026 to December 2026 in embedded systems, ASIC design, hardware verification, and autonomous systems. Open to full-time positions starting May 2027.

ymoussa@ualberta.ca

LinkedIn GitHub