🧠 Real-Time Gesture Recognition FPGA Accelerator

This project demonstrates a complete hardware/software co-design workflow to accelerate a Convolutional Neural Network (CNN) for hand gesture recognition on an embedded Xilinx Zynq UltraScale+ MPSoC platform (Avnet UltraZed-EG).

The objective is to offload inference from the CPU (ARM) to the FPGA (PL) to achieve low-latency, energy-efficient inference suitable for embedded applications (robotics, smart devices, HMI).

🚀 Architecture and Workflow

This project doesn't just train a model; it compiles it into hardware. The complete workflow, from Keras to an FPGA bitstream, is as follows:

Keras (Float) Training: A compact CNN is first trained with Keras in float32 to establish an accuracy baseline.
QAT Training (QKeras): The model is converted to QKeras and retrained (or fine-tuned) using Quantization-Aware Training (QAT). This adapts the network weights to low-precision arithmetic (e.g., INT8) that the FPGA can compute efficiently.
.h5 Export: The quantized model is saved in the .h5 format. HLS4ML is capable of reading this file and directly interpreting the QKeras layers to infer the hardware data types (e.g., ap_fixed<8,2>).
HLS Synthesis (HLS4ML): HLS4ML is used to convert the .h5 graph into optimized HLS C++ code. It generates a complete project ready for synthesis.
Hardware Compilation (Vitis HLS): Vitis HLS (called by the HLS4ML build script) synthesizes the C++ into a hardware IP block (RTL - Verilog/VHDL) ready to be imported into a logic design.
System Integration (Vivado): The hardware IP is imported into Vivado and integrated into a Zynq MPSoC Block Design. It is connected to the processor (PS) via an AXI-Lite interface (for control) and to the DDR memory via AXI DMA (for the image pixel stream). A bitstream is then generated.
Deployment (PYNQ): The final application runs on PYNQ (Python on Zynq) on the UltraZed-EG board. The Python code (running on the ARM CPU) handles:
- Video capture via OpenCV.
- Image preprocessing (64x64 resize, normalization).
- Sending the image to the accelerator (PL) via DMA.
- Receiving the results (logits) from the PL.
- Post-processing (Softmax on the CPU) and displaying the recognized gesture.

🚧 Limitations & Future Work

Dataset: This model is trained on SignMNIST. Performance on real-world camera feeds is poor due to the "domain gap".

Roadmap:

Fine-tune the model on a real-world dataset (like HaGRID) to improve camera inference.
Complete the full PYNQ integration (scripts/5_run_inference_pynq.py).
Benchmark power consumption on the UltraZed board.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
IP		IP
data		data
doc		doc
scripts		scripts
src		src
zedboard_gesture_system		zedboard_gesture_system
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clean_dataset.py		clean_dataset.py
data_loader.py		data_loader.py
environment_keras.yml		environment_keras.yml
gesture.bin		gesture.bin
gesture.bit		gesture.bit
gesture.hwh		gesture.hwh
settings.py		settings.py
vitis_hls.log		vitis_hls.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Real-Time Gesture Recognition FPGA Accelerator

🚀 Architecture and Workflow

🚧 Limitations & Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Real-Time Gesture Recognition FPGA Accelerator

🚀 Architecture and Workflow

🚧 Limitations & Future Work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages