Research Resources
Curated papers, videos, and tutorials on Vision Transformers and FPGA acceleration.
Introduces shifted window attention for linear complexity, enabling ViT to scale to high-resolution images and dense prediction tasks.
Shows ViT can be trained efficiently on ImageNet alone using knowledge distillation, making it practical without massive pre-training datasets.
Comprehensive survey of neural network quantization techniques including post-training quantization, quantization-aware training, and mixed-precision methods.
Official Vitis HLS documentation covering pragmas, directives, dataflow optimization, and best practices for high-performance HLS design.
Recent work mapping transformer-based LLMs to FPGAs with custom sparse computation and memory optimization. Highly relevant to our ViT acceleration work.
Directly relevant: proposes an automated framework for ViT acceleration on FPGAs with mixed-precision quantization and hardware-aware NAS.
Practical walkthrough of deploying neural networks on Xilinx FPGAs using Vitis AI, covering quantization, compilation, and board deployment.
Essential survey covering DNN hardware acceleration fundamentals: dataflows, memory hierarchies, and hardware architectures for efficient inference.
Professor Suggested Readings
a CNN accelerator that uses a dataflow pipeline (done in Professor Betz's group, and the latest in a few related works), this accelerator style will be harder to make work for a transformer though.
The latest version of the NPU (again in Professor Betz's group, and you can see earlier publications in its references). It is an overlay architecture, meaning a very customized and big soft processor. This style is more likely to work well for a transformer.
Here's a (fairly advanced) lecture on vision transformers, by Song Han at MIT. He has lots of interesting work on efficient machine learning.
Project Title (options) ViT-FPGA: Hybrid HPS–FPGA Vision Transformer Accelerator on DE1-SoC Edge Vision Transformer Acceleration on FPGA with ARM HPS Co-Design TinyViT Hardware Acceleration on DE1-SoC Using Hybrid CPU–FPGA Architecture A Hybrid FPGA–HPS Architecture for Efficient Vision Transformer Inference This capstone project implements a lightweight Vision Transformer (ViT / TinyViT) inference accelerator on the Intel DE1-SoC platform, using a hybrid architecture that combines the ARM Cortex-A9 Hard Processor System (HPS) with FPGA fabric. The system offloads compute-intensive operations such as matrix multiplications in attention and MLP blocks to custom FPGA kernels, while the HPS manages high-level control flow, memory orchestration, and system integration. Model parameters and intermediate activations are primarily stored in external DDR3 memory on the HPS side, with FPGA-side SDRAM used as a low-latency cache for acceleration kernels. Data movement is coordinated via DMA between memory hierarchies. The design explores system-level challenges in mapping transformer workloads onto heterogeneous hardware, including memory placement, bandwidth constraints, and efficient execution of attention mechanisms. Input images are acquired via either USB camera (HPS-managed Linux pipeline) or GPIO camera module (FPGA direct interface), with system-level trade-offs evaluated. The project aims to demonstrate a scalable hardware–software co-design approach for deploying transformer-based vision models on resource-constrained FPGA platforms, with a focus on TinyViT inference and system integration rather than full training support.