~/Research Resources
Capstone 20264th Year

Research Resources

Curated papers, videos, and tutorials on Vision Transformers and FPGA acceleration.

Paper

Introduces shifted window attention for linear complexity, enabling ViT to scale to high-resolution images and dense prediction tasks.

SwinViTObject Detection
22,000 citations
Paper

Shows ViT can be trained efficiently on ImageNet alone using knowledge distillation, making it practical without massive pre-training datasets.

DeiTViTDistillation
7,800 citations
Survey
A Survey of Quantization Methods for Efficient Neural Network Inference
Gholami et al. · 2022 · Low-Power Computer Vision

Comprehensive survey of neural network quantization techniques including post-training quantization, quantization-aware training, and mixed-precision methods.

QuantizationEfficiencySurvey
2,100 citations
Tutorial
Vitis HLS User Guide (UG1399)
AMD/Xilinx · 2024 · AMD Documentation

Official Vitis HLS documentation covering pragmas, directives, dataflow optimization, and best practices for high-performance HLS design.

HLSVitisFPGA
Paper

Recent work mapping transformer-based LLMs to FPGAs with custom sparse computation and memory optimization. Highly relevant to our ViT acceleration work.

FPGATransformerLLM
85 citations
Paper

Directly relevant: proposes an automated framework for ViT acceleration on FPGAs with mixed-precision quantization and hardware-aware NAS.

ViTFPGAQuantization
52 citations
Video

Practical walkthrough of deploying neural networks on Xilinx FPGAs using Vitis AI, covering quantization, compilation, and board deployment.

FPGAVitis AIXilinx
Survey
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Sze et al. · 2017 · Proceedings of the IEEE

Essential survey covering DNN hardware acceleration fundamentals: dataflows, memory hierarchies, and hardware architectures for efficient inference.

HardwareDNNSurvey
4,200 citations
Paper
Via: A novel vision-transformer accelerator based on fpga
2022 · IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
Paper
ViTA: A Vision Transformer Inference Accelerator for Edge Applications
2026 · International Symposium on Circuits and Systems (ISCAS)

Professor Suggested Readings

7 items
Prof. Suggested
H2PIPE
2024

a CNN accelerator that uses a dataflow pipeline (done in Professor Betz's group, and the latest in a few related works), this accelerator style will be harder to make work for a transformer though.

CNNAccelerator
Prof. Suggested

The latest version of the NPU (again in Professor Betz's group, and you can see earlier publications in its references).  It is an overlay architecture, meaning a very customized and big soft processor. This style is more likely to work well for a transformer.

NPUOverlayBig soft processor
Prof. Suggested

Here's a (fairly advanced) lecture on vision transformers, by Song Han at MIT. He has lots of interesting work on efficient machine learning.

EfficientMLLectureMIT
Prof. Suggested
ECE 496 Google Doc
Winnie · 2026

Google doc for meeting notes and discussions

Prof. Suggested

An overview of the top level, and stuff to consider.

Prof. Suggested
Profile
Prof. Suggested

Project Title (options) ViT-FPGA: Hybrid HPS–FPGA Vision Transformer Accelerator on DE1-SoC Edge Vision Transformer Acceleration on FPGA with ARM HPS Co-Design TinyViT Hardware Acceleration on DE1-SoC Using Hybrid CPU–FPGA Architecture A Hybrid FPGA–HPS Architecture for Efficient Vision Transformer Inference This capstone project implements a lightweight Vision Transformer (ViT / TinyViT) inference accelerator on the Intel DE1-SoC platform, using a hybrid architecture that combines the ARM Cortex-A9 Hard Processor System (HPS) with FPGA fabric. The system offloads compute-intensive operations such as matrix multiplications in attention and MLP blocks to custom FPGA kernels, while the HPS manages high-level control flow, memory orchestration, and system integration. Model parameters and intermediate activations are primarily stored in external DDR3 memory on the HPS side, with FPGA-side SDRAM used as a low-latency cache for acceleration kernels. Data movement is coordinated via DMA between memory hierarchies. The design explores system-level challenges in mapping transformer workloads onto heterogeneous hardware, including memory placement, bandwidth constraints, and efficient execution of attention mechanisms. Input images are acquired via either USB camera (HPS-managed Linux pipeline) or GPIO camera module (FPGA direct interface), with system-level trade-offs evaluated. The project aims to demonstrate a scalable hardware–software co-design approach for deploying transformer-based vision models on resource-constrained FPGA platforms, with a focus on TinyViT inference and system integration rather than full training support.