Research Resources

Curated papers, videos, and tutorials on Vision Transformers and FPGA acceleration.

Paper

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Liu et al. · 2021 · ICCV 2021

Introduces shifted window attention for linear complexity, enabling ViT to scale to high-resolution images and dense prediction tasks.

SwinViTObject Detection

22,000 citations

Paper

Training data-efficient image transformers & distillation through attention (DeiT)

Touvron et al. · 2021 · ICML 2021

Shows ViT can be trained efficiently on ImageNet alone using knowledge distillation, making it practical without massive pre-training datasets.

DeiTViTDistillation

7,800 citations

Survey

A Survey of Quantization Methods for Efficient Neural Network Inference

Gholami et al. · 2022 · Low-Power Computer Vision

Comprehensive survey of neural network quantization techniques including post-training quantization, quantization-aware training, and mixed-precision methods.

QuantizationEfficiencySurvey

2,100 citations

Tutorial

Vitis HLS User Guide (UG1399)

AMD/Xilinx · 2024 · AMD Documentation

Official Vitis HLS documentation covering pragmas, directives, dataflow optimization, and best practices for high-performance HLS design.

HLSVitisFPGA

Paper

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

Zeng et al. · 2024 · FPGA 2024

Recent work mapping transformer-based LLMs to FPGAs with custom sparse computation and memory optimization. Highly relevant to our ViT acceleration work.

FPGATransformerLLM

85 citations

Paper

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

Li et al. · 2022 · FPL 2022

Directly relevant: proposes an automated framework for ViT acceleration on FPGAs with mixed-precision quantization and hardware-aware NAS.

ViTFPGAQuantization

52 citations

Video

FPGA Design for AI Inference — Xilinx Developer Series

Xilinx/AMD · 2023 · YouTube

Practical walkthrough of deploying neural networks on Xilinx FPGAs using Vitis AI, covering quantization, compilation, and board deployment.

FPGAVitis AIXilinx

Survey

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Sze et al. · 2017 · Proceedings of the IEEE

Essential survey covering DNN hardware acceleration fundamentals: dataflows, memory hierarchies, and hardware architectures for efficient inference.

HardwareDNNSurvey

4,200 citations

Paper

Via: A novel vision-transformer accelerator based on fpga

2022 · IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

Paper

ViTA: A Vision Transformer Inference Accelerator for Edge Applications

2026 · International Symposium on Circuits and Systems (ISCAS)

Paper

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Kan Wu · 2022

TinyVit

Professor Suggested Readings

7 items

Prof. Suggested

H2PIPE

2024

a CNN accelerator that uses a dataflow pipeline (done in Professor Betz's group, and the latest in a few related works), this accelerator style will be harder to make work for a transformer though.

CNNAccelerator

2026

The latest version of the NPU (again in Professor Betz's group, and you can see earlier publications in its references). It is an overlay architecture, meaning a very customized and big soft processor. This style is more likely to work well for a transformer.

NPUOverlayBig soft processor

2023

Here's a (fairly advanced) lecture on vision transformers, by Song Han at MIT. He has lots of interesting work on efficient machine learning.

EfficientMLLectureMIT

Prof. Suggested

ECE 496 Google Doc

Winnie · 2026

Google doc for meeting notes and discussions

Prof. Suggested

System design overview

2026

An overview of the top level, and stuff to consider.

Prof. Suggested

TinyViT profile result

2026

Profile

Prof. Suggested

Project Title and Project Description

2026 · 2026

Project Title (options) ViT-FPGA: Hybrid HPS–FPGA Vision Transformer Accelerator on DE1-SoC Edge Vision Transformer Acceleration on FPGA with ARM HPS Co-Design TinyViT Hardware Acceleration on DE1-SoC Using Hybrid CPU–FPGA Architecture A Hybrid FPGA–HPS Architecture for Efficient Vision Transformer Inference This capstone project implements a lightweight Vision Transformer (ViT / TinyViT) inference accelerator on the Intel DE1-SoC platform, using a hybrid architecture that combines the ARM Cortex-A9 Hard Processor System (HPS) with FPGA fabric. The system offloads compute-intensive operations such as matrix multiplications in attention and MLP blocks to custom FPGA kernels, while the HPS manages high-level control flow, memory orchestration, and system integration. Model parameters and intermediate activations are primarily stored in external DDR3 memory on the HPS side, with FPGA-side SDRAM used as a low-latency cache for acceleration kernels. Data movement is coordinated via DMA between memory hierarchies. The design explores system-level challenges in mapping transformer workloads onto heterogeneous hardware, including memory placement, bandwidth constraints, and efficient execution of attention mechanisms. Input images are acquired via either USB camera (HPS-managed Linux pipeline) or GPIO camera module (FPGA direct interface), with system-level trade-offs evaluated. The project aims to demonstrate a scalable hardware–software co-design approach for deploying transformer-based vision models on resource-constrained FPGA platforms, with a focus on TinyViT inference and system integration rather than full training support.