~/Research Resources
Capstone 20264th Year

Research Resources

Curated papers, videos, and tutorials on Vision Transformers and FPGA acceleration.

Paper

The original Vision Transformer (ViT) paper. Demonstrates that pure transformer architectures can match or exceed CNNs on image classification when trained on large datasets.

ViTTransformersImage Classification
28,000 citations
Paper

Pioneering FPGA framework for binarized neural networks. Key reference for understanding how to map neural network operations to FPGA dataflows.

FPGABNNInference
1,900 citations
Paper

Shows ViT can be trained efficiently on ImageNet alone using knowledge distillation, making it practical without massive pre-training datasets.

DeiTViTDistillation
7,800 citations
Paper

Introduces shifted window attention for linear complexity, enabling ViT to scale to high-resolution images and dense prediction tasks.

SwinViTObject Detection
22,000 citations
Survey
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
Sze et al. · 2017 · Proceedings of the IEEE

Essential survey covering DNN hardware acceleration fundamentals: dataflows, memory hierarchies, and hardware architectures for efficient inference.

HardwareDNNSurvey
4,200 citations
Survey
A Survey of Quantization Methods for Efficient Neural Network Inference
Gholami et al. · 2022 · Low-Power Computer Vision

Comprehensive survey of neural network quantization techniques including post-training quantization, quantization-aware training, and mixed-precision methods.

QuantizationEfficiencySurvey
2,100 citations
Video

Practical walkthrough of deploying neural networks on Xilinx FPGAs using Vitis AI, covering quantization, compilation, and board deployment.

FPGAVitis AIXilinx
Tutorial
Vitis HLS User Guide (UG1399)
AMD/Xilinx · 2024 · AMD Documentation

Official Vitis HLS documentation covering pragmas, directives, dataflow optimization, and best practices for high-performance HLS design.

HLSVitisFPGA
Paper

Recent work mapping transformer-based LLMs to FPGAs with custom sparse computation and memory optimization. Highly relevant to our ViT acceleration work.

FPGATransformerLLM
85 citations
Video
MIT 6.5940 TinyML and Efficient Deep Learning Computing
Song Han · 2023 · MIT OpenCourseWare

MIT course covering efficient deep learning: pruning, quantization, knowledge distillation, and hardware-aware neural architecture search.

TinyMLQuantizationPruning
Paper

Directly relevant: proposes an automated framework for ViT acceleration on FPGAs with mixed-precision quantization and hardware-aware NAS.

ViTFPGAQuantization
52 citations

Professor Suggested Readings

3 items
Prof. Suggested
H2PIPE
2024

a CNN accelerator that uses a dataflow pipeline (done in Professor Betz's group, and the latest in a few related works), this accelerator style will be harder to make work for a transformer though.

CNNAccelerator
Prof. Suggested

The latest version of the NPU (again in Professor Betz's group, and you can see earlier publications in its references).  It is an overlay architecture, meaning a very customized and big soft processor. This style is more likely to work well for a transformer.

NPUOverlayBig soft processor
Prof. Suggested

Here's a (fairly advanced) lecture on vision transformers, by Song Han at MIT. He has lots of interesting work on efficient machine learning.

EfficientMLLectureMIT