← back
FPGA / RTL

FPGA Neural Network Accelerator

SystemVerilog Python UART INT8 Basys3 Artix-7 Vivado Live demo

A handwritten digit classifier running directly on an Artix-7 FPGA. Draw a digit below — 784 pixel values stream over UART at 115,200 baud, the hardware runs a quantized INT8 MLP through three layers (784→128→64→10), and the prediction comes back in roughly 78 ms.


checking…

Result will appear here


BROWSER

28×28 canvas captures pixel values 0–127, MNIST-style centered and scaled before sending

API PROXY

Vercel serverless function forwards the 784 pixel values to the FastAPI bridge

UART 115200

784 raw bytes sent to the Basys3 over serial; 13-byte response returned

INT8 MLP

784→128→64→10 network in SystemVerilog, per-layer right-shift quantization, argmax output


Weights are stored in three BRAMs on the Artix-7 (34.5 BRAM36 total), keeping the LUT count low (~4,400 LUTs). Each layer uses a MAC unit with a 24-bit accumulator and DSP inference, followed by a ReLU and a per-layer right-shift to stay in INT8 range.

The host sends 784 raw bytes over UART at 115,200 baud. The FPGA accumulates them into an input buffer, fires the inference FSM on the final byte, and sends back a 13-byte packet: 0xBB + predicted class (1 byte) + 10 signed logit bytes + 0x66.

The Python bridge on the host deserializes the packet, computes the argmax for verification, and returns JSON to the caller. Round-trip latency including UART is ~78 ms.


MetricValue
MNIST test accuracy10 / 10 spot-checked digits correct
Round-trip latency~78 ms (UART + inference + bridge)
BRAM usage34.5 BRAM36 (weights)
LUT usage~4,400 LUTs
Clock100 MHz (Artix-7 XC7A35T)
Weight precisionINT8, 24-bit accumulator