A handwritten digit classifier running directly on an Artix-7 FPGA. Draw a digit below — 784 pixel values stream over UART at 115,200 baud, the hardware runs a quantized INT8 MLP through three layers (784→128→64→10), and the prediction comes back in roughly 78 ms.
live demo
Result will appear here
how it works
BROWSER
28×28 canvas captures pixel values 0–127, MNIST-style centered and scaled before sending
API PROXY
Vercel serverless function forwards the 784 pixel values to the FastAPI bridge
UART 115200
784 raw bytes sent to the Basys3 over serial; 13-byte response returned
INT8 MLP
784→128→64→10 network in SystemVerilog, per-layer right-shift quantization, argmax output
architecture
Weights are stored in three BRAMs on the Artix-7 (34.5 BRAM36 total), keeping the LUT count low (~4,400 LUTs). Each layer uses a MAC unit with a 24-bit accumulator and DSP inference, followed by a ReLU and a per-layer right-shift to stay in INT8 range.
The host sends 784 raw bytes over UART at 115,200 baud. The FPGA accumulates them into an input buffer, fires the inference FSM on the final byte, and sends back a 13-byte packet: 0xBB + predicted class (1 byte) + 10 signed logit bytes + 0x66.
The Python bridge on the host deserializes the packet, computes the argmax for verification, and returns JSON to the caller. Round-trip latency including UART is ~78 ms.
results
| Metric | Value |
|---|---|
| MNIST test accuracy | 10 / 10 spot-checked digits correct |
| Round-trip latency | ~78 ms (UART + inference + bridge) |
| BRAM usage | 34.5 BRAM36 (weights) |
| LUT usage | ~4,400 LUTs |
| Clock | 100 MHz (Artix-7 XC7A35T) |
| Weight precision | INT8, 24-bit accumulator |
links