Experiments

How we trained both models — from dataset preparation to fine-tuning to evaluation. All training runs on Kaggle free GPU (T4) at $0 cost.

Understanding YOLOv12s

YOLOv12s

Attention-Centric Detection

What it is

YOLOv12 is the latest in the YOLO (You Only Look Once) family of real-time object detectors. The “s” variant has 9.3M parameters — small enough for fast inference but large enough for accurate detection. The key innovation is the Area Attention Module (A²) which replaces traditional convolution blocks with attention mechanisms.

Why it's good for signatures

Signatures are thin, elongated stroke patterns on cluttered document backgrounds. The attention mechanism can focus on these fine details better than pure convolution, which tends to average over local patches and miss thin lines.

YOLOv12s specs:

Parameters: 9.3M

COCO mAP@0.5:0.95: 48.0 (vs YOLOv8s: 44.9)

Inference: 2.61ms on T4 GPU

Architecture: R-ELAN backbone + Area Attention + FlashAttention

Dataset

tech4humans/signature-detection (HuggingFace)

1,980

train

420

validation

419

test

Combined from Tobacco800 (scanned business documents) + Roboflow 100 signatures. Apache 2.0 license.

Training Process

Phase 1 — Head warm-up (epochs 1-20):

Freeze: backbone (10 layers)

Train: detection head only

LR: 0.01, Optimizer: AdamW

Phase 2 — Full fine-tuning (epochs 21-100):

Freeze: nothing (all layers trainable)

LR: 0.0001 (100x lower — prevent catastrophic forgetting)

Augmentation: rotation ±10°, shear, brightness ±20%, mosaic, blur

GPU: Kaggle T4 (free), ~3-4 hours total

Results

0.910

mAP@0.5

0.533

mAP@0.5:0.95

0.916

Precision

0.884

Recall

Phase 1 → Phase 2 improvement: mAP@0.5 improved from 0.846 to 0.910 (+7.6%). Precision jumped from 0.812 to 0.916 (+12.8%). Trained on RunPod RTX 4090 in ~20 minutes total, cost ~$0.15.

Evaluation

Try it

Live Demo