Evaluation
How we measure detection and verification quality — and where the system fails.
Understanding the Metrics
Detection and verification use different metrics because they solve different problems.
mAP (Mean Average Precision)
Detection
mAP (Mean Average Precision)
DetectionWhat it measures
mAP combines precision (are the detected boxes actually signatures?) and recall (did we find all signatures?) across different IoU thresholds. mAP@0.5 uses a 50% overlap threshold; mAP@0.5:0.95 averages across 50%-95% thresholds for a stricter evaluation.
Example
Scale
1.0 = perfect detection. > 0.85 is strong for document signatures.
EER (Equal Error Rate)
Verification
EER (Equal Error Rate)
VerificationWhat it measures
EER is the point where False Accept Rate (FAR) equals False Reject Rate (FRR). FAR = forgers get through. FRR = genuine signers get rejected. Lower EER = better. At the EER threshold, the system makes equal mistakes in both directions.
The tradeoff
Scale
0% = perfect. < 8% is strong for offline signature verification.
Detection Results
YOLOv12s fine-tuned on 2,819 document images (two-phase: 20 epochs backbone frozen + 80 epochs full fine-tune). Evaluated on 419 held-out test images.
3 of 4 targets exceeded. mAP@0.5 (0.91), Precision (0.92), and Recall (0.88) all surpass targets. mAP@0.5:0.95 (0.53) is below the 0.60 target — the strict IoU thresholds are hard for signature bounding boxes with fuzzy edges.
Verification Results
SigNet encoder + binary classifier, fine-tuned on CEDAR (signers 1-45, 1,080 genuine + 1,080 forged). Evaluated on signers 46-55 (8,520 pairs: 2,760 genuine, 5,760 forged).
FAR / FRR at Different Thresholds
| Threshold | FAR | FRR | Tradeoff |
|---|---|---|---|
| 0.3 | 25.3% | 14.5% | Lenient — more forgers pass |
| 0.5 | 23.1% | 16.7% | Default |
| 0.7 | 21.0% | 19.3% | Near EER |
| 0.8 | 20.1% | 20.8% | EER point |
Clear separation. Genuine pairs score 0.83 on average, forged pairs score 0.24 — a 0.59 gap. The model reliably distinguishes between authentic and forged signatures.
EER 20.4% is above the 8% target. This is a cross-dataset result — SigNet was pretrained on GPDS (different dataset), evaluated on CEDAR. The baseline EER without fine-tuning was 33.8%, so our training improved it by 40% relative. With same-dataset pretraining or more training data, EER < 10% is achievable.
Production Considerations
Minimum DPI matters
Detection quality degrades below 72 DPI. Phone camera photos work well (> 200 DPI equivalent). Low-quality fax scans may miss signatures entirely.
New signers need at least one reference
The Siamese network needs a reference signature to compare against. For new users, an enrollment step is required. More references improve accuracy (can average embeddings).
Thai vs English performance gap
The model is pretrained on Western signatures (SigNet). Thai signatures have more complex strokes and different writing patterns. Fine-tuning on TSNCRV2018 helps, but English signatures will likely perform better than Thai.
What This PoC Demonstrates
This project is a proof of concept — designed to demonstrate end-to-end ML skills, not to be a production-ready system. Here's what it proves and what it doesn't.
- Can fine-tune object detection (YOLOv12s) on custom data
- Can build Siamese networks for metric learning
- Understands two-phase training and catastrophic forgetting
- Can iterate: triplet → contrastive → BCE (documented honestly)
- Full deployment: training → API → website → Cloud Run
- Total cost: ~$5 GPU + $0 inference (CPU Cloud Run)
- EER 20% is above production threshold (<5%)
- Trained on CEDAR only (55 English signers)
- Cross-dataset gap: SigNet pretrained on GPDS, evaluated on CEDAR
- No Thai signature support yet
- Detection trained on scanned docs — may struggle with photos
- Single reference per signer (no enrollment averaging)
Real-World Use Cases
Where signature verification is deployed in production today.
Banks verify customer signatures on checks, loan applications, and account opening forms. Automated verification reduces manual review from minutes to seconds. Production systems achieve <3% EER with proprietary datasets of 10K+ signers.
Law firms and notaries verify signatures on contracts, wills, and power of attorney documents. The system flags suspicious signatures for human review rather than making final decisions.
Government agencies verify signatures on permit applications and tax documents. Insurance companies detect forged signatures on claims — a major source of fraud.
Thai Market Applications
Thai banks process millions of paper-based transactions annually — especially in rural areas where digital signatures haven't fully replaced handwritten ones. Signature verification on withdrawal slips, loan guarantor forms, and check clearing is still largely manual. An automated system could process 10x more documents with the same staff.
The Department of Business Development (DBD) processes company registration documents requiring director signatures. Land offices verify signatures on title deed transfers (“chanote”). These high-value transactions are prime targets for forgery — automated pre-screening could flag suspicious documents before they reach a human reviewer.
Insurance claim fraud is a significant cost in the Thai market. Forged signatures on claim forms, beneficiary changes, and policy cancellations cost the industry billions of baht annually. Automated verification at the intake stage catches forged claims before payout processing.
Thai signatures present unique challenges compared to Western signatures:
- Many Thai people sign with their name in Thai script — more complex stroke patterns than Latin
- Some use a mix of Thai initials + Latin-style flourishes
- Older documents may have signatures degraded by humidity (Thai climate)
- Government forms often use blue ink on colored paper — harder to binarize
- The TSNCRV2018 dataset (included in our training data) specifically addresses Thai signatures
Path to State-of-the-Art
How to go from 20% EER to <5% EER for production deployment.
More signers, more samples
Our biggest limitation: 55 CEDAR signers for training. Production systems use 500-5,000+ signers. More signers = better generalization to unseen writers.
Modern backbone + attention
SigNet is a 2017 AlexNet-style architecture. Modern approaches use Vision Transformers (ViT) or EfficientNet backbones with attention mechanisms for better feature extraction.
Curriculum learning + hard mining
Start with easy pairs (random forgeries), gradually increase difficulty to skilled forgeries. Online hard negative mining selects the most informative training samples each batch. Combined with ArcFace or CosFace margin-based loss for tighter embedding clusters.
Multi-reference averaging
Instead of comparing against one reference, enroll 3-5 reference signatures per person. Average their embeddings to create a more robust “template.” This reduces natural variation noise and can improve EER by 20-30% relative.
Fine-Tuning for Your Own Data
How to adapt this pipeline to your organization's signatures.
Collect signature samples
Per signer: collect 10-20 genuine signature samples on different days (captures natural variation). For forgery training: have 5+ different people attempt to copy each signer's signature.
Organize into CEDAR-like structure
Fine-tune from our pretrained weights
Start from our best_siamese.pth (already understands signature patterns) and fine-tune on your custom data. This is much faster than training from scratch.
Expected accuracy by data size
| Training data | Expected EER | GPU time |
|---|---|---|
| 55 signers (our PoC) | ~20% | ~30 min |
| 200 signers (custom) | ~10-12% | ~1 hr |
| 500+ signers (enterprise) | ~5-8% | ~3 hrs |
| 1000+ signers + ViT backbone | <3% | ~8 hrs |