A curated collection of papers, datasets, code, and tools for detecting tampered text and forged documents.
| Paper | Year | Venue | Method | Dataset | Train. Code | Val. Code | Infer. Code |
|---|---|---|---|---|---|---|---|
| Yang et al. [1] : Deep Learning + ELA fusion | 2022 | ISEEIE | Deep learning + ELA fusion | Ali Tianchi Competition | - | - | - |
| Wang et al. [2] : SR3 | 2022 | ECCV | SR3 | T-IC13 | - | - | - |
| Qu et al. [3] : DTD | 2023 | CVPR | DTD | Doc-Tamper | ✅ | ✅ | ✅ |
| Okamoto et al. [4] : FCN | 2023 | arXiv | FCN (semantic segmentation) | FD-VIED | - | - | - |
| Jain [5] : K-Means, Decision Trees, Logistic Regression | 2024 | IRJET | K-Means, Decision Trees, Logistic Regression | - | - | - | - |
| Chen et al. [6] : FFDN | 2024 | ECCV | FFDN | Doc-Tamper | - | - | - |
| Qu et al. [7] : TextSleuth / Tampered Text Detective | 2024 | arXiv | TextSleuth / Tampered Text Detective | ETTD | - | - | - |
| Shao et al. [8] : DTL-ARob | 2024 | ECCV | DTL-ARob | Doc-Tamper, T-SROIE | - | - | - |
| Wang et al. [9] : SherlockNet | 2024 | IEEE TMM | SherlockNet | EN-HA | - | - | - |
| Li et al. [10] : Dual-path framework | 2024 | arXiv | Dual-path framework | Suning Scene Data | - | - | - |
| Liao et al. [11] : CTP-Net | 2023 | arXiv | CTP-Net | Fake Chinese Trademark (FCTM) | - | - | - |
| Li et al. [12] : MA-Net | 2024 | IEEE TIFS | Forgery trace enhancement + multiscale attention | TextTamper | ✅ | ✅ | ✅ |
| Ren et al. [13] : EMF-Net | 2024 | Expert Systems With Applications | EMF-Net | TMI12K | - | - | - |
| Qu et al. [14] : DAF + Text Jitter | 2025 | AAAI | DAF + Text Jitter | OSTF | ✅ | ✅ | ✅ |
| Duan et al. [15] : TTDMamba | 2025 | IJCV | TTDMamba | RealTTD | - | - | - |
| Wong et al. [16] : ADCD-Net | 2025 | ICCV | ADCD-Net | Doc-Tamper | ✅ | ✅ | - |
| Li et al. [17] : DCLNet | 2025 | Signal Processing | DCLNet | Doc-Tamper | - | - | - |
| Nguyen et al. [18] : TALIU | 2025 | IEEE Access | TALIU | Doc-Tamper | - | - | - |
| Li et al. [19] : CD-SD | 2025 | JVCIR | CD-SD | Doc-Tamper | - | - | - |
| Li et al. [20] : Spatial-Frequency Fusion + Swin-T | 2025 | IET | Spatial-Frequency Fusion + Swin-T | Doc-Tamper | - | - | - |
| George & Marcel [21] : EdgeDoc | 2025 | arXiv | EdgeDoc | FantasyID | - | - | - |
| Luo et al. [26] : ASC-Former | 2025 | Pattern Recognition | ASC-Former | RTM | ✅ | ✅ | ✅ |
A Dataset for Forgery Detection and Spotting in Document Images (2017) [22]
- Introduction: Character-level ground truth on payslip documents with XML annotations for character bounding boxes and values; 677 images focusing on copy-paste tampering.
- Link: Dataset Homepage
CMID — Copy-Move ID (2021) [23]
- Introduction: Pixel-level masks for copy-move forgeries on ID documents. Includes separate genuine (304) and tampered (893) images.
- Link: Dataset Homepage
FCD — Forged Character Detection Datasets (2022) [24]
- Introduction: Bounding-box annotations for forged characters across passports, driving licences, and visa stickers; three 15K-image subsets.
- Link: Implementation code | FCD-P | FCD-D | FCD-V
Tampered-IC13 (2022) [2]
- Introduction: Bounding-box annotations over ICDAR 2013 scene text images with S3R strategy; train/test splits 229/233. Tampering generated via SRNet.
- Link: Dataset Homepage
FSD — Forged Scanned Document (2023) [25]
- Introduction: Pixel-level segmentation masks for forged documents built on FUNSD; covers copy-move, splicing, and resampling.
- Link: Download Link
DocTamper (2023) [3]
- Introduction: Large-scale pixel-level dataset for document tampering localization with multiple subsets (contracts, invoices, receipts, noisy office, scanned receipts).
- Link: Dataset Homepage
ETTD — Explainable Tampered Text Detection Dataset (2024) [7]
- Introduction: Pixel-level masks with natural language rationales for documents, IDs, and scene texts; includes ETTD-Train, ETTD-Test, and class-agnostic ETTD-CD.
- Link: Not Available
TextTamper (2024) [12]
- Introduction: Pixel-level annotations for text tampering localization across certificates, documents, and tables; created with rule-based, Poisson, and deep image blending.
- Link: Dataset Homepage
OSTF — Open-set Scene Text Forensics (2025) [14]
- Introduction: Open-set benchmark on scene text with bounding boxes; evaluates multiple generation/erasure models (Derend, SRNet, STEFANN, Mostel, DiffSTE, AnyText, Textdiff, UDiffText) on ICDAR 2013, ReCTS, TextOCR, etc.
- Link: Dataset Homepage
RealDTT — Real-world Comprehensive Dataset of Tampered Text Images (2025) [15]
- Introduction: Pixel-level segmentation across scene text and documents from MARIO-LAION, FUNSD, ReCTS, LSVT, RCTW; includes Photoshop, STEFANN, MOSTEL, VATr, ViTEraser, SRNet, DiffSTE, AnyText, UDiffText, TextDiffuser subsets.
- Link: Dataset Homepage
RTM — Real Text Manipulation (2025) [26]
- Introduction: Pixel-level binary masks for varied manipulations (copy-move, splicing, insertion, inpainting, coverage) across charts, receipts, certificates, scanned docs, table-heavy pages; train/test from SROIE, FUNSD, TNCR, volunteers.
- Link: Dataset Homepage
| Datasets | Language | Annotation level | Subset | Domain |
|---|---|---|---|---|
| A Dataset for Forgery Detection and Spotting in Document Images [22] | French | Character-level (XML) | - | Payslips |
| CMID (Copy-Move ID) [23] | French | Pixel-level | Genuine / Tampered | ID documents |
| FCD [24] | English | Bounding box | FCD-P / FCD-D / FCD-V | Passports, Driving Licences, Visa Stickers |
| Tampered-IC13 [2] | English | Bounding box | Train / Test | Scene texts |
| FSD (Forged Scanned Document) [25] | English | Pixel-level (segmentation) | Train / Test / Authentic | Scanned documents |
| DocTamper [3] | English, Chinese | Pixel-level (segmentation) | Train / Test / DocTamper-FCD / DocTamper-SCD | Contracts, Invoices, Receipts, Text pages |
| ETTD [7] | English, Chinese | Pixel-level + Natural language | Train / Test / ETTD-CD | Documents, ID cards, Scene texts |
| TextTamper [12] | Chinese | Pixel-level | Train / Val | Certificates, Documents, Tables |
| OSTF [14] | Chinese | Bounding box | Model-specific splits | Scene text |
| RealDTT [15] | Various | Pixel-level (segmentation) | Photoshop / STEFANN / MOSTEL / VATr / ViTEraser / SRNet / DiffSTE / AnyText / UDiffText / TextDiffuser | Scene text, Documents |
| RTM [26] | English | Pixel-level (binary masks) | Train / Test | Charts, Receipts, Certificates, Scanned documents, Tables |
| Model Name | Year | Venue | Category | Key Contribution / Core Idea | Architecture | GitHub / Code Status | Paper Link |
|---|---|---|---|---|---|---|---|
| SRNet [27] | 2019 | ACM MM | GAN-Based Editing | Early GAN-based end-to-end network for text replacement. | GAN-based | Official Repo | Paper |
| EnsNet [28] | 2019 | ArXiv | Text Editing, Inpainting, Removal | An early text removal model using a feature-level attention block. | GAN-based (U-Net with attention) | Official Repo | Paper |
| SwapText [29] | 2020 | ArXiv | GAN-Based Editing | Robust three-stage GAN for text replacement and background preservation. | Three-stage GAN | N/A | Paper |
| STEFANN [30] | 2020 | CVPR | Character-Level & Font-Aware | Character-level editing preserving font structure and color. | Two-stage GAN (FANnet + Colornet) | Official Repo | Paper |
| dRENDER [31] | 2021 | ICCV | Parametric & De-rendering | Parses rendering parameters of stylized text for artifact-free re-rendering. | Differentiable Text Rendering Model | Official Repo | Paper |
| MOSTEL [32] | 2022 | AAAI | Stroke-Level & Fine-Grained | Stroke-level text editing using guidance maps for high glyph fidelity. | Generates stroke guidance maps; Semi-supervised hybrid learning | Official Repo | Paper |
| DiffUTE [33] | 2023 | NeurIPS | Universal Text Editing & Diffusion | Self-supervised diffusion model for high-fidelity text replacement/modification. | Diffusion Model with glyph/position guidance | Official Repo | Paper |
| DiffSTE [34] | 2023 | ArXiv | Universal Text Editing & Diffusion | Diffusion-based scene text editing to modify styles and colors while preserving structure. | Diffusion Model | Official Repo | Paper |
| Magicremover [35] | 2023 | ArXiv | Text Editing, Inpainting, Removal | Tuning-free text-guided image inpainting for text/object removal. | Diffusion Model (Stable Diffusion) | N/A | Paper |
| AnyText [36] | 2023 | ArXiv | Multilingual & Cross-Language | Pioneering multilingual text generation/editing with an OCR-based text encoder. | Diffusion Model with auxiliary latent module | Official Repo | Paper |
| TextDiff [37] | 2023 | ArXiv | Text Super-Resolution | Diffusion model that sharpens text by predicting the high-frequency residual. | Two-module framework: TEM + MRD (Residual Diffusion) | Official Repo | Paper |
| GlyphControl [38] | 2023 | NeurIPS | Character-Aware Synthesis | Glyph-conditional diffusion model for explicit control of text content, location, and size. | Conditional Diffusion Model | Official Repo | Paper |
| UDiffText [39] | 2023 | ArXiv | Character-Aware Synthesis | Unified framework for text synthesis and editing using a character-level text encoder. | Fine-tuned Stable Diffusion with char-level encoder | Official Repo | Paper |
| TextCtrl [40] | 2024 | NeurIPS | Style-Preserving & Prior-Guided | Explicitly disentangles style and glyph structure priors for style preservation. | Diffusion Model with Style-Structure guidance | Official Repo | Paper |
| AnyText2 [41] | 2024 | ArXiv | Multilingual & Cross-Language | Adds customizable font/color attributes and improves speed over AnyText. | Diffusion Model (WriteNet+AttnX) | Official Repo | Paper |
| TextCrafter [42] | 2024 | ArXiv | Character-Aware Synthesis | Precisely renders multiple texts with varying attributes by segmenting and rendering independently. | Diffusion Model with independent region rendering | Official Repo | Paper |
| TextMaster [43] | 2024 | ArXiv | Style-Preserving & Prior-Guided | Universal controllable text editing with adaptive font and style injection. | Diffusion Model with Attention and Perceptual Loss | N/A | Paper |
| GlyphMastero [44] | 2025 | ArXiv | Stroke-Level & Fine-Grained | Specialized glyph encoder to guide a diffusion model for stroke-level precision. | Diffusion Model + Glyph Encoder | N/A | Paper |
| Qwen-Image [45] | 2025 | ArXiv | Character-Aware Synthesis | SOTA image foundation model with excellent multilingual text rendering and editing. | Multimodal DiT (MMDiT) | Official Repo | Paper |
[1] P. Yang, W. Fang, F. Zhang, L. Bai, Y. Gao, "Document Image Forgery Detection Based on Deep Learning Models," 2022 International Symposium on Electrical, Electronics and Information Engineering (ISEEIE), 2022, pp. 1-5, Paper
[2] Y. Wang, C. Liu, X. Liu, D. Peng, L. Jin, "SR3 for Tampered Text Detection," Proceedings of the European Conference on Computer Vision (ECCV), 2022, pp. 1234-1245, Paper
[3] C. Qu, Y. Liu, X. Liu, D. Peng, F. Guo, L. Jin, "DTD: Document Tampering Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 5937-5946 Paper | Code
[4] Y. Okamoto, G. Osada, I. Yahiro, R. Hasegawa, P. Zhu, H. Kataoka, "Image Generation and Learning Strategy for Deep Document Forgery Detection," arXiv preprint arXiv:2311.03650, Nov. 2023. Paper.
[5] J. Jain, "AI-Driven OCR for Fraud Detection in FinTech Income Verification Systems," International Research Journal of Engineering and Technology (IRJET), Vol. 11, Issue 12, Dec. 2024, pp. 850-854. Paper
[6] Z. Chen, S. Chen, T. Yao, K. Sun, S. Ding, X. Lin, L. Cao, R. Ji, "Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition," in Proceedings of the European Conference on Computer Vision (ECCV), 2024, pp. 394-411. Paper
[7] C. Qu, J. Liu, H. Chen, B. Yu, J. Liu, W. Wang, L. Jin, "Explainable Tampered Text Detection via Multimodal Large Models," arXiv preprint arXiv:2412.14816v2, Dec. 2024. Paper
[8] H. Shao, Z. Qian, K. Huang, W. Wang, X. Huang, Q. Wang, "Delving into Adversarial Robustness on Document Tampering Localization," Proceedings of the European Conference on Computer Vision (ECCV), 2024. Paper
[9] J. Wang, L. Mou, C. Zheng, W. Gao, "Image-Based Freeform Handwriting Authentication With Energy-Oriented Self-Supervised Learning," IEEE Transactions on Multimedia, vol. 27, pp. 1397-1409, 2025. Paper
[10] G. Li, X. Yang, W. Ma, "A Two-Stage Dual-Path Framework for Text Tampering Detection and Recognition," arXiv preprint arXiv:2402.13545v2, Feb. 2024. Paper
[11] X. Liao, S. Chen, J. Chen, T. Wang, X. Li, "CTP-Net: Character Texture Perception Network for Document Image Forgery Localization," arXiv preprint arXiv:2308.02158v1, Aug. 2023. Paper
[12] B. Li, J. Xu, Y. Wang, Z. Wu, "Robust Text Image Tampering Localization via Forgery Traces Enhancement and Multiscale Attention," IEEE Transactions on Information Forensics and Security (TIFS), 2024. Paper | Code
[13] R. Ren, Q. Hao, F. Gu, S. Niu, J. Zhang, M. Wang, "EMF-Net: An Edge-Guided Multi-Feature Fusion Network for Text Manipulation Detection," Expert Systems with Applications, Vol. 249, Part A, 2024, 123548. Paper
[14] C. Qu, Y. Zhong, F. Guo, L. Jin, "Revisiting Tampered Scene Text Detection in the Era of Generative AI," Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 694–702, 2025. Paper | Code
[15] J. Duan, H. Sun, F. Ji, et al., "RealDTT: Towards A Comprehensive Real-World Dataset for Tampered Text Detection," International Journal of Computer Vision, 2025. Paper
[16] K. A. Wong, J. Zhou, H. Wu, Y.-W. Si, J. Zhou, “ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. Paper | Code
[17] W. Li, B. Li, K. Zheng, S. Li, H. Li, "Document image forgery detection and localization in desensitization scenarios," Signal Processing, vol. 238, 110123, 2025. Paper
[18] A. D. Nguyen, H.-Y. Kim, H. N. Nguyen, "TALIU: A Novel Decoder and Augmentation Strategy for Boosting Tampered Document Image Detection," IEEE Access, vol. 13, pp. 70340-70351, 2025. Paper
[19] L. Li, Y. Bai, S. Zhang, M. Emam, "Document forgery detection based on spatial-frequency and multi-scale feature network," Journal of Visual Communication and Image Representation, vol. 107, 104393, 2025. Paper
[20] L. Li, K. Zhang, J. Lu, S. Zhang, N. Chu, "Multiclassification Tampering Detection Algorithm Based on Spatial-Frequency Fusion and Swin-T," IET Image Processing, vol. 19, 2025. Paper
[21] A. George, S. Marcel, "EdgeDoc: Hybrid CNN-Transformer Model for Accurate Forgery Detection and Localization in ID Documents," arXiv preprint arXiv:2508.16284, 2025. Paper
[22] N. Sidere, F. Cruz, M. Coustaty and J.-M. Ogier, "A dataset for forgery detection and spotting in document images," Seventh International Conference on Emerging Security Technologies (EST), Canterbury, UK, 2017, pp. 26–31. Paper
[23] G. Mahfoudi, F. Morain-Nicolier, F. Retraint and M. Pic, "CMID: A New Dataset for Copy-Move Forgeries on ID Documents," IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 2021, pp. 3028–3032. Paper
[24] T. Kumar, M. Turab, S. Talpur, R. Brennan and M. Bendechache, "Forged Character Detection Datasets: Passports, Driving Licences and Visa Stickers," International Journal of Artificial Intelligence & Applications, vol. 13, pp. 21–35, Mar. 2022. Paper
[25] A. K. Jaiswal, S. Singh and S. K. Tripathy, "FSD: A novel forged document dataset and baseline," 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, pp. 1–6, 2023. Paper
[26] D. Luo, Y. Liu, R. Yang, X. Liu, J. Zeng, Y. Zhou and X. Bai, "Toward real text manipulation detection: New dataset and new solution," Pattern Recognition, vol. 157, p. 110828, 2025. Paper
[27] L. Wu, C. Zhang, J. Liu, J. Han, J. Liu, E. Ding and X. Bai, "Editing text in the wild," in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 1500–1508. Paper | Code
[28] S. Zhang, Y. Liu, L. Jin, Y. Huang and S. Lai, "EnsNet: Ensconce text in the wild," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 1, 2019, pp. 801–808. Paper | Code
[29] Q. Yang, J. Huang and W. Lin, "SwapText: Image based texts transfer in scenes," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14700–14709. Paper
[30] P. Roy, S. Bhattacharya, S. Ghosh and U. Pal, "STEFANN: Scene text editor using font adaptive neural network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13228–13237. Paper | Code
[31] W. Shimoda, D. Haraguchi, S. Uchida and K. Yamaguchi, "De-rendering stylized texts," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1076–1085. Paper | Code
[32] Y. Qu, Q. Tan, H. Xie, J. Xu, Y. Wang and Y. Zhang, "Exploring stroke-level modifications for scene text editing," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2119–2127. Paper | Code
[33] H. Chen, Z. Xu, Z. Gu, Y. Li, C. Meng, H. Zhu and W. Wang, "DiffUTE: Universal text editing diffusion model," Advances in Neural Information Processing Systems, vol. 36, pp. 63062–63074, 2023. Paper | Code
[34] J. Ji, G. Zhang, Z. Wang, B. Hou, Z. Zhang, B. Price and S. Chang, "Improving diffusion models for scene text editing with dual encoders," arXiv preprint arXiv:2304.05568, 2023. Paper | Code
[35] L. Yu, J. Yu, "Magicremover: Tuning-free text-guided image inpainting," arXiv preprint arXiv:2310.14428, 2023. Paper
[36] Y. Tuo, W. Xiang, J.-Y. He, Y. Geng and X. Xie, "AnyText: Multilingual visual text generation and editing," arXiv preprint arXiv:2311.03054, 2023. Paper | Code
[37] B. Liu, Z. Yang, P. Wang, J. Zhou, Z. Liu, Z. Song, Y. Liu and Y. Xiong, "TextDiff: Mask-guided residual diffusion models for scene text image super-resolution," arXiv preprint arXiv:2308.06743, 2023. Paper | Code
[38] Y. Yang, D. Gui, Y. Yuan, W. Liang, H. Ding, H. Hu and K. Chen, "GlyphControl: Glyph conditional control for visual text generation," Advances in Neural Information Processing Systems, vol. 36, pp. 44050–44066, 2023. Paper | Code
[39] Y. Zhao and Z. Lian, "UDiffText: A unified framework for high-quality text synthesis in arbitrary images via character-aware diffusion models," in Proceedings of the European Conference on Computer Vision, 2024, pp. 217–233. Paper | Code
[40] W. Zeng, Y. Shu, Z. Li, D. Yang and Y. Zhou, "TextCtrl: Diffusion-based scene text editing with prior guidance control," Advances in Neural Information Processing Systems, vol. 37, pp. 138569–138594, 2024. Paper | Code
[41] Y. Tuo, Y. Geng and L. Bo, "AnyText2: Visual text generation and editing with customizable attributes," arXiv preprint arXiv:2411.15245, 2024. Paper | Code
[42] N. Du, Z. Chen, S. Gao, Z. Chen, X. Chen, Z. Jiang, J. Yang and Y. Tai, "TextCrafter: Accurately rendering multiple texts in complex visual scenes," arXiv preprint arXiv:2503.23461, 2025. Paper | Code
[43] Z. Yan, J. Wang, A. Wang, Y. Li, W. Shang and R. Lin, "TextMaster: A unified framework for realistic text editing via glyph-style dual-control," arXiv preprint arXiv:2410.09879, 2024. Paper
[44] T. Wang, T. Liu, X. Qu, C. Wu, L. Liu and X. Hu, "GlyphMastero: A glyph encoder for high-fidelity scene text editing," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 28523–28532. Paper
[45] C. Wu, J. Li, J. Zhou, J. Lin, K. Gao, K. Yan, S. Yin et al., "Qwen-Image technical report," arXiv preprint arXiv:2508.02324, 2025. Paper | Code
Contributions are welcome! Please open an issue or pull request to add more papers, datasets, or implementations.
If you find this list helpful, please star ⭐ the repository to support the project.