Unveiling! The 256M parameter multimodal OCR tool helps you instantly obtain document information.

Mar 20, 2025#工具215

AI Translation

This post is translated from Chinese into English through AI.View Original

AI-generated summary

SmolDocling is a lightweight, versatile OCR model with only 256M parameters, allowing it to run on standard CPUs and low-end GPUs without requiring high-end resources. It processes documents quickly, taking just 0.35 seconds per page, making it suitable for batch processing. Key features include: - Comprehensive document OCR capabilities, recognizing titles, text, lists, tables, charts, code, and formulas. - Diverse element recognition, including layout, code, formulas, charts, and graphics. - Flexible output formats, supporting exports to Markdown, HTML, and JSON. - Batch processing support for handling multiple documents simultaneously. SmolDocling is ideal for tasks like academic paper analysis, contract review, data extraction, and knowledge base construction. It offers a demo on HuggingFace for users to experience its powerful features firsthand. If you're looking for a fast and efficient OCR tool, SmolDocling is highly recommended.

SmolDocling: Lightweight All-in-One Document OCR Model#

Current mainstream OCR systems typically require large models with 1B+ parameters for computation. Recently, I discovered a lightweight all-in-one document OCR model tool with only 256M parameters.

Features of SmolDocling OCR Model#

Lightweight and Fast
- 256M small parameters, can run on CPU/low-end GPU without high-end computing resources.
- Fast OCR speed, taking only 0.35 seconds per page, suitable for batch processing.
Core Capabilities
1. Full Document OCR Parsing
  - Intelligent recognition of titles, body text, lists, tables, charts, code, formulas, and more.
  - Suitable for various document types including academic papers, business documents, patents, reports, handwritten documents, etc.
2. Diverse Element Recognition
  - Layout recognition, code recognition, formula recognition, chart and table recognition, graphic classification, etc.
3. Flexible Output Formats
  - Supports export to various formats including Markdown, HTML, JSON, etc.
4. Batch Processing Support
  - Can process multiple documents at once, suitable for large-scale data conversion.

Quick Start#

To use the latest SmolDocling, there are two methods:

Online Demo: The official demo of SmolDocling-256M-preview is deployed on HuggingFace, allowing you to directly experience its powerful features.
- Demo Link

SmolDocling is a lightweight, ultra-fast, and fully document-parsing multimodal OCR model that is more accurate and efficient than traditional OCR, suitable for tasks such as paper parsing, contract analysis, data extraction, and knowledge base construction. It not only supports complete document OCR, including tables, code, formulas, and charts, but also processes quickly, taking only 0.35 seconds per page, and can export in various formats, making it suitable for many different user needs.

If you are looking for a fast and efficient OCR tool, SmolDocling is definitely worth a try!

Model Link: SmolDocling-256M-preview