banner
andrewji8

Being towards death

Heed not to the tree-rustling and leaf-lashing rain, Why not stroll along, whistle and sing under its rein. Lighter and better suited than horses are straw sandals and a bamboo staff, Who's afraid? A palm-leaf plaited cape provides enough to misty weather in life sustain. A thorny spring breeze sobers up the spirit, I feel a slight chill, The setting sun over the mountain offers greetings still. Looking back over the bleak passage survived, The return in time Shall not be affected by windswept rain or shine.
telegram
twitter
github

MinerU - The magic tool that converts PDF into machine-readable format

MinerU#

image

MinerU is a powerful open-source tool specifically designed to convert PDF documents into machine-readable formats, such as Markdown and JSON. Its main features include:

Main Features#

  • Remove Redundant Elements: Automatically removes unnecessary elements such as headers, footers, footnotes, and page numbers, ensuring that the extracted content is semantically coherent while retaining important body charts.

  • Multi-Element Extraction: Supports the extraction of images, image descriptions, tables, and their titles and footnotes from documents, ensuring the completeness and accuracy of information.

  • Formula Recognition: Capable of automatically recognizing and converting mathematical formulas in documents, while also handling extremely long formulas, outputting them in LaTeX format.

  • Table Recognition: Able to recognize and convert tables into HTML format for easy presentation on web pages.

  • Preserve Document Structure: Maintains the original document structure, including headings, paragraphs, and lists, when extracting text, ensuring that the output results follow a natural order for human reading.

  • OCR Support: Supports automatic detection and recognition of scanned PDFs and garbled PDFs, utilizing OCR technology to handle documents in up to 84 languages.

  • Multi-Format Output: Supports various output format options, including Markdown, JSON, etc., making it convenient for users to use according to their needs.

  • Multi-Platform Support: Compatible with Windows, Linux, and Mac platforms, and can utilize CPU, GPU, and NPU for acceleration, improving conversion efficiency.

Summary#

In summary, MinerU is a comprehensive tool suitable for users who frequently handle PDF documents, effectively extracting information while maintaining document structure, thus enhancing work efficiency.

Reference Links

[1] MinerU: https://github.com/opendatalab/MinerU

[2] OpenDataLab Demo: https://mineru.net/OpenSourceTools/Extractor?source=github

[3] ModelScope Demo: https://www.modelscope.cn/studios/OpenDataLab/MinerU

[4] HuggingFace Demo: https://huggingface.co/spaces/opendatalab/MinerU

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.