A super powerful OCR tool! It's called Surya, an open-source OCR tool with explosive performance, updated with table recognition capabilities. It can not only recognize rows, columns, and cells in tables but also recognize rotated tables and complex layouts, supporting over 90 languages—it's simply unbeatable. Surya outperforms the current SoTA open-source model Table Transformer, especially in table recognition, thanks to its advanced architecture. Currently, it has over 10,000 stars on GitHub, is free and open-source, and can be applied in commercial scenarios.
Core Features
-
Table Recognition: The new version of Surya has significantly enhanced table recognition, clearly identifying rows, columns, and cells in tables, while also recognizing specific character content. This feature is undoubtedly a boon for those who need to handle large amounts of table data.
-
Complex Layout Recognition: It is not limited to tables; it can also recognize complex layouts in documents, such as titles, images, and even rotated tables. This means that no matter how complex your document is, Surya can accurately extract the information you need.
-
Support for Over 90 Languages: It supports OCR recognition for over 90 languages worldwide, including Chinese, Japanese, Korean, Arabic, and more. This multilingual support allows it to easily handle documents in various languages, whether for international business document processing or content conversion for localization projects—Surya excels in both.
-
Efficient Text Recognition and Reading Order Determination: In addition to tables, Surya is also proficient in line-level text detection and can correctly identify the reading order of text, avoiding confusion in document information and ensuring that the text content is output in the correct sequence.
-
Local Operation and API Support: Another highlight of Surya is its ability to run locally, making it convenient for developers to process sensitive information offline or handle documents on a large scale. Additionally, Surya provides an API interface, allowing developers to easily integrate it into their applications for batch automation processing.
GitHub Address: https://github.com/VikParuchuri/surya