Skip to content

Nexdata-AI/500000-Images-Natural-Scenes-and-Documents-OCR-Data

Repository files navigation

500000-Images-Natural-Scenes-and-Documents-OCR-Data

Description

The dataset consists of 500,000 images for multi-country natural scenes and document OCR, including 20 languages such as Traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, Polish, etc. The diversity includes various natural scenarios and multiple shooting angles. This set of data can be used for multi-language OCR tasks.

For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1759?source=Github

Specifications

Data size

500,000 images. For each language, there are 25,000 images in total, including 12,500 natural scene images and 12,500 document images

Language distribution

traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, French, German, Italian, Portuguese, Russian, Spanish, Arabic, Turkish, Polish, Dutch, Greek, Czech, Filipino (Tagalog)

Collecting environment

Natural scene: including slogan, receipt, poster, warning sign, road sign, food packaging, billboard, station sign and signboard, etc. Document: electronic documents, meeting minutes, reports, manuals, user manuals, books, newspapers, teaching materials, etc.

Data diversity

including a variety of natural scenes, multiple shooting angles

Device

cellphone, scanner

Photographic angle

looking up angle, looking down angle, eye-level angle

Accuracy rate

according to the collection requirements, the collection accuracy is not less than 97%

Licensing Information

Commercial License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published