When many people OCR hear about ocr translation, most people think it is an annotation about pictures. When my colleague was chatting with me two days ago, he said that OCR is a kind of image annotation . I asked him why, and he said it was because the image was framed. But do you actually know that ocr actually belongs to text, do you know why? Please follow my footsteps to solve the puzzles for you one by one.
1. What is OCR
What is OCR, English: Optical Character Recognition, Chinese name: Optical Character Recognition, OCR for short. It uses optical technology and computer technology to read the text printed or written on paper, and convert it into a format that computers can accept and people can understand. For example: you want to copy a paragraph of text in a magazine you saw, but you can’t copy it, and it is very time-consuming to input it by yourself. We can convert it into text through OCR technology.
2. Application scenarios
At present, the application of OCR can be divided into four scenarios
1. The data of the photo form
is very private, and it can be saved as an electronic file through our translation technology, but there are still difficulties in the current technology. For example: during the current epidemic, students are taught online, but the teacher assigns a lot of homework, and many students still need to write homework and take photos for the teacher to correct. However, our teacher, one by one, is very time-consuming and troublesome in computer operation. So when we can directly translate into text through OCR, we can solve very large problems, and we can process them in batches and give the results directly.
2. Digital native
data This type of data is the most complex and diverse: various fonts, backgrounds, arrangements, combinations, etc. The most representative one is Taobao pictures, which are the carriers of commodity information. The amount of pictures is large, and the daily updates are also the most.
This type of data involves many public scenarios, such as storage of tickets, invoices, takeaway orders, and various types of bills.
4. Natural scenes
are currently the most widely used, most mature and commercially valuable. Such as: document recognition, bank card recognition, license plate recognition, camera monitoring, courier number recognition, etc.
3. What can we do as a data labeler? OCR
1. What are the current difficulties of OCR
1) Irregular content, image clarity, etc., background interference, etc.
2) Recognition of non-simplified characters, similar characters, rare characters, complex formula symbols, etc.
3) Positioning problems are obvious, sticky, line spacing is unclear, difficult to mark, and character height range
4) The problem of handwriting is the main difficulty at present, because everyone’s personal There are different styles of customary fonts. Although we can understand them ourselves, machines rarely understand them.
5) According to the recognition content, it is currently divided into three categories: Chinese characters, English, and Arabic numerals. Number recognition is the simplest; English recognition only has 26 letters (including 52 capital letters); but Chinese is different. Commonly used Chinese characters 3700, similar characters 2278, and traditional and simplified characters, etc., also need to recognize the entire font, which is currently the largest problem.
2. OCR recognition process
layout analysis -> preprocessing -> row and column cutting -> character recognition -> post-processing recognition correction
3. What can we do?
Through the above content, we can understand that although many are already using this technology, there are still many technical difficulties. To overcome these difficulties, machine learning drills are required, and in a process. A large amount of data support is required. So these large amounts of data are what we need to do, collecting, cleaning, and labeling are all things we can do.
1) Collection: such as handwritten fonts, billboards, student homework, and photo collection of various prints
2) Cleaning: remove invalid data, remove noise data, fast classification, etc.
3) Mark: draw boxes, labels, and transcribe.
4.ocr labeling time-consuming
How many frames does OCR draw in an hour?
4. Labeling rules
Regarding the labeling rules of ocr, each ocr labeling item has its own rules, generally as follows:
1) Attribute labeling: for the picture, it is divided into valid data, invalid data, etc. for the whole picture, or labeling within the range
2) frame selection requirements : According to the task type, it is generally a frame, a polygon, etc.
3) Accuracy requirements: Because the pixels of the text are basically between 80-400, the accuracy of drawing the frame should fit the text as much as possible but do not press the font. There may be slight differences according to different project requirements.
4) Content transcription: Depending on the project, it may be in Chinese, English, Arabic numerals, etc., and generally needs to be transcribed according to the actual content.
5) Sequential labeling: Many OCR recognitions are performed sequentially, because textual content generally has continuity, such as context, so when we do this, it is best to label according to the order of content.
6) Submission format: Now the technology is basically online marking, we only need to mark and save and submit, but some projects will still use offline marking due to data security issues. The submission format needs to be submitted according to the project requirements, such as: json, txt, etc.
7) Reminder: When marking, be careful to save the marked content at any time, whether online or offline, because sometimes it means doing nothing, and there are important One point, when marking, you must have a good understanding of the performance of the marking tool and some shortcut tools. After all, sharpening a knife is not a mistake in chopping wood.
Well, today’s sharing is over. I hope you can have a preliminary understanding of OCR through this article. If you want to know about data labeling, please let me know.
This article is reproduced from the CSDN artificial intelligence technology community for the purpose of delivering more information. If there is an error in the source labeling or violation of your legal rights, please contact us, and we will correct and delete it in time, thank you.