Python OCR with OpenCv – Optical Character Recognation
we continue with Python and OpenCv operations .I will give you information about the installation and use of tesseract in Linux distro, which is the library we will use for OCR operations in our series of articles today.
In the next article, we will import tesseract into our Python application and perform OCR operations with our applications.
What is OCR
OCR is a process of accessing, recognizing, converting to text, in short characters in a picture or handwriting.
Tesseract Library
Tesseract is one of the most widely used OCR libraries developed by Hewlett Packard in the 1980s, which became an open source in 2005 .
Tesseract supports English by default. There are currently more than 100 language support .
Tesseract Install(Linux Mint)
1 | sudo apt-get install tesseract-ocr |
Check the version after the install
1 | tesseract -v |
output;
1 2 3 | tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0 |
Command Line Usage :
Lists a list of commands with the tesseract –help command
for example :
1 | tesseract osp.png -l en |
– l parameter specifies the language pack .
With the stdout parameter, you can write the text read from the image file to the console .
tesseract aa.if we had used the PNG out-l tour command, we would have worked in the directory out.create a TXT file and write the characters read into this text file.