FAQ Database Discussion Community


How to use user-words in Tesseract (Java)?

java,ocr,tesseract,tess4j,config-spec
I am using Tesseract for OCR purposes and I have added few additional words into "fin.user-words" (I would like to avoid creating a new word list and replacing tessdata/fin.word-dawg with it). Now, I succeeded doing it in command prompt: >tesseract image.png result -l fin TestConfig where TestConfig (Tesseract configuration file...

Preprocessing image for Tesseract OCR with OpenCV

opencv,image-processing,ocr,tesseract
I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the result is pretty bad. Here is the the image...

Tesseract integration to my Project on xCode 6.3 iOS 8.3

ios,ocr,tesseract,objective
I'm really digging the web and stackoverflow questions but none of them does not solve my problem. I'm trying use Tesseract OCR in my iOS Project, but integration did not go what I expected. I follow the instructions in this blog and I did the all things but still I...

Training data for digit recognition on Android with OpenCV

java,android,opencv,ocr
I'm trying to do digit recognition on Android with OpenCV. I'm using the k-Nearest Neighbor (kNN) functions of OpenCV on samples images I found on this blog (basically a single .png per digit + an image with multiple digits for testing purposes). I'm running in some issues when I'm trying...

Which method should theoretically be best?

neural-network,ocr
Say I want to recognize my characters using neural network(s). Let's cut it down to 5 letters, the binary form of image to 16x16, input + 2 layers network, unipolar function inside both layers. Momentum backpropagation is used in the process of learning. Which of the following approaches should give...

Live screen capture of text to OCR implementation ideas? (Image to machine readable, Live)

c#,text,ocr
I am trying to create a software in (preferably) .Net where I can set a fixed area on my screen where text appears and then convert that data into machine readable code using some sort of OCR. I would like to do this on a live basis, meaning when the...

Is there a module for Python that recieves (reads) text from a picture?

python,image,module,ocr
In relation to my question: If not for Python, what other programming languages have/acquire this feature?

Python real time screen processing [closed]

python,real-time,ocr
I would like to process my screen in a real time. Is there some real time OCR library for python, that I could use? Or is there some better technology that I could use?...

Alfresco File Storage(alf_data)

java,ocr,alfresco,command-line-tool
I am in an situation where i need to run one command-line tool for file which is uploaded in alfresco repository.The reason behind this is i need to perform OCR on that particular file. I know i can use transformation which alfresco by default provides.But transformation does not provides conversation...

Train SVM for car plate recognition in OpenCV C++

c++,opencv,ocr,svm,knn
I'm trying to create a car plate recognition system, using OpenCV (C++). I've already seen this example on GitHub, but I want to use SVM, instead of K-nearest neighbours or Arificial Neural Networks. I trained a SVM only for two classes (positive or negative), so how can I train to...

Normalize car plate for OCR in OpenCV C++

c++,opencv,ocr,hough-transform
I'm doing some simple OCR car plate recognition system. I'm using HaarCascades to find car plate, and next i need to normalize this plate, to put it into my OCR module. I'm using floodfill to find main contours of a car plate, and then i perform Hough transform, to find...

php tesseract with http post no response

php,html,ocr,tesseract
I am developing an php page on the webserver. It works in the following three steps: get an image uploaded from an HTML form with POST method; execute tesseract to change the image into text; execute tesseract to change the image into text; print the text on the screen; Now...

Getting digits in an order (left to right) after OCR from x,y coordinates

python,ocr
So I wrote an OCR script which grabs image and performs OCR and return x,y coordinates with the digit , when i plot it i get the underlying image. The x,y coordinates with digits are not predicted sequentially but as the contours are detected (almost randomly). Is there a way...

Android Tesseract OCR on Android Studio [closed]

android,eclipse,android-studio,ocr,tesseract
For a while I have been trying to include teseract in my android app on Android Studio (using this tutorial). Since it did not work after many trys (missing allheaders.h) I contacted the creators (blog Gautam Gupta and OCR Robert Theis)they told me to try it on eclipse. Since I...

Next step in image preprocessing for OCR with Tesseract (tess4j)

java,image-processing,ocr,tesseract,tess4j
I've been trying to use Tesseract to identify some digits in a series of images and after scouring for advice I've made a number of improvements. So far I've attempted the following steps: Binarize the image at an appropriate threshold to pick out the numbers Restrict Tesseract to digits only...

Xamarin Tesseract OCR binding for Android

xamarin,monodroid,ocr,tesseract
I would like to use tesseract ocr for Xamarin.Android and Xamarin.iOS applications. I found the binding for iOS (https://github.com/jherby2k/Xamarin-Tesseract-OCR-iOS-Unified). Is there an equivalent for Android ?...

android - using the tess-two library

android,ocr,tesseract,tess-two
I am following this tutorial and manage to build the library just fine. My State Now: I take a photo, save it to the external memory (here is the directory path) static String directoryPath = Environment.getExternalStorageDirectory().toString() + "/saved_images"; In the directory there are currently only pictures I took in jpg...

Set minimum confidence to ocr in Matlab

matlab,ocr,text-extraction,matlab-cvst,confidence-interval
My Matlab program for extracting text using ocr function gives many false positives which having lower confidence. Is there any way to set ocrtxt.WordConfidence to a minimum value and ignore all lower values? I want ocrtxt.Words or ocrtxt.Text only above confidence 0.8 for further process. ocrtxt = ocr(regionFilteredTextMask); ocrtxt.Text; ...

OCR multiple images into one PDF

c#,pdf,ocr
Does anyone have experience how to OCR several images and create one output PDF file with recognized text with Nicomsoft OCR in C#? It seems it can do it, but my C# code does not produce any PDF for some reason: NSOCRLib.NSOCRClass NsOCR = new NSOCRLib.NSOCRClass(); NsOCR.Engine_InitializeAdvanced(out CfgObj, out OcrObj,...

OCR implementing for multiple languages

android,ocr,tesseract,hindi
I have implemented OCR android application for android using Tess-two which is successfully running though it gives only 80%result, but now I want to implement the same android application for another languages such as Hindi, Chinese, french etc. I tried to edit the code of simple-android-OCR by Gautam Gupta. please...

Document Storage with Full Text Indexing - PDF

java-ee,pdf,indexing,ocr,full-text-indexing
We have built an application for indexing submitted documents in many formats, spanning across Microsoft Office to text. The issue is that, for pdf, we often resort to converting to Word, then indexing. This is a slow process and problematic especially because it doesn't handle image-based pdfs where an OCR...

Creat a searchable PDF automatically from an other PDF or image

c#,pdf,ocr,acrobat-sdk
I need to create an application or a script that lets you create a searchable PDF from another PDF or image. I tried to use Adobe Acrobat's SDK, but I don't found a solution to use Acrobat's OCR feature in an other application. Do you have another solution, or can...

Tesseract OCR in C# Code

c#,ocr,tesseract
I am trying to develop Optical Character Recognition(OCR) in bangla with Tesseract.Now I'm in initial state. I found some links about it. But in every place provide this link google code. Actually I want to know how can I use this Tesseract in my C# code and before that how...

Camera Preview and OCR

xamarin,monodroid,android-camera,ocr,tesseract
I am new to android development - I'm using Xamarin. I am trying to write an application that initiates the camera preview, and then constantly scans the incoming frames for text (I am using Xamarin.Tesseract from NuGet). In other words, I don't want to make the user take a photo...

OCR with android app

android,ocr
I want to create an app where people can take a photo of any text and the app recognizes the text, copies it and puts it into an editable area. The language of the text shouldnt matter. I just want to automatically recognize the characters. A later feature could be...

Error on SIMPLE ANDROID OCR

android,android-studio,runtime-error,ocr,tess-two
When I tried the Simple Android OCR (https://github.com/GautamGupta/Simple-Android-OCR) in ANDROID STUDIO it gave me a runtime error as shown in the picture. Can anyone help me with this? ![Runtime error occured when tried to run][2] 06-03 12:44:31.904 17051-17051/com.startup.vrvijay.liccamera E/AndroidRuntime﹕ FATAL EXCEPTION: main Process: com.startup.vrvijay.liccamera, PID: 17051 android.util.SuperNotCalledException: Activity {com.startup.vrvijay.liccamera/com.startup.vrvijay.liccamera.MainActivity} did...

Why negative image is used in preprocessing?

c++,opencv,image-processing,ocr,image-recognition
I've observed that for many preprocessing operations (I mean mainly preprocessing for OCR) usually negative image is used? For example: http://felix.abecassis.me/2011/10/opencv-rotation-deskewing/ http://felix.abecassis.me/2011/09/opencv-detect-skew-angle/ I've found it also when objects are found using kNN algorithm. Why inverted images are used? Is that only to show it is just preprocessing step? Are there...

Cleaning up an image for OCR with ImageMagick and 'textcleaner'

imagemagick,ocr,tesseract,imagemagick-convert
I have the following image that I'd like to prepare for an OCR with tesseract: The objective is to clean up the image and remove all of the noise. I'm using the textcleaner script that uses ImageMagick with the following parameters: ./textcleaner -g -e normalize -f 30 -o 12 -s...

Searching for a line in a jpg file Python

python,image,line,ocr
I'm interested in using Python to detect a vertical line dividing a scanned page in two. I have a series of these scanned pages, and I need to split them in half along a black line. I know how to split them using ImageMagick, I just need to be able...

Missing 'strcasestr.cpp' file when compiling Tesseract 3.03 training tools

make,ocr,tesseract,autoconf
I have managed to build the Tesseract 3.03 rc1 from source. But when I try to build the training tools, which is the very feature I want form 3.03, I got the following error. It seems there should be a strcasestr.cpp file at the vs2010 folder. But the downloaded source...

OCR - How to train a new Tesseract model?

machine-learning,ocr,tesseract,text-mining
I am using Tesseract to recognize characters from screenshots. But it seems many models are trained for images like below. This image is very different from a screenshot. Anyone knows where I can find a trained data for screenshot? Or could anyone tell me how to train a model for...

Bing translation error while using tesseract ocr in android for real-time text detection and translation

android,translation,ocr,tesseract
I am using Robert Theis' experimental app (namely, android-ocr) to achieve real-time OCR and translation (using Bing translator.) In class CaptureActivity.java, in function handleOcrContinuousDecode (which is the function for real-time OCR), I have created a TranslateAsycnTask.java object which passes the translated-text to be displayed through the ViewFinderView.java like this: The...

Tesseract character recognition problems in Android (but not on iOS?)

android,ios,ocr,tesseract,tess-two
I've build an application that uses Tesseract (V3.03 rc1) to identify some specific text strings. These are, unfortunately, printed on a custom font that requires that I build my own traineddata file. I've built the application on both iOS (using https://github.com/gali8/Tesseract-OCR-iOS for inspiration) and Android (using https://github.com/rmtheis/tess-two/ for inspiration as...