Making a Simple OCR Android App using Tesseract

Making a Simple OCR Android App using Tesseract

This post tells you how you can easily make an Android application to extract the text from the image being captured by the camera of your Android phone! We’ll be using a fork of Tesseract Android Tools by Robert Theis called Tess Two. They are based on the Tesseract OCR Engine (mainly maintained by Google) and Leptonica image processing libraries.

Recognizing text using your Android phone. Not exactly the end result of this blog post, but what you could achieve.

Note: These instructions are for Android SDK r16 and Android NDK r7, at least for the time being (written at this tree). You would also need proper PATH variables added.

  1. Download or check out the source from this git repository. This project contains tools for compiling the Tesseract, Leptonica, and JPEG libraries for use on Android. It contains an Eclipse Android library project that provides a Java API for accessing natively-compiled Tesseract and Leptonica APIs. You don’t need eyes-two code, you can do without it.
  2. Build this project using these commands (easy to do it on Mac and Linux – does not work on Windows - try an Ubuntu VM) (here, tess-two is the directory inside tess-two – the one at the same level as of tess-two-test):
    cd <project-directory>/tess-two
    ndk-build
    android update project --path .
    ant release
  3. Now import the project as a library in Eclipse. File -> Import -> Existing Projects into workspace -> tess-two directory. Right click the project, Android Tools -> Fix Project Properties. Right click -> Properties -> Android -> Check Is Library.
  4. Configure your project to use the tess-two project as a library project: Right click your project name -> Properties -> Android -> Library -> Add, and choose tess-two. You’re now ready to OCR any image using the library.
  5. First, we need to get the picture itself. For that, I found a simple code to capture the image here. After we have the bitmap, we just need to perform the OCR which is relatively easy. Be sure to correct the rotation and image type by doing something like:
    // _path = path to the image to be OCRed
    ExifInterface exif = new ExifInterface(_path);
    int exifOrientation = exif.getAttributeInt(
            ExifInterface.TAG_ORIENTATION,
            ExifInterface.ORIENTATION_NORMAL);
    
    int rotate = 0;
    
    switch (exifOrientation) {
    case ExifInterface.ORIENTATION_ROTATE_90:
        rotate = 90;
        break;
    case ExifInterface.ORIENTATION_ROTATE_180:
        rotate = 180;
        break;
    case ExifInterface.ORIENTATION_ROTATE_270:
        rotate = 270;
        break;
    }
    
    if (rotate != 0) {
        int w = bitmap.getWidth();
        int h = bitmap.getHeight();
    
        // Setting pre rotate
        Matrix mtx = new Matrix();
        mtx.preRotate(rotate);
    
        // Rotating Bitmap & convert to ARGB_8888, required by tess
        bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
        bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);
    }
  6. Now we have the image in the bitmap, and we can simple use the TessBaseAPI to run the OCR like:
    TessBaseAPI baseApi = new TessBaseAPI();
    // DATA_PATH = Path to the storage
    // lang for which the language data exists, usually "eng"
    baseApi.init(DATA_PATH, lang); baseApi.setImage(bitmap);
    String recognizedText = baseApi.getUTF8Text();
    baseApi.end();
  7. Now that you’ve got the OCRed text in the variable recognizedText, you can do pretty much anything with it – translate, search, anything! ps. You can add various language support by having a preference and then downloading the required language data file from here. You might even put them in the assets folder and copy them to the SD card on start.

To make things easy, and for you to have a better understanding, I have uploaded a simple application on OCR that makes use of Tess Two on Github called Simple Android OCR (for beginners). If you want a full-fledged application, that has a selectable region while capturing the image, translating the text, preferences etc., then you can checkout Robert Theis’s Android OCR application on Github too (for intermediate+)!

Updated: 3 February 2012

References

  1. Using Tesseract Tools for Android to Create a Basic OCR App by Robert Theis
  2. Simple Android Photo Capture by MakeMachine
  3. tess-two README

Troubleshooting

About updating PATH

You need to update your PATH variable for the commands to function, otherwise you would see a command not found error. For Android SDK, go here and expand How to update your PATH section. For Android NDK, use the same process to add the android-ndk directory to the PATH variable. Check Setting proper path variables by Harsha Dura for more information for linux users.

Translations

  1. Japanese by datsuns

86 Comments

  1. Hi, I’m a student and I have a school project. I’m going to make android OCR for Korean Character Recognition. I would like to use tesseract android open source library. I have googled a lot how to build the libjpeg.so, liblept.so and libtess.so from the following Tesseract android readme file. I tried on cygwin and ubuntu, but both of them ended up with some errors. While using Cygwin it said that no rule to make target, while using Ubuntu I succeeded in building libjpeg.so but while building liblept.so errors occurred which said that android/bitmap.h does not exist. Then I check my leptonica-1.68 folder, true there is no such folder and file. I downloaded bitmap.h from internet and put it under android folder but still the same problem. Can u help me out of this ? The deadline will be 1 month and 30 days ahead for me to finish my project and the will be judgement day.

    Thank You for your attention

    Regards,

    Priska

    • I received the same error and solved it by editing tess-two/jni/Android.mk (as highlighted in step 2) so that the beginning looks something like:

      # NOTE: You must set these variables to their respective source paths before
      # compiling. For example, set LEPTONICA_PATH to the directory containing
      # the Leptonica configure file and source folders. Directories must be
      # root-relative, e.g. TESSERACT_PATH := /home/username/tesseract-3.00
      #
      # To set the variables, you can run the following shell commands:
      # export TESSERACT_PATH=path-to-tesseract
      # export LEPTONICA_PATH=path-to-leptonica
      # export LIBJPEG_PATH=path-to-libjpeg
      #
      # Or you can fill out and uncomment the following definitions:
      # TESSERACT_PATH := path-to-tesseract
      # LEPTONICA_PATH := path-to-leptonica
      # LIBJPEG_PATH := path-to-libjpeg
      
      TESSERACT_PATH := $(call my-dir)/../external/tesseract-3.01
      LEPTONICA_PATH := $(call my-dir)/../external/leptonica-1.68
      LIBJPEG_PATH := $(call my-dir)/../external/libjpeg
      

      You can also try running these in the Terminal before ndk-build in step 3:

      export TESSERACT_PATH=${PWD}/external/tesseract-3.01
      export LEPTONICA_PATH=${PWD}/external/leptonica-1.68
      export LIBJPEG_PATH=${PWD}/external/libjpeg
      

      Hope that it helps. :)

  2. Hello, I made the whole process and the project gives me tess-two errors:

    Requires Android compiler compliance level 5.0 or 6.0. Found ’1 .4 ‘instead.

    I compiled with JRE 1.5, this is okay?

    Thanks in advance, greetings.

  3. I need your help for the android ocr app I am making as my project. I did all the steps successfully. I then imported your project along with tess-two in eclipse. It shows no errors, but it fails at run time. I really made a lot of error to sort it out. It’ll be great if you can be of any help.Contact me on my email id. I’ll be obliged.

  4. Hey Gautam,

    I tried following the steps listed by you in windows environment (using cygwin). However, I face this error after I do ndk-build:
    make: *** No rule to make target `//cygdrive/f/work/ocr_cc/newf/rmtheis-tess-two-1edb5e2/rmtheis-tess-two-1edb5e2/tess-two/external/leptonica-1.68/src/adaptmap.c’, needed by `/cygdrive/f/work/ocr_cc/newf/rmtheis-tess-two-1edb5e2/rmtheis-tess-two-1edb5e2/tess-two/obj/local/armeabi/objs/lept//cygdrive/f/work/ocr_cc/newf/rmtheis-tess-two-1edb5e2/rmtheis-tess-two-1edb5e2/tess-two/external/leptonica-1.68/src/adaptmap.o’. Stop.

    I tried solutions mentioned at http://code.google.com/p/tesseract-android-tools/issues/detail?id=4#c16 but with no luck.
    I strongly suspect that there is a path problem because if you look at the path in the error, it looks like it is a absolute path problem. Can you suggest any work around?

    Thanks,
    Vishwanath

    • That’s the same problem that was being faced by one of my friends, I tried to fix it via Teamviewer but couldn’t. I’d prefer if you develop on non-win system or contact Robert Theis, author of tess-two for help. :)

    • thanks gautam. switched to linux! :)

  5. Hi,
    Get error during build :
    utilities.cpp:19:28: error: android/bitmap.h: No such file or directory

    any idea ?
    tx
    ps
    the tesseract-android-tools are working for me

    • Have you edited the Android.mk file as explained in comment 6611?

    • Yes,I did all steps again , clone ,uncoment lines (12-18)in Android.mk and build . Get same error :

      Compile++ thumb : lept <= utilities.cpp
      tess-two/tess-two/tess-two/jni/com_googlecode_leptonica_android/utilities.cpp:19:28: error: android/bitmap.h: No such file or directory
      make: *** [tess-two/tess-two/tess-two/obj/local/armeabi/objs/lept//tess-two/tess-two/tess-two/jni/com_googlecode_leptonica_android/utilities.o] Fehler 1

    • Which OS do you use? I just built the latest revision on my Mac without any problems.

    • hi andrzej,

      I had the same problem as yours.

      Did u extract the android-ndk from terminal?
      I also extracted the file from terminal, but some file was missing. So I downloaded and installed the new android-ndk and extracted it by clicking the right mouse instead of using terminal.

      I suggest you to reinstall your android-ndk to the latest version. android/bitmap.h is included in android-ndk.

    • hi,
      @priska , thanx for suggestion ,but the problem was building with NDK r6 .
      Now NDK r6b is building without any error, so this problem is solved.

      The other problem was by android update :
      android update project –path .
      Error: The project either has no target set or the target is invalid.
      Please provide a –target to the ‘android update’ command.

      For any one having this error : update your Android SDK Tools to r14/r15

  6. ubuntu 10.04
    I can build tesseract-android-tools without any problems.
    Dont understand what wrong :(
    I am waiting for some reply also here:
    http://code.google.com/p/tesseract-android-tools/issues/detail?id=9
    thanx

  7. make: *** No rule to make target `//home/park/android-ndk-r7/App/tess-two/external/leptonica-1.68/src/adaptmap.c’, needed by `obj/local/armeabi/objs/lept//home/park/android-ndk-r7/App/tess-two/external/leptonica-1.68/src/adaptmap.o’. Stop.

    error… help me

  8. Please help i got this error.

    c:\rmtheis-tess-two-1edb5e2\tess-two>c:\MyWork\android-ndk\ndk-build
    Install        : libjpeg.so => libs/armeabi/libjpeg.so
    "Compile thumb : lept <= open_memstream.c
    "Compile thumb : lept <= fopencookie.c
    "Compile thumb : lept <= fmemopen.c
    "Compile++ thumb : lept <= box.cpp
    In file included from jni/com_googlecode_leptonica_android/box.cpp:17:
    jni/com_googlecode_leptonica_android/common.h:22:24: error: allheaders.h: No suc
    h file or directory
    jni/com_googlecode_leptonica_android/box.cpp: In function 'jint Java_com_googlec
    ode_leptonica_android_Box_nativeCreate(JNIEnv*, _jclass*, jint, jint, jint, jint
    )':
    jni/com_googlecode_leptonica_android/box.cpp:27: error: 'BOX' was not declared i
    n this scope
    ...
    make: *** [obj/local/armeabi/objs/lept/box.o] Error 1

    The box.cpp it not correct? any build source work please send to my mail. chao.raksa@gmail.com

    • The build doesn’t work on Windows as far as I know.

    • hmmm i can build on windows it generate those .so file but when test on my phone got error 2 more:
      1. NativeReadBitMap(ReadFile.java:203)
      2. SetImage(TessBaseAPI.Java:311)

      How to solve it problem :D

    • Nope, the fact that the files were created doesn’t necessarily mean the build worked. Give it a try on Linux.

  9. I made it following this post!!! The Simple Android OCR is a really cool app! I am playing it on my Nexus One now. During the whole process, I did not encounter any problem. Built it under Ubuntu 10.04 64bit.

  10. Hi , any one has some good source/Dok/links how to work with implemented functions of tesseract?
    How to use :
    api.setPageSegMode(mode)
    api.setRectangle(rect)
    api.getTextlines()
    also
    Binarize.otsuAdaptiveThreshold(pix);
    Binarize is important by ocr ,but I am getting no result … or I don’t know how to use it…

    • That method returns a Pix object, so you would need to do:

      Pix myThresholdedImage = Binarize.otsuAdaptiveThreshold(pix);

  11. i am getting error of TessbaseApi .and its giving me error that googlecode can not be resolved to a type?

  12. thanks for this superb tutorial Gautam. It worked for me after couple of messy configurations on my Fedora boot. Wish you all the best for your future endeavors.

  13. Is it just me or is the text recognition really poor using the tesseract API? I tried this and most of what I got was garbage (picture taken with 8MP HTC Evo camera, moderately good light conditions, printed text black on white). Even with rmtheiss’ Android OCR app, I got very poor results, plus its pretty slow. Uploading the same image to Google Docs and using its OCR had way better results. I’ve been thinking about trying to use the Google Docs API to do OCR for me but not sure that’s a good idea, considering it uploads the file as a new google doc and that’s not what I want, I just need to be able to OCR it and then parse the text for key info, not save it. I could set up my own server for cloud OCR-ing but I see no point in that unless I can get better OCR results. Any suggestions on how to make the recognition better?

  14. hi Gautam.
    Thank you for amazing post !!

    I had tried to build Tesseract-Android, and never succeeded until I found this site!

    and I translated this information into Japanese on my blog.
    http://d.hatena.ne.jp/datsuns/20120105
    (sorry Japanese only…)

    I’m sure this will be good information for other Japanese engineers!!

    so many Thanks !!

  15. Hii, i tried this sample and i am getting dalvikvm: Exception Ljava/lang/UnsatisfiedLinkError; i am using windows xp, and i complied to get lib file using cygwin. The application runs up to image capturing after that i am getting force close error. can you pls help me where i am getting struck off. i am new to ndk and ocr integration.

  16. I’ve been looking into getting a live camera preview working in the Android emulator. Currently the Android emulator just gives a black and white chess board animation. ..plz help me ?

    • That’s the intended behavior–the chess board animation simulates a camera view.

      There’s some code online that you can try searching for that connects the video feed from your webcam to the camera input of the Android emulator. It will probably be somewhat challenging to set up, but I’ve heard that it works.

  17. dear Gautam

    i have a problem at step number 3 and error message said “ndk-build:command not found”

    please help me fix this thing

    thank you

    • You need to add the android ndk directory to your PATH variable as noted in the post. :)

    • how do i add NDK directory to my PATH variable? i didn’t find note ini the post

      sorry im really newbie with this thing

    • hi Gautam

      i have succeessfully passed the “ndk-build” step and advanced to the “android update project –path .” step but it didn’t work and error message said “android: command not found”

      what should i do to solve this problem

      thank you

    • me too.
      i passed the “ndk-build” step.
      but when i try “android update project –path .” step, it didn’t work and error message said “android: command not found”

    • You need to add /…path…/android-sdk/tools/ to your PATH variables, which contains the android command.

    • I’ve now added About updating PATH section in Troubleshooting. Please check it. :)

  18. hi Gautam

    i have problem again. when i tried to add library as the instruction number 5, there is nothing tess-two library. i think i missed something.

    please help me fix this thing

    thank you Gautam

  19. Sharadchandra Pawar |

    Thank you!!
    it works great !!
    Keep it up…!!

  20. I’m working on MAC and I have Android SDK latest installed and got the NDK r6 butwhen I use the
    ndk-build I get the follwoing error whe it is trying to build the lept lib:

    Invalid attribute name:
    package
    Install : libjpeg.so => libs/armeabi/libjpeg.so
    SharedLibrary : liblept.so
    /Users/viph4367/android-ndk-r6/toolchains/arm-linux-androideabi-4.4.3/prebuilt/darwin-x86/bin/../lib/gcc/arm-linux-androideabi/4.4.3/../../../../arm-linux-androideabi/bin/ld: cannot find -ljnigraphics
    collect2: ld returned 1 exit status
    make: *** [/Users/viph4367/Tesseract/obj/local/armeabi/liblept.so] Error 1

    Any idea, what I may be missing or doing wrong here?

    Vijay

    • Update:

      - downloaded Robert Tess-two
      - and used NDK r7 as he mentioned in his README
      - I still get an error while running ndk-build, but a differnet on:

      make: *** No rule to make target …/rmtheis-tess-two-0cddf3a/tess-two/jni/../external/leptonica-1.68/src/adaptmap.c’

      Any ideas??

    • Latest Update:

      Got it working. My build term had some issues. I got a new term which solved the issues I was running into.

      Vijay

  21. i tried building using ndk-build but it gives the follwing error.
    Install : libjpeg.so => libs/armeabi/libjpeg.so
    make: *** No rule to make target `/home/sumit/Downloads/rmtheis-tess-two-0cddf3a/tess-two/jni/com_googlecode_leptonica_android/../..//home/sumit/Downloads/rmtheis-tess-two-0cddf3a/tess-two/external/leptonica-1.68/src/adaptmap.c', needed by `/home/sumit/Downloads/rmtheis-tess-two-0cddf3a/tess-two/obj/local/armeabi/objs/lept//home/sumit/Downloads/rmtheis-tess-two-0cddf3a/tess-two/external/leptonica-1.68/src/adaptmap.o'. Stop.

    I working with Ubuntu

    Thanks in advance!!

    • Be sure to use SDK r16 and NDK r7. Get the latest tess-two.

    • Thanks for your reply Vijay. I have been using SDK r16 and NDK r7. I also got a fresh copyof tess-two. Nothing seems to be working. I also tried with NDKr5. Still no luck!!!

    • I have the same problem with you. I believe they changed the code since my last compiling in early Dec 2011. Because I can compile my old cold successfully again but fail in the code downloaded from the repository today. The path is broken.

      I think maybe tess-two changed some path variables in January 04, 2012 when they tried to integrated the eyes-two code into the project. “com_googlecode_leptonica_android/../..//home/sumit/” and “/lept//home/sumit/” in the error message really look like a path setting problem. (“//” before home)

    • how can i get a working copy of tess-two code?

    • I not sure how many people encounter this problem as us. It seems many people work well with the new release.

    • I can’t reproduce this problem–I just pulled from the repository and it built successfully on NDK r7 and Android SDK Tools 16 on Ubuntu 11.04 by following the instructions here.

      Please let us know if you find the source of the problem.

  22. I’ve updated the post to reflect the changes that it now works with SDK r16 and NDK r7. Also, it doesn’t require all those TESSERACT_PATH, LEPTONICA_PATH and LIBJPEG_PATH. :)

  23. I get this problem:

    android update project –path .
    Error: The project either has no target set or the target is invalid.
    Please provide a –target to the ‘android update’ command.

    When using Android SDK Tools r15. I’m currently upgrading to r16 and will try again.

  24. Thanks for your great post, Guatam.
    It helped me a lot and the examples work great too.

    I have a question.
    Is it possible to make ‘Tesseract’ recognize more than two languages at the same time?
    It seems to be that ‘Tesseract api’ is only initialized with one language even if I copy more language data files to sdcard.

    Any advice will be really appreciated.
    Thanks and have a nice day!!

    • I don’t think that is possible currently. You may want to run the OCR multiple times, using different languages each time and put the strings together. :)

  25. Thank u so much for ur kind answer Gautam!! At least I’m now aware of what I need to do! Best luck for you:D

  26. Hey gatuam,
    I am working on OCR project and I came across your blog.
    I have been finding some difficulty in building
    android-NDK in MAC OSX LION, I downloaded its latest version r7 as you have mentioned, but when I give the path for ndk-build, it gives error as
    “ERROR: Cannot find ‘make’ program. Please install Cygwin make package or define the GNUMAKE variable to point to it.”
    There are no spaces in my ndk path also, It would be great if you help us out.

    Thanks

  27. 1.I complie in cygwin follow the readm steps..
    2. and import in my project(myocr) as library project.
    3. now when i try to run the program my application myocr.apk goes in android emulator and install but when the tes-two project library turns come it display the error that

    “Could not find tess-two.apk!”
    :(
    Please Help

    • Asad, as mentioned earlier in the post and the comments, it does not work on Windows even with Cygwin. Please try using an Ubuntu VM. :)

  28. 1. Is (my android library project) tess-two.apk also install on my android emulator yes or not ?
    If Yes then first time when i set up and run my ocr project this tess-two.apk also install in android, but after 4 or 5 times i dont know what i did wrong in my project settings it is not installing give error Could not find tess-two.apk!”

    2. The Tess-two Project was build using API 3.2 i changed it to android API 2.2 is it ok to change the API level ?
    cuz m using emulator in android API Level 2.2.

      1. I once faced the “Could not find tess-two.apk” error, but do not remember how I resolved it. Probably try reimporting it into eclipse as a library and building/running it once (within eclipse). Also, the app would not show any results on an emulator since the camera shows no text, as noted by rmtheis in earlier comments.
      2. Yes, it is ok to do that.
  29. s@ubuntu:~/Desktop/android-ndk-r7$ ./ndk-build NDK-LOG=1
    Android NDK: Your APP_BUILD_SCRIPT points to an unknown file: home/s/Desktop/tess-two/jni/Android.mk
    /home/s/Desktop/android-ndk-r7/build/core/add-application.mk:133: *** Android NDK: Aborting… . Stop.

    we are using ubuntu 11.04 on VMware.Android.mk file is present in the required folder.Also we have tried it on android-ndk-r6b but the same error persists.Awaiting a solution.plz.

    • Does it also fail when you run ndk-build from the “tess/tess-two” project directory (as specified in the instructions)?

  30. Hi Gautam

    i have downloaded the application’s source code that you made. i created a new project then used the code and then run it. i have been successful deploying this project to the android device. i run the application on the device, using camera to complete the action. i captured the picture and then process stopped unexpectedly.

    i found a couple of error messages on eclipse that refer to some of code lines

    1. line 211 at SimpleAndroidOCRActivity.java refers to this line of code :
    TessBaseAPI baseApi = new TessBaseAPI();

    2. line 35 at SimpleAndroidOCRActivity.java refers to this line of code :
    onPhotoTaken();

    3. line 47 at TessBaseAPI.java refers to this line of code :
    System.loadLibrary(“lept”);

    • Are you sure that you’ve added the Tess library to the application project (step 4)?

    • yes i have…therefore error on TessBaseAPI.java at line 47 appeared

      i think the errors related to TessBaseAPI.java, especially at line 47, it couldn’t load “lept”

  31. i found the same problem….please check this link:

    https://github.com/rmtheis/tess-two/issues/4

    the error messages exactly same as i got

  32. i’ve an issue:
    the program failed at “Before baseApi” LOG.

    W/dalvikvm(11691): Exception Ljava/lang/UnsatisfiedLinkError; thrown while initializing Lcom/googlecode/tesseract/android/TessBaseAPI;

    this is the error.
    how can i solve it?

  33. when i type the command ndk-build it gives following error

    nstall : libjpeg.so => libs/armeabi/libjpeg.so
    make: *** No rule to make target `/home/kashif/tess/tess-two/jni/com_googlecode_leptonica_android/../..//home/kashif/tess/tess-two/jni/../external/leptonica-1.68/src/adaptmap.c’, needed by `/home/kashif/tess/tess-two/obj/local/armeabi/objs/lept//home/kashif/tess/tess-two/jni/../external/leptonica-1.68/src/adaptmap.o’. Stop.

    and after android update project –path
    and ant release

    its sucessfully build.xml file. and when compile project to add library with simple android ocr then display error in logcat as follow

    02-17 17:42:57.369: E/AndroidRuntime(330): FATAL EXCEPTION: main

    02-17 17:42:57.369: E/AndroidRuntime(330): java.lang.ExceptionInInitializerError

    02-17 17:42:57.369: E/AndroidRuntime(330): Caused by: java.lang.UnsatisfiedLinkError: Couldn’t load lept: findLibrary returned null

    please help me what i do i am totally stuck from 2 and half week.

  34. Just to let everyone know – just built tess on Windows 7 today ;) I didn’t do anything special, only steps described in this article – no errors so far. Tomorrow I will try to configure tess in eclipse.

  35. @Gautam and @rmtheis, i have a suggestion to you. Why not you record video on how to compile those things. from scratch to the end. because this project are really2x important. many of newbies wanna try to their own projects. i need you help on this. could you please do that?

    really appreciate.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>