WordPress.org

Ready to get started?Download WordPress

Plugin Directory

!This plugin hasn't been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues when used with more recent versions of WordPress.

OCR

A plugin for extracting text from attached images using OCR via Tesseract.

What is Tesseract OCR and where do I get it?

Tesseract OCR is an open source optical character recognition library that the WordPress OCR plugin uses to extract text from images. The library as well as installation instructions can be found at http://code.google.com/p/tesseract-ocr/

How do I know if / where I have Tesseract installed on my server?

Linux:

  1. SSH into your server and type which tesseract.
  2. If Tesseract is installed and in your shell environment PATH the terminal should return a path similar to /opt/local/bin/tesseract.
  3. Place this path in the configuration of the OCR plugin through the Plugins > OCR link in the sidebar menu in WordPress

What is ImageMagick and where do I get it?

ImageMagick is a an open source, server side, image manipulation library. The WordPress OCR plugin requires the convert utility specifically. The library as well as installation instructions can be found at http://www.imagemagick.org

How do I know if / where I have ImageMagick installed on my server?

Linux:

  1. SSH into your server and type which convert.
  2. If ImageMagick is installed and in your shell environment PATH the terminal should return a path similar to /opt/local/bin/convert.
  3. Place this path in the configuration of the OCR plugin through the Plugins > OCR link in the sidebar menu in WordPress

Why does OCR require ImageMagick?

Tesseract is only compatible with TIFF images. Therefor, when a web formatted image (JPG, GIF, PNG, etc) is uploaded, a temporary TIFF image must be created via ImageMagick in order for Tesseract to detect the text within the image. This TIFF is discarded once the OCR has been completed.

Where is the detected text stored?

The text detected by the OCR plugin is added to the image as a custom field named ocr_text. See http://codex.wordpress.org/Custom_Fields for instructions on using the ocr_text field in your templates.

Where can I edit the detected text?

The text detected by the OCR plugin is available in a text area labeled 'OCR Text' both in the 'Add an Image' model while attaching an image to a post and while editing a previously uploaded image under the 'Media' section of your WordPress install.

What is the 'Resize percentage' configuration option?

The OCR plugin is tailored to detecting text in images with ~12pt text at 72dpi. ImageMagick is used to upscale the temporary TIFF images fed to Tesseract as Tesseract is generally more accurate with larger type, even if it's been upscaled from a smaller source. If you wish to disable this option simply set this configuration option to 100% and no resizing will occur.

Will the OCR plugin work on versions of WordPress other than 2.9?

Possibly. The OCR plugin simply hasn't been tested on any other versions.

Requires: 2.9 or higher
Compatible up to: 2.9.2
Last Updated: 2010-4-8
Downloads: 560

Ratings

0 stars
0 out of 5 stars

Support

Got something to say? Need help?

Compatibility

+
=
Not enough data

0 people say it works.
0 people say it's broken.