OCR app

Humanitext OCR

Humanitext OCR is a next-generation Optical Character Recognition (OCR) platform that leverages Google's powerful multimodal AI, Gemini, as its core engine. It goes beyond merely transcribing characters; it allows users to provide instructions in natural language to extract specific parts of a document or automatically format complex, structured information into JSON. Furthermore, it features an AI-powered auto-correction function that intelligently fixes common recognition errors found in traditional OCR. From digitizing research materials to streamlining daily data entry, Humanitext OCR meets all transcription needs with high precision and flexibility.

Highlights

  • High-precision text recognition powered by Gemini AI
  • Flexible extraction control with natural language instructions
  • Structured JSON output based on user-defined schemas
  • Optional AI-powered auto-correction to further improve accuracy
Watch on YouTube

What is Humanitext OCR?

Humanitext OCR (Humanitext Optical Character Recognition) transcends the traditional framework of “recognizing characters based on fixed rules.” Instead, it fully leverages the “contextual understanding” and “instruction-following” capabilities of AI.

The core of this system lies in the user’s ability to upload a PDF or image file and then add free-form instructions in a text box, such as, “This is a XX document. Ignore the header and footer, and extract only the main body of the text.” In the backend, the uploaded image and this instructional text are sent to the Gemini model, which, much like a human, understands the instructions and executes the OCR process accordingly.

This enables anyone to easily perform tasks that previously required manual correction or custom scripting, such as extracting specific information from documents.

Purpose: To Transform Any Document into Meaningful Data

We are surrounded by documents where text is embedded as images: scanned academic papers, photos of old books, PDF meeting minutes, and more. To make the information in these documents reusable, a highly accurate and flexible OCR is essential.

Humanitext OCR was created to solve this challenge, aiming to resurrect all documents not just as a string of text, but as “meaningful, structured data.”

  • For Researchers and Students Dramatically streamline tedious data conversion tasks in academic work, such as extracting citations from papers and historical sources, digitizing field notes that include handwritten text, or creating structured data from a bibliography.

  • For Business Professionals Automate data entry for both structured and unstructured business documents, such as extracting specific items from invoices and receipts in JSON format or pulling only the action items from meeting minutes.

  • For Everyone Eliminate the hassle of various daily “transcription” tasks, whether it’s digitizing the content of a physical book or converting the information from a photographed poster into text.

Core Functions and How to Use Them

Humanitext OCR achieves high-precision results through a simple, two-step process.

1. File Upload and Configuration

First, upload the PDF (one file at a time) or image files (multiple allowed) you want to process. Then, configure the following settings as needed:

  • Instructions for the LLM: In the text area, enter specific requests regarding the OCR process (e.g., This text contains a mix of Latin and Greek. Please insert a page number like [p.XX] at the beginning of each page.).
  • Output Format Selection:
    • Text file: Outputs the extracted results in a free-form text format.
    • JSON file: Outputs structured JSON data by allowing you to define the “keys” and “types” (single, list, nested, etc.) of the information you want to extract via a GUI. This makes subsequent data utilization significantly easier.
  • Auto-Correction: Checking the “Perform auto-correction of OCR by LLM” box will have the AI compare the initial OCR result with the image again and automatically fix any errors. This improves accuracy but roughly doubles the processing time.

2. Test Run and Final Processing

Once configured, you proceed with a two-step process.

  • Step 1: Run OCR Test This processes only the first page of your PDF (or the first image) and shows you a preview of the result. You can use this to verify if your instructions and settings were appropriate.

  • Step 2: Process the Rest with This Setting If you are satisfied with the test result, clicking this button will process all remaining pages or files with the same settings. After processing is complete, you can download the individual files or a single ZIP file containing all the results. If the test result is unsatisfactory, the “Redo” button allows you to go back and adjust the settings.


Regarding Bulk File Processing (Batch Processing)

Humanitext OCR also has a more advanced batch processing function capable of handling hundreds or thousands of PDF pages in parallel. While this feature is not publicly available, we offer it on a consultation basis for research institutions or individuals with special requirements, such as large-scale digitization projects. If you are interested, please contact the project team for more information.