Build a free text CAPTCHA solver in PHP Using Tesseract OCR

Solve basic text CAPTCHAs in PHP without using any paid API. Here’s a fully working approach using Tesseract OCR and a simple PHP wrapper.

This guide walks you through everything — installing dependencies, writing the script, preprocessing noisy images, and understanding when this method stops working.

1. What You’ll Need

Tesseract OCR (engine)

On Linux:

sudo apt update
sudo apt install tesseract-ocr -y

On Windows: download here, install it, and make sure tesseract.exe is in your PATH.

PHP wrapper

composer require thiagoalessio/tesseract_ocr

2. Minimal OCR script in PHP

<?php

require __DIR__ . '/vendor/autoload.php';

use thiagoalessio\TesseractOCR\TesseractOCR;

$ocr = new TesseractOCR('captcha.jpg');
$ocr->lang('eng');
$ocr->setOptions([
    'tessedit_char_whitelist' => 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
]);

$text = $ocr->run();
$cleaned = preg_replace('/[^A-Z0-9]/', '', strtoupper($text));

echo "Recognized text: $cleaned\n";

Works fine for clean images with no distortions. For anything blurry or noisy — keep reading.

3. Optional: Preprocessing the image

Tesseract isn't magic. Garbage in = garbage out. Preprocess the image with PHP’s GD library:

<?php

$im = imagecreatefromjpeg('captcha.jpg');
imagefilter($im, IMG_FILTER_GRAYSCALE);
imagefilter($im, IMG_FILTER_CONTRAST, -50);
imagejpeg($im, 'captcha_prepared.jpg');
imagedestroy($im);

Then change the OCR input to 'captcha_prepared.jpg'.

4. Generate a test CAPTCHA (for development)

Using ImageMagick:

convert -size 150x60 xc:white -font Arial -pointsize 30 -fill black -draw "text 20,40 'AB12'" captcha.jpg

5. Troubleshooting

Problem Cause Solution
tesseract: not found Binary not installed or not in PATH Install or fix your PATH
Empty result Bad quality input Apply grayscale and contrast
File missing Wrong filename or missing image Double-check the path

6. Full working script with preprocessing

<?php

require __DIR__ . '/vendor/autoload.php';

use thiagoalessio\TesseractOCR\TesseractOCR;

// Step 1: Preprocess image
$source = 'captcha.jpg';
$prepared = 'captcha_prepared.jpg';

$im = imagecreatefromjpeg($source);
imagefilter($im, IMG_FILTER_GRAYSCALE);
imagefilter($im, IMG_FILTER_CONTRAST, -50);
imagejpeg($im, $prepared);
imagedestroy($im);

// Step 2: Run OCR
$ocr = new TesseractOCR($prepared);
$ocr->lang('eng');
$ocr->setOptions([
    'tessedit_char_whitelist' => 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
]);

$text = $ocr->run();
$cleaned = preg_replace('/[^A-Z0-9]/', '', strtoupper($text));

echo "Recognized text: $cleaned\n";

7. When this fails

This method only solves alphanumeric CAPTCHAs. It completely fails on:

  • reCAPTCHA v2/v3
  • hCaptcha
  • Cloudflare Turnstile
  • GeeTest
  • Anything with visual distortion, background noise, animation, JavaScript logic, etc.

If you’re dealing with modern, smart CAPTCHAs, you need a real API like SolveCaptcha, which supports all major CAPTCHA systems including reCAPTCHA, hCaptcha, and Turnstile via token injection.

Offers:

  • Fast response times
  • Support for all major CAPTCHA types
  • Simple API integration in any language

8. Other languages supported

You don’t have to stick with PHP. CAPTCHA solvers like 2Captcha, SolveCaptcha work via API, so you can use them with:

  • Python
  • Java
  • Node.js
  • Go
  • Ruby
  • C#
  • Shell scripts