Build a free text CAPTCHA solver in PHP Using Tesseract OCR
Solve basic text CAPTCHAs in PHP without using any paid API. Here’s a fully working approach using Tesseract OCR and a simple PHP wrapper.
This guide walks you through everything — installing dependencies, writing the script, preprocessing noisy images, and understanding when this method stops working.
1. What You’ll Need
Tesseract OCR (engine)
On Linux:
sudo apt update
sudo apt install tesseract-ocr -y
On Windows: download here, install it, and make sure tesseract.exe is in your PATH.
PHP wrapper
composer require thiagoalessio/tesseract_ocr
2. Minimal OCR script in PHP
<?php
require __DIR__ . '/vendor/autoload.php';
use thiagoalessio\TesseractOCR\TesseractOCR;
$ocr = new TesseractOCR('captcha.jpg');
$ocr->lang('eng');
$ocr->setOptions([
'tessedit_char_whitelist' => 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
]);
$text = $ocr->run();
$cleaned = preg_replace('/[^A-Z0-9]/', '', strtoupper($text));
echo "Recognized text: $cleaned\n";
Works fine for clean images with no distortions. For anything blurry or noisy — keep reading.
3. Optional: Preprocessing the image
Tesseract isn't magic. Garbage in = garbage out. Preprocess the image with PHP’s GD library:
<?php
$im = imagecreatefromjpeg('captcha.jpg');
imagefilter($im, IMG_FILTER_GRAYSCALE);
imagefilter($im, IMG_FILTER_CONTRAST, -50);
imagejpeg($im, 'captcha_prepared.jpg');
imagedestroy($im);
Then change the OCR input to 'captcha_prepared.jpg'.
4. Generate a test CAPTCHA (for development)
Using ImageMagick:
convert -size 150x60 xc:white -font Arial -pointsize 30 -fill black -draw "text 20,40 'AB12'" captcha.jpg
5. Troubleshooting
Problem | Cause | Solution |
---|---|---|
tesseract: not found | Binary not installed or not in PATH | Install or fix your PATH |
Empty result | Bad quality input | Apply grayscale and contrast |
File missing | Wrong filename or missing image | Double-check the path |
6. Full working script with preprocessing
<?php
require __DIR__ . '/vendor/autoload.php';
use thiagoalessio\TesseractOCR\TesseractOCR;
// Step 1: Preprocess image
$source = 'captcha.jpg';
$prepared = 'captcha_prepared.jpg';
$im = imagecreatefromjpeg($source);
imagefilter($im, IMG_FILTER_GRAYSCALE);
imagefilter($im, IMG_FILTER_CONTRAST, -50);
imagejpeg($im, $prepared);
imagedestroy($im);
// Step 2: Run OCR
$ocr = new TesseractOCR($prepared);
$ocr->lang('eng');
$ocr->setOptions([
'tessedit_char_whitelist' => 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
]);
$text = $ocr->run();
$cleaned = preg_replace('/[^A-Z0-9]/', '', strtoupper($text));
echo "Recognized text: $cleaned\n";
7. When this fails
This method only solves alphanumeric CAPTCHAs. It completely fails on:
- reCAPTCHA v2/v3
- hCaptcha
- Cloudflare Turnstile
- GeeTest
- Anything with visual distortion, background noise, animation, JavaScript logic, etc.
If you’re dealing with modern, smart CAPTCHAs, you need a real API like SolveCaptcha, which supports all major CAPTCHA systems including reCAPTCHA, hCaptcha, and Turnstile via token injection.
Offers:
- Fast response times
- Support for all major CAPTCHA types
- Simple API integration in any language
8. Other languages supported
You don’t have to stick with PHP. CAPTCHA solvers like 2Captcha, SolveCaptcha work via API, so you can use them with:
- Python
- Java
- Node.js
- Go
- Ruby
- C#
- Shell scripts