š CAPTCHA Considerations: Handle Bot Detection Like a Pro
CAPTCHAs are the gatekeepers of the web - puzzles designed to separate humans from bots. They're like bouncers at an exclusive club, checking if you're on the guest list. From simple text puzzles to complex behavioral analysis, CAPTCHAs protect websites from automated access. But with the right approach, patience, and tools, we can handle them ethically and effectively. Let's master the art of CAPTCHA handling! š§©
The CAPTCHA Ecosystem
Think of CAPTCHA handling as a chess game where each move must be carefully calculated. You're not trying to "break" the CAPTCHA but rather solve it legitimately, either manually, through services, or by avoiding it altogether through smart automation practices. It's about finding the right balance between automation efficiency and respecting website security!
Real-World Scenario: The Data Intelligence Platform šÆ
You're building a competitive intelligence platform that monitors product listings, prices, and reviews across multiple e-commerce sites. These sites use various CAPTCHA systems to prevent automated access. You need to handle reCAPTCHA, hCaptcha, image puzzles, and behavioral detection while maintaining efficiency and staying within legal boundaries. Let's build a comprehensive CAPTCHA handling system!
# First, install required packages:
# pip install selenium pillow opencv-python pytesseract 2captcha-python anticaptchaofficial
import time
import random
import json
import base64
import logging
from typing import Dict, List, Optional, Any, Tuple, Union, Callable
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from enum import Enum
from pathlib import Path
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from PIL import Image
import cv2
import numpy as np
from io import BytesIO
import hashlib
from functools import wraps
import threading
from queue import Queue
import re
# ==================== CAPTCHA Types ====================
class CaptchaType(Enum):
"""Types of CAPTCHAs."""
TEXT = "text"
IMAGE = "image"
RECAPTCHA_V2 = "recaptcha_v2"
RECAPTCHA_V3 = "recaptcha_v3"
HCAPTCHA = "hcaptcha"
FUNCAPTCHA = "funcaptcha"
GEETEST = "geetest"
AUDIO = "audio"
SLIDER = "slider"
ROTATION = "rotation"
PUZZLE = "puzzle"
class SolverStrategy(Enum):
"""CAPTCHA solving strategies."""
MANUAL = "manual"
SERVICE = "service"
OCR = "ocr"
AUDIO = "audio"
BEHAVIORAL = "behavioral"
AVOIDANCE = "avoidance"
@dataclass
class CaptchaChallenge:
"""Represents a CAPTCHA challenge."""
type: CaptchaType
site_key: Optional[str] = None
page_url: Optional[str] = None
image_url: Optional[str] = None
image_data: Optional[bytes] = None
challenge_data: Dict[str, Any] = field(default_factory=dict)
timestamp: datetime = field(default_factory=datetime.now)
attempts: int = 0
solved: bool = False
solution: Optional[str] = None
# ==================== CAPTCHA Detector ====================
class CaptchaDetector:
"""
Detect and identify CAPTCHA types on web pages.
"""
def __init__(self, driver: webdriver.Chrome):
self.driver = driver
self.logger = logging.getLogger(__name__)
# CAPTCHA signatures
self.signatures = {
CaptchaType.RECAPTCHA_V2: [
"//iframe[contains(@src, 'recaptcha') and contains(@src, 'anchor')]",
"//div[@class='g-recaptcha']",
"//div[contains(@class, 'grecaptcha')]"
],
CaptchaType.RECAPTCHA_V3: [
"//script[contains(@src, 'recaptcha/api.js?render=')]",
"grecaptcha.execute"
],
CaptchaType.HCAPTCHA: [
"//iframe[contains(@src, 'hcaptcha.com/captcha')]",
"//div[@class='h-captcha']",
"//div[contains(@class, 'hcaptcha')]"
],
CaptchaType.FUNCAPTCHA: [
"//div[@id='funcaptcha']",
"//iframe[contains(@src, 'funcaptcha.com')]"
],
CaptchaType.GEETEST: [
"//div[contains(@class, 'geetest')]",
"//script[contains(@src, 'geetest')]"
]
}
def detect_captcha(self) -> Optional[CaptchaChallenge]:
"""
Detect CAPTCHA on current page.
"""
for captcha_type, signatures in self.signatures.items():
for signature in signatures:
if self._check_signature(signature):
self.logger.info(f"Detected {captcha_type.value} CAPTCHA")
return self._extract_challenge(captcha_type)
# Check for generic CAPTCHA indicators
if self._check_generic_captcha():
self.logger.info("Detected generic CAPTCHA")
return CaptchaChallenge(type=CaptchaType.TEXT)
return None
def _check_signature(self, signature: str) -> bool:
"""Check if signature exists on page."""
try:
if signature.startswith("//"):
# XPath selector
elements = self.driver.find_elements(By.XPATH, signature)
return len(elements) > 0
else:
# JavaScript check
result = self.driver.execute_script(f"return typeof {signature} !== 'undefined'")
return result
except:
return False
def _check_generic_captcha(self) -> bool:
"""Check for generic CAPTCHA indicators."""
indicators = [
"captcha",
"security-check",
"bot-check",
"human-verification",
"challenge"
]
page_source = self.driver.page_source.lower()
for indicator in indicators:
if indicator in page_source:
return True
return False
def _extract_challenge(self, captcha_type: CaptchaType) -> CaptchaChallenge:
"""Extract CAPTCHA challenge details."""
challenge = CaptchaChallenge(
type=captcha_type,
page_url=self.driver.current_url
)
if captcha_type == CaptchaType.RECAPTCHA_V2:
challenge.site_key = self._extract_recaptcha_sitekey()
elif captcha_type == CaptchaType.HCAPTCHA:
challenge.site_key = self._extract_hcaptcha_sitekey()
return challenge
def _extract_recaptcha_sitekey(self) -> Optional[str]:
"""Extract reCAPTCHA site key."""
try:
# Method 1: From div attribute
element = self.driver.find_element(By.CLASS_NAME, "g-recaptcha")
return element.get_attribute("data-sitekey")
except:
pass
try:
# Method 2: From iframe src
iframe = self.driver.find_element(By.XPATH, "//iframe[contains(@src, 'recaptcha')]")
src = iframe.get_attribute("src")
match = re.search(r'k=([A-Za-z0-9_-]+)', src)
if match:
return match.group(1)
except:
pass
return None
def _extract_hcaptcha_sitekey(self) -> Optional[str]:
"""Extract hCaptcha site key."""
try:
element = self.driver.find_element(By.CLASS_NAME, "h-captcha")
return element.get_attribute("data-sitekey")
except:
return None
# ==================== Human-like Behavior Simulator ====================
class HumanBehaviorSimulator:
"""
Simulate human-like behavior to avoid CAPTCHA triggers.
"""
def __init__(self, driver: webdriver.Chrome):
self.driver = driver
self.actions = ActionChains(driver)
self.logger = logging.getLogger(__name__)
def random_delay(self, min_seconds: float = 0.5, max_seconds: float = 2.0):
"""Add random delay between actions."""
delay = random.uniform(min_seconds, max_seconds)
time.sleep(delay)
def human_like_mouse_movement(self, element):
"""Move mouse to element in human-like pattern."""
# Get element location
location = element.location
size = element.size
# Target point with slight randomization
target_x = location['x'] + size['width'] / 2 + random.randint(-5, 5)
target_y = location['y'] + size['height'] / 2 + random.randint(-5, 5)
# Current mouse position (approximate)
current_x = random.randint(0, 100)
current_y = random.randint(0, 100)
# Generate bezier curve points
points = self._generate_bezier_curve(
(current_x, current_y),
(target_x, target_y),
num_points=random.randint(10, 20)
)
# Move mouse along curve
for x, y in points:
self.actions.move_by_offset(x - current_x, y - current_y)
current_x, current_y = x, y
self.actions.pause(random.uniform(0.01, 0.03))
self.actions.perform()
self.actions = ActionChains(self.driver) # Reset action chain
def _generate_bezier_curve(self, start: Tuple[float, float],
end: Tuple[float, float],
num_points: int = 20) -> List[Tuple[float, float]]:
"""Generate points along a bezier curve for natural mouse movement."""
# Control points for curve
control1 = (
start[0] + (end[0] - start[0]) * 0.25 + random.randint(-50, 50),
start[1] + (end[1] - start[1]) * 0.25 + random.randint(-50, 50)
)
control2 = (
start[0] + (end[0] - start[0]) * 0.75 + random.randint(-50, 50),
start[1] + (end[1] - start[1]) * 0.75 + random.randint(-50, 50)
)
points = []
for i in range(num_points):
t = i / (num_points - 1)
# Cubic bezier formula
x = (1-t)**3 * start[0] + \
3*(1-t)**2*t * control1[0] + \
3*(1-t)*t**2 * control2[0] + \
t**3 * end[0]
y = (1-t)**3 * start[1] + \
3*(1-t)**2*t * control1[1] + \
3*(1-t)*t**2 * control2[1] + \
t**3 * end[1]
points.append((x, y))
return points
def human_like_typing(self, element, text: str):
"""Type text with human-like speed and rhythm."""
element.click()
for char in text:
element.send_keys(char)
# Variable typing speed
if char == ' ':
delay = random.uniform(0.1, 0.3)
elif char in '.,!?':
delay = random.uniform(0.2, 0.4)
else:
delay = random.uniform(0.05, 0.2)
time.sleep(delay)
# Occasional pauses (thinking)
if random.random() < 0.1:
time.sleep(random.uniform(0.5, 1.5))
def random_scrolling(self):
"""Perform random scrolling actions."""
scroll_count = random.randint(1, 3)
for _ in range(scroll_count):
# Random scroll direction and distance
direction = random.choice([-1, 1])
distance = random.randint(100, 500) * direction
self.driver.execute_script(f"window.scrollBy(0, {distance})")
time.sleep(random.uniform(0.5, 1.5))
def simulate_reading(self, duration: float = None):
"""Simulate reading behavior on page."""
if duration is None:
duration = random.uniform(5, 15)
start_time = time.time()
while time.time() - start_time < duration:
# Small scroll movements
self.driver.execute_script(
f"window.scrollBy(0, {random.randint(50, 200)})"
)
# Reading pause
time.sleep(random.uniform(1, 3))
# Occasional mouse movement
if random.random() < 0.3:
x = random.randint(100, 500)
y = random.randint(100, 500)
self.actions.move_by_offset(x, y).perform()
self.actions = ActionChains(self.driver)
# ==================== CAPTCHA Solver Services ====================
class CaptchaSolverService:
"""
Base class for CAPTCHA solving services.
"""
def __init__(self, api_key: str):
self.api_key = api_key
self.logger = logging.getLogger(__name__)
def solve(self, challenge: CaptchaChallenge) -> Optional[str]:
"""Solve CAPTCHA challenge."""
raise NotImplementedError
class TwoCaptchaSolver(CaptchaSolverService):
"""
2captcha.com solver implementation.
"""
def __init__(self, api_key: str):
super().__init__(api_key)
self.base_url = "http://2captcha.com"
def solve(self, challenge: CaptchaChallenge) -> Optional[str]:
"""Solve CAPTCHA using 2captcha service."""
if challenge.type == CaptchaType.RECAPTCHA_V2:
return self._solve_recaptcha_v2(challenge)
elif challenge.type == CaptchaType.IMAGE:
return self._solve_image(challenge)
elif challenge.type == CaptchaType.HCAPTCHA:
return self._solve_hcaptcha(challenge)
else:
self.logger.warning(f"Unsupported CAPTCHA type: {challenge.type}")
return None
def _solve_recaptcha_v2(self, challenge: CaptchaChallenge) -> Optional[str]:
"""Solve reCAPTCHA v2."""
# Submit CAPTCHA
params = {
'key': self.api_key,
'method': 'userrecaptcha',
'googlekey': challenge.site_key,
'pageurl': challenge.page_url,
'json': 1
}
response = requests.post(f"{self.base_url}/in.php", data=params)
result = response.json()
if result.get('status') != 1:
self.logger.error(f"Failed to submit CAPTCHA: {result}")
return None
request_id = result.get('request')
# Poll for result
for _ in range(60): # Max 5 minutes
time.sleep(5)
response = requests.get(
f"{self.base_url}/res.php",
params={
'key': self.api_key,
'action': 'get',
'id': request_id,
'json': 1
}
)
result = response.json()
if result.get('status') == 1:
return result.get('request')
elif result.get('request') != 'CAPCHA_NOT_READY':
self.logger.error(f"CAPTCHA solving failed: {result}")
return None
return None
def _solve_image(self, challenge: CaptchaChallenge) -> Optional[str]:
"""Solve image CAPTCHA."""
if not challenge.image_data:
return None
# Submit image
files = {'file': ('captcha.png', challenge.image_data)}
data = {
'key': self.api_key,
'method': 'post',
'json': 1
}
response = requests.post(f"{self.base_url}/in.php", files=files, data=data)
result = response.json()
if result.get('status') != 1:
return None
request_id = result.get('request')
# Poll for result
for _ in range(20):
time.sleep(3)
response = requests.get(
f"{self.base_url}/res.php",
params={
'key': self.api_key,
'action': 'get',
'id': request_id,
'json': 1
}
)
result = response.json()
if result.get('status') == 1:
return result.get('request')
return None
def _solve_hcaptcha(self, challenge: CaptchaChallenge) -> Optional[str]:
"""Solve hCaptcha."""
params = {
'key': self.api_key,
'method': 'hcaptcha',
'sitekey': challenge.site_key,
'pageurl': challenge.page_url,
'json': 1
}
response = requests.post(f"{self.base_url}/in.php", data=params)
result = response.json()
if result.get('status') != 1:
return None
request_id = result.get('request')
# Poll for result
for _ in range(60):
time.sleep(5)
response = requests.get(
f"{self.base_url}/res.php",
params={
'key': self.api_key,
'action': 'get',
'id': request_id,
'json': 1
}
)
result = response.json()
if result.get('status') == 1:
return result.get('request')
return None
# ==================== Local CAPTCHA Solvers ====================
class OCRSolver:
"""
Solve simple text CAPTCHAs using OCR.
"""
def __init__(self):
import pytesseract
self.logger = logging.getLogger(__name__)
def solve_text_captcha(self, image: Union[Image.Image, np.ndarray]) -> Optional[str]:
"""Solve text CAPTCHA using OCR."""
try:
# Preprocess image
processed = self._preprocess_image(image)
# OCR
import pytesseract
text = pytesseract.image_to_string(
processed,
config='--psm 8 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
)
# Clean result
text = text.strip().replace(' ', '')
if text:
self.logger.info(f"OCR result: {text}")
return text
except Exception as e:
self.logger.error(f"OCR failed: {e}")
return None
def _preprocess_image(self, image: Union[Image.Image, np.ndarray]) -> np.ndarray:
"""Preprocess image for better OCR results."""
# Convert to numpy array if needed
if isinstance(image, Image.Image):
image = np.array(image)
# Convert to grayscale
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
else:
gray = image
# Apply thresholding
_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Denoise
denoised = cv2.medianBlur(thresh, 3)
# Resize for better OCR
height, width = denoised.shape
if height < 50:
scale = 50 / height
new_width = int(width * scale)
denoised = cv2.resize(denoised, (new_width, 50))
return denoised
class SliderSolver:
"""
Solve slider CAPTCHAs.
"""
def __init__(self, driver: webdriver.Chrome):
self.driver = driver
self.actions = ActionChains(driver)
self.logger = logging.getLogger(__name__)
def solve_slider(self, slider_element, target_position: int = None):
"""Solve slider CAPTCHA."""
try:
# If no target position, try to detect it
if target_position is None:
target_position = self._detect_target_position()
# Get slider bounds
location = slider_element.location
size = slider_element.size
# Calculate movement
current_position = 0
movement = target_position - current_position
# Perform human-like drag
self._human_like_drag(slider_element, movement)
return True
except Exception as e:
self.logger.error(f"Slider solving failed: {e}")
return False
def _detect_target_position(self) -> int:
"""Detect target position for slider (puzzle piece matching)."""
# This would involve image processing to find the gap
# Simplified version - return random position
return random.randint(100, 250)
def _human_like_drag(self, element, distance: int):
"""Perform human-like drag operation."""
# Click and hold
self.actions.click_and_hold(element).perform()
# Move in steps with acceleration/deceleration
steps = 20
moved = 0
for i in range(steps):
# Acceleration curve
if i < steps / 2:
step = (distance / steps) * (1 + i / steps)
else:
step = (distance / steps) * (2 - i / steps)
self.actions.move_by_offset(step, 0).perform()
moved += step
# Small random variations
if random.random() < 0.3:
self.actions.move_by_offset(0, random.randint(-2, 2)).perform()
time.sleep(random.uniform(0.01, 0.03))
# Adjust final position
final_adjustment = distance - moved
if abs(final_adjustment) > 0:
self.actions.move_by_offset(final_adjustment, 0).perform()
# Release
time.sleep(random.uniform(0.1, 0.3))
self.actions.release().perform()
# ==================== CAPTCHA Manager ====================
class CaptchaManager:
"""
Comprehensive CAPTCHA handling manager.
"""
def __init__(self, driver: webdriver.Chrome,
solver_api_key: Optional[str] = None,
strategy: SolverStrategy = SolverStrategy.SERVICE):
self.driver = driver
self.strategy = strategy
self.detector = CaptchaDetector(driver)
self.behavior_simulator = HumanBehaviorSimulator(driver)
# Initialize solvers
self.service_solver = None
if solver_api_key:
self.service_solver = TwoCaptchaSolver(solver_api_key)
self.ocr_solver = OCRSolver()
self.slider_solver = SliderSolver(driver)
self.logger = logging.getLogger(__name__)
# Statistics
self.stats = {
'total_captchas': 0,
'solved': 0,
'failed': 0,
'avoided': 0
}
def handle_captcha(self, max_attempts: int = 3) -> bool:
"""
Detect and handle CAPTCHA if present.
"""
# First, try to avoid triggering CAPTCHA
if self.strategy == SolverStrategy.AVOIDANCE:
self.behavior_simulator.simulate_reading()
time.sleep(2)
# Check for CAPTCHA
challenge = self.detector.detect_captcha()
if not challenge:
return True # No CAPTCHA found
self.stats['total_captchas'] += 1
self.logger.info(f"CAPTCHA detected: {challenge.type.value}")
# Try to solve CAPTCHA
for attempt in range(max_attempts):
challenge.attempts = attempt + 1
solution = self._solve_challenge(challenge)
if solution:
challenge.solution = solution
challenge.solved = True
# Submit solution
if self._submit_solution(challenge):
self.stats['solved'] += 1
self.logger.info(f"CAPTCHA solved successfully")
return True
# Wait before retry
if attempt < max_attempts - 1:
time.sleep(random.uniform(2, 5))
self.stats['failed'] += 1
self.logger.error(f"Failed to solve CAPTCHA after {max_attempts} attempts")
return False
def _solve_challenge(self, challenge: CaptchaChallenge) -> Optional[str]:
"""Solve CAPTCHA challenge based on strategy."""
if self.strategy == SolverStrategy.SERVICE and self.service_solver:
return self.service_solver.solve(challenge)
elif self.strategy == SolverStrategy.OCR:
if challenge.type == CaptchaType.TEXT:
# Get CAPTCHA image
image = self._get_captcha_image()
if image:
return self.ocr_solver.solve_text_captcha(image)
elif self.strategy == SolverStrategy.BEHAVIORAL:
# Use human-like behavior to avoid detection
self._simulate_human_solving()
return "behavioral_bypass"
return None
def _get_captcha_image(self) -> Optional[Image.Image]:
"""Extract CAPTCHA image from page."""
try:
# Look for common CAPTCHA image selectors
selectors = [
"//img[contains(@id, 'captcha')]",
"//img[contains(@class, 'captcha')]",
"//img[contains(@src, 'captcha')]"
]
for selector in selectors:
try:
element = self.driver.find_element(By.XPATH, selector)
# Get image as base64
img_base64 = self.driver.execute_script(
"return arguments[0].toDataURL('image/png').substring(21);",
element
)
# Convert to PIL Image
img_data = base64.b64decode(img_base64)
image = Image.open(BytesIO(img_data))
return image
except:
continue
except Exception as e:
self.logger.error(f"Failed to extract CAPTCHA image: {e}")
return None
def _submit_solution(self, challenge: CaptchaChallenge) -> bool:
"""Submit CAPTCHA solution."""
try:
if challenge.type == CaptchaType.RECAPTCHA_V2:
# Inject reCAPTCHA solution
self.driver.execute_script(
f"document.getElementById('g-recaptcha-response').innerHTML = '{challenge.solution}';"
)
# Trigger callback if exists
self.driver.execute_script(
"if(typeof ___grecaptcha_cfg !== 'undefined') { "
"Object.entries(___grecaptcha_cfg.clients).forEach(([key, client]) => { "
"if(client.callback) { client.callback('" + challenge.solution + "'); } "
"}); }"
)
elif challenge.type == CaptchaType.TEXT:
# Find input field and submit
input_field = self.driver.find_element(
By.XPATH,
"//input[contains(@name, 'captcha') or contains(@id, 'captcha')]"
)
input_field.clear()
input_field.send_keys(challenge.solution)
# Find and click submit button
submit_button = self.driver.find_element(
By.XPATH,
"//button[@type='submit'] | //input[@type='submit']"
)
submit_button.click()
# Wait for page change or CAPTCHA disappearance
time.sleep(3)
# Check if CAPTCHA is gone
new_challenge = self.detector.detect_captcha()
return new_challenge is None
except Exception as e:
self.logger.error(f"Failed to submit solution: {e}")
return False
def _simulate_human_solving(self):
"""Simulate human solving behavior."""
# Random mouse movements
for _ in range(random.randint(3, 7)):
x = random.randint(100, 500)
y = random.randint(100, 500)
self.behavior_simulator.actions.move_by_offset(x, y).perform()
self.behavior_simulator.actions = ActionChains(self.driver)
time.sleep(random.uniform(0.5, 1.5))
# Random scrolling
self.behavior_simulator.random_scrolling()
# Simulate thinking time
time.sleep(random.uniform(5, 15))
def get_statistics(self) -> Dict[str, Any]:
"""Get CAPTCHA handling statistics."""
total = self.stats['total_captchas']
if total > 0:
self.stats['success_rate'] = self.stats['solved'] / total
else:
self.stats['success_rate'] = 0
return self.stats
# Example usage
if __name__ == "__main__":
print("š CAPTCHA Handling Examples\n")
# Example 1: CAPTCHA types
print("1ļøā£ CAPTCHA Types:")
captcha_types = [
("Text CAPTCHA", "Simple distorted text"),
("reCAPTCHA v2", "Google's 'I'm not a robot' checkbox"),
("reCAPTCHA v3", "Invisible score-based detection"),
("hCaptcha", "Privacy-focused alternative to reCAPTCHA"),
("FunCaptcha", "Game-based puzzles"),
("GeeTest", "Slider and puzzle CAPTCHAs"),
("Image Selection", "Select all images with..."),
("Audio CAPTCHA", "Audio-based challenges")
]
for captcha_type, description in captcha_types:
print(f" {captcha_type}: {description}")
# Example 2: Solving strategies
print("\n2ļøā£ Solving Strategies:")
strategies = [
("Service-based", "Use 2captcha, Anti-Captcha, etc."),
("OCR", "Optical character recognition for text"),
("Audio", "Speech recognition for audio CAPTCHAs"),
("Behavioral", "Mimic human behavior to avoid triggers"),
("Avoidance", "Prevent CAPTCHA from appearing"),
("Manual", "Human intervention when needed")
]
for strategy, description in strategies:
print(f" {strategy}: {description}")
# Example 3: Human behavior simulation
print("\n3ļøā£ Human Behavior Simulation:")
behaviors = [
"Natural mouse movements (bezier curves)",
"Variable typing speed with pauses",
"Random scrolling and reading patterns",
"Hover over elements before clicking",
"Add delays between actions",
"Simulate mistakes and corrections"
]
for behavior in behaviors:
print(f" ⢠{behavior}")
# Example 4: CAPTCHA detection
print("\n4ļøā£ CAPTCHA Detection:")
print(" Detection methods:")
print(" ⢠Check for iframe sources (recaptcha, hcaptcha)")
print(" ⢠Look for specific div classes")
print(" ⢠Search for CAPTCHA-related scripts")
print(" ⢠Analyze page text for keywords")
print(" ⢠Monitor for popup dialogs")
# Example 5: Service integration
print("\n5ļøā£ CAPTCHA Solving Services:")
services = [
("2captcha", "$2.99/1000", "Popular, reliable"),
("Anti-Captcha", "$2.00/1000", "Good API, fast"),
("DeathByCaptcha", "$1.39/1000", "Cheapest option"),
("ImageTyperz", "$1.50/1000", "Good for images"),
("CapMonster Cloud", "$0.60/1000", "Cloud-based")
]
print(" Service comparison:")
for service, price, notes in services:
print(f" {service}: {price} - {notes}")
# Example 6: Success rates
print("\n6ļøā£ Typical Success Rates:")
success_rates = [
("Text CAPTCHA (OCR)", "60-80%"),
("reCAPTCHA v2 (Service)", "85-95%"),
("hCaptcha (Service)", "80-90%"),
("Image Selection", "70-85%"),
("Slider/Puzzle", "75-90%"),
("Behavioral Avoidance", "95-99%")
]
for method, rate in success_rates:
print(f" {method}: {rate}")
# Example 7: Cost considerations
print("\n7ļøā£ Cost Analysis:")
print(" For 10,000 CAPTCHAs/month:")
print(" Service costs: $20-30")
print(" Time saved: ~83 hours")
print(" Success rate: ~90%")
print(" ROI depends on your use case")
# Example 8: Legal and ethical notes
print("\n8ļøā£ Legal & Ethical Considerations:")
considerations = [
"āļø Always respect Terms of Service",
"š¤ Consider reaching out to website owners",
"š° Some sites offer APIs as alternatives",
"š« Never use for malicious purposes",
"š CAPTCHAs protect sites from abuse",
"š Respect rate limits even after solving"
]
for consideration in considerations:
print(f" {consideration}")
# Example 9: Avoidance techniques
print("\n9ļøā£ CAPTCHA Avoidance Techniques:")
techniques = [
"Use authenticated sessions",
"Maintain consistent browser fingerprint",
"Rotate residential proxies",
"Add realistic delays between requests",
"Complete user flows naturally",
"Use headless browser detection bypass",
"Maintain cookies and local storage"
]
for technique in techniques:
print(f" ⢠{technique}")
# Example 10: Best practices
print("\nš Best Practices:")
best_practices = [
"šÆ Try avoidance before solving",
"ā±ļø Cache solved CAPTCHAs when possible",
"š Implement retry logic with backoff",
"š Monitor success rates",
"š” Use appropriate strategy per site",
"š”ļø Have fallback solving methods",
"š Log all CAPTCHA encounters",
"š¤ Combine automation with manual backup",
"š° Budget for CAPTCHA solving costs",
"ā” Optimize for speed vs success rate"
]
for practice in best_practices:
print(f" {practice}")
print("\nā
CAPTCHA handling demonstration complete!")
Key Takeaways and Best Practices šÆ
- Prevention is Better: Try to avoid triggering CAPTCHAs in the first place.
- Use Human-like Behavior: Natural mouse movements and typing patterns.
- Choose the Right Strategy: Balance between cost, speed, and success rate.
- Have Multiple Fallbacks: Don't rely on a single solving method.
- Monitor Success Rates: Track and optimize your solving strategies.
- Respect the Purpose: CAPTCHAs protect websites from abuse.
- Consider Alternatives: APIs might be available for legitimate use.
- Budget Appropriately: Factor CAPTCHA solving costs into your project.
CAPTCHA Handling Best Practices š
Mastering CAPTCHA handling transforms you from someone blocked by bot detection to someone who can navigate it responsibly and effectively. You now understand detection methods, solving strategies, behavioral simulation, and service integration. Whether you're building testing tools, data collectors, or automation systems, these CAPTCHA handling skills ensure your automation can handle real-world challenges! š
Pro Tip: CAPTCHAs are like security guards - the best approach is to not look suspicious in the first place. Focus on prevention through human-like behavior: use realistic delays, natural mouse movements, and complete user journeys. When you must solve CAPTCHAs, choose your strategy wisely. Services like 2captcha are reliable but cost money; OCR works for simple text but has lower success rates; behavioral simulation can bypass some detection entirely. Always have multiple strategies - what works today might not work tomorrow as CAPTCHA systems evolve. Remember that CAPTCHAs exist for good reasons - to protect websites from abuse. If you're hitting lots of CAPTCHAs, consider whether there's a legitimate API you could use instead. Most importantly: respect the intent behind CAPTCHAs. They're not obstacles to break but security measures to work with responsibly!