Changelog
Changelog
All notable changes to txtcaptcha are documented here. The format follows Keep a Changelog and the project adheres to Semantic Versioning.
[Unreleased]
Changed
- Notebooks moved under
docs/notebooks/and surfaced on the docs site.
0.1.0 - 2026-04-11
Initial public release.
Added
- CRNN+CTC architecture (
txtcaptcha.CRNN) with ResNet-style CNN backbone, 2-layer BiLSTM and CTC head. Handles variable-width input and variable label lengths out of the box. read_captcha,Captcha,annotatefor loading/labeling images.CaptchaDataset,transform_image,encode_label,decode_indices,pad_collatefor building training pipelines.fit_modeltraining loop with validation split, CTC loss and early stopping.decrypt()with three decoding modes:- greedy CTC (default),
- fixed-length exact DP via
length=N, - decode-time masking via
mask=(list, regex, orre.Pattern).
save_model/load_modelwith automatic CPU fallback when CUDA is unavailable.- Dataset fetchers:
available_datasets,download_dataset. - Live captcha fetchers via
download_captchasCLI and thetxtcaptcha.downloadsubpackage (10 Brazilian court / tax sources). - Hugging Face Hub integration:
save_pretrained,from_pretrained,push_to_hub,DEFAULT_REPO_ID,- safetensors +
config.json+ model card layout, decrypt()auto-downloads the unified checkpoint fromjtrecenti/txtcaptcha-crnnon first call and caches it.
- Unified pretrained checkpoint published on the Hub at
jtrecenti/txtcaptcha-crnn(~89% captcha-level accuracy across ten Brazilian court datasets). - Notebooks:
train_unified_model.ipynb,eval_per_dataset.ipynb,eval_per_dataset_live.ipynb. - GitHub Actions:
- CI running pytest on push/PR,
- PyPI publishing via trusted publishing on GitHub releases,
- Quarto + quartodoc site deployed to GitHub Pages on pushes to
main.