Skip to content

Releases: mindee/doctr

v1.0.0

09 Jul 11:39
7dabbe1

Choose a tag to compare

Note: docTR 1.0.0 requires python >= 3.10

What's Changed

Breaking Change

TensorFlow has been removed as a supported backend. docTR now comes with PyTorch as the default and only deep learning backend.

The installation options torch and tf have been removed. You can now install docTR simply with:

pip install python-doctr

This will install docTR with PyTorch support by default.

Training script filenames have been updated to remove backend-specific extensions. For example:

recognition/train_pytorch.py → recognition/train.py

New features

  • A new crnn_vgg16_bn checkpoint was added

What's Changed

Breaking Changes 🛠

Bug Fixes

  • [bug] Fix viptr onnx export issue by @felixT2K in #1966
  • [Fix] Correct condition for image dilation in orientation estimation by @Razlaw in #1971

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.12.0...v1.0.0

v0.12.0

20 Jun 07:43
97d4006

Choose a tag to compare

Note: docTR 0.12.0 requires python >= 3.10
Note: docTR 0.12.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

Warning

TensorFlow Backend Deprecation Notice

Using docTR with TensorFlow as a backend is deprecated and will be removed in the next major release (v1.0.0).
We recommend switching to the PyTorch backend, which is more actively maintained and supports the latest features and models.
Alternatively, you can use OnnxTR, which does not require TensorFlow or PyTorch.

This decision was made based on several considerations:

  • Allows better focus on improving the core library
  • Frees up resources to develop new features faster
  • Enables more targeted optimizations with PyTorch

Warning

This release is the last minor release supporting TensorFlow as backend

What's changed

New features

  • A new lightweight recognition model viptr_tiny was added
  • New built-in dataset added - COCO-Text V2
  • A new custom model loading interface
# NEW
model = vitstr_small(pretrained=False, pretrained_backbone=False)
model.from_pretrained("<PATH_TO>")  # local path or url to .pt or .h5

# Instead of depending on the backend
reco_params = torch.load('<path_to_pt>', map_location="cpu")
reco_model.load_state_dict(reco_params)
# Or with TensorFlow
reco_model.load_weights(..)

What's Changed

Breaking Changes 🛠

New Features

  • [datasets] COCO-Text V2 integration by @sarjil77 in #1888
  • [references] Recognition - Allow built-in datasets usage by @sarjil77 in #1904
  • [Feat] PyTorch - VIP backbone and VIPTR recognition module by @lkosh in #1912

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.11.0...v0.12.0

v0.11.0

30 Jan 09:25
1c9ce92

Choose a tag to compare

Note: docTR 0.11.0 requires python >= 3.10
Note: docTR 0.11.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

What's changed

New features

  • Added torch.compile support (PyTorch backend)
  • Improved model training logging
  • Created a small labeling tool designed for docTR (early stage) --> doctr-labeler

Compile your model

Compiling your PyTorch models with torch.compile optimizes the model by converting it to a graph representation and applying backends that can improve performance.
This process can make inference faster and reduce memory overhead during execution.

Further information can be found in the PyTorch documentation

import torch
from doctr.models import (
    ocr_predictor,
    vitstr_small,
    fast_base,
    mobilenet_v3_small_crop_orientation,
    mobilenet_v3_small_page_orientation,
    crop_orientation_predictor,
    page_orientation_predictor
)

# Compile the models
detection_model = torch.compile(
    fast_base(pretrained=True).eval()
)
recognition_model = torch.compile(
    vitstr_small(pretrained=True).eval()
)
crop_orientation_model = torch.compile(
    mobilenet_v3_small_crop_orientation(pretrained=True).eval()
)
page_orientation_model = torch.compile(
    mobilenet_v3_small_page_orientation(pretrained=True).eval()
)

predictor = models.ocr_predictor(
    detection_model, recognition_model, assume_straight_pages=False
)
# NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification
# Set the orientation predictors
predictor.crop_orientation_predictor = crop_orientation_predictor(crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(page_orientation_model)

compiled_out = predictor(doc)

What's Changed

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.10.0...v0.11.0

v0.10.0

21 Oct 08:37
d5dbc73

Choose a tag to compare

Note: docTR 0.10.0 requires python >= 3.9
Note: docTR 0.10.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

What's Changed

Soft Breaking Changes (TensorFlow backend only) 🛠

  • Changed the saving format from /weights to .weights.h5

NOTE: Please update your custom trained models and HuggingFace hub uploaded models, this will be the last release supporting manual loading from /weights.

New features

Disable page orientation classification

  • If you deal with documents which contains only small rotations (~ -45 to 45 degrees), you can disable the page orientation classification to speed up the inference.
  • This will only have an effect with assume_straight_pages=False and/or straighten_pages=True and/or detect_orientation=True.
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_page_orientation=True)

Disable crop orientation classification

  • If you deal with documents which contains only horizontal text, you can disable the crop orientation classification to speed up the inference.
  • This will only have an effect with assume_straight_pages=False and/or straighten_pages=True.
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_crop_orientation=True)

Loading custom exported orientation classification models

You can now load your custom trained orientation models, the following snippet demonstrates how:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor

custom_page_orientation_model = mobilenet_v3_small_page_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>")
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>"))

predictor = ocr_predictor(pretrained=True, assume_straight_pages=False, detect_orientation=True)

# Overwrite the default orientation models
predictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.9.0...v0.10.0

v0.9.0

08 Aug 13:57
894eafd

Choose a tag to compare

Note: docTR 0.9.0 requires python >= 3.9
Note: docTR 0.9.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

Soft Breaking Changes 🛠

  • The default detection model changed from db_resnet50 to fast_base.
    NOTE: Can be reverted by passing the detection model predictor = ocr_predictor(det_arch="db_resnet50", pretrained=True)
  • The default value of resolve_blocks changed from True to False
    NOTE: Can be reverted by passing resolve_blocks=True to the ocr_predictor

New features

✨ Installation ✨

We have splitted docTR into some optional parts to make it a bit more lightweight and to exclude parts which are not required for inference.
Optional parts are:

  • visualization (to support .show())
  • html support (to support .from_url(...))
  • contribution module
# for TensorFlow without any optional dependencies
pip install "python-doctr[tf]"

# for PyTorch without any optional dependencies
pip install "python-doctr[torch]"

# Installs pytorch and all available optional parts
pip install "python-doctr[torch,viz,html,contib]"

✨ ONNX and OnnxTR ✨

We have build a standalone library to provide a super lightweight way to use existing docTR onnx exported models or your custom onces.

benefits:

  • kown docTR interface (ocr_predictor, etc.)
  • no PyTorch or TensorFlow required - build on top of onnxruntime
  • more lightweight package with faster inference latency and less required resources
  • 8-Bit quantized models for faster inference on CPU

Give it a try and check it out: OnnxTR
docTR docs: ONNX / OnnxTR

Screenshot from 2024-08-09 09-15-37

What's Changed

Breaking Changes 🛠

  • [models] Change default model to fast_base - soft breaking change by @felixdittrich92 in #1588
  • [misc] update README & fix mypy & change resolve blocks default by @felixT2K in #1686

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.8.1...v0.9.0

v0.8.1

04 Mar 14:50
62d94ff

Choose a tag to compare

Note: doctr 0.8.1 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

v0.8.0

28 Feb 13:13
67d1087

Choose a tag to compare

Note: doctr 0.8.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

Breaking Changes 🛠

  • db_resnet50_rotation (PyTorch) and linknet_resnet18_rotation (TensorFlow) are removed (All models can handle rotated documents now)
  • .show(doc) changed to .show()

New features

  • All models have pretrained checkpoints now by @odulcy-mindee
  • All detection models was retrained on rotated samples by @odulcy-mindee
  • Improved orientation detection for documents rotated between -90 and 90 degrees by @felixdittrich92
  • Conda deployment job & receipt added by @frgfm
  • Official docTR docker images are added by @odulcy-mindee => docker-images
  • New benchmarks and documentation improvements by @felixdittrich92
  • WildReceipt dataset added by @HamzaGbada
  • EarlyStopping callback added to all training scripts by @SkaarFacee
  • Hook mechanism added to ocr_predictor to maniplulate the detection predictions in the middle of the pipeline to your needs by @felixdittrich92
from doctr.model import ocr_predictor

class CustomHook:
    def __call__(self, loc_preds):
        # Manipulate the location predictions here
        # 1. The outpout structure needs to be the same as the input location predictions
        # 2. Be aware that the coordinates are relative and needs to be between 0 and 1
        return loc_preds

my_hook = CustomHook()

predictor = ocr_predictor(pretrained=True)
# Add a hook in the middle of the pipeline
predictor.add_hook(my_hook)
# You can also add multiple hooks which will be executed sequentially
for hook in [my_hook, my_hook, my_hook]:
    predictor.add_hook(hook)

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

09 Sep 13:23
75bddfc

Choose a tag to compare

Note: doctr 0.7.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
Note: We will release the missing PyTorch checkpoints with 0.7.1

What's Changed

Breaking Changes 🛠

  • We changed the preserve_aspect_ratio parameter to True by default in #1279
    => To restore the old behaviour you can pass preserve_aspect_ratio=False to the predictor instance

New features

Add of the KIE predictor

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.

from doctr.io import DocumentFile
from doctr.models import kie_predictor

# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)

predictions = result.pages[0].predictions
for class_name in predictions.keys():
    list_predictions = predictions[class_name]
    for prediction in list_predictions:
        print(f"Prediction for {class_name}: {prediction}")

The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.0

29 Sep 11:51
dcbb21f

Choose a tag to compare

Highlights of the release:

Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

Full integration with Huggingface Hub (docTR meets Huggingface)

hf

  • Loading from hub:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
  • Pushing to the hub:
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')

Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

Predefined datasets can be used also for recognition task

from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]

Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

New models (both frameworks)

  • classification: VisionTransformer (ViT)
  • recognition: Vision Transformer for Scene Text Recognition (ViTSTR)

Bug fixes recognition models

  • MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)

ONNX support (experimential)

  • All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)

NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

Further features

  • our demo is now also PyTorch compatible, thanks to @odulcy-mindee
  • it is now possible to detect the language of the extracted text, thanks to @aminemindee

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

  • [Refactor] commit tags by @felixdittrich92 in #871
  • Update io/pdf.py to new pypdfium2 API by @mara004 in #944
  • docs: Documentation the reason for keras version specifier by @frgfm in #958
  • [datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in #983
  • [datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in #987
  • fix: update tensorflow-addons to match tensorflow version by @ianardee in #998
  • move transformers implementation to modules by @felixdittr...
Read more

v0.5.1

22 Mar 10:41
9d03085

Choose a tag to compare

This minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !

Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improvement of the documentation

The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed.
This how it renders:

doc
Capture d’écran de 2022-03-22 11-08-31

Rotated text detection extended to Tensorflow backend

We provide weights for the linknet_resnet18_rotation model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images.
All these improvements led to much better results, and the pretrained model is now very robust.

Preserving the aspect ratio in the detection task

You can now choose to preserve the aspect ratio in the detection_predictor:

>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

This option can also be activated in the high level end-to-end predictor:

>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

Integration within the HugginFace Hub

The artefact detection model is now available on the HugginFace Hub, this is amazing:

Capture d’écran de 2022-03-22 11-33-14

On DocTR, you can now use the .from_hub() method so that those 2 snippets are equivalent:

# Pretrained
from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)

and:

# HF Hub
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")

Breaking changes

Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible

We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.

Full changelog

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.5.0...v0.5.1