Clone of https://github.com/microsoft/table-transformer with several improvements and box relaxation.
Usage:
conda env create --name tatr --file=environment.yml
conda run --no-capture-output --live-stream -n tatr python -c 'from huggingface_hub import snapshot_download; snapshot_download(repo_id="bsmock/pubtables-1m", repo_type="dataset")'Instead of environment.yml it is also possible to use environment-latest.
The available list of scripts is described below. Note that only those using python need conda, and even those can be easily modified to skip conda if one installs manually the list of dependencies.
- Table Detection (TD)
- make-single-object-dataset.sh: Makes a training dataset out of the single-table images;
- shuffle-detection.sh: Shuffles the list of training images. Prerequisite for relax-detection-only-subset.sh and relax-detection-subset.sh;
- relax-detection-only-subset.sh: Keeps a (randomly selected) table from each training image;
- relax-detection-subset.sh: Keeps a (randomly selected) table from each training image and relaxes the other ones to have no hole border and the full image as the outer hole.
- relax-dataset-detection-px.sh: Contracts (for the hole border) and expands (for the outer border) each training table by 2 pixels. Each of the 4 sides may be independently contracted less and/or expanded less than 2 pixels if needed to keep the table center within the hole border resp. to keep the outer border within the image box.
- relax-dataset-detection-pxs.sh: Contracts (for the hole border) and expands (for the outer border) each training table by 4 pixels for one dataset, and once again by 8 pixels for yet another training dataset. The actual amount of relaxation on each side can be lower if necessary to maintain symmetry between corresponding edges of the hole and outer borders.
- train-detection-px.sh: Trains the model with the dataset created by relax-dataset-detection-px.sh;
- train-detection-midline.sh: Trains the model with the dataset created by relax-dataset-detection-pxs.sh;
- Table Structure Recognition (TSR)
- relax-constraints-inf.sh: Relax the TSR objects in the training dataset while making sure that the same table cell matrix ensues. Can be run in parallel, e.g. one process pe CPU core. Upon restart continues where it left over, with the exception that the tables with at least a spanning cell which is dropped during matrix cell extraction are reprocesses after each restart.
- The precomputed train split with relaxed bounding boxes is available as
train_pxct_infat https://huggingface.co/datasets/aioaneid/relaxed-table-structure.
- The precomputed train split with relaxed bounding boxes is available as
- make-structure-pxct-dataset.sh: Makes the table outer border identical to the original table bounding box. Makes a complete training dataset, including without relaxation those tables with spanning cells which get dropped in the cell matrix extraction step.
- train-structure-pxct.sh: Trains a TSR model on the constrained box relaxation dataset.
- relax-constraints-inf.sh: Relax the TSR objects in the training dataset while making sure that the same table cell matrix ensues. Can be run in parallel, e.g. one process pe CPU core. Upon restart continues where it left over, with the exception that the tables with at least a spanning cell which is dropped during matrix cell extraction are reprocesses after each restart.
Just like with TATR v1.1, the TSR eval should be performed on table images with very little padding as created by create_padded_dataset.py. The val and test splits with tight padding are available at https://huggingface.co/datasets/aioaneid/relaxed-table-structure.
The GriTS evaluation code can be executed in parallel on different batches of images, e.g.:
seed=$((echo 0 ${test_split_name} ${epoch} | sha512sum | awk '{printf "ibase=16; "toupper($1)}' && echo " % 7FFFFF") | bc) &&
conda run --no-capture-output --live-stream -n tatr python src/main.py --data_type structure --config_file src/structure_config.json --data_root_dirs ${d} --table_words_dir ${d}/words --data_root_image_extensions .jpg --data_root_multiplicities 1 --device ${device} --mode ${mode} --test_split_name ${test_split_name} --test_start_offset ${test_start_offset} --test_max_size ${test_max_size} --no-enable_bounds --model_load_path ${f}/model_${epoch}.pth --metrics_save_filepath ${metrics_save_path} --seed ${seed} --torch_num_threads 1The metrics batches for an arbitrary epoch can then be merged together using plots/aggregate_json_grits.py.
The training scripts allow a new --mode option validate which can be executed in a subsequent phase to training.
The code can also be used without box relaxation by specifying --no-enable_bounds when running main.py.
| Model | Cardinality Error | AP | AR |
|---|---|---|---|
| All images, all tables | 0.0018 | 0.9800 | 0.9900 |
| Only images with exactly one object (table or table rotated) | 0.1050 | 0.8700 | 0.8870 |
| All images, one randomly-selected object (table or table rotated) per image | 0.0186 | 0.9770 | 0.9880 |
| All images, all objects counted split by category (table or table rotated), one randomly-selected object per image has a bounding box | 0.0018 | 0.9730 | 0.9880 |
| All images, all objects with hole and outer bounding boxes each relaxed by 2 pixels. TATR v1.1 cropping around the outer border | 0.0018 | 0.9800 | 0.9900 |
| All images, all objects with hole and outer bounding boxes each relaxed symetrically by (up to) 4 pixels. TATR v1.1 cropping around original bounding box | 0.0016 | 0.9790 | 0.9900 |
| All images, all objects with hole and outer bounding boxes each relaxed symetrically by (up to) 8 pixels. TATR v1.1 cropping around original bounding box | 0.0014 | 0.9760 | 0.9850 |
The table below refers to the best results obtained up to and including model_28.pth of:
- "TATR v1.1 with bug fixes": https://huggingface.co/aioaneid/baseline-table-structure;
- "Constrained box relaxation": https://huggingface.co/aioaneid/relaxed-table-structure.
| Model | Tables | AccCon | GriTSCon | GriTSLoc | GriTSTop | Epochs |
|---|---|---|---|---|---|---|
| TATR v1.0 | All | 0.8243 | 0.9850 | 0.9786 | 0.9849 | 20 |
| TATR v1.1 | All | 0.8326 | 0.9855 | 0.9797 | 0.9851 | 28.5 |
| TATR v1.1 with bug fixes | All | 0.8433 | 0.9862 | 0.9806 | 0.9858 | 28 |
| Constrained box relaxation | All | 0.8458 | 0.9866 | 0.9811 | 0.9861 | 28 |
| TATR v1.1 with bug fixes | Simple | 0.9661 | 0.9947 | 0.9934 | 0.9953 | 28 |
| Constrained box relaxation | Simple | 0.9667 | 0.9954 | 0.9941 | 0.9960 | 28 |
| TATR v1.1 with bug fixes | Complex | 0.7324 | 0.9786 | 0.9693 | 0.9774 | 28 |
| Constrained box relaxation | Complex | 0.7363 | 0.9789 | 0.9697 | 0.9773 | 28 |
If you find this work useful, please cite: Aioanei, D. (2025). Relaxed Bounding Boxes for Object Detection. ICCK Journal of Image Analysis and Processing, 1(3), 107–124. https://doi.org/10.62762/JIAP.2025.507329
In BibTeX format:
@article{aioanei2025relaxed,
author = {Aioanei, Daniel},
title = {Relaxed Bounding Boxes for Object Detection},
journal = {ICCK Journal of Image Analysis and Processing},
year = {2025},
volume = {1},
number = {3},
pages = {107--124},
doi = {10.62762/JIAP.2025.507329},
url = {https://doi.org/10.62762/JIAP.2025.507329}
}