Skip to content
Change the repository type filter

All

    Repositories list

    • uwazi

      Public
      Uwazi is a web-based, open-source solution for building and sharing document collections
      TypeScript
      8829243621Updated Jan 17, 2026Jan 17, 2026
    • ML-Benchmarks

      Public
      Repository to store all the ML benchmarks
      0000Updated Jan 15, 2026Jan 15, 2026
    • pdf_information_extraction
      Python
      1508Updated Jan 9, 2026Jan 9, 2026
    • Trainable Entity Extractor
      Python
      0407Updated Jan 9, 2026Jan 9, 2026
    • A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
      Python
      1241.1k48Updated Jan 9, 2026Jan 9, 2026
    • Python API to interact with Uwazi
      Python
      0200Updated Jan 9, 2026Jan 9, 2026
    • NER-in-docker
      Python
      0407Updated Nov 24, 2025Nov 24, 2025
    • preserve

      Public
      Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.
      TypeScript
      161211Updated Nov 18, 2025Nov 18, 2025
    • ml-cloud-connector
      Python
      0000Updated Oct 21, 2025Oct 21, 2025
    • NER-in-uwazi
      Python
      0000Updated Oct 20, 2025Oct 20, 2025
    • pdf-features
      Python
      0100Updated Oct 20, 2025Oct 20, 2025
    • queue-processor
      Python
      0000Updated Oct 2, 2025Oct 2, 2025
    • Python
      0000Updated Aug 29, 2025Aug 29, 2025
    • pdf-document-layout-analysis-async
      Python
      0105Updated Aug 18, 2025Aug 18, 2025
    • HTML
      3260Updated Jun 24, 2025Jun 24, 2025
    • docker-translation-service
      Python
      0006Updated May 2, 2025May 2, 2025
    • TypeScript
      1300Updated Mar 18, 2025Mar 18, 2025
    • rison

      Public
      JavaScript
      5001Updated Mar 11, 2025Mar 11, 2025
    • This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.
      Makefile
      43720Updated Feb 3, 2025Feb 3, 2025
    • This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.
      Makefile
      41920Updated Feb 3, 2025Feb 3, 2025
    • An http service to OCR PDFs based on a redis queue.
      Python
      0130Updated Dec 13, 2024Dec 13, 2024
    • text selection handling and highlighting
      TypeScript
      0160Updated Nov 14, 2024Nov 14, 2024
    • An http service to convert documents to PDF based on a redis queue.
      Python
      0037Updated Sep 19, 2024Sep 19, 2024
    • Python
      3316Updated Jul 4, 2024Jul 4, 2024
    • Python
      65114Updated Jul 4, 2024Jul 4, 2024
    • Python
      21500Updated Apr 26, 2024Apr 26, 2024
    • 0400Updated Jul 3, 2023Jul 3, 2023
    • topic-classification

      Public
      Python
      45104Updated May 25, 2023May 25, 2023
    • twitter_crawler

      Public
      twitter crawler
      Python
      0101Updated Apr 3, 2023Apr 3, 2023
    • Python
      4313Updated Dec 27, 2022Dec 27, 2022