This repository contains the source code and materials related to my bachelor's thesis focused on the automatic detection of problems in data. The project explores methods for identifying typical issues in structured datasets, such as missing or invalid values, outliers, anomalies, schema drift, and data drift.
The implementation includes rule-based systems, machine learning algorithms, automated profiling tools, and experiment with large language models (LLMs). The goal is to demonstrate how modern technologies can support the automation of data issue detection in real-world scenarios.
This thesis was written as part of the bachelor's program at the University of Economics, Prague (Vysoká škola ekonomická v Praze).