Describe the task
Before implementing the ingestion framework, it's essential to identify and document all existing data sources (mobile apps, portals, legacy systems, APIs). This sub-task focuses on analyzing the structure, frequency (real-time or batch), and data formats used by each source. It also includes defining standardized data contracts or schemas to streamline ingestion and ensure consistency across inputs.
Key Outcomes:
- Create a complete list of all current and upcoming data producers.
- Classify them as real-time or batch sources.
- Document data types, formats, and payload examples.
- Define common interface/contract specifications (e.g., JSON schema, Avro).
- Identify sources needing transformation before ingestion.