Skip to content

Conversation

@23shivay
Copy link

@23shivay 23shivay commented May 8, 2025

Summary

This PR fulfills Task #98: Create Inventory of Data Sources and Define Standard Data Contracts, a sub-task under the broader initiative to design a scalable and decoupled ingestion framework for real-time and batch data.

Key Outcomes

Created a complete inventory of current data sources
Classified each as real-time or batch
Documented data types, formats, and ingestion frequency
Outlined architecture fit and ingestion approach per source
Included initial design of real-time and batch ingestion pipelines
Identified sources requiring transformation (e.g., XML → JSON)

Artifacts Included

  • source-summary-table.png: Source-wise frequency, format, and structure
  • architecture-fit-table.png: Architecture design fit for each source
  • architecture-diagram.jpg: Conceptual ingestion architecture for real-time and batch sources

Images:

  • Source Summary Table
  • Architecture Fit Table
  • Architecture Diagram

Notes

  • Data formats and transformation points are defined with future schema standardization (e.g., JSON Schema, Avro) in mind.
  • The current scope focuses on documentation and analysis; implementation will follow in subsequent tasks.

cc @mvadodariya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant