docs: Add Data Source Inventory and Ingestion Architecture Analysis (#98) #102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fulfills Task #98: Create Inventory of Data Sources and Define Standard Data Contracts, a sub-task under the broader initiative to design a scalable and decoupled ingestion framework for real-time and batch data.
Key Outcomes
Created a complete inventory of current data sources
Classified each as real-time or batch
Documented data types, formats, and ingestion frequency
Outlined architecture fit and ingestion approach per source
Included initial design of real-time and batch ingestion pipelines
Identified sources requiring transformation (e.g., XML → JSON)
Artifacts Included
source-summary-table.png: Source-wise frequency, format, and structurearchitecture-fit-table.png: Architecture design fit for each sourcearchitecture-diagram.jpg: Conceptual ingestion architecture for real-time and batch sourcesImages:
Notes
cc @mvadodariya