-
Notifications
You must be signed in to change notification settings - Fork 41
Description
name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Add Docstrings to DataProduct Class Methods'
labels: 'good first issue, documentation, enhancement'
assignees: ''
Welcome! 👋
This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.
Task Description
Add comprehensive docstrings to methods in the DataProduct class located in src/intugle/data_product.py. This is the second most important user-facing API that users interact with to create data products from the semantic layer.
Methods like __init__(), load_all(), plot_graph(), and plot_sources_graph() are missing docstrings entirely.
Why This Matters
The DataProduct class enables users to generate SQL queries and create data products by simply selecting attributes across tables. Good documentation here helps users:
- Understand Capabilities: Know what data products can do
- Generate Queries: Learn how to build queries from the semantic layer
- Visualize Relationships: Use graph plotting features
- Debug Issues: Understand what each method does under the hood
What You'll Learn
- Writing documentation for data transformation APIs
- Understanding SQL generation and ETL concepts
- Documenting query building workflows
- Explaining graph-based join optimization
Step-by-Step Guide
Prerequisites
- Python 3.10+ installed
- Git basics (clone, commit, push, pull request)
- Read our CONTRIBUTING.md guide
Setup Instructions
-
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/data-tools.git cd data-tools -
Create a virtual environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
pip install -e ".[dev]" -
Create a new branch
git checkout -b docs/add-data-product-docstrings
Implementation Steps
-
Open the file
src/intugle/data_product.py -
Add docstring to
__init__()method (line 24):- Explain that DataProduct loads the semantic model from YAML files
- Document the
models_dir_pathparameter - Mention what gets initialized (manifest, field_details, links, join optimizer)
- Add example
-
Add docstring to
load_all()method (line 104):- Explain that it loads all datasets from the manifest
- Mention this is called automatically during initialization
- Document any side effects
-
Add docstring to
plot_graph()method (line 264):- Explain what graph it plots (table relationships)
- Document the
graphparameter - Mention visualization requirements (matplotlib, graphviz, etc.)
-
Add docstring to
plot_sources_graph()method (line 267):- Explain it visualizes all tables and their relationships
- Mention difference from
plot_graph()(shows all vs specific) - Add example
Files to Modify
- File:
src/intugle/data_product.py- Change: Add comprehensive docstrings to missing methods
- Line(s): 24 (init), 104 (load_all), 264 (plot_graph), 267 (plot_sources_graph)
Example Code
def plot_graph(self, graph):
"""
Plot a specific relationship graph.
Visualizes table relationships as a network graph, showing tables as nodes
and foreign key relationships as edges.
Args:
graph: NetworkX graph object containing table relationships to visualize.
Typically obtained from the join optimizer.
Example:
>>> dp = DataProduct()
>>> # Get graph for specific tables
>>> graph = dp.join.generate_graph(["patients", "claims"])
>>> dp.plot_graph(graph)
Note:
Requires matplotlib and graphviz to be installed for visualization.
The graph is displayed inline in Jupyter notebooks or saved to a file
in other contexts.
"""Testing Your Changes
-
Verify docstrings render correctly:
from intugle import DataProduct help(DataProduct) help(DataProduct.__init__) help(DataProduct.plot_sources_graph)
-
Test in a Jupyter notebook:
- Docstrings should appear as tooltips with
Shift+Tab - Examples should be copy-pasteable
- Docstrings should appear as tooltips with
-
Run tests:
pytest tests/
Submitting Your Work
Please run the following command to automatically fix linting issues before committing:
ruff check --fix .
-
Commit your changes
git add src/intugle/data_product.py git commit -m "Add comprehensive docstrings to DataProduct methods" -
Push to your fork
git push origin docs/add-data-product-docstrings
-
Create a Pull Request
- Go to the original repository
- Click "Pull Requests" → "New Pull Request"
- Select your branch
- Fill out the PR template
- Reference this issue with "Fixes #ISSUE_NUMBER"
Expected Outcome
The DataProduct class should have clear docstrings for all methods that:
- Explain purpose and use cases
- Document parameters and return values
- Include practical examples
- Mention prerequisites and requirements
Definition of Done
- Docstring added to
__init__()method with example - Docstring added to
load_all()method - Docstring added to
plot_graph()method - Docstring added to
plot_sources_graph()method with example - All docstrings follow Google style
- Docstrings tested with
help()function - Examples are accurate
- Tests passing locally
- Pull request submitted
Resources
- Project Documentation
- CONTRIBUTING.md
- Example notebooks - See DataProduct usage
Need Help?
Don't hesitate to ask questions! We're here to help you succeed.
- Comment below with your questions
- Join our Discord for real-time support
- Tag maintainers: @raphael-intugle (if specific help needed)
Skills You'll Use
- Python basics
- Git and GitHub
- Technical writing
- Understanding SQL generation concepts
- Graph visualization concepts
Thank you for contributing to Intugle!
Tips for Success:
- Look at notebooks to see how DataProduct is used in practice
- The methods with good docstrings (plan, build, generate_query) are good references
- Focus on explaining WHY users would call each method
- Include practical examples from real use cases
- Have fun! 🎉