Skip to content

[GOOD FIRST ISSUE] Add Docstrings to DataProduct Class Methods #128

@raphael-intugle

Description

@raphael-intugle

name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Add Docstrings to DataProduct Class Methods'
labels: 'good first issue, documentation, enhancement'
assignees: ''

Welcome! 👋

This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.

Task Description

Add comprehensive docstrings to methods in the DataProduct class located in src/intugle/data_product.py. This is the second most important user-facing API that users interact with to create data products from the semantic layer.

Methods like __init__(), load_all(), plot_graph(), and plot_sources_graph() are missing docstrings entirely.

Why This Matters

The DataProduct class enables users to generate SQL queries and create data products by simply selecting attributes across tables. Good documentation here helps users:

  • Understand Capabilities: Know what data products can do
  • Generate Queries: Learn how to build queries from the semantic layer
  • Visualize Relationships: Use graph plotting features
  • Debug Issues: Understand what each method does under the hood

What You'll Learn

  • Writing documentation for data transformation APIs
  • Understanding SQL generation and ETL concepts
  • Documenting query building workflows
  • Explaining graph-based join optimization

Step-by-Step Guide

Prerequisites

  • Python 3.10+ installed
  • Git basics (clone, commit, push, pull request)
  • Read our CONTRIBUTING.md guide

Setup Instructions

  1. Fork and clone the repository

    git clone https://github.com/YOUR_USERNAME/data-tools.git
    cd data-tools
  2. Create a virtual environment

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies

    pip install -e ".[dev]"
  4. Create a new branch

    git checkout -b docs/add-data-product-docstrings

Implementation Steps

  1. Open the file src/intugle/data_product.py

  2. Add docstring to __init__() method (line 24):

    • Explain that DataProduct loads the semantic model from YAML files
    • Document the models_dir_path parameter
    • Mention what gets initialized (manifest, field_details, links, join optimizer)
    • Add example
  3. Add docstring to load_all() method (line 104):

    • Explain that it loads all datasets from the manifest
    • Mention this is called automatically during initialization
    • Document any side effects
  4. Add docstring to plot_graph() method (line 264):

    • Explain what graph it plots (table relationships)
    • Document the graph parameter
    • Mention visualization requirements (matplotlib, graphviz, etc.)
  5. Add docstring to plot_sources_graph() method (line 267):

    • Explain it visualizes all tables and their relationships
    • Mention difference from plot_graph() (shows all vs specific)
    • Add example

Files to Modify

  • File: src/intugle/data_product.py
    • Change: Add comprehensive docstrings to missing methods
    • Line(s): 24 (init), 104 (load_all), 264 (plot_graph), 267 (plot_sources_graph)

Example Code

def plot_graph(self, graph):
    """
    Plot a specific relationship graph.
    
    Visualizes table relationships as a network graph, showing tables as nodes
    and foreign key relationships as edges.
    
    Args:
        graph: NetworkX graph object containing table relationships to visualize.
               Typically obtained from the join optimizer.
               
    Example:
        >>> dp = DataProduct()
        >>> # Get graph for specific tables
        >>> graph = dp.join.generate_graph(["patients", "claims"])
        >>> dp.plot_graph(graph)
        
    Note:
        Requires matplotlib and graphviz to be installed for visualization.
        The graph is displayed inline in Jupyter notebooks or saved to a file
        in other contexts.
    """

Testing Your Changes

  1. Verify docstrings render correctly:

    from intugle import DataProduct
    help(DataProduct)
    help(DataProduct.__init__)
    help(DataProduct.plot_sources_graph)
  2. Test in a Jupyter notebook:

    • Docstrings should appear as tooltips with Shift+Tab
    • Examples should be copy-pasteable
  3. Run tests:

    pytest tests/

Submitting Your Work

Please run the following command to automatically fix linting issues before committing: ruff check --fix .

  1. Commit your changes

    git add src/intugle/data_product.py
    git commit -m "Add comprehensive docstrings to DataProduct methods"
  2. Push to your fork

    git push origin docs/add-data-product-docstrings
  3. Create a Pull Request

    • Go to the original repository
    • Click "Pull Requests" → "New Pull Request"
    • Select your branch
    • Fill out the PR template
    • Reference this issue with "Fixes #ISSUE_NUMBER"

Expected Outcome

The DataProduct class should have clear docstrings for all methods that:

  • Explain purpose and use cases
  • Document parameters and return values
  • Include practical examples
  • Mention prerequisites and requirements

Definition of Done

  • Docstring added to __init__() method with example
  • Docstring added to load_all() method
  • Docstring added to plot_graph() method
  • Docstring added to plot_sources_graph() method with example
  • All docstrings follow Google style
  • Docstrings tested with help() function
  • Examples are accurate
  • Tests passing locally
  • Pull request submitted

Resources

Need Help?

Don't hesitate to ask questions! We're here to help you succeed.

  • Comment below with your questions
  • Join our Discord for real-time support
  • Tag maintainers: @raphael-intugle (if specific help needed)

Skills You'll Use

  • Python basics
  • Git and GitHub
  • Technical writing
  • Understanding SQL generation concepts
  • Graph visualization concepts

Thank you for contributing to Intugle!

Tips for Success:

  • Look at notebooks to see how DataProduct is used in practice
  • The methods with good docstrings (plan, build, generate_query) are good references
  • Focus on explaining WHY users would call each method
  • Include practical examples from real use cases
  • Have fun! 🎉

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions