Skip to content

[GOOD FIRST ISSUE] Add Docstrings to Processing Utility Functions #131

@raphael-intugle

Description

@raphael-intugle

name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Add Docstrings to Processing Utility Functions'
labels: 'good first issue, documentation, enhancement'
assignees: ''

Welcome! 👋

This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.

Task Description

Add comprehensive docstrings to utility functions in src/intugle/core/utilities/processing.py. Several important functions like string_standardization, compute_stats, adjust_sample, and others need better documentation.

Why This Matters

These utility functions are used throughout the codebase for:

  • Data cleaning and standardization
  • Statistical computations
  • Sample data processing

Good documentation helps developers understand:

  • What each function does
  • What parameters it expects
  • What it returns
  • When to use each function

What You'll Learn

  • Writing clear documentation for utility functions
  • Explaining mathematical operations in plain language
  • Documenting data transformation functions
  • Understanding statistical concepts (mean, variance, skewness, kurtosis)

Step-by-Step Guide

Prerequisites

  • Python 3.10+ installed
  • Git basics (clone, commit, push, pull request)
  • Read our CONTRIBUTING.md guide

Setup Instructions

  1. Fork and clone the repository

    git clone https://github.com/YOUR_USERNAME/data-tools.git
    cd data-tools
  2. Create a virtual environment

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies

    pip install -e ".[dev]"
  4. Create a new branch

    git checkout -b docs/add-docstrings-processing-utils

Implementation Steps

  1. Open the file src/intugle/core/utilities/processing.py

  2. Add docstring to remove_ascii() function (line 18):

    • Explain what it does (removes non-ASCII characters)
    • Document parameters and return type
    • Explain use case (data cleaning)
  3. Add docstring to string_standardization() function (line 22):

    • Currently has no docstring
    • Explain the cleaning steps: remove ASCII, remove special chars, standardize whitespace, etc.
    • Add example showing before/after
  4. Add docstring to compute_stats() function (line 31):

    • Explain what statistics are computed
    • Document the return tuple order
    • Explain the special case when variance is 0
  5. Add docstring to adjust_sample() function (line 54):

    • Explain the sampling strategy
    • Document all parameters and their defaults
    • Explain when samples are augmented vs truncated
  6. Add docstring to character_length_based_stratified_sampling() function (line 175):

    • Explain stratified sampling approach
    • Explain why character length is used for stratification
    • Document parameters
  7. Add docstring to to_high_precision_array() function (line 246):

    • Already has a good docstring! This is a reference for your other docstrings

Files to Modify

  • File: src/intugle/core/utilities/processing.py
    • Change: Add comprehensive docstrings to utility functions
    • Line(s): 18, 22, 31, 54, 175 (and others as you see fit)

Testing Your Changes

  1. Verify docstrings render correctly:

    from intugle.core.utilities.processing import (
        string_standardization,
        compute_stats,
        adjust_sample
    )
    help(string_standardization)
    help(compute_stats)
    help(adjust_sample)
  2. Check linting:

    ruff check src/intugle/core/utilities/processing.py

Submitting Your Work

Please run the following command to automatically fix linting issues before committing: ruff check --fix .

  1. Commit your changes

    git add src/intugle/core/utilities/processing.py
    git commit -m "Add comprehensive docstrings to processing utilities"
  2. Push to your fork

    git push origin docs/add-docstrings-processing-utils
  3. Create a Pull Request

    • Go to the original repository
    • Click "Pull Requests" → "New Pull Request"
    • Select your branch
    • Fill out the PR template
    • Reference this issue with "Fixes #ISSUE_NUMBER"

Expected Outcome

All utility functions should have clear docstrings with:

  • Brief description of what the function does
  • Parameter descriptions
  • Return value documentation
  • Practical examples
  • Notes about edge cases or special behavior

Definition of Done

  • Docstring added to remove_ascii() function
  • Docstring added to string_standardization() function
  • Docstring added to compute_stats() function
  • Docstring added to adjust_sample() function
  • Docstring added to character_length_based_stratified_sampling() function
  • All docstrings include parameters, returns, and examples
  • Tests passing locally
  • Pull request submitted

Resources

Need Help?

Don't hesitate to ask questions! We're here to help you succeed.

  • Comment below with your questions
  • Join our Discord for real-time support
  • Tag maintainers: @raphael-intugle (if specific help needed)

Skills You'll Use

  • Python basics
  • Git and GitHub
  • Technical writing
  • Understanding data processing concepts
  • Explaining algorithms clearly

Thank you for contributing to Intugle!

Tips for Success:

  • Read each function carefully to understand what it does
  • Test the functions in a Python shell to see their behavior
  • Include concrete examples that show real use cases
  • Have fun! 🎉

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions