-
Notifications
You must be signed in to change notification settings - Fork 41
Description
name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Add Docstrings to Processing Utility Functions'
labels: 'good first issue, documentation, enhancement'
assignees: ''
Welcome! 👋
This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.
Task Description
Add comprehensive docstrings to utility functions in src/intugle/core/utilities/processing.py. Several important functions like string_standardization, compute_stats, adjust_sample, and others need better documentation.
Why This Matters
These utility functions are used throughout the codebase for:
- Data cleaning and standardization
- Statistical computations
- Sample data processing
Good documentation helps developers understand:
- What each function does
- What parameters it expects
- What it returns
- When to use each function
What You'll Learn
- Writing clear documentation for utility functions
- Explaining mathematical operations in plain language
- Documenting data transformation functions
- Understanding statistical concepts (mean, variance, skewness, kurtosis)
Step-by-Step Guide
Prerequisites
- Python 3.10+ installed
- Git basics (clone, commit, push, pull request)
- Read our CONTRIBUTING.md guide
Setup Instructions
-
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/data-tools.git cd data-tools -
Create a virtual environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
pip install -e ".[dev]" -
Create a new branch
git checkout -b docs/add-docstrings-processing-utils
Implementation Steps
-
Open the file
src/intugle/core/utilities/processing.py -
Add docstring to
remove_ascii()function (line 18):- Explain what it does (removes non-ASCII characters)
- Document parameters and return type
- Explain use case (data cleaning)
-
Add docstring to
string_standardization()function (line 22):- Currently has no docstring
- Explain the cleaning steps: remove ASCII, remove special chars, standardize whitespace, etc.
- Add example showing before/after
-
Add docstring to
compute_stats()function (line 31):- Explain what statistics are computed
- Document the return tuple order
- Explain the special case when variance is 0
-
Add docstring to
adjust_sample()function (line 54):- Explain the sampling strategy
- Document all parameters and their defaults
- Explain when samples are augmented vs truncated
-
Add docstring to
character_length_based_stratified_sampling()function (line 175):- Explain stratified sampling approach
- Explain why character length is used for stratification
- Document parameters
-
Add docstring to
to_high_precision_array()function (line 246):- Already has a good docstring! This is a reference for your other docstrings
Files to Modify
- File:
src/intugle/core/utilities/processing.py- Change: Add comprehensive docstrings to utility functions
- Line(s): 18, 22, 31, 54, 175 (and others as you see fit)
Testing Your Changes
-
Verify docstrings render correctly:
from intugle.core.utilities.processing import ( string_standardization, compute_stats, adjust_sample ) help(string_standardization) help(compute_stats) help(adjust_sample)
-
Check linting:
ruff check src/intugle/core/utilities/processing.py
Submitting Your Work
Please run the following command to automatically fix linting issues before committing:
ruff check --fix .
-
Commit your changes
git add src/intugle/core/utilities/processing.py git commit -m "Add comprehensive docstrings to processing utilities" -
Push to your fork
git push origin docs/add-docstrings-processing-utils
-
Create a Pull Request
- Go to the original repository
- Click "Pull Requests" → "New Pull Request"
- Select your branch
- Fill out the PR template
- Reference this issue with "Fixes #ISSUE_NUMBER"
Expected Outcome
All utility functions should have clear docstrings with:
- Brief description of what the function does
- Parameter descriptions
- Return value documentation
- Practical examples
- Notes about edge cases or special behavior
Definition of Done
- Docstring added to
remove_ascii()function - Docstring added to
string_standardization()function - Docstring added to
compute_stats()function - Docstring added to
adjust_sample()function - Docstring added to
character_length_based_stratified_sampling()function - All docstrings include parameters, returns, and examples
- Tests passing locally
- Pull request submitted
Resources
Need Help?
Don't hesitate to ask questions! We're here to help you succeed.
- Comment below with your questions
- Join our Discord for real-time support
- Tag maintainers: @raphael-intugle (if specific help needed)
Skills You'll Use
- Python basics
- Git and GitHub
- Technical writing
- Understanding data processing concepts
- Explaining algorithms clearly
Thank you for contributing to Intugle!
Tips for Success:
- Read each function carefully to understand what it does
- Test the functions in a Python shell to see their behavior
- Include concrete examples that show real use cases
- Have fun! 🎉