Skip to content

Add feedback#7

Open
sabinagio wants to merge 2 commits intosamia-boubaya:mainfrom
sabinagio-teaching:main
Open

Add feedback#7
sabinagio wants to merge 2 commits intosamia-boubaya:mainfrom
sabinagio-teaching:main

Conversation

@sabinagio
Copy link
Copy Markdown

@sabinagio sabinagio commented Nov 8, 2025

Summary by Sourcery

Include reviewer feedback on data cleaning project

Documentation:

  • Add Jupyter notebook with detailed feedback annotations on data cleaning steps
  • Add FEEDBACK.md summarizing positive points and improvement suggestions

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Nov 8, 2025

Reviewer's Guide

This PR introduces a comprehensive feedback notebook that walks through the shark‐attacks dataset cleaning pipeline—covering country normalization, sex and age standardization, injury classification, fatality derivation, and date parsing—alongside a FEEDBACK.md file summarizing high-level repo and style suggestions.

Entity relationship diagram for cleaned shark attack dataset

erDiagram
  SHARK_ATTACKS {
    string Country
    string State
    string Location
    string Activity
    string Name
    string Sex
    int Age
    string Injury
    string Type
    string Fatal_YN
    string Time
    string Species
    string Source
    int DD
    string MM
    int YYYY
    int MM_clean
    string Injury_Severity
  }

  SHARK_ATTACKS ||--o{ COUNTRY : has
  SHARK_ATTACKS ||--o{ SEX : has
  SHARK_ATTACKS ||--o{ AGE : has
  SHARK_ATTACKS ||--o{ INJURY : has
  SHARK_ATTACKS ||--o{ TYPE : has
  SHARK_ATTACKS ||--o{ FATAL_YN : has
  SHARK_ATTACKS ||--o{ DATE : has

  COUNTRY {
    string name
  }
  SEX {
    string value
  }
  AGE {
    int value
  }
  INJURY {
    string description
    string severity
  }
  TYPE {
    string classification
  }
  FATAL_YN {
    string value
  }
  DATE {
    int DD
    string MM
    int YYYY
    int MM_clean
  }
Loading

Class diagram for data cleaning pipeline steps

classDiagram
  class SharkAttackRaw {
    +string Date
    +float Year
    +string Type
    +string Country
    +string State
    +string Location
    +string Activity
    +string Name
    +string Sex
    +string Age
    +string Injury
    +string Fatal_YN
    +string Time
    +string Species
    +string Source
  }

  class SharkAttackCleaned {
    +string Country
    +string State
    +string Location
    +string Activity
    +string Name
    +string Sex
    +int Age
    +string Injury
    +string Type
    +string Fatal_YN
    +string Time
    +string Species
    +string Source
    +int DD
    +string MM
    +int YYYY
    +int MM_clean
    +string Injury_Severity
  }

  SharkAttackRaw <|-- SharkAttackCleaned : cleaning

  class CountryCleaner {
    +normalize_country_names()
    +drop_nulls()
    +replace_aliases()
  }
  class SexCleaner {
    +standardize_sex()
    +drop_nulls()
  }
  class AgeCleaner {
    +standardize_age()
    +categorize_age()
    +drop_nulls()
  }
  class InjuryClassifier {
    +classify_injury()
    +classify_severity()
  }
  class FatalityDeriver {
    +derive_fatality()
  }
  class DateParser {
    +parse_date()
    +split_date()
    +drop_invalid()
  }

  SharkAttackRaw --> CountryCleaner : uses
  SharkAttackRaw --> SexCleaner : uses
  SharkAttackRaw --> AgeCleaner : uses
  SharkAttackRaw --> InjuryClassifier : uses
  SharkAttackRaw --> FatalityDeriver : uses
  SharkAttackRaw --> DateParser : uses

  CountryCleaner --> SharkAttackCleaned : outputs
  SexCleaner --> SharkAttackCleaned : outputs
  AgeCleaner --> SharkAttackCleaned : outputs
  InjuryClassifier --> SharkAttackCleaned : outputs
  FatalityDeriver --> SharkAttackCleaned : outputs
  DateParser --> SharkAttackCleaned : outputs
Loading

File-Level Changes

Change Details Files
Add detailed data-cleaning notebook with inline review comments
  • Normalize and standardize Country names, drop nulls, and apply title-case replacements
  • Clean Sex column by uppercasing, replacing variants, and dropping missing values
  • Convert Age values to numeric, compute percentiles, and prepare age ranges
  • Implement classify_injury function to label injury types and save interim CSV
  • Create Injury Severity and derive Fatal Y/N flags, then save cleaned data
  • Parse Date strings into DD, MM, YYYY columns, map month names to integers, and save results
notebooks/data-cleaning-pandas-feedback.ipynb
Add repository-level feedback document
  • Create FEEDBACK.md with positive notes on structure and organization
  • Include actionable improvement suggestions on naming, consistency, and commentary
FEEDBACK.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Extract recurring cleaning logic (e.g. country normalization, month mapping) into reusable utility functions or modules to DRY up the notebook and improve maintainability.
  • Consider breaking the monolithic data‐cleaning notebook into smaller, focused scripts or notebooks so each cleaning step can be reviewed and tested in isolation.
  • Use pandas built-in parsing (like pd.to_datetime) and categorical dtypes instead of custom regex and manual mapping for more robust and concise date and category transformations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Extract recurring cleaning logic (e.g. country normalization, month mapping) into reusable utility functions or modules to DRY up the notebook and improve maintainability.
- Consider breaking the monolithic data‐cleaning notebook into smaller, focused scripts or notebooks so each cleaning step can be reviewed and tested in isolation.
- Use pandas built-in parsing (like pd.to_datetime) and categorical dtypes instead of custom regex and manual mapping for more robust and concise date and category transformations.

## Individual Comments

### Comment 1
<location> `FEEDBACK.md:10` </location>
<code_context>
+- Also love how organized the Jupyter notebook is :heart-eyes:. That's a work of art!  
+- Like that you used a GDrive link to make sure you all worked with the same raw data but using the same relative path, i.e. `pd.read_csv("../data/raw.csv")` would have achieved the same result!  
+- Really thorough, clean, and well-organized data cleaning, colour me impressed with your performance!! 
+(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
+
+### To fix / improve on:
</code_context>

<issue_to_address>
**issue (typo):** Typo: 'offerring' should be 'offering'.

```suggestion
(there is a meme reference which I'm offering as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
```
</issue_to_address>

### Comment 2
<location> `FEEDBACK.md:18` </location>
<code_context>
+- I noticed you did describe the data quite often during the cleaning process, but I didn't see you making any comments on what you were seeing. This is rule #1: Whenever you check something, you need to extract an insight from it, whether it's boring/obvious or not! That's your main job as analysts. Of course, now I ask you to do this everytime in writing so you can build the habit and you might stop doing it for simple things at some point, but for now... it's mandatory!
+
+Tips:
+- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph
+- You can also write `pd.options.display.max_columns = None`         
+
</code_context>

<issue_to_address>
**issue (typo):** Typo: 'bulletpoints' should be 'bullet points'.

Please update 'bulletpoints' to 'bullet points' for correct spelling.

```suggestion
- If you want your bullet points to show up in new lines, make sure to add at least 2 spaces after each paragraph
```
</issue_to_address>

### Comment 3
<location> `FEEDBACK.md:14` </location>
<code_context>
+
+### To fix / improve on:
+- Stay consistent with the separators you use in titles, choose either "_" or "-"      
+- The file should be `__init___.py` not `init.py`    
+- I noticed you did describe the data quite often during the cleaning process, but I didn't see you making any comments on what you were seeing. This is rule #1: Whenever you check something, you need to extract an insight from it, whether it's boring/obvious or not! That's your main job as analysts. Of course, now I ask you to do this everytime in writing so you can build the habit and you might stop doing it for simple things at some point, but for now... it's mandatory!
+
</code_context>

<issue_to_address>
**issue (typo):** Typo: '__init___.py' should be '__init__.py' (two underscores).

Use '__init__.py' with two underscores before and after 'init' for Python package initialization files.

```suggestion
- The file should be `__init__.py` not `init.py`    
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread FEEDBACK.md
- Also love how organized the Jupyter notebook is :heart-eyes:. That's a work of art!
- Like that you used a GDrive link to make sure you all worked with the same raw data but using the same relative path, i.e. `pd.read_csv("../data/raw.csv")` would have achieved the same result!
- Really thorough, clean, and well-organized data cleaning, colour me impressed with your performance!!
(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Typo: 'offerring' should be 'offering'.

Suggested change
(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
(there is a meme reference which I'm offering as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))

Comment thread FEEDBACK.md
- I noticed you did describe the data quite often during the cleaning process, but I didn't see you making any comments on what you were seeing. This is rule #1: Whenever you check something, you need to extract an insight from it, whether it's boring/obvious or not! That's your main job as analysts. Of course, now I ask you to do this everytime in writing so you can build the habit and you might stop doing it for simple things at some point, but for now... it's mandatory!

Tips:
- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Typo: 'bulletpoints' should be 'bullet points'.

Please update 'bulletpoints' to 'bullet points' for correct spelling.

Suggested change
- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph
- If you want your bullet points to show up in new lines, make sure to add at least 2 spaces after each paragraph

Comment thread FEEDBACK.md

### To fix / improve on:
- Stay consistent with the separators you use in titles, choose either "_" or "-"
- The file should be `__init___.py` not `init.py`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Typo: 'init_.py' should be 'init.py' (two underscores).

Use 'init.py' with two underscores before and after 'init' for Python package initialization files.

Suggested change
- The file should be `__init___.py` not `init.py`
- The file should be `__init__.py` not `init.py`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant