Add feedback by sabinagio · Pull Request #7 · samia-boubaya/project_02-shark-attacks

sabinagio · 2025-11-08T15:47:33Z

Summary by Sourcery

Include reviewer feedback on data cleaning project

Documentation:

Add Jupyter notebook with detailed feedback annotations on data cleaning steps
Add FEEDBACK.md summarizing positive points and improvement suggestions

sourcery-ai · 2025-11-08T15:47:40Z

Reviewer's Guide

This PR introduces a comprehensive feedback notebook that walks through the shark‐attacks dataset cleaning pipeline—covering country normalization, sex and age standardization, injury classification, fatality derivation, and date parsing—alongside a FEEDBACK.md file summarizing high-level repo and style suggestions.

Entity relationship diagram for cleaned shark attack dataset

erDiagram
  SHARK_ATTACKS {
    string Country
    string State
    string Location
    string Activity
    string Name
    string Sex
    int Age
    string Injury
    string Type
    string Fatal_YN
    string Time
    string Species
    string Source
    int DD
    string MM
    int YYYY
    int MM_clean
    string Injury_Severity
  }

  SHARK_ATTACKS ||--o{ COUNTRY : has
  SHARK_ATTACKS ||--o{ SEX : has
  SHARK_ATTACKS ||--o{ AGE : has
  SHARK_ATTACKS ||--o{ INJURY : has
  SHARK_ATTACKS ||--o{ TYPE : has
  SHARK_ATTACKS ||--o{ FATAL_YN : has
  SHARK_ATTACKS ||--o{ DATE : has

  COUNTRY {
    string name
  }
  SEX {
    string value
  }
  AGE {
    int value
  }
  INJURY {
    string description
    string severity
  }
  TYPE {
    string classification
  }
  FATAL_YN {
    string value
  }
  DATE {
    int DD
    string MM
    int YYYY
    int MM_clean
  }

Class diagram for data cleaning pipeline steps

classDiagram
  class SharkAttackRaw {
    +string Date
    +float Year
    +string Type
    +string Country
    +string State
    +string Location
    +string Activity
    +string Name
    +string Sex
    +string Age
    +string Injury
    +string Fatal_YN
    +string Time
    +string Species
    +string Source
  }

  class SharkAttackCleaned {
    +string Country
    +string State
    +string Location
    +string Activity
    +string Name
    +string Sex
    +int Age
    +string Injury
    +string Type
    +string Fatal_YN
    +string Time
    +string Species
    +string Source
    +int DD
    +string MM
    +int YYYY
    +int MM_clean
    +string Injury_Severity
  }

  SharkAttackRaw <|-- SharkAttackCleaned : cleaning

  class CountryCleaner {
    +normalize_country_names()
    +drop_nulls()
    +replace_aliases()
  }
  class SexCleaner {
    +standardize_sex()
    +drop_nulls()
  }
  class AgeCleaner {
    +standardize_age()
    +categorize_age()
    +drop_nulls()
  }
  class InjuryClassifier {
    +classify_injury()
    +classify_severity()
  }
  class FatalityDeriver {
    +derive_fatality()
  }
  class DateParser {
    +parse_date()
    +split_date()
    +drop_invalid()
  }

  SharkAttackRaw --> CountryCleaner : uses
  SharkAttackRaw --> SexCleaner : uses
  SharkAttackRaw --> AgeCleaner : uses
  SharkAttackRaw --> InjuryClassifier : uses
  SharkAttackRaw --> FatalityDeriver : uses
  SharkAttackRaw --> DateParser : uses

  CountryCleaner --> SharkAttackCleaned : outputs
  SexCleaner --> SharkAttackCleaned : outputs
  AgeCleaner --> SharkAttackCleaned : outputs
  InjuryClassifier --> SharkAttackCleaned : outputs
  FatalityDeriver --> SharkAttackCleaned : outputs
  DateParser --> SharkAttackCleaned : outputs

File-Level Changes

Change	Details	Files
Add detailed data-cleaning notebook with inline review comments	Normalize and standardize Country names, drop nulls, and apply title-case replacements Clean Sex column by uppercasing, replacing variants, and dropping missing values Convert Age values to numeric, compute percentiles, and prepare age ranges Implement classify_injury function to label injury types and save interim CSV Create Injury Severity and derive Fatal Y/N flags, then save cleaned data Parse Date strings into DD, MM, YYYY columns, map month names to integers, and save results	`notebooks/data-cleaning-pandas-feedback.ipynb`
Add repository-level feedback document	Create FEEDBACK.md with positive notes on structure and organization Include actionable improvement suggestions on naming, consistency, and commentary	`FEEDBACK.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Extract recurring cleaning logic (e.g. country normalization, month mapping) into reusable utility functions or modules to DRY up the notebook and improve maintainability.
Consider breaking the monolithic data‐cleaning notebook into smaller, focused scripts or notebooks so each cleaning step can be reviewed and tested in isolation.
Use pandas built-in parsing (like pd.to_datetime) and categorical dtypes instead of custom regex and manual mapping for more robust and concise date and category transformations.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Extract recurring cleaning logic (e.g. country normalization, month mapping) into reusable utility functions or modules to DRY up the notebook and improve maintainability.
- Consider breaking the monolithic data‐cleaning notebook into smaller, focused scripts or notebooks so each cleaning step can be reviewed and tested in isolation.
- Use pandas built-in parsing (like pd.to_datetime) and categorical dtypes instead of custom regex and manual mapping for more robust and concise date and category transformations.

## Individual Comments

### Comment 1
<location> `FEEDBACK.md:10` </location>
<code_context>
+- Also love how organized the Jupyter notebook is :heart-eyes:. That's a work of art!  
+- Like that you used a GDrive link to make sure you all worked with the same raw data but using the same relative path, i.e. `pd.read_csv("../data/raw.csv")` would have achieved the same result!  
+- Really thorough, clean, and well-organized data cleaning, colour me impressed with your performance!! 
+(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
+
+### To fix / improve on:
</code_context>

<issue_to_address>
**issue (typo):** Typo: 'offerring' should be 'offering'.

```suggestion
(there is a meme reference which I'm offering as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
```
</issue_to_address>

### Comment 2
<location> `FEEDBACK.md:18` </location>
<code_context>
+- I noticed you did describe the data quite often during the cleaning process, but I didn't see you making any comments on what you were seeing. This is rule #1: Whenever you check something, you need to extract an insight from it, whether it's boring/obvious or not! That's your main job as analysts. Of course, now I ask you to do this everytime in writing so you can build the habit and you might stop doing it for simple things at some point, but for now... it's mandatory!
+
+Tips:
+- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph
+- You can also write `pd.options.display.max_columns = None`         
+
</code_context>

<issue_to_address>
**issue (typo):** Typo: 'bulletpoints' should be 'bullet points'.

Please update 'bulletpoints' to 'bullet points' for correct spelling.

```suggestion
- If you want your bullet points to show up in new lines, make sure to add at least 2 spaces after each paragraph
```
</issue_to_address>

### Comment 3
<location> `FEEDBACK.md:14` </location>
<code_context>
+
+### To fix / improve on:
+- Stay consistent with the separators you use in titles, choose either "_" or "-"      
+- The file should be `__init___.py` not `init.py`    
+- I noticed you did describe the data quite often during the cleaning process, but I didn't see you making any comments on what you were seeing. This is rule #1: Whenever you check something, you need to extract an insight from it, whether it's boring/obvious or not! That's your main job as analysts. Of course, now I ask you to do this everytime in writing so you can build the habit and you might stop doing it for simple things at some point, but for now... it's mandatory!
+
</code_context>

<issue_to_address>
**issue (typo):** Typo: '__init___.py' should be '__init__.py' (two underscores).

Use '__init__.py' with two underscores before and after 'init' for Python package initialization files.

```suggestion
- The file should be `__init__.py` not `init.py`    
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-11-08T15:48:37Z

+- Also love how organized the Jupyter notebook is :heart-eyes:. That's a work of art!  
+- Like that you used a GDrive link to make sure you all worked with the same raw data but using the same relative path, i.e. `pd.read_csv("../data/raw.csv")` would have achieved the same result!  
+- Really thorough, clean, and well-organized data cleaning, colour me impressed with your performance!! 
+(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))


issue (typo): Typo: 'offerring' should be 'offering'.

Suggested change

(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))

(there is a meme reference which I'm offering as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))

sourcery-ai · 2025-11-08T15:48:37Z

+- I noticed you did describe the data quite often during the cleaning process, but I didn't see you making any comments on what you were seeing. This is rule #1: Whenever you check something, you need to extract an insight from it, whether it's boring/obvious or not! That's your main job as analysts. Of course, now I ask you to do this everytime in writing so you can build the habit and you might stop doing it for simple things at some point, but for now... it's mandatory!
+
+Tips:
+- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph


issue (typo): Typo: 'bulletpoints' should be 'bullet points'.

Please update 'bulletpoints' to 'bullet points' for correct spelling.

Suggested change

- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph

- If you want your bullet points to show up in new lines, make sure to add at least 2 spaces after each paragraph

sourcery-ai · 2025-11-08T15:48:37Z

+
+### To fix / improve on:
+- Stay consistent with the separators you use in titles, choose either "_" or "-"      
+- The file should be `__init___.py` not `init.py`    


issue (typo): Typo: 'init_.py' should be 'init.py' (two underscores).

Use 'init.py' with two underscores before and after 'init' for Python package initialization files.

Suggested change

- The file should be `__init___.py` not `init.py`

- The file should be `__init__.py` not `init.py`

sabinagio added 2 commits November 8, 2025 10:52

save current feedback

7017761

Finish feedback

58719ed

sourcery-ai bot reviewed Nov 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add feedback#7

Add feedback#7
sabinagio wants to merge 2 commits intosamia-boubaya:mainfrom
sabinagio-teaching:main

sabinagio commented Nov 8, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Nov 8, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Nov 8, 2025

Uh oh!

sourcery-ai bot Nov 8, 2025

Uh oh!

sourcery-ai bot Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	(there is a meme reference which I'm offerring as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))
	(there is a meme reference which I'm offering as a well-done, from the kickboxing world: [A Canadian boxer trying to talk smack](https://www.youtube.com/shorts/RdoFCWhiL9g))

	- If you want your bulletpoints to show up in new lines, make sure to add at least 2 spaces after each paragraph
	- If you want your bullet points to show up in new lines, make sure to add at least 2 spaces after each paragraph

	- The file should be `__init___.py` not `init.py`
	- The file should be `__init__.py` not `init.py`

Conversation

sabinagio commented Nov 8, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Entity relationship diagram for cleaned shark attack dataset

Class diagram for data cleaning pipeline steps

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sabinagio commented Nov 8, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 8, 2025 •

edited

Loading