Skip to content

Conversation

@dnlserrano
Copy link

not sure if I should be:

  • applying sort=sort_by_count to all these instances of pandas.DataFrame.value_counts
  • repeating the same thing over and over instead of, e.g., abstracting this to a function of sorts

but I trust you'll know what to do, if this is even to be accepted.

I was trying to not sort my bar chart by counts, e.g.

before

Screenshot 2024-12-23 at 22 46 52

after

Screenshot 2024-12-23 at 22 46 11

I was trying this in the context of applying the monk scale to a set of faces from the utkface dataset, so wanted the monk scale on the x axis to continue to go from 1 to 10. the exercise was to semi-infer representation of each skin tone in the dataset 🤓

maybe this doesn't even make sense or there is a much better way of achieving the exact same thing

anyway, go get them, @gietema! thanks again for such an awesome lib

@dnlserrano
Copy link
Author

dnlserrano commented Dec 23, 2024

so I just realised this only worked for me with this small fix because my dataset had previously been ordered by monk scale (x axis); records 0-447 were monk scale 1, 448-etc monk scale 2, etc.

what I'd want probably is not only a "sort by count or don't" but also/alternatively a "sort by x axis natural order or don't" if that makes sense 🤔

example of unordered data with this proposed fix:

Screenshot 2024-12-23 at 23 08 54

7 is first because that's how the unordered dataset starts:

❯ head monk_scale_classifications.csv
image_path,mst_level
UTKFace/9_1_2_20161219204347420.jpg.chip.jpg,7
UTKFace/36_0_1_20170117163203851.jpg.chip.jpg,8
UTKFace/86_1_0_20170120225751953.jpg.chip.jpg,4
UTKFace/26_1_0_20170116171048641.jpg.chip.jpg,6
UTKFace/1_1_2_20161219154612988.jpg.chip.jpg,4
UTKFace/52_0_1_20170117161018159.jpg.chip.jpg,6
UTKFace/25_1_0_20170117134403373.jpg.chip.jpg,4
UTKFace/16_0_0_20170104003740977.jpg.chip.jpg,4
UTKFace/27_0_3_20170119210058457.jpg.chip.jpg,8

@dnlserrano
Copy link
Author

could do it via

    sort_by: Optional[str] = None,
    sort_by_count: bool = True,
):  # pylint: disable=too-many-arguments,missing-function-docstring,too-many-locals
    if sort_by:
        df = df.sort_values(x)

but again, not sure how these 2 would play with each other; maybe a new mr later on?

fine to trash this one and brainstorm a better way forward as part of this thread though

sorry for bothering during xmas holidays 🎅 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant