Skip to content

Releases: gietema/clusterfun

Labelling support!

22 May 23:07

Choose a tag to compare

Finally in the clusterfun library: labelling!

This release enables multi-class labelling.

The label tab

  • Keeps track of what you have labelled, easy for e.g. a residual error analysis
  • Allows you to download specific labels of your selection or all labels of your plot
  • Allows you to create new grid views with labelled media
  • Allows you to (de)select a label for all media currently visible

Labels are automatically saved, so when you load the same plot again, it'll show the labels again.

image

Labelling can be done with keyboard shortcuts - just hover over a data point and press the number of the label you're interested in:
image

v0.4.2

19 May 14:44

Choose a tag to compare

Add audio support and display parameter.

Currently, we determine the media type naively by checking if the file extension is included in:

[
    "mp3",
    "wav",
    "aac",
    "ogg",
    "flac",
    "wma",
    "m4a",
    "aiff",
    "midi",
    "ape",
    "wavpack",
    "alac",
    "ac3",
    "opus",
  ]

Audio is autoplayed on hover.
When displaying audio, the grid view doesn't have an image, but I can imagine it will be useful to display something there.
That's why there's a new parameter display added to all plot types that allows you to display the value of one or more columns in the grid view, as such:
image

This can be created like this:

import pandas as pd
import clusterfun as clt

df = pd.read_parquet("https://raw.githubusercontent.com/gietema/clusterfun-data/main/libri_speech_test_clean.parquet")
print(clt.grid(
    df,
    title="LibriSpeech test-clean",
    media="filepath",
    show=False,
    display="translation",
))

A less naive way of checking for audio support can be added later if this is not sufficient.

Grid selection download button

18 May 14:37

Choose a tag to compare

image The grid now has a download button that allows you to download the selected media as a csv.

Continuous colour scale

10 May 08:26

Choose a tag to compare

image Add `is_categorical=False` to a bar chart, histogram or scatter along with a `color=...` to get a continuous colour scale.

Storage client

07 May 09:57

Choose a tag to compare

This release allows you to create different storage clients, so you can more easily create a version of clusterfun that loads data from different sources.
This should open up the way to add GCP support soon as well.

Thanks to @hyenal for adding this!

v0.2.3

02 May 19:46

Choose a tag to compare

Mostly bug fixes in this release:

  • Python 3.12 support, dropped 3.8 support

  • Fix bug that made it impossible to create plots with two classes

  • Grid page does not reset to the first page when enlarging (clicking) an image on a second page

  • Slightly more robust filtering

Next up is labelling and saving/downloading selections.

Refactor and grid features release

09 Sep 21:59
0a09bf8

Choose a tag to compare

Pre-release
  • Stats feature for grid view, allowing you to quickly look at plots related to the selection of media.
  • The grid view is now filterable.
  • Refactor of frontend with tailwind and jotai.
  • Library now works from within Jupyter notebooks