Skip to content

πŸ”’πŸŒ€πŸ“Š ckanext-dimred adds UMAP, t-SNE and PCA previews to CKAN, projecting high-dimensional CSV/TSV/XLS/XLSX resources into 2D/3D plots so you can inspect structure, clusters, and labels directly from the resource page.

License

Notifications You must be signed in to change notification settings

DataShades/ckanext-dimred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Tests

ckanext-dimred

Dimensionality-reduction preview for tabular resources. The extension adds a resource view that:

Create the view, select a method, and generate a 2D or 3D projection of your data. You can color points by a chosen column and control which columns are used as features.

UMAP embedding PNG

How it works

  • Data loading: adapters handle CSV/TSV/XLS/XLSX; row sampling via ckanext.dimred.max_rows.
  • Feature prep: numeric columns included; low-cardinality categoricals one-hot encoded if enabled; user can pick feature columns.
  • Dimensionality reduction: choose UMAP or t-SNE or PCA, with configurable defaults and per-view JSON overrides.
  • Rendering: configurable backend β€” interactive Apache ECharts with 3D scatter support (default) or static Matplotlib PNG (2D/3D); choose per view in the form, with the config value as the default; pluggable to custom renderer if you override bundle/module.
  • API: dimred_get_dimred_preview returns the embedding and metadata (prep info, method params) for programmatic use.
  • Caching: results are cached in Redis by default so repeat calls with the same settings avoid recomputing the projection (configurable TTL and on/off toggle).

Usage

  1. Add a tabular resource (csv/tsv/xls/xlsx).
  2. Create a new resource view of type dimred_view.
  3. (Optional) Choose method (UMAP/t-SNE/PCA), pick Color by column, and select feature columns.
  4. (Optional) Choose output components (2 or 3); defaults come from the method config (e.g., ckanext.dimred.umap.n_components).
  5. (Optional) Pick render backend (ECharts interactive or Matplotlib PNG) β€” defaults to the config value.
  6. Save or Preview to see the rendered embedding (interactive or PNG, depending on ckanext.dimred.render_backend), and use β€œDownload embedding (CSV)” to get the coordinates.

API: use dimred_get_dimred_preview with id (resource id) and view_id to retrieve embedding/meta.

3D rendering

  • Set n_components to 3 in the form (or method parameters) to get a 3D embedding.
  • Interactive backend (ckanext.dimred.render_backend = echarts) uses ECharts with echarts-gl so you can rotate/zoom/pan the point cloud.
  • Static backend (render_backend = matplotlib) renders a 3D scatter PNG (fixed view angle, rotatable only when using the interactive backend).

Example

Iris dataset:

rownames Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
... ... ... ... ... ...
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
... ... ... ... ... ...
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
... ... ... ... ... ...

Creating the dimred view: Method, Feature selection, Color by:

Creating the dimred view

Rendered 2D embedding PNG:

Rendered 2D embedding PNG

Requirements

Compatibility with core CKAN versions:

CKAN version Compatible?
2.9 and earlier no
2.10+ yes

Installation

To install ckanext-dimred:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate

  2. Clone the source and install it on the virtualenv

    git clone https://github.com/DataShades/ckanext-dimred.git cd ckanext-dimred pip install -e .

  3. Add dimred to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/ckan.ini).

  4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:

    sudo service apache2 reload

Config settings

General defaults:

  • ckanext.dimred.default_method (default: umap)
  • ckanext.dimred.allowed_methods (default: umap tsne pca)
  • ckanext.dimred.max_file_size_mb (default: 50)
  • ckanext.dimred.max_rows (default: 50000)
  • ckanext.dimred.enable_categorical (default: true)
  • ckanext.dimred.max_categories_for_ohe (default: 30)
  • ckanext.dimred.export_enabled (default: true)
  • ckanext.dimred.cache_enabled (default: true)
  • ckanext.dimred.cache_ttl (default: 3600)
  • ckanext.dimred.render_backend (default: echarts; echarts for interactive chart, matplotlib for static PNG)
  • ckanext.dimred.render_asset (optional; override the webassets bundle for the configured render backend)
  • ckanext.dimred.render_module (optional; override the CKAN JS module for the configured render backend)
  • ckanext.dimred.embedding_decimals (default: 3; decimal places to round embedding coordinates before returning/exporting)

UMAP defaults:

  • ckanext.dimred.umap.n_neighbors (default: 15)
  • ckanext.dimred.umap.min_dist (default: 0.1)
  • ckanext.dimred.umap.n_components (default: 2)

t-SNE defaults:

  • ckanext.dimred.tsne.perplexity (default: 30)
  • ckanext.dimred.tsne.n_components (default: 2)

PCA defaults:

  • ckanext.dimred.pca.n_components (default: 2)
  • ckanext.dimred.pca.whiten (default: false)

Example:

ckan.plugins = ... dimred

ckanext.dimred.allowed_methods = umap
ckanext.dimred.max_rows = 10000
ckanext.dimred.enable_categorical = true

Developer installation

To install ckanext-dimred for development, activate your CKAN virtualenv and do:

git clone https://github.com/DataShades/ckanext-dimred.git
cd ckanext-dimred
pip install -e .
pip install -r dev-requirements.txt

Tests

To run the tests, do:

pytest --ckan-ini=test.ini

License

AGPL

About

πŸ”’πŸŒ€πŸ“Š ckanext-dimred adds UMAP, t-SNE and PCA previews to CKAN, projecting high-dimensional CSV/TSV/XLS/XLSX resources into 2D/3D plots so you can inspect structure, clusters, and labels directly from the resource page.

Resources

License

Stars

Watchers

Forks