Skip to content

Suggestion: tree-shakeable transformations #30

@luucvanderzee

Description

@luucvanderzee

While our transformations currently have a pretty decent API, there is one big drawback to the current class-based piping/chaining approach: the transformations are not tree-shakeable. This means that you always send the code of all transformations to the client.

I think I just came up with an approach to circumvent this problem. Say that we want to do this with the current API:

import DataContainer from '@snlab/florence-datacontainer'

const data = new Datacontainer({ 
  fruit: ['apple', 'apple', 'banana', 'banana'],
  price: [1, 2, 3, 4]
})

const meanPricePerFruit = data
  .groupBy('fruit')
  .summarise({ mean_price: { price: 'mean' } })
  .arrange({ mean_price: 'descending' })

In the new proposed API, this would become

import DataContainer, { groupBy, summarise, arrange } from '@snlab/florence-datacontainer'

const data = new Datacontainer({ 
  fruit: ['apple', 'apple', 'banana', 'banana'],
  price: [1, 2, 3, 4]
})

const meanPricePerFruit = data.pipe(
  groupBy('fruit'),
  summarise({ mean_price: { price: 'mean' } }),
  arrange({ mean_price: 'descending' })
)

The advantages of this method are

  1. Tree-shakeable transformations, like mentioned above
  2. Easy for users to write and use custom transformations (see below)
  3. Transformations can be used without a DataContainer (if you are using column-oriented data at least)
  4. Cleaner separation of code/tests can just focus purely on the transformations

An example of how a user could write a custom toQuantitative transformation to convert categorical data to quantitative data:

import DataContainer from '@snlab/florence-datacontainer'

const toQuantitatve = columnName => {
  return data => {
    const categoricalColumn = data[columnName]
    data[columnName] = categoricalColumn.map(value => parseFloat(value))

    return data
  }
}

const dataContainer = new DataContainer({ 
  amount: [1, 2, 3, 4], 
  price: ['1', '2', '3', '4']
}).pipe(toQuantitative('price'))

console.log(dataContainer.column('price')) // [1, 2, 3, 4]

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions