Skip to content

A novel solution to produce clean samples of datasets with duplicates following a target group distribution

Notifications You must be signed in to change notification settings

dbmodena/radler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🍋 RadlER: Deduplicated Sampling On-Demand

[04.08.2025] The refined and documented version of the code will be made available in a few weeks.

In this repository, we provide the code and the datasets for RadlER, a novel solution to produce clean samples from data containing duplicates according to a target distribution.

You can find all details about deduplicated sampling on-demand with RadlER in our research paper, which will be presented jointly with the related demonstration at VLDB in London (September 1-5, 2025):

@article{radler,
  author    = {Luca {Zecchini} and Vasilis {Efthymiou} and Felix {Naumann} and Giovanni {Simonini}},
  title     = {{Deduplicated Sampling On-Demand}},
  journal   = {{Proceedings of the VLDB Endowment (PVLDB)}},
  volume    = {18},
  number    = {8},
  pages     = {2482--2495},
  year      = {2025},
  doi       = {10.14778/3742728.3742742}
}

You can also take a look at our demonstration:

Watch the video

@article{radler_demo,
  author    = {Luca {Zecchini} and Ziawasch {Abedjan} and Vasilis {Efthymiou} and Giovanni {Simonini}},
  title     = {{RadlER: Deduplicated Sampling On-Demand}},
  journal   = {{Proceedings of the VLDB Endowment (PVLDB)}},
  volume    = {18},
  number    = {12},
  pages     = {5319--5322},
  year      = {2025},
  doi       = {10.14778/3750601.3750661}
}

About

A novel solution to produce clean samples of datasets with duplicates following a target group distribution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.2%
  • Jupyter Notebook 49.8%