DC-BENCH
A standardized benchmark for dataset condensation

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue.

This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced at DCBench github.

Comprehensive Datasets


Different datasets are provide ranging from medium to large to better evaluate the condensation methods

Automated Eval Library


We provide a fully auotmated condensation methods performance evaluation library

NAS


We follow the standard Neural Architecture Search proceduare by using condensed dataset for model design
Available Leaderboards

CIFAR10 IPC 1

CIFAR10 IPC 10

CIFAR10 IPC 50

CIFAR100 IPC 1

CIFAR100 IPC 10

CIFAR100 IPC 50

TinyImagenet IPC 1

TinyImagenet IPC 10

TinyImagenet IPC 50

Citation

Consider citing our whitepaper if you want to reference our leaderboard or benchmark.
@article{cui2022dc,
  title={DC-BENCH: Dataset Condensation Benchmark},
  author={Cui, Justin and Wang, Ruochen and Si, Si and Hsieh, Cho-Jui},
  journal={arXiv preprint arXiv:2207.09639},
  year={2022}
}