This repository contains the code and data for the paper
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Vaidehi Patil, Elias Stengel-Eskin and Mohit Bansal
Unlearning is an essential technique for ensuring that machine learning models adhere to data privacy regulations (e.g., GDPR, CCPA) and ethical AI practices. However, existing unlearning methods often result in unintended forgetting and degradation of model performance. UPCORE (Utility-Preserving Coreset Selection) addresses this issue by selectively pruning high-variance outliers from the forget set, ensuring a better balance between deletion effectiveness and model utility retention.
- Less Collateral Damage: Reduces unintended forgetting by removing outlier points from the forget set.
- Positive Transfer: Pruned points still get forgotten, minimizing damage while maintaining forget performance.
- Method-Agnostic: UPCORE can be applied to any unlearning framework.
- Evaluating Unlearning: Introduces AUC-based metrics to evaluate the trade-off between deletion and utility over time.
UPCORE is built on top of tofu and NPO. It provides methods for effective unlearning in large language models while balancing accuracy and utility.
To set up UPCORE, follow these steps:
git clone https://github.com/Vaidehi99/UPCORE.git
cd UPCORE
conda create -n upcore python=3.10
conda activate upcore
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
Ensure that all required dependencies are installed using:
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
UPCORE supports unlearning on the following datasets:
- Counterfact Topics:
data/Counterfact_topics/
- TriviaQA Topics:
data/TriviaQA_topics/
To extract the core forget set from a given complete forget set, run:
python3 scripts/upcore.py
To unlearn using Gradient Ascent, set forget_loss = switch_unlearn
in tofu/bash_config.conf
.
To unlearn using Refusal, set forget_loss = switch_idk
in tofu/bash_config.conf
.
Run training:
cd tofu
./loop_run_split_train.sh
cd tofu
./loop_run_split_eval.sh
python3 scripts/gather_results.py
python3 scripts/auc.py
To train using NPO, run:
cd negative-preference-optimization/TOFU
./loop_run_split_train_npo.sh
cd negative-preference-optimization/TOFU
./loop_run_split_eval_npo.sh
python3 scripts/gather_results.py
python3 scripts/auc.py
cd tofu
./loop_run_split_train_longform.sh
cd tofu
./loop_run_split_eval_longform.sh
python3 scripts/gather_results.py
python3 scripts/auc.py
Results can be obtained using:
python3 scripts/gather_results.py
python3 scripts/auc.py
This project is licensed under MIT License.
If you find this work useful, please cite:
@misc{patil2025upcoreutilitypreservingcoresetselection,
title={UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning},
author={Vaidehi Patil and Elias Stengel-Eskin and Mohit Bansal},
year={2025},
eprint={2502.15082},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.15082},
}