This repository contains the implementation of the paper Mitigating over-exploration in latent space optimization using LES, by Omer Ronen, Ahmed Imtiaz Humayun, Richard Baraniuk, Randall Balestriero and Bin Yu.
Citation
If you use LES or any of the resources in this repo in your work, please use the following citation:
@misc{ronen2025mitigatingoverexplorationlatentspace,
title={Mitigating over-exploration in latent space optimization using LES},
author={Omer Ronen and Ahmed Imtiaz Humayun and Richard Baraniuk and Randall Balestriero and Bin Yu},
year={2025},
eprint={2406.09657},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.09657},
}
Table of contents
Using Anaconda, first clone the current repository:
git clone https://github.com/OmerRonen/les.git
Then install the dependencies using:
conda env create --file environment.yml
conda activate les
To use the log-expected improvement acquisition function, you would have to manually clone and install the BoTorch repository:
git clone https://github.com/pytorch/botorch.git
cd botorch
pip install -e .
This repository uses the expressions and SMILES datasets, both can be downloaded from the repository of the Grammar Variational Autoencoder paper. Specifically, the eq2_grammar_dataset.h5
and 250k_rndm_zinc_drugs_clean.smi
files should be downloaded into the data/grammar
and data/molecules
directories, respectively.
All the models used in our work can be found in the trained_models
directory. The following command loads a pre-trained VAE for the expressions dataset:
from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)
For replicating the results on the molecular datasets (SELFIES and SMILES), we recommend using a GPU to avoid long running times.
The results in Table 1 can be replicated using:
python -m les.analysis.ood <DATASET> <ARCHITECTURE> <BETA>
where <DATASET>
should be replaced with expressions
, smiles
, or selfies
, <ARCHITECTURE>
with gru
, lstm
, or transformer
and <BETA>
with 0.05
, 0.1
or 1
.
The Bayesian Optimization results in Section 4 can be replicated with (see les/configs/bayes_opt.yaml
for run configuration):
python -m les.analysis.bo
If you are interested in calculating ScaLES with a given pre-trained generative model, you can use the following code:
from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)
les = LES(vae)
z = torch.randn((5, vae.latent_dim))
les_score = les(z)
The code is released under the MIT license; see the LICENSE file for details.