Mitigating over-exploration in latent space optimization using LES

Arxiv link

This repository contains the implementation of the paper Mitigating over-exploration in latent space optimization using LES, by Omer Ronen, Ahmed Imtiaz Humayun, Richard Baraniuk, Randall Balestriero and Bin Yu.

Citation

If you use LES or any of the resources in this repo in your work, please use the following citation:

@misc{ronen2025mitigatingoverexplorationlatentspace,
      title={Mitigating over-exploration in latent space optimization using LES}, 
      author={Omer Ronen and Ahmed Imtiaz Humayun and Richard Baraniuk and Randall Balestriero and Bin Yu},
      year={2025},
      eprint={2406.09657},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.09657}, 
}

Table of contents

Environment setup
Datasets and models
Replication of results
- Valid generation
- Bayesian Optimization
Calculating LES
License

Environment setup

Using Anaconda, first clone the current repository:

git clone https://github.com/OmerRonen/les.git

Then install the dependencies using:

conda env create --file environment.yml
conda activate les

To use the log-expected improvement acquisition function, you would have to manually clone and install the BoTorch repository:

git clone https://github.com/pytorch/botorch.git
cd botorch
pip install -e .

Datasets and models

Datasets

This repository uses the expressions and SMILES datasets, both can be downloaded from the repository of the Grammar Variational Autoencoder paper. Specifically, the eq2_grammar_dataset.h5 and 250k_rndm_zinc_drugs_clean.smi files should be downloaded into the data/grammar and data/molecules directories, respectively.

Models

All the models used in our work can be found in the trained_models directory. The following command loads a pre-trained VAE for the expressions dataset:

from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)

Replication of results

For replicating the results on the molecular datasets (SELFIES and SMILES), we recommend using a GPU to avoid long running times.

Valid generation

The results in Table 1 can be replicated using:

python -m les.analysis.ood <DATASET> <ARCHITECTURE> <BETA>

where <DATASET> should be replaced with expressions, smiles, or selfies, <ARCHITECTURE> with gru, lstm, or transformer and <BETA> with 0.05, 0.1 or 1.

Bayesian Optimization

The Bayesian Optimization results in Section 4 can be replicated with (see les/configs/bayes_opt.yaml for run configuration):

python -m les.analysis.bo

Calculating LES

If you are interested in calculating ScaLES with a given pre-trained generative model, you can use the following code:

from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)
les = LES(vae)
z = torch.randn((5, vae.latent_dim))
les_score = les(z)

License

The code is released under the MIT license; see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
figures		figures
les		les
lolbo		lolbo
trained_models		trained_models
ugo		ugo
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mitigating over-exploration in latent space optimization using LES

Arxiv link

Environment setup

Datasets and models

Datasets

Models

Replication of results

Valid generation

Bayesian Optimization

Calculating LES

License

About

Releases

Packages

Languages

License

OmerRonen/les

Folders and files

Latest commit

History

Repository files navigation

Mitigating over-exploration in latent space optimization using LES

Arxiv link

Environment setup

Datasets and models

Datasets

Models

Replication of results

Valid generation

Bayesian Optimization

Calculating LES

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages