Skip to content

OmerRonen/les

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mitigating over-exploration in latent space optimization using LES

Alt Text

This repository contains the implementation of the paper Mitigating over-exploration in latent space optimization using LES, by Omer Ronen, Ahmed Imtiaz Humayun, Richard Baraniuk, Randall Balestriero and Bin Yu.

Citation

If you use LES or any of the resources in this repo in your work, please use the following citation:

@misc{ronen2025mitigatingoverexplorationlatentspace,
      title={Mitigating over-exploration in latent space optimization using LES}, 
      author={Omer Ronen and Ahmed Imtiaz Humayun and Richard Baraniuk and Randall Balestriero and Bin Yu},
      year={2025},
      eprint={2406.09657},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.09657}, 
}
Table of contents

Environment setup

Using Anaconda, first clone the current repository:

git clone https://github.com/OmerRonen/les.git

Then install the dependencies using:

conda env create --file environment.yml
conda activate les

To use the log-expected improvement acquisition function, you would have to manually clone and install the BoTorch repository:

git clone https://github.com/pytorch/botorch.git
cd botorch
pip install -e .

Datasets and models

Datasets

This repository uses the expressions and SMILES datasets, both can be downloaded from the repository of the Grammar Variational Autoencoder paper. Specifically, the eq2_grammar_dataset.h5 and 250k_rndm_zinc_drugs_clean.smi files should be downloaded into the data/grammar and data/molecules directories, respectively.

Models

All the models used in our work can be found in the trained_models directory. The following command loads a pre-trained VAE for the expressions dataset:

from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)

Replication of results

For replicating the results on the molecular datasets (SELFIES and SMILES), we recommend using a GPU to avoid long running times.

Valid generation

The results in Table 1 can be replicated using:

python -m les.analysis.ood <DATASET> <ARCHITECTURE> <BETA>

where <DATASET> should be replaced with expressions, smiles, or selfies, <ARCHITECTURE> with gru, lstm, or transformer and <BETA> with 0.05, 0.1 or 1.

Bayesian Optimization

The Bayesian Optimization results in Section 4 can be replicated with (see les/configs/bayes_opt.yaml for run configuration):

python -m les.analysis.bo

Calculating LES

If you are interested in calculating ScaLES with a given pre-trained generative model, you can use the following code:

from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)
les = LES(vae)
z = torch.randn((5, vae.latent_dim))
les_score = les(z)

License

The code is released under the MIT license; see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages