Biocatalysis Assistant is a Python-based chatbot framework designed to automate and streamline bioinformatics and biocatalysis tasks. By integrating advanced language models and a dynamic set of tools, it simplifies complex processes in computational biology and enzyme engineering, making research more efficient and accessible.
- π£οΈ Interactive Chatbot Interface: Engage with the assistant via a user-friendly command-line or web interface.
- π§ Dynamic Tool Integration: Seamlessly run bioinformatics and biocatalysis tools.
- π€ Language Model Integration: Leverage state-of-the-art language models for analysis and decision-making.
- π Extensible Architecture: Easily add new tools and functionalities.
- π Optimization Capabilities: Optimize enzyme sequences for enhanced catalytic activity.
- π Comprehensive Analysis: Perform detailed bioinformatics analyses, including binding site extraction and reaction element analysis.
The Biocatalysis Assistant integrates the following tools for bioinformatics and biocatalysis tasks:
- ExtractBindingSites: Extract binding sites from enzyme sequences.
- GetElementsOfReaction: Analyze and extract elements from biochemical reactions.
- Blastp: Perform BLASTP searches for protein sequence analysis.
- OptimizeEnzymeSequences: Optimize enzyme sequences for improved catalytic activity.
- FindPDBStructure: Search for related protein structures in the PDB database.
- DownloadPDBStructure: Download PDB structure files.
- Mutagenesis: Perform in silico mutagenesis on protein sequences.
- MDSimulation: Run molecular dynamics simulations.
Note:
For a complete description of the tool and its usage, refer to tools_description.md.
The Biocatalysis Assistant can be run using Docker, which simplifies the setup process by encapsulating all dependencies in a container. If you use Docker, you can skip the local installation steps below.
docker build -t lmabc .
docker run -it -p 8501:8501 --env-file .env lmabc lmabc-app
-it
: Runs the container interactively.-p 8501:8501
: Maps port8501
on your host to the container.--env-file .env
: Passes environment variables.
To persist data or use local files:
docker run -it -p 8501:8501 --env-file .env --volume ${HOME}/my-cache:/app/.lmabc lmabc lmabc-app
Open your browser and navigate to:
http://localhost:8501
docker run -it --env-file .env lmabc:latest lmabc --help
- Run the Streamlit Web App:
docker run -it -p 8501:8501 --env-file .env lmabc:latest lmabc-app
- Use the CLI:
docker run -it --env-file .env lmabc:latest lmabc
- Run with Local Cache:
docker run -it -p 8501:8501 --env-file .env --volume ${HOME}/my-cache:/app/.lmabc lmabc:latest lmabc-app
If you prefer to run the Biocatalysis Assistant locally, follow these steps.
- Python: Version 3.10 or higher. Download Python.
- GROMACS: A molecular dynamics simulation package. Install GROMACS.
Clone the repository to your local machine:
git clone [email protected]:GT4SD/lm-assistant-for-biocatalysis.git
cd lm-assistant-for-biocatalysis
- Create a
.env
file in the project root directory. - Add your credentials (e.g., for Hugging Face API):
HUGGINGFACEHUB_API_TOKEN=your_key_here
- Important: Ensure
.env
is added to.gitignore
to protect sensitive information.
Install dependencies using uv
:
pip install uv
uv pip install -e .
Run the setup script to configure molecular dynamics files, RXNAAMapper, enzyme optimization models, and BLAST databases. Use the --blastdb
option to specify which BLAST database(s) to download:
- Default: Downloads Swissprot database.
- Custom: Pass a specific database name (e.g.,
nr
). - All: Pass
all
to download a predefined set of databases.
bash tools_setup/setup.sh --blastdb <db_name|all>
Skip Minio Client installation:
bash tools_setup/setup.sh --skip-mc --blastdb <db_name|all>
- PyMOL: Install via Conda.
- GROMACS: Follow the official installation guide.
To start the Biocatalysis Assistant using the CLI:
- Ensure you have completed the installation process using Uv.
- Open your terminal.
- Run the following command:
lmabc --help
To run the Biocatalysis Assistant using the Streamlit web interface:
- Ensure you have completed the installation process using Uv.
- Open your terminal.
- Run the following command:
lmabc-app
If you use lmabc
in your projects, please cite:
@software{LMABC,
author = {Yves Gaetan Nana Teukam, Francesca Grisoni, Matteo Manica},
month = {10},
title = {The biocatalysis assistant: a language model agent for biocatalysis (lmabc)},
url = {https://github.com/GT4SD/lm-assistant-for-biocatalysis},
version = {main},
year = {2024}
}
The lmabc
codebase is under the MIT license. For individual model usage, refer to the licenses of the original packages.
For issues or questions, please open an issue in the GitHub repository.