Backend for running calculations using biochemistry tools in separate docker containers. It consists of the API, which serves as an entrypoint, and the worker, which is running the calculations inside the docker containers.
NOTE: Since the worker doesn't run inside a container, it needs to be run locally within a poetry virtual envirnoment. Therefore, you need to be inside a shell (
poetry shell
) or always use prefixpoetry run
when starting the worker. This applies also for thealembic
migration utility if you run it from the outide of the chemtools_api container.
- Copy
template.env
to.env
and adjust environment variables as needed - Run
poetry install
to install project locally for the worker (Poetry >2.0 is required!) - Run
docker compose up --build
to build the api along with needed services and start them - Run
poetry run alembic upgrade head
to migrate database - Setup MinIO locally by visiting
http://127.0.0.1:9001/access-keys
, creating access key and updating.env
file accordingly. You can skip this step if you store your data in filesystem (usingFilesystemStorageService
)
- Run
docker compose up
to start the API, postgres, rabbitmq, and minio - Run
poetry run python src/worker.py
to start the worker
In order to add a new tool, you need to:
- Build the tool image
- Write request schema
- Create new tool class which inherits from
BaseDockerizedTool
- Extend
DockerizedToolEnum
, which lists supported tools, and prepare - Create new endpoint/s for the tool.
- When you make changes in database models, you need to create new alembic migration. Run
poetry run alembic revision --autogenerate -m <rev_name>
- To update changes use
poetry run alembic upgrade head
-
automatically pull all required tool docker images (which need to be pre-build) on worker startup
-
write how to create new tool
-
write about command injection (_get_cmd_args)
-
gesamt - podpora -s/-d [kinda done]
-
gesamt - later finish output parsing for multiple files (more than 3)
-
write docstrings [in progress]
-
test uploading large files. Maybe it will be necessary to upload/download them by chunks
-
write about different architecture possibilities regarding the containerized apps. Each worker runs only one container vs one worker runs every container
-
caching -> fetched files -> think of sending head requests to check whether the file on the server has changed. This could be supported only by certain sites. Write about this possible improvement in thesis.