Your Project Name

Description

This repo is just my fork from ashleve/lightning-hydra-template where I modified some setup parameters to be ready to go directly after cloning. Reference the Modified from template section to see the changes. otherwise the main modifications are the following:

I set up hydra-submitit-launcher for an easier usage of SLURM, and add example config setups for clusters (JUWELS, Terrabyte to come)
I commit to use Weight & Biases (W&B) logger. Other loggers are still possible to use, but everything is setup by default for W&B.
I use wandb_osh to support offline, real-time logging of my runs on W&B. In this template, setting up wandb_osh is as easy as that:
- Switch logger.wandb.offline to True
- Have the "Farm" running on the login node, i.e., with the command wandb-osh

Installation

Setting up Git

There are 2 different ways you can setup your new repository: by keeping track of the template, or by starting a fresh new git repo with all the files from the template.

If you plan to host your code on DLR GitLab, you should make sure that when you create the new repository you create it as a "blank project", select <your_user> and not <your_group> in the Project URL and uncheck "Initialize repository with a README". Also, you should use the HTTPS URLs.

In both cases you first need to clone the template and rename the folder with <your_project_name>:

# Clone the template
git clone https://github.com/CedricLeon/Setup_Lightning_Hydra_template.git
# Rename the folder with your project name
mv Setup_Lightning_Hydra_template/ <your_project_name>/
cd <your_project_name>/

Re-initializating Git history

Then you can either delete the remote and commit history of the template, this is the most straightforward way:

# Reset the git repository
rm -rf .git/
git init --initial-branch=main
# Add your remote
git remote add origin <your_remote_URL>
# Stage and commit all files + set origin main as upstream
git add .
git commit -m "Initial commit"
git push --set-upstream origin main

Keeping track of the template Git history (homemade)

Or you can keep the remote but rename it to template and add a new origin.

I describe how to do that below, but you should know that it is just a homemade version of template repository from GitHub. It is less clean, but allows to host the new repo on a server that isn't GitHub (I didn't find a way to do that using the GitHub template feature). If someone has a cleaner way of doing it, please enlight me.

# Rename the template remote
git remote rename origin template
# Add your new repository remote. So, yes, you need to create it before
git remote add origin <your_remote_URL>
git remote -v

# Synchronize your (empty new repo) with a rebase to avoid non-fast-forward errors
git pull --rebase origin main
# Push the commit history and all the template files on the new repo (also set the origin/main branch as upstream)
git push --set-upstream origin main

Set your conda environment

# Force python version 3.11 for compatibility reasons (pytorch)
conda create -n <your_env_name> python=3.11
conda activate <your_env_name>

# /!\ install pytorch with GPU support, see https://pytorch.org/get-started/
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# install requirements
pip install -r requirements.txt

Optional: Setup `pre-commit` hooks

If you don't know pre-commit hooks, they do exactly what the name suggests, avoiding you to commit stupid typos or performing code linting for you in the background. Check the docs for more details.

So, in case you deleted .git after cloning the template, you have to reinstall pre-commit. It's also a good idea to run it against all files (if you have any) for the first time.

pre-commit install
pre-commit run --all-files

You can test that pre-commit is nicely setup with a dummy commit, or just by committing the changes of the next sections.

Note: if you are using VSCode commit system, the output logs are redirected towards the OUTPUT/Git console. Nevertheless, you should still get an error message if you messed something. Spoiler: the error message is not helpful, but redirects you towards the git logs.

Personalize project template parameters

I have fixed some parameters that are project specific with generic names (e.g., logger.wandb.project: "lightning-hydra-template"). Here is a list you should check and replace:

Documentation: Change the title of this README.md (and most likely also delete the crap I wrote 😉)
W&B: In configs/logger/wandb.yaml, team: "my-wandb-team" and project: "lightning-hydra-template"
Submitit (if you plan to use multiruns):
- In configs/hydra/launcher/ change your account settings in the different cluster setups: account: "your_juwels_project" (if necessary, also update your favorite partition).
- You can specify the launcher through your command, with the option hydra.launcher.partition=juwels_single_gpu for example
- Otherwise, in the experiment file, add a configuration for hydra-submitit-launcher:

# Just after defaults:
  - override /hydra/launcher: juwels_single_gpu # for example

As a general comment, I advise to run a mock run (/!\ not with debug=fdr /!\, it hides most of the config) and have a careful look at your config. @TODO

Optional: Test the environment

You can try running a 10 epochs training of a SimpleDenseNet on MNIST classification problem to check if everything runs smooth. If you already logged on W&B on your system you should not need to do anything else for the setup to be complete.

# Run on cpu by default
python src/train.py experiment=example
# If on a cluster, you can open an interactive session and run on gpu
python src/train.py experiment=example trainer=gpu
# Otherwise, you can run in "multirun" mode from the logging node
# /!\ Remember to specify the submitit launcher, and if necessary to set the run `offline`, otherwise W&B will crash the run /!\
python src/train.py -m experiment=example trainer=gpu hydra.launcher.partition=develbooster # or logger.wandb.offline=True

Usage / Run

@TODO: refine the usage examples with how I use experiments, etc.

Classic usage

The method I describe below is my preferred way of using this template. Of course, that's only a theory and you are free to organize yourself differently, the repository is very flexible. However, after trying out different setups I often found myself lost, e.g., trying to find out why a parameter kept its old value when I was overriding it. In any case, Hydra documentation is your best friend. Now that you are warned, here is are my best practices.

In short, I recommend always creating runs from an experiment config. This enforces better hierarchy and organization, while having the advantage of grouping "all" modifications in a single file, making modifications easy.

See below an example to run with a chosen experiment configuration from configs/experiment/:

python src/train.py experiment=example

Overriding HYDRA config from CLI

From here, you can override minor parameters from the CLI for a quick check or a specific run:

python src/train.py experiment=example trainer.max_epochs=2

Whenever you find yourself, running several times similar commands with a high number of overrides, this is probably a good time to create a new experiment.yaml.

Overriding a full config group

Sometimes, you might want to change big parts of your experiment config without wanting to redefine a new experiment, then, you can override Config Group options. Examples non-exhaustively include estimating results on a different dataset, checking run time on a different hardware, or logging to csv because you're a boomer.

# Train on CPU
python src/train.py experiment=example trainer=cpu
# Quickly test another dataset
python src/train.py experiment=example data=kodak
# Change the logger
python src/train.py experiment=example logger=csv

Debugging

Debugging is a instance of the previous case, where you override the debug package from the CLI. However, it's so common and important it deserves its own section. Firstly, whenever you specify debug there won't be any logging or callbacks and the run will be executed without multithreading on CPU. The best example is the fast_dev_run option of the Lightning Trainer which will run 1 step of training, validation and test. This is what I use 99% of the time.

python src/train.py experiment=example debug=fdr

If you still want some logging, or want to debug on GPU, etc. you can always specify that after your debug setup.

python src/train.py experiment=example debug=default trainer=gpu

Modified from template

This section simply lists the major changes I brought to the original template ashleve/lightning-hydra-template. It's also here that I give a big shoutout to ashleve, in addition of the impressive work behind such repo, he is also on most of the Issues and PR I came across when I was setting this fork.

New features

Add deterministic training support (can be unset from config)
Add the W&B offline management using wandb_osh (automatically adds the Lightning Callback when the run is set offline)
Redirect logs to subdirectories specific for each experiment (see task_name)
Automate job submission on cluster using hydra-submitit-launcher through --multirun mode

More dependencies

Uncomment my favorite logger in environment.yaml (wandb) as well as in requirements.txt
Add additional requirements:
- hydra-submitit-launcher
- torchgeo
- wandb_osh (Wandb Offline Sync Hook)
Uncomment sh in requirements.txt to allow the tests in test_sweeps.py

CI/CD and Testing

To execute all tests (require GPUs): execute pytest on a compute node (e.g., with an interactive session) to validate @RunIf(min_gpus=1) in test_train.py (make sure Pytorch is installed with GPU support) => Get all tests to be executed and None skipped.
I removed MacOS and Windows deployment test, as well as most of the different versions of python tested (reason: save compute resources)
The tests to be executed in CI/CD are the "not slow" ones, for the same reason mentioned above

Features to come, @TODO

Add a submitit setup for Terrabyte
Increase test coverage, and provide classic examples to test Lightning Datamodule and Modules.
Upgrade and "automate" the task_name parameter generation:
- Either by using a specific name parameter in each Config Group option (config file) and **kwargs in the corresponding Modules.
- Or by making it general and global in the "root" config file using Hydra interpolation system. Not that easy because it's impossible to interpolate in the Default List, see this stackoverflow.
Add my Lightning Callback plotting reconstructions/predictions every $N$ epochs

Run tests

@TODO: Specify how to add tests and provide examples. But nobody likes testing.

# run all tests
pytest

# run all tests except the ones marked as slow
pytest -k "not slow"

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
configs		configs
data		data
logs		logs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Your Project Name

Description

Installation

Setting up Git

Re-initializating Git history

Keeping track of the template Git history (homemade)

Set your conda environment

Optional: Setup `pre-commit` hooks

Personalize project template parameters

Optional: Test the environment

Usage / Run

Classic usage

Overriding HYDRA config from CLI

Overriding a full config group

Debugging

Modified from template

New features

More dependencies

CI/CD and Testing

Features to come, @TODO

Run tests

About

Releases

Packages

Languages

CedricLeon/Setup_Lightning_Hydra_template

Folders and files

Latest commit

History

Repository files navigation

Your Project Name

Description

Installation

Setting up Git

Re-initializating Git history

Keeping track of the template Git history (homemade)

Set your conda environment

Optional: Setup pre-commit hooks

Personalize project template parameters

Optional: Test the environment

Usage / Run

Classic usage

Overriding HYDRA config from CLI

Overriding a full config group

Debugging

Modified from template

New features

More dependencies

CI/CD and Testing

Features to come, @TODO

Run tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Optional: Setup `pre-commit` hooks

Packages