Skip to content

Feature Optional backends functionality #538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mauicv
Copy link
Contributor

@mauicv mauicv commented Jun 21, 2022

Todo:

  • Document BackendValidator
  • Tox env tests
  • Document BackendValidator tests
  • Update Readme
  • Update docs/notebooks
  • Comments for large file changes

What is this

This work branches off feature/dep-management and will make tensorflow and pytorch optional dependencies of alibi-detect.

Two main objectives are:

1. Make tensorflow and pytorch optional dependencies:

I've combined these into this single PR because they're all quite interdependent and so It was hard to do them separately without writing code to manage functionality implemented in one but not the others.

2. Add BackendValidation for all detectors:

This wasn't such an issue in alibi as only CFRL implemented multiple backends. In detect however there are multiple cases where we have a tensorflow, PyTorch or sklearn backend and on top of this sometimes the tensorflow backend is dependent on tensorflow-probability as well. Previously this was handled using duplicated code in each detector. I've added a BackenValidator that replaces this and issues more specific error messages for optional dependencies.


What this PR doesn't include:

  1. The install options in the docs. I think it makes sense to do this in a later PR on the feature branch for all the optional deps all at once, once this branch has been merged in.
  2. The install options in the README.md for same reason as 1.
  3. The config-driven work in the release branch. This will be done in a separate PR in order to manage Release/Master branch complexity issues.
  4. Make Licenses updates. This will be done in a PR on the feature branch.

Notes:

Circular imports:

I'm not sure why this wasn't an issue with alibi but in alibi-detect often we have the following scenario:
- Import model from models
- In importing the above model it needs a function from utils.distance
- Importing this function means the code in utils.__init__ is executed
- This code imports utils.fetching which in turn imports all the models from models
- We get a circular import error

This is an issue that's introduced by us trying to bundle all the functionality into the init file and the correct approach is further scoping and modularization. The solution to the above is a utils/fetching/__init__.py as separate from the utils/__init__.py

Changing import statements:

This PR results in lots of changes to the notebooks primarily in the form of changes to the notebooks to reflect the new intended public API. Note that these shouldn't be breaking changes as the objects in question should still be in the old locations the only difference is that where they need to be protected from ImportErrors we import them into the __init__.py file in a module instead of from the file directly. So old code will still run but the object imported won't be protected as per import_optional. This might mean code will break if users are running default installs as they won't have all the relevant dependencies. Any release should incorporate a note that a quick fix is updating their installs to pip install alibi[x,y,z] for relevant x, y, z. A longer fix is to update the import statements to the public API. Most of the time this won't even be necessary, just where imports have changed as:

from alibi_detect.utils.pytorch.kernels import DeepKernel

to

from alibi_detect.utils.pytorch import DeepKernel

The full list of changes is as follows:

  1. from alibi_detect.utils.tensorflow.kernels import DeepKernel -> from alibi_detect.utils.tensorflow import DeepKernel
  2. from alibi_detect.utils.tensorflow.prediction import predict_batch -> from alibi_detect.utils.tensorflow import predict_batch
  3. from alibi_detect.utils.pytorch.data import TorchDataset -> from alibi_detect.utils.pytorch import TorchDataset
  4. from alibi_detect.models.pytorch.trainer import trainer -> from alibi_detect.models.pytorch import trainer
  5. from alibi_detect.models.tensorflow.resnet import scale_by_instance -> from alibi_detect.models.tensorflow import scale_by_instance
  6. from alibi_detect.models.tensorflow.resnet import scale_by_instance -> from alibi_detect.models.tensorflow import scale_by_instance
  7. from alibi_detect.utils.pytorch.kernels import DeepKernel -> from alibi_detect.utils.pytorch import DeepKernel
  8. from alibi_detect.models.tensorflow.autoencoder import eucl_cosim_features -> from alibi_detect.models.tensorflow import eucl_cosim_features
  9. from alibi_detect.utils.tensorflow.prediction import predict_batch -> from alibi_detect.utils.tensorflow import predict_batch
  10. from alibi_detect.models.tensorflow.losses import elbo -> from alibi_detect.models.tensorflow import elbo
  11. from alibi_detect.models import PixelCNN -> from alibi_detect.models.tensorflow import PixelCNN
  12. from alibi_detect.utils.tensorflow.data import TFDataset -> from alibi_detect.utils.tensorflow import TFDataset
  13. from alibi_detect.utils.pytorch.data import TorchDataset -> from alibi_detect.utils.pytorch import TorchDataset

also see this related comment!

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ascillitoe
Copy link
Contributor

ascillitoe commented Jun 28, 2022

Noting down a few thoughts:

  1. Number of optional deps groups: The number is getting quite large, and might grow further in future. We might want to reassess the motivation behind each group. Specifically, is the numba group needed, and can we merge tensorflow and tensorflow-probability (apologies, separating the last two was largely down to my initial input, which I am now thinking was maybe misguided).
  2. conda optional deps groups: We need to consider what is included (in run) and what is optional (in run_constrained) in our conda recipe. This is made more complex by the need to consider what is included in recipes of our dependencies i.e. the transformers recipe includes pytorch, so we cannot have pytorch as optional in our conda recipe. Based on offline discussion with @mauicv, the latest thinking here is that we should keep our conda recipe "fat" for now (i.e. include tensorflow and pytorch), and only have fbprophet as an optional addition here. Looking at how to slim down the conda recipe and speed up the install should be kept as a separate task Revisit conda recipe #443.

@mauicv
Copy link
Contributor Author

mauicv commented Jun 28, 2022

w.r.t.

  1. Number of optional deps groups: The number is getting quite large, and might grow further in future. We might want to reassess the motivation behind each group. Specifically, is the numba group needed, and can we merge tensorflow and tensorflow-probability (apologies, separating the last two was largely down to my initial input, which I am now thinking was maybe misguided).

Do you mean having a tensorflow bucket and a TensorFlow-probability bucket that also has tensorflow or just having a single tensorflow bucket which includes tensorflow-probability? I think I'm generally in favour of the former.

  1. conda optional deps groups: We need to consider what is included (in run) and what is optional (in run_constrained) in our conda recipe. This is made more complex by the need to consider what is included in recipes of our dependencies i.e. the transformers recipe includes pytorch, so we cannot have pytorch as optional in our conda recipe. Based on offline discussion with @mauicv, the latest thinking here is that we should keep our conda recipe "fat" for now (i.e. include tensorflow and pytorch), and only have fbprophet as an optional addition here. Looking at how to slim down the conda recipe and speed up the install should be kept as a separate task Conda install stuck on "Solving environment" #443.

agreed!

@@ -11,7 +47,14 @@
"VAE",
"VAEGMM",
"resnet",
"scale_by_instance",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check; is scale_by_instance promoted from tensorflow.resnet to tensorflow so that we can do the import_optional check in __init__? This makes sense I think, although a minor point is that we would ordinarily want to raise a DeprecationWarning if the old import usage from alibi_detect.models.tensorflow.resnet import scale_by_instance is used. Perhaps not important in this case as scale_by_instance is not used a lot...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly correct. I'll add the relevent DepreciationWarnings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this related comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this one "unresolved" for visibility...

@@ -16,5 +40,6 @@
"predict_batch_transformer",
"get_device",
"quantile",
"zero_diag"
"zero_diag",
"TorchDataset"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General point here: Where we have promoted (public!) imports up to a higher submodule (or just moved things around entirely), IMO we at a minimum need to document the changes properly in the release notes. Better yet we should raise DeprecationWarning's if the old imports are used.

An example is the old from alibi_detect.utils.saving import save_detector has been moved to from alibi_detect.saving import save_detector, however calling the former will still import save_detector and raise a DeprecationWarning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and @ascillitoe discussed options:

The best way of providing depreciation warnings is to move the functionality in alibi_detect.utils.pytorch.misc to alibi_detect.utils.pytorch._misc. We then update the __init__.py file to import from _misc.py:

get_device, quantile, zero_diag = import_optional(
    'alibi_detect.utils.pytorch._misc',
    names=['get_device', 'quantile', 'zero_diag']
)

In the misc.py file we replace the relevant functions with a depreciation warning similar to in the config work. I think this makes sense but the result would be inconsistencies in what is private and what is public in a wider sense throughout detect. I could try and make this consistent but this would mean renaming lots of files with the underscore...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO moving backend-specific files to private (to avoid imports without optional deps checks) might be a worthwhile exercise in a subsequent PR (to include in 0.10.0 release). However, personally I would only vote for this change if it doesn't screw up the git history for these files too much i.e. if we can get the git history to just track the filename change instead of registering it as a complete deletion and rewrite of each file.

@@ -112,7 +112,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
Copy link
Contributor

@ascillitoe ascillitoe Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usual comment wrt to notebooks; ideally would reset these changes to reduce file diffs (same for all following notebooks).

Copy link
Contributor

@ascillitoe ascillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mauicv, done a first pass. Looking good I think, but will hold off from approving until a few queries are resolved.

@arnaudvl arnaudvl mentioned this pull request Jul 6, 2022
16 tasks
import tensorflow_probability as tfp
from tensorflow.keras.losses import kld, categorical_crossentropy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uber pendantic nitpick, could just remove these changes and changes to test_losses_tf.py...

Copy link
Contributor Author

@mauicv mauicv Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure I understand?

@@ -123,7 +123,7 @@
"```python\n",
"import tensorflow as tf\n",
"from tensorflow.keras.layers import Conv2D, Flatten, Input\n",
"from alibi_detect.utils.tensorflow.kernels import DeepKernel\n",
"from alibi_detect.utils.tensorflow import DeepKernel\n",
Copy link
Contributor

@ascillitoe ascillitoe Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change also needs to be made in main text: The DeepKernel class found in either alibi_detect.utils.tensorflow.kernels or alibi_detect.utils.pytorch.kernels

Might be worth grep-ing the docs for all the import statements that have been changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good idea. I found alibi_detect.utils.pytorch.data.TorchDataset and alibi_detect.utils.tensorflow.data.TFDataset as well, These are referenced in a parameter docstring. Are they intended to be public? Would a user import and use them? They're not used explicitly anywhere within the examples!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In offline discussion agreed the TorchDataset and TFDataset's should be public so I've protected them with import_optional and fixed private import paths in the docs.

@mauicv mauicv merged commit 5047c71 into SeldonIO:feature/dep-management Jul 7, 2022
mauicv added a commit to mauicv/alibi-detect that referenced this pull request Jul 8, 2022
* Add BackendValidator class

* Protect ad, od and cd and other API objects from tensorflow and torch optional dependency import errors

* Update import statements in notebooks
mauicv added a commit to mauicv/alibi-detect that referenced this pull request Jul 25, 2022
* Add BackendValidator class

* Protect ad, od and cd and other API objects from tensorflow and torch optional dependency import errors

* Update import statements in notebooks
mauicv added a commit that referenced this pull request Jul 25, 2022
* Implement base optional dependency functionality

* Feature Optional backends functionality (#538)

* Merge config into opt deps (#553)

* Feature Optional prophet (#541)

* Feature Optional dependencies documentation updates (#542)

* Update licenses (#562)

* Add updates to CHANGELOG for optional dependencies (#563)
@ascillitoe ascillitoe mentioned this pull request Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants