You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Return the OOF instances for classifier-based drift detectors (ClassifierDrift and SpotTheDiffDrift) (#665)
* add out-of-fold instances to the return dict for classifier detectors
* update docs
* update score return type
* fix typo and mypy error
* extend to list inputs and update score return types
* add Union import
* fix type error
* add changelog
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@
7
7
-**New feature** MMD drift detector has been extended with a [KeOps](https://www.kernel-operations.io/keops/index.html) backend to scale and speed up the detector.
8
8
See the [documentation](https://docs.seldon.io/projects/alibi-detect/en/latest/cd/methods/mmddrift.html) and [example notebook](https://docs.seldon.io/projects/alibi-detect/en/latest/examples/cd_mmd_keops.html) for more info ([#548](https://github.com/SeldonIO/alibi-detect/pull/548)).
9
9
- If a `categories_per_feature` dictionary is not passed to `TabularDrift`, a warning is now raised to inform the user that all features are assumed to be numerical ([#606](https://github.com/SeldonIO/alibi-detect/pull/606)).
10
+
- For the `ClassifierDrift` and `SpotTheDiffDrift` detectors, we can also return the out-of-fold instances of the reference and test sets. When using `train_size` for training the detector, this allows to associate the returned prediction probabilities with the correct instances.
10
11
11
12
### Changed
12
13
- Minimum `prophet` version bumped to `1.1.0` (used by `OutlierProphet`). This upgrade removes the dependency on `pystan` as `cmdstanpy` is used instead. This version also comes with pre-built wheels for all major platforms and Python versions, making both installation and testing easier ([#627](https://github.com/SeldonIO/alibi-detect/pull/627)).
Copy file name to clipboardExpand all lines: doc/source/cd/methods/classifierdrift.ipynb
+4-1
Original file line number
Diff line number
Diff line change
@@ -139,7 +139,7 @@
139
139
"source": [
140
140
"### Detect Drift\n",
141
141
"\n",
142
-
"We detect data drift by simply calling `predict` on a batch of instances `x`. `return_p_val` equal to *True* will also return the p-value of the test, `return_distance` equal to *True* will return a notion of strength of the drift and `return_probs` equals *True* also returns the out-of-fold classifier model prediction probabilities on the reference and test data (0 = reference data, 1 = test data).\n",
142
+
"We detect data drift by simply calling `predict` on a batch of instances `x`. `return_p_val` equal to *True* will also return the p-value of the test, `return_distance` equal to *True* will return a notion of strength of the drift and `return_probs` equals *True* also returns the out-of-fold classifier model prediction probabilities on the reference and test data (0 = reference data, 1 = test data) as well as the associated out-of-fold reference and test instances.\n",
143
143
"\n",
144
144
"The prediction takes the form of a dictionary with `meta` and `data` keys. `meta` contains the detector's metadata while `data` is also a dictionary which contains the actual predictions stored in the following keys:\n",
145
145
"\n",
@@ -155,6 +155,9 @@
155
155
"\n",
156
156
"* `probs_test`: the instance level prediction probability for the test data `x` if `return_probs` is *true*.\n",
157
157
"\n",
158
+
"* `x_ref_oof`: the instances associated with `probs_ref` if `return_probs` equals *True*.\n",
159
+
"\n",
160
+
"* `x_test_oof`: the instances associated with `probs_test` if `return_probs` equals *True*.\n",
Copy file name to clipboardExpand all lines: doc/source/cd/methods/spotthediffdrift.ipynb
+21-17
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@
53
53
"* `initial_diffs`: Array used to initialise the diffs that will be learned. Defaults to Gaussian for each feature with equal variance to that of reference data.\n",
54
54
"\n",
55
55
"* `l1_reg`: Strength of l1 regularisation to apply to the differences.\n",
56
-
"\n",
56
+
"\n",
57
57
"* `binarize_preds`: Whether to test for discrepency on soft (e.g. probs/logits) model predictions directly with a K-S test or binarise to 0-1 prediction errors and apply a binomial test.\n",
58
58
"\n",
59
59
"* `train_size`: Optional fraction (float between 0 and 1) of the dataset used to train the classifier. The drift is detected on *1 - train_size*. Cannot be used in combination with `n_folds`.\n",
@@ -109,12 +109,12 @@
109
109
"from alibi_detect.cd import SpotTheDiffDrift\n",
110
110
"\n",
111
111
"cd = SpotTheDiffDrift(\n",
112
-
" x_ref,\n",
113
-
" backend='pytorch',\n",
114
-
" p_val=.05,\n",
115
-
" n_diffs=1,\n",
116
-
" l1_reg=1e-3,\n",
117
-
" epochs=10,\n",
112
+
" x_ref,\n",
113
+
" backend='pytorch',\n",
114
+
" p_val=.05,\n",
115
+
" n_diffs=1,\n",
116
+
" l1_reg=1e-3,\n",
117
+
" epochs=10,\n",
118
118
" batch_size=32\n",
119
119
")\n",
120
120
"\n",
@@ -143,13 +143,13 @@
143
143
"\n",
144
144
"# instantiate the detector\n",
145
145
"cd = SpotTheDiffDrift(\n",
146
-
" x_ref,\n",
147
-
" backend='tensorflow',\n",
148
-
" p_val=.05,\n",
149
-
" kernel=kernel,\n",
150
-
" n_diffs=1,\n",
151
-
" l1_reg=1e-3,\n",
152
-
" epochs=10,\n",
146
+
" x_ref,\n",
147
+
" backend='tensorflow',\n",
148
+
" p_val=.05,\n",
149
+
" kernel=kernel,\n",
150
+
" n_diffs=1,\n",
151
+
" l1_reg=1e-3,\n",
152
+
" epochs=10,\n",
153
153
" batch_size=32\n",
154
154
")\n",
155
155
"```"
@@ -161,7 +161,7 @@
161
161
"source": [
162
162
"### Detect Drift\n",
163
163
"\n",
164
-
"We detect data drift by simply calling `predict` on a batch of instances `x`. `return_p_val` equal to *True* will also return the p-value of the test, `return_distance` equal to *True* will return a notion of strength of the drift, `return_probs` equals *True* returns the out-of-fold classifier model prediction probabilities on the reference and test data (0 = reference data, 1 = test data) and `return_kernel` equals *True* will also return the trained kernel.\n",
164
+
"We detect data drift by simply calling `predict` on a batch of instances `x`. `return_p_val` equal to *True* will also return the p-value of the test, `return_distance` equal to *True* will return a notion of strength of the drift, `return_probs` equals *True* returns the out-of-fold classifier model prediction probabilities on the reference and test data (0 = reference data, 1 = test data) as well as the associated out-of-fold reference and test instances, and `return_kernel` equals *True* will also return the trained kernel.\n",
165
165
"\n",
166
166
"The prediction takes the form of a dictionary with `meta` and `data` keys. `meta` contains the detector's metadata while `data` is also a dictionary which contains the actual predictions stored in the following keys:\n",
167
167
"\n",
@@ -181,6 +181,10 @@
181
181
"\n",
182
182
"* `probs_test`: the instance level prediction probability for the test data `x` if `return_probs` is *true*.\n",
183
183
"\n",
184
+
"* `x_ref_oof`: the instances associated with `probs_ref` if `return_probs` equals *True*.\n",
185
+
"\n",
186
+
"* `x_test_oof`: the instances associated with `probs_test` if `return_probs` equals *True*.\n",
187
+
"\n",
184
188
"* `kernel`: The trained kernel if `return_kernel` equals *True*.\n",
0 commit comments