[ENH] Implementation of Extended Isolation Forest (EIF) anomaly detector #2679

Akhil-Jasson · 2025-03-23T14:09:40Z

This PR implements the Extended Isolation Forest (EIF) algorithm.

Reference Issues/PRs

Fixes #2113

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Yes, it introduces H20.ai as a new dependency

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you after the PR has been merged.
The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

For new estimators and functions

I've added the estimator/function to the online API documentation.
(OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

For developers with write access

(OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

aeon-actions-bot · 2025-03-23T14:10:02Z

Thank you for contributing to `aeon`

I did not find any labels to add based on the title. Please add the [ENH], [MNT], [BUG], [DOC], [REF], [DEP] and/or [GOV] tags to your pull requests titles. For now you can add the labels manually.
I have added the following labels to this PR based on the changes made: [ $\color{#6F6E8D}{\textsf{anomaly detection}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

Run pre-commit checks for all files
Run mypy typecheck tests
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
Disable numba cache loading
Push an empty commit to re-run CI checks

MatthewMiddlehurst · 2025-03-23T15:26:29Z

Please fill out the template and use the correct title format.

Akhil-Jasson · 2025-03-23T15:43:16Z

I've fixed the title format. Shall I proceed with adding EIF to the online API documentation?

Ramana-Raja · 2025-03-24T18:02:34Z

Hi @Akhil-Jasson,

I just saw your code, and it looks great! That said, I don’t think using H2O is the best approach since Aeon doesn’t rely on it. It might be better to have our own implementation instead.

A few updates to consider:

1.Could you update the section "Does your contribution introduce a new dependency?" and mention H2O there?

2.The test cases seem to be missing—could you add them?

3.Instead of importing the entire H2O module, it’s better to import only what’s needed to keep things lightweight.

MatthewMiddlehurst · 2025-03-24T19:42:43Z

New dependencies should be put in pyproject.toml otherwise this won't be tested, Still bits missing from the template

…ile for the EIF implementation.

Akhil-Jasson · 2025-03-30T13:22:03Z

I've added the h2o dependency to pyproject.toml, but I'm encountering errors when running the test files. The test attempts to import aeon.anomaly_detection._eif but fails, indicating that the module doesn’t exist yet.

Is there a step I'm missing for adding new modules to aeon? What could be the possible issue?

MatthewMiddlehurst · 2025-04-02T23:49:24Z

your import is incorrect.

SebastianSchmidl

AI2O looks like a massive package. Do we actually want to include it as a dependency? The issue clearly states that we are looking for an implementation in aeon directly.

I know this is not mentioned in the corresponding issue, but I think it makes sense to work with sliding windows in EIF as well. We can always get the original behavior back by setting the window-size to 1.

aeon/anomaly_detection/_eif.py

SebastianSchmidl · 2025-04-04T14:56:25Z

aeon/anomaly_detection/_eif.py

+
+        return self
+
+    def _predict(self, X) -> np.ndarray:


To make the usage of EIF similar to our other models, we want it to be usable as a semi-supervised (as implemented already) and an unsupervised algorithm. The current implementation of _predict does not allow that.

…d and unsupervised learning. Included Sliding Window Algorithm as requested.

SebastianSchmidl

The code does not look too complicated, so I think, it was a good decision to not include H2O.ai as a dependency!

Thanks for the implementation effort. Please address the issues below and then compare your implementation to the original implementation by Hariri et al. to demonstrate that your/our implementation produces the same results as theirs.

I think this estimator is a suitable candidate for improving the performance with Numba / JITs. But let us first focus on making the implementation correct, and then tackle the performance optimization in another PR.

SebastianSchmidl · 2025-04-25T12:30:23Z

aeon/anomaly_detection/_eif.py

+    axis : int, default=1
+        The time point axis of the input series if it is 2D. If ``axis==0``, it is
+        assumed each column is a time series and each row is a time point. i.e. the
+        shape of the data is ``(n_timepoints, n_channels)``. ``axis==1`` indicates
+        the time series are in rows, i.e. the shape of the data is
+        ``(n_channels, n_timepoints)``.


Different data layouts are handled in the base class. Please do not expose an additional axis-parameter here.

SebastianSchmidl · 2025-04-25T12:31:53Z

aeon/anomaly_detection/_eif.py

+        y : np.ndarray, optional
+            Labels for semi-supervised learning. 0 for normal, 1 for anomalous.
+            If None, unsupervised learning is used.


Semi-supervised means training on normal data only, so just 0 is allowed here. Otherwise, this would be supervised training with normal data and known anomalies. Details: https://www.aeon-toolkit.org/en/stable/examples/anomaly_detection/anomaly_detection.html#Anomaly-Detection-in-aeon

SebastianSchmidl · 2025-04-25T12:32:51Z

aeon/anomaly_detection/_eif.py

+        # Ensure X is 2D
+        if X.ndim == 1:
+            X = X.reshape(-1, 1)


This should not be necessary anymore if you remove the axis-parameter from the class and use a single data layout within the estimator.

SebastianSchmidl · 2025-04-25T12:35:48Z

aeon/anomaly_detection/_eif.py

+        # Calculate anomaly scores
+        scores = self._predict(X)
+
+        # Set threshold
+        self.threshold_ = float(np.percentile(scores, 100 * (1 - self.contamination)))


This looks suspicious. Why do you predict in fit?

SebastianSchmidl · 2025-04-25T12:40:42Z

aeon/anomaly_detection/_eif.py

+                scores,
+                window_size=self.window_size,
+                stride=self.stride,
+                n_timepoints=self._n_samples_orig,


This just works if either EIF is used in an unsupervised way or both training and prediction time series have the same length!

SebastianSchmidl · 2025-04-25T12:44:55Z

aeon/anomaly_detection/_eif.py

+            return 0
+        elif n == 2:
+            return 1
+        return 2 * (np.log(n - 1) + 0.5772156649) - 2 * (n - 1) / n


Where is the magic number from? Please at least add a comment.

SebastianSchmidl · 2025-04-25T12:52:16Z

Tests are also still missing.

Akhil-Jasson · 2025-04-25T14:15:32Z

Previously, the tests were triggered automatically on commit based on the old repo structure. Since then, the structure has changed significantly. I ran the tests locally using the earlier setup, and they passed.

To accommodate the requirements of the test files—such as the need for y in the fit function—I’ve made it an optional parameter. I’ll take a look into the other comments as well.

Given the extent of changes in the repo structure, is it easier to just create a new branch from the updated main branch or is there an easier alternative? Please let me know!

MatthewMiddlehurst · 2025-04-30T21:50:45Z

I would merge main into your branch, the bulk of your code should not conflict. It is mainly that we have refactored the module so it has all moved around a bit. If you do not know which module to put your implementation in just put it anywhere for now and we can discuss after where is best.

…e module

Akhil-Jasson · 2025-05-01T18:42:03Z

Changes have been made to this branch to align with the newly refactored module structure. I have added the EIF model under outlier_detection.

I'll soon address the feedback to remove the axis hyperparameter and add the necessary comments as requested.

Akhil-Jasson added 2 commits March 23, 2025 13:27

Add Extended Isolation Forest (EIF) anomaly detector

ac77471

Add Extended Isolation Forest (EIF) anomaly detector

7065d21

Akhil-Jasson requested review from SebastianSchmidl and MatthewMiddlehurst as code owners March 23, 2025 14:09

aeon-actions-bot bot added the anomaly detection Anomaly detection package label Mar 23, 2025

Akhil-Jasson added 2 commits March 23, 2025 14:10

Automatic pre-commit fixes

1221a73

Fixed the Line too long issue

b32a354

MatthewMiddlehurst added the enhancement New feature, improvement request or other non-bug code enhancement label Mar 23, 2025

Akhil-Jasson changed the title ~~2113 Implementation of Extended Isolation Forest (EIF) anomaly detector~~ [ENH] Implementation of Extended Isolation Forest (EIF) anomaly detector Mar 23, 2025

Akhil-Jasson added 2 commits March 30, 2025 13:16

Made recommended changes according to the template and added a test f…

5b97aaa

…ile for the EIF implementation.

Automatic pre-commit fixes

ed65d88

SebastianSchmidl requested changes Apr 4, 2025

View reviewed changes

Akhil-Jasson and others added 7 commits April 16, 2025 16:37

EIF Implementation without large dependency, including semi-supervise…

c059309

…d and unsupervised learning. Included Sliding Window Algorithm as requested.

Automatic pre-commit fixes

0356eb8

Temporarily Removed test file

b397f34

Fixed Data Type Error and Fixed Dimension issue with Input

82757e4

Automatic pre-commit fixes

d5a922e

Fixed Sliding Window and Type Errors

5b4d4e5

Automatic pre-commit fixes

0aaf790

SebastianSchmidl requested changes Apr 25, 2025

View reviewed changes

Akhil-Jasson added 5 commits May 1, 2025 18:08

Resolving conflicts with the main branch due to the refactoring of th…

b8a725e

…e module

Automatic pre-commit fixes

8f80f69

Merge branch 'main' into my-feature-branch

6f04556

Made Changes according to the refactored module structure

4d16ef6

Automatic pre-commit fixes

9f73608

[ENH] Implementation of Extended Isolation Forest (EIF) anomaly detector #2679

Are you sure you want to change the base?

[ENH] Implementation of Extended Isolation Forest (EIF) anomaly detector #2679

Uh oh!

Conversation

Akhil-Jasson commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions

For new estimators and functions

For developers with write access

Uh oh!

aeon-actions-bot bot commented Mar 23, 2025

Thank you for contributing to aeon

PR CI actions

Uh oh!

MatthewMiddlehurst commented Mar 23, 2025

Uh oh!

Akhil-Jasson commented Mar 23, 2025

Uh oh!

Ramana-Raja commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatthewMiddlehurst commented Mar 24, 2025

Uh oh!

Akhil-Jasson commented Mar 30, 2025

Uh oh!

MatthewMiddlehurst commented Apr 2, 2025

Uh oh!

SebastianSchmidl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SebastianSchmidl Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl left a comment

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianSchmidl commented Apr 25, 2025

Uh oh!

Akhil-Jasson commented Apr 25, 2025

Uh oh!

MatthewMiddlehurst commented Apr 30, 2025

Uh oh!

Akhil-Jasson commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Akhil-Jasson commented Mar 23, 2025 •

edited

Loading

Thank you for contributing to `aeon`

Ramana-Raja commented Mar 24, 2025 •

edited

Loading

Akhil-Jasson commented May 1, 2025 •

edited

Loading