Reduce size of scheduler_info #9045

fjetter · 2025-04-16T11:46:32Z

The scheduler_info is used for this view

I am introducing a limit to the number of workers being returned.

I cannot imagine that somebody is using this for actual diagnostics (but it's cute and we should leave it) and I just randomly hard coded this to 5. With this change the total numbers in the upper right corner is still the max and indicates that the below view is only a subset. I'm open to suggestions for making this prettier but that has only low priority.

cc @jacobtomlinson I think you introduced this HTML view and might have an opinion about details or the hard coded number, etc.

fjetter · 2025-04-16T11:49:43Z

distributed/scheduler.py

@@ -1778,6 +1778,7 @@ def __init__(
        self.task_groups = {}
        self.task_prefixes = {}
        self.task_metadata = {}
+        self.total_memory = 0


It's almost overkill to maintain additional state for this but if we have to call identity often I want it to be constant time

Also, I'm surprised we are not tracking this already. Seems like a useful metric for adaptive scaling or smth like that 🤷

fjetter · 2025-04-16T11:50:33Z

distributed/scheduler.py

@@ -4075,16 +4076,22 @@ def _repr_html_(self) -> str:
            tasks=self.tasks,
        )

-    def identity(self) -> dict[str, Any]:
+    def identity(self, n_workers: int = -1) -> dict[str, Any]:


the default of -1 is to maintain backwards compat (no idea who is calling this in the wild). All callers in this code base are setting this explicitly

Copilot

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

distributed/widgets/templates/scheduler_info.html.j2: Language not supported

Comments suppressed due to low confidence (1)

distributed/client.py:4412

[nitpick] There are varying conventions for n_workers across client calls (e.g. 0 in dashboard_link, 5 in scheduler_info, -1 for full info). Consider clarifying/documenting the intent behind each value to avoid confusion.

def scheduler_info(self, n_workers: int = 5, **kwargs: Any) -> SchedulerInfo:

Copilot · 2025-04-16T11:55:21Z

distributed/client.py

@@ -1298,7 +1298,7 @@ def dashboard_link(self):
        try:
            return self.cluster.dashboard_link
        except AttributeError:
-            scheduler, info = self._get_scheduler_info()
+            scheduler, info = self._get_scheduler_info(n_workers=0)


Using n_workers=0 in this context results in an empty workers list. If the intention is to provide a default view for diagnostics, consider using a value such as -1 (to fetch all) or the same default limit (e.g. 5) used elsewhere.

Suggested change

scheduler, info = self._get_scheduler_info(n_workers=0)

scheduler, info = self._get_scheduler_info(n_workers=-1)

that's intentional since the workers info is not even accessed here

github-actions · 2025-04-16T12:01:48Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ±0 27 suites ±0 11h 17m 19s ⏱️ + 1m 9s
4 100 tests ±0 3 980 ✅ - 2 112 💤 ±0 7 ❌ +1 1 🔥 +1
51 401 runs +1 49 085 ✅ ±0 2 308 💤 - 1 7 ❌ +1 1 🔥 +1

For more details on these failures and errors, see this check.

Results for commit 41ce938. ± Comparison against base commit 16aa189.

♻️ This comment has been updated with latest results.

fjetter · 2025-04-16T15:53:54Z

I'll move forward with merging. Test failures appear to be unrelated and I think the benefits of this outweight possible minor UX things. I'm happy to follow up on this if there is feedback but I'd like to get the functional fix into the release today.

fjetter added 5 commits April 16, 2025 13:26

Reduce size of scheduler_info

3bc4ec4

adjust template

41dd231

fix totals

f45e119

introduce total_memory attr

123d7fd

add type annotation

665431f

fjetter commented Apr 16, 2025

View reviewed changes

This was referenced Apr 16, 2025

Scheduler not robust to many connected clients #9043

Closed

Release 2025.4.0 dask/community#418

Closed

fjetter requested review from hendrikmakait and Copilot April 16, 2025 11:54

Copilot AI reviewed Apr 16, 2025

View reviewed changes

fjetter force-pushed the reduce_size_scheduler_info branch from b0e37f3 to 665431f Compare April 16, 2025 12:01

fix test_Scheduler__to_dict

41ce938

fjetter merged commit 48fcf48 into dask:main Apr 16, 2025
25 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce size of scheduler_info #9045

Reduce size of scheduler_info #9045

fjetter commented Apr 16, 2025

fjetter Apr 16, 2025

fjetter Apr 16, 2025

fjetter Apr 16, 2025

Copilot AI left a comment

Copilot AI Apr 16, 2025

fjetter Apr 16, 2025

github-actions bot commented Apr 16, 2025 •

edited

Loading

fjetter commented Apr 16, 2025

	scheduler, info = self._get_scheduler_info(n_workers=0)
	scheduler, info = self._get_scheduler_info(n_workers=-1)

Reduce size of scheduler_info #9045

Reduce size of scheduler_info #9045

Conversation

fjetter commented Apr 16, 2025

fjetter Apr 16, 2025

Choose a reason for hiding this comment

fjetter Apr 16, 2025

Choose a reason for hiding this comment

fjetter Apr 16, 2025

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot AI Apr 16, 2025

Choose a reason for hiding this comment

fjetter Apr 16, 2025

Choose a reason for hiding this comment

github-actions bot commented Apr 16, 2025 • edited Loading

Unit Test Results

fjetter commented Apr 16, 2025

github-actions bot commented Apr 16, 2025 •

edited

Loading