Skip to content

Commit 3f4bd16

Browse files
Sam Luryefacebook-github-bot
Sam Lurye
authored andcommitted
Basic pdb debugging for actor mesh (#217)
Summary: Pull Request resolved: #217 This diff fleshes out suo's work in D75023066 to provide a basic pdb debugging experience for python actor meshes. The basic flow is: - Call `init_debugging(python_actor_mesh)` to get a `debug_client` - Call `debug_client.enter()` to enter a debugging session - Input `list` to show the set of currently active breakpoints across your mesh - Attach to a specific actor by inputting `attach`, then run arbitrary pdb commands for the attached actor - Input `detach` to exit back into the outer debug loop, where you can attach to another rank - Cast pdb commands to multiple actors at once, either via a comma separated list like `cast r0,r1,r2 <command>` or to all actors via `cast * <command>` - Input `quit` to exit the debug session and continue Reviewed By: suo Differential Revision: D76086980
1 parent 42a123a commit 3f4bd16

File tree

4 files changed

+670
-0
lines changed

4 files changed

+670
-0
lines changed

python/monarch/actor_mesh.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import itertools
1414
import logging
1515
import random
16+
import sys
1617
import traceback
1718

1819
from dataclasses import dataclass
@@ -34,6 +35,7 @@
3435
ParamSpec,
3536
Tuple,
3637
Type,
38+
TYPE_CHECKING,
3739
TypeVar,
3840
)
3941

@@ -54,6 +56,10 @@
5456

5557
from monarch.common.pickle_flatten import flatten, unflatten
5658
from monarch.common.shape import MeshTrait, NDSlice
59+
from monarch.pdb_wrapper import remote_breakpointhook
60+
61+
if TYPE_CHECKING:
62+
from monarch.debugger import DebugClient
5763

5864
logger = logging.getLogger(__name__)
5965

@@ -610,6 +616,19 @@ def _new_with_shape(self, shape: Shape) -> "ActorMeshRef":
610616
"actor implementations are not meshes, but we can't convince the typechecker of it..."
611617
)
612618

619+
@endpoint
620+
async def _set_debug_client(self, client: "DebugClient") -> None:
621+
point = MonarchContext.get().point
622+
# For some reason, using a lambda instead of functools.partial
623+
# confuses the pdb wrapper implementation.
624+
sys.breakpointhook = functools.partial( # pyre-ignore
625+
remote_breakpointhook,
626+
point.rank,
627+
point.shape.coordinates(point.rank),
628+
MonarchContext.get().mailbox.actor_id,
629+
client,
630+
)
631+
613632

614633
class ActorMeshRef(MeshTrait):
615634
def __init__(

0 commit comments

Comments
 (0)