Skip to content

feat: establish CircleCI like tmate debugging session #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

galargh
Copy link

@galargh galargh commented Mar 9, 2023

This PR adds new inputs with which the action can be used to imitate CircleCI's SSH debugging sessions. It also adds prettier config to the repo but doesn't apply it yet to make the diff easier to review.

CircleCI like tmate session

If you want to set up a tmate session similarily to how CircleCI does it, you can use the following configuration.

You can insert it at the beggining of your pipeline. It will remain dormant until you rerun all jobs with the debug logging enabled.

Then, it will print out the tmate connection string when a session is established but it will not wait for you to connect. Instead, it will proceed with the run and it will wait for 10 minutes after it is complete. If you do not connect within that time, the session will remain open until you disconnect. The disconnection checks will be performed every 10 minutes.

You will be able to connect to the session only if you use registered public SSH key(s).

name: CI
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - name: Setup tmate session
      uses: mxschmitt/action-tmate@v3
      if: runner.debug == '1'
      with:
        check-num-clients: true
        limit-access-to-actor: true # requires registered public SSH key(s)
        wait: false
        wait-in-post: true
        wait-interval: '600000' # 10 minutes
    - uses: actions/checkout@v2

Testing

@galargh galargh changed the title feat: establish CircleCI like tmate debugging session #1 feat: establish CircleCI like tmate debugging session Mar 9, 2023
Copy link
Collaborator

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really, really dislike how refactoring-happy this PR is. And then prettify-happy. And then undoes (part of?) that prettification. And since its claimed goal is not even about refactoring, nor about prettifying, but about CircleCI-like operation, that means that it mixes too many concerns into the same contribution.

As a consequence, I am reminded of the adage: There's code so simply that it has no obvious bugs. And there's code so complex that it has no obvious bugs. This contribution falls into the latter category.

As a point in favor of this take, note that I already found one regression this PR would introduce if we merged it.

Therefore, I am strongly in favor of separating concerns better and essentially throw all of the refactoring and prettifying out, and only keep the feature that was actually stated as goal: it can easily implemented via a new if/else block that indents part of the existing code and adds the new functionality in the else block. That way, the code will be eminently more reviewable, and confidence can be re-builtin in the correctness of the patch.

About the design of the feature: there is no need for wait-in-post and wait-interval: one does not make sense without the other. In general, it would probably make sense if this PR added but a single new parameter to trigger CircleCI-like operations. Maybe something like block-while-running or something similar.

if (core.getInput("install-dependencies") !== "false") {
core.debug("Installing dependencies")
if (process.platform === "darwin") {
await execShellCommand('brew install tmate');
} else if (process.platform === "win32") {
await execShellCommand('pacman -S --noconfirm tmate');
await execShellCommand('pacman -Sy --noconfirm tmate');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a regression for the fix in #151.

@galargh
Copy link
Author

galargh commented Mar 13, 2023

Happy to split refactor/feature into separate PRs but before continuing to work on this contribution, I'd like to learn if the feature is something the maintainers see as valuable to this project.

@dscho
Copy link
Collaborator

dscho commented Mar 21, 2023

I'd like to learn if the feature is something the maintainers see as valuable to this project.

I use a similar feature in CirrusCI whenever debugging Git problems on FreeBSD.

So I think with some careful design, this can be a highly valuable feature. One thing I would love to see improved is the user interface: I would like there to be one new input with a highly descriptive name (and yes, I agree, naming is hard).

Another thing I would love to see is a change in design when it comes to the post-action: instead of waiting for a fixed time-out and then exiting, it would be much better to wait for the tmate session to be ended by the user (e.g. via Ctrl+D in the connected shell).

Speaking of the post-action: we need to define better how we intend to handle job states other than success: the job could have failed, and it could have been canceled.

If you really like the CircleCI feature where the terminal times out and force-disconnects, that is a feature that could be implemented independently of the feature to send the session into the background.

@galargh
Copy link
Author

galargh commented Mar 24, 2023

Another thing I would love to see is a change in design when it comes to the post-action: instead of waiting for a fixed time-out and then exiting, it would be much better to wait for the tmate session to be ended by the user (e.g. via Ctrl+D in the connected shell).

This assumption is not correct. See:

action-tmate/src/helpers.js

Lines 115 to 144 in 880f83b

export async function waitUntilDebuggingSessionExit() {
const [tmateSSH, tmateWeb] = await getTmateConnectionStrings()
core.debug('Entering main loop')
while (true) {
showTmateConnectionStrings(tmateSSH, tmateWeb)
if (continueFileExists()) {
core.info(
'Exiting debugging session because the continue file was created'
)
break
}
if (didTmateQuit()) {
core.info("Exiting debugging session 'tmate' quit")
break
}
await sleep(parseInt(core.getInput('wait-interval')))
if (
core.getInput('check-num-clients') !== 'false' &&
!(await doesTmateHaveConnectedClients())
) {
core.info("Exiting debugging session because 'tmate' has no clients")
break
}
}
}
The session doesn't quit if there are any connected clients. This behaviour is in line with how the action works now.

@dscho
Copy link
Collaborator

dscho commented Mar 27, 2023

The session doesn't quit if there are any connected clients.

Indeed. And that might be because the run finished so fast that the user did not even have a chance to connect yet.

Don't get me wrong: I can see a potential point in timing out when users don't even bother to connect. But I see an even bigger point in being nice to users who did connect and do not want their session getting terminated while they're in the middle of debugging something.

@galargh
Copy link
Author

galargh commented Mar 27, 2023

Don't get me wrong: I can see a potential point in timing out when users don't even bother to connect. But I see an even bigger point in being nice to users who did connect and do not want their session getting terminated while they're in the middle of debugging something.

That's exactly what this option is for -

action-tmate/action.yml

Lines 39 to 40 in 880f83b

check-num-clients:
description: 'Whether or not to exit if number of connected clients is 0'

@dscho
Copy link
Collaborator

dscho commented Mar 28, 2023

Don't get me wrong: I can see a potential point in timing out when users don't even bother to connect. But I see an even bigger point in being nice to users who did connect and do not want their session getting terminated while they're in the middle of debugging something.

That's exactly what this option is for -

action-tmate/action.yml

Lines 39 to 40 in 880f83b

check-num-clients:
description: 'Whether or not to exit if number of connected clients is 0'

I understand that. But the design of this PR makes the Action complicated to use: too many configuration options, too unintuitive defaults.

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

@galargh
Copy link
Author

galargh commented Mar 28, 2023

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

I certainly wouldn't want the tmate action to fail my workflow on connection time-out 😨

Either way, following the discussion, I feel that the disconnect between the original proposal and the suggested changes is pretty substantial. It would be quite an undertaking to reach an "approvable" state which, sadly, I cannot afford at the moment. Hence, I'm going to close the PR for now.

Thank you for your engagement 🙇

@galargh galargh closed this Mar 28, 2023
@dscho
Copy link
Collaborator

dscho commented Mar 31, 2023

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

I certainly wouldn't want the tmate action to fail my workflow on connection time-out 😨

@galargh You would want the action-tmate "post" action to block up to six hours after the workflow completes, waiting for a user who does not connect?

@galargh
Copy link
Author

galargh commented Mar 31, 2023

Ideally, what would happen is that there was an automated time-out waiting for the user to connect where the Action would stop with failure when that time-out is reached. The time-out would be automatically canceled as soon as the user connects. This would be the default, no need to specify any configuration. The time-out would have a sensible default, again, no need to specify anything.

I certainly wouldn't want the tmate action to fail my workflow on connection time-out 😨

@galargh You would want the action-tmate "post" action to block up to six hours after the workflow completes, waiting for a user who does not connect?

In most cases, I would want the action to exit gracefully when it reaches the configured timeout without failing my workflow.

@dscho
Copy link
Collaborator

dscho commented Mar 31, 2023

Oh right, that would be nicer.

@dscho
Copy link
Collaborator

dscho commented May 17, 2023

Gotta say, I am finding myself liking this "detached" mode. I implemented it "manually" yesterday, and it came in pretty handy.

@dscho
Copy link
Collaborator

dscho commented May 18, 2023

FWIW I offer my proposed solution over here: #162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants