Skip to content

BYOC: Support template build for team clusters #766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

sitole
Copy link
Member

@sitole sitole commented Jun 13, 2025

Implements abstraction for template managers in API to support edge clusters.

@sitole sitole self-assigned this Jun 13, 2025
@sitole sitole added the feature New feature label Jun 13, 2025
Copy link

linear bot commented Jun 13, 2025

@sitole sitole changed the title Template management compatible with edge clusters e2b 2473 BYOC: Template management abstraction for calling remote clusters Jun 13, 2025
@sitole sitole force-pushed the edge-service-core-impl branch from 345856a to e58cdf7 Compare June 13, 2025 17:39
@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch from d3c8031 to 36016c6 Compare June 13, 2025 18:13
@sitole sitole requested a review from Copilot June 13, 2025 19:09
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for building templates on remote clusters by abstracting build placement and integrating a cluster edge pool.

  • Added database migrations and queries to track clusters and build nodes.
  • Defined a BuildPlacement interface with local and clustered implementations.
  • Updated API handlers and cache to propagate cluster and node IDs.

Reviewed Changes

Copilot reviewed 65 out of 65 changed files in this pull request and generated 3 comments.

File Description
packages/db/migrations/20250611172307_clusgter_node_for_builds.sql Adds cluster_node_id column to track build node assignment
packages/api/internal/template-manager/placement_local.go Implements local build placement and log fetching via Loki
packages/api/internal/template-manager/template_manager.go Central TemplateManager wiring for placement, creation, deletion
packages/api/internal/template-manager/template_manager_test.go Tests for polling status adjusted for new client fields
Comments suppressed due to low confidence (1)

packages/api/internal/template-manager/template_manager_test.go:118

  • The test initializer still uses statusClient, but PollBuildStatus no longer has that field. Update the test to construct with the new fields (placement, templateManagerClient, etc.).
c := &PollBuildStatus{

@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch 2 times, most recently from 3a0b224 to 1c8c32e Compare June 13, 2025 23:03
@sitole sitole force-pushed the edge-service-core-impl branch 2 times, most recently from 0051d27 to eba9e11 Compare June 16, 2025 22:37
Base automatically changed from edge-service-core-impl to main June 17, 2025 17:25
@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch 3 times, most recently from 3e0648a to 3875fc4 Compare June 17, 2025 21:26
@sitole sitole changed the base branch from main to api-edge-clusters-support June 17, 2025 21:27
@sitole sitole force-pushed the api-edge-clusters-support branch from 1de1bfd to 1de2706 Compare June 17, 2025 21:47
@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch 2 times, most recently from 7db10cf to afc1b03 Compare June 17, 2025 22:00
@sitole
Copy link
Member Author

sitole commented Jun 19, 2025

Need to implement migrations and schema changes removed from #795

@sitole sitole force-pushed the api-edge-clusters-support branch from 25f2f66 to f534515 Compare June 19, 2025 19:46
@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch 2 times, most recently from a908f7c to 1fb8a74 Compare June 19, 2025 21:15
@sitole sitole changed the title BYOC: Template management abstraction for calling remote clusters BYOC: Support template build for team clusters Jun 19, 2025
@sitole sitole changed the base branch from api-edge-clusters-support to main June 19, 2025 21:21
@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch 2 times, most recently from 7f1e73b to d963648 Compare June 21, 2025 00:01
@sitole sitole force-pushed the template-management-compatible-with-edge-clusters-e2b-2473 branch from e95b23d to 09032a9 Compare June 26, 2025 17:25
Copy link
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mainly unite the syncing logic, as it's there now twice, but doing basically the same thing

@sitole sitole requested a review from dobrac June 27, 2025 02:23
logs = append(logs, line["message"].(string))
}
}
l, err := a.templateManager.GetLogs(ctx, buildUUID, templateID, team.ClusterID, buildInfo.ClusterNodeId, params.LogsOffset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
l, err := a.templateManager.GetLogs(ctx, buildUUID, templateID, team.ClusterID, buildInfo.ClusterNodeId, params.LogsOffset)
l, err := a.templateManager.GetLogs(ctx, buildUUID, templateID, team.ClusterID, buildInfo.ClusterNodeID, params.LogsOffset)

)

type PlacementLogsProvider interface {
GetLogs(ctx context.Context, buildId string, templateId string, offset *int32) ([]string, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GetLogs(ctx context.Context, buildId string, templateId string, offset *int32) ([]string, error)
GetLogs(ctx context.Context, templateID string, buildID string, offset *int32) ([]string, error)

Comment on lines +53 to +56
TemplateId string

ClusterId *uuid.UUID
ClusterNodeId *string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TemplateId string
ClusterId *uuid.UUID
ClusterNodeId *string
TemplateID string
ClusterID *uuid.UUID
ClusterNodeID *string


localClient: client,
localClientMutex: sync.RWMutex{},
localClientStatus: infogrpc.ServiceInfoStatus_OrchestratorHealthy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it always healthy when you add it?

}
func (tm *TemplateManager) getBuilderClient(clusterID *uuid.UUID, nodeID *string, placement bool) (*grpclient.GRPCClient, metadata.MD, PlacementLogsProvider, error) {
if clusterID == nil || nodeID == nil {
tm.localClientMutex.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put the RUnlock to defer here?

err = utils.UnwrapGRPCError(err)
if err != nil {
zap.L().Error("Failed to get health status of template manager", zap.Error(err))
tm.setLocalClientStatus(orchestratorinfo.ServiceInfoStatus_OrchestratorDraining)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

draining or unhealthy?

@@ -0,0 +1,123 @@
package synchronization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice ❤️

return p, nil
}

func (p *Pool) syncBackground() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be probably something like start, etc. The function doesn't (and shouldn't) know if it's gonna be run in the background or not

Store: poolSynchronizationStore{pool: p},
}

synchronize.SyncInBackground(p.close, poolSyncInterval, poolSyncTimeout, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this name, the SyncInBackground runs it actually in the foreground

return nil
}

func (s *Synchronize[SourceItem, SourceKey, PoolItem]) SyncInBackground(cancel chan struct{}, syncInterval time.Duration, syncRoundTimeout time.Duration, runInitialSync bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the cancel channel passing here. I would prefer separate method to close/stop the loop, so the cancel will be inner property of the Synchronize type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants