Skip to content

dfdaemon is connecting to an incorrect scheduler domain name and IP address. #4040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zsksy123 opened this issue May 7, 2025 · 0 comments
Labels

Comments

@zsksy123
Copy link

zsksy123 commented May 7, 2025

Version Information

helm list
NAME     	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART           	APP VERSION
dragonfly	dragonfly	29      	2025-05-06 17:43:30.005192 +0800 CST	deployed	dragonfly-1.1.32	2.1.31

bug
The dfdaemon component has an incorrect connection to the FQDN of the scheduler.

kubectl  logs -f dragonfly-dfdaemon-zwgnd|grep -i error

2025-05-07T05:33:21.097Z	WARN	config/dynconfig_manager.go:132	scheduler host address dragonfly-scheduler-2.scheduler.dragonfly.svc.wlcb.in.openbayes.com:8002 is unreachable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: lookup dragonfly-scheduler-2.scheduler.dragonfly.svc.wlcb.in.openbayes.com on 10.97.0.10:53: no such host"

As far as I know, only headless type services can use the FQDN in the following format.
<pod-name>.<service-name>.<namespace>.svc.cluster.local
However, the default type of the scheduler svc is not headless.

kubectl get svc|grep dragonfly-scheduler
dragonfly-scheduler           ClusterIP   10.97.104.238   <none>        8002/TCP             350d

And the IP address is also incorrect.

kubectl  get pod -o wide|grep dragonfly-scheduler-0
dragonfly-scheduler-0                1/1     Running   0             3m16s   10.96.6.213     titan-v1   <none>           <none>
2025-05-07T06:49:08.280Z	WARN	config/dynconfig_manager.go:142	scheduler 10.96.6.243 dragonfly-scheduler-0.scheduler.dragonfly.svc.bj.in.openbayes.com 8002 has not reachable addresses
2025-05-07T06:49:38.279Z	WARN	config/dynconfig_manager.go:142	scheduler 10.96.6.243 dragonfly-scheduler-0.scheduler.dragonfly.svc.bj.in.openbayes.com 8002 has not reachable addresses
d7y.io/dragonfly/v2/client/config.(*dynconfigManager).GetResolveSchedulerAddrs
	/go/src/d7y.io/dragonfly/v2/client/config/dynconfig_manager.go:142
d7y.io/dragonfly/v2/pkg/resolver.(*SchedulerResolver).ResolveNow
	/go/src/d7y.io/dragonfly/v2/pkg/resolver/scheduler_resolver.go:82
d7y.io/dragonfly/v2/pkg/resolver.(*SchedulerResolver).OnNotify
	/go/src/d7y.io/dragonfly/v2/pkg/resolver/scheduler_resolver.go:109
d7y.io/dragonfly/v2/client/config.(*dynconfigManager).Notify
	/go/src/d7y.io/dragonfly/v2/client/config/dynconfig_manager.go:242
d7y.io/dragonfly/v2/client/config.(*dynconfigManager).Serve
	/go/src/d7y.io/dragonfly/v2/client/config/dynconfig_manager.go:268
d7y.io/dragonfly/v2/client/daemon.(*clientDaemon).Serve.func10
	/go/src/d7y.io/dragonfly/v2/client/daemon/daemon.go:744
golang.org/x/sync/errgroup.(*Group).Go.func1
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78
2025-05-07T06:49:38.279Z	ERROR	grpclog/grpclog.go:55	[scheduler_resolver]resolve addresses error can not found available scheduler addresses
google.golang.org/grpc/internal/grpclog.ErrorDepth
	/go/pkg/mod/google.golang.org/[email protected]/internal/grpclog/grpclog.go:55
google.golang.org/grpc/grpclog.(*componentData).ErrorDepth
	/go/pkg/mod/google.golang.org/[email protected]/grpclog/component.go:46
google.golang.org/grpc/grpclog.(*componentData).Errorf
	/go/pkg/mod/google.golang.org/[email protected]/grpclog/component.go:79
d7y.io/dragonfly/v2/pkg/resolver.(*SchedulerResolver).ResolveNow
	/go/src/d7y.io/dragonfly/v2/pkg/resolver/scheduler_resolver.go:84
d7y.io/dragonfly/v2/pkg/resolver.(*SchedulerResolver).OnNotify
	/go/src/d7y.io/dragonfly/v2/pkg/resolver/scheduler_resolver.go:109
d7y.io/dragonfly/v2/client/config.(*dynconfigManager).Notify
	/go/src/d7y.io/dragonfly/v2/client/config/dynconfig_manager.go:242
d7y.io/dragonfly/v2/client/config.(*dynconfigManager).Serve
	/go/src/d7y.io/dragonfly/v2/client/config/dynconfig_manager.go:268
d7y.io/dragonfly/v2/client/daemon.(*clientDaemon).Serve.func10
	/go/src/d7y.io/dragonfly/v2/client/daemon/daemon.go:744
golang.org/x/sync/errgroup.(*Group).Go.func1
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78
@zsksy123 zsksy123 added the bug label May 7, 2025
@zsksy123 zsksy123 changed the title The dfdaemon component has an incorrect connection to the FQDN of the scheduler. dfdaemon is connecting to an incorrect scheduler domain name and IP address. May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant