Skip to content

[Bug]: External transforms cannot be instantiated with LOOPBACK mode. #34594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 17 tasks
robertwb opened this issue Apr 9, 2025 · 0 comments · May be fixed by #34678
Open
1 of 17 tasks

[Bug]: External transforms cannot be instantiated with LOOPBACK mode. #34594

robertwb opened this issue Apr 9, 2025 · 0 comments · May be fixed by #34678

Comments

@robertwb
Copy link
Contributor

robertwb commented Apr 9, 2025

What happened?

This is especially painful as the default Beam Java runner can't run cross-language transforms, so one is forced to use Docker when running on local runners such as Prism or the Python Universal Local Runner.

To reproduce, try running any of the examples at https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java with the Prism runner. E.g.

PortablePipelineOptions options = PipelineOptionsFactory.create().as(PortablePipelineOptions.class);
options.setDefaultEnvironmentType("LOOPBACK");

Pipeline p = Pipeline.create(options);
p.apply(Create.of(KV.of("A", "x"), KV.of("A", "y"), KV.of("B", "z")))
  .apply(PythonExternalTransform
                    .<PCollection<KV<String, String>>, PCollection<KV<String, Iterable<String>>>>
                        from("apache_beam.GroupByKey"))

results in

java.lang.IllegalArgumentException: External service address must not be empty (set it using '--environmentOptions=external_service_address=...'?).
	at org.apache.beam.sdk.util.construction.Environments.createExternalEnvironment(Environments.java:224)
	at org.apache.beam.sdk.util.construction.Environments.createOrGetDefaultEnvironment(Environments.java:193)
	at org.apache.beam.sdk.util.construction.SdkComponents.create(SdkComponents.java:109)
	at org.apache.beam.sdk.util.construction.External$ExpandableTransform.expand(External.java:221)

We should allow creating a placeholder environment of this type and late binding it to the loopback environment once that is created. (Possibly we should do this for all environments during construction and only bind them when we go to run them.)

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@liferoad liferoad linked a pull request Apr 20, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant