Skip to content

Commit 9dd1f68

Browse files
authored
Merge pull request #33907 Expand yaml provider documentation.
2 parents 7451a7e + ea8c560 commit 9dd1f68

File tree

1 file changed

+41
-10
lines changed

1 file changed

+41
-10
lines changed

website/www/site/content/en/documentation/sdks/yaml-providers.md

+41-10
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,26 @@ vend catalogues of schema transforms.
3030

3131
## Java
3232

33-
For example, you could build a jar that vends a
33+
Exposing transform in Java that can be used in a YAML pipeline consists of
34+
four main steps:
35+
36+
1. Defining the transformation itself as a
37+
[PTransform](https://beam.apache.org/documentation/programming-guide/#composite-transforms)
38+
that consumes and produces zero or more [schema'd PCollections](https://beam.apache.org/documentation/programming-guide/#creating-schemas).
39+
2. Exposing this transform via a
40+
[SchemaTransformProvider](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html)
41+
which provides an identifier used to refer to this transform later as well
42+
as metadata like a human-readable description and its configuration parameters.
43+
3. Building a Jar that contains these classes and vends them via the
44+
[Service Loader](https://github.com/Polber/beam-yaml-xlang/blob/95abf0864e313232a89f3c9e57b950d0fb478979/src/main/java/org/example/ToUpperCaseTransformProvider.java#L30)
45+
infrastructure.
46+
4. Writing a [provider specification](https://beam.apache.org/documentation/sdks/yaml/#providers)
47+
that tells Beam YAML where to find this jar and what it contains.
48+
49+
If the transform is already exposed as a
3450
[cross language transform](https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/)
3551
or [schema transform](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html)
36-
and then use it in a transform as follows
52+
then steps 1-3 have been done for you. One then uses this transform as follows:
3753

3854
```
3955
pipeline:
@@ -56,13 +72,14 @@ pipeline:
5672
providers:
5773
- type: javaJar
5874
config:
59-
jar: /path/or/url/to/myExpansionService.jar
75+
jar: /path/or/url/to/myExpansionService.jar
6076
transforms:
61-
MyCustomTransform: "urn:registered:in:expansion:service"
77+
MyCustomTransform: "urn:registered:in:expansion:service"
6278
```
6379

64-
A full example of how to build a java provider can be found
65-
[here](https://github.com/apache/beam-starter-java-provider).
80+
We provide a
81+
[full cloneable example of how to build a java provider](https://github.com/apache/beam-starter-java-provider)
82+
that can be used to get started.
6683

6784
## Python
6885

@@ -72,13 +89,27 @@ Arbitrary Python transforms can be provided as well, using the syntax
7289
providers:
7390
- type: pythonPackage
7491
config:
75-
packages:
76-
- my_pypi_package>=version
77-
- /path/to/local/package.zip
92+
packages:
93+
- my_pypi_package>=version
94+
- /path/to/local/package.zip
7895
transforms:
79-
MyCustomTransform: "pkg.module.PTransformClassOrCallable"
96+
MyCustomTransform: "pkg.module.PTransformClassOrCallable"
8097
```
8198

99+
which can then be used as
100+
101+
```
102+
- type: MyCustomTransform
103+
config:
104+
num: 3
105+
arg: whatever
106+
```
107+
108+
This will cause the dependencies to be installed before the transform is
109+
imported (via its given fully qualified name) and instantiated
110+
with the config values passed as keyword arguments (e.g. in this case
111+
`pkg.module.PTransformClassOrCallable(num=3, arg="whatever")`).
112+
82113
We offer a [python provider starter project](https://github.com/apache/beam-starter-python-provider)
83114
that serves as a complete example for how to do this.
84115

0 commit comments

Comments
 (0)