@@ -30,10 +30,26 @@ vend catalogues of schema transforms.
30
30
31
31
## Java
32
32
33
- For example, you could build a jar that vends a
33
+ Exposing transform in Java that can be used in a YAML pipeline consists of
34
+ four main steps:
35
+
36
+ 1 . Defining the transformation itself as a
37
+ [ PTransform] ( https://beam.apache.org/documentation/programming-guide/#composite-transforms )
38
+ that consumes and produces zero or more [ schema'd PCollections] ( https://beam.apache.org/documentation/programming-guide/#creating-schemas ) .
39
+ 2 . Exposing this transform via a
40
+ [ SchemaTransformProvider] ( https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html )
41
+ which provides an identifier used to refer to this transform later as well
42
+ as metadata like a human-readable description and its configuration parameters.
43
+ 3 . Building a Jar that contains these classes and vends them via the
44
+ [ Service Loader] ( https://github.com/Polber/beam-yaml-xlang/blob/95abf0864e313232a89f3c9e57b950d0fb478979/src/main/java/org/example/ToUpperCaseTransformProvider.java#L30 )
45
+ infrastructure.
46
+ 4 . Writing a [ provider specification] ( https://beam.apache.org/documentation/sdks/yaml/#providers )
47
+ that tells Beam YAML where to find this jar and what it contains.
48
+
49
+ If the transform is already exposed as a
34
50
[ cross language transform] ( https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/ )
35
51
or [ schema transform] ( https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html )
36
- and then use it in a transform as follows
52
+ then steps 1-3 have been done for you. One then uses this transform as follows:
37
53
38
54
```
39
55
pipeline:
@@ -56,13 +72,14 @@ pipeline:
56
72
providers:
57
73
- type: javaJar
58
74
config:
59
- jar: /path/or/url/to/myExpansionService.jar
75
+ jar: /path/or/url/to/myExpansionService.jar
60
76
transforms:
61
- MyCustomTransform: "urn:registered:in:expansion:service"
77
+ MyCustomTransform: "urn:registered:in:expansion:service"
62
78
```
63
79
64
- A full example of how to build a java provider can be found
65
- [ here] ( https://github.com/apache/beam-starter-java-provider ) .
80
+ We provide a
81
+ [ full cloneable example of how to build a java provider] ( https://github.com/apache/beam-starter-java-provider )
82
+ that can be used to get started.
66
83
67
84
## Python
68
85
@@ -72,13 +89,27 @@ Arbitrary Python transforms can be provided as well, using the syntax
72
89
providers:
73
90
- type: pythonPackage
74
91
config:
75
- packages:
76
- - my_pypi_package>=version
77
- - /path/to/local/package.zip
92
+ packages:
93
+ - my_pypi_package>=version
94
+ - /path/to/local/package.zip
78
95
transforms:
79
- MyCustomTransform: "pkg.module.PTransformClassOrCallable"
96
+ MyCustomTransform: "pkg.module.PTransformClassOrCallable"
80
97
```
81
98
99
+ which can then be used as
100
+
101
+ ```
102
+ - type: MyCustomTransform
103
+ config:
104
+ num: 3
105
+ arg: whatever
106
+ ```
107
+
108
+ This will cause the dependencies to be installed before the transform is
109
+ imported (via its given fully qualified name) and instantiated
110
+ with the config values passed as keyword arguments (e.g. in this case
111
+ ` pkg.module.PTransformClassOrCallable(num=3, arg="whatever") ` ).
112
+
82
113
We offer a [ python provider starter project] ( https://github.com/apache/beam-starter-python-provider )
83
114
that serves as a complete example for how to do this.
84
115
0 commit comments