You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -147,21 +151,20 @@ Some example tools which consume multiple datasets (including lists) include:
147
151
148
152
Also see the tools-devteam repository `Pull Request #20 <https://github.com/galaxyproject/tools-devteam/pull/20>`__ modifying the cufflinks suite of tools for collection compatible reductions.
149
153
150
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
151
-
Identifiers
152
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154
+
Processing Identifiers
155
+
-------------------------------
153
156
154
157
As mentioned previously, sample identifiers are preserved through mapping
155
158
steps, during reduction steps one may likely want to use these - for
156
-
reporting, comparisons, etc.... When using these multiple ``data`` parameters
159
+
reporting, comparisons, etc. When using these multiple ``data`` parameters
157
160
the dataset objects expose a field called ``element_identifier``. When these
158
161
parameters are used with individual datasets - this will just default to being
159
162
the dataset's name, but when used with collections this parameter will be the
160
-
element_identifier (i.e. the preserved sample name).
163
+
``element_identifier`` (i.e. the preserved sample name).
161
164
162
165
For instance, imagine merging a collection of tabular datasets into a single
163
166
table with a new column indicating the sample name the corresponding rows were
164
-
derived from using a little ficitious program called ``merge_rows``.
167
+
derived from using a little fictitious program called ``merge_rows``.
165
168
166
169
::
167
170
@@ -181,14 +184,14 @@ Some example tools which utilize ``element_identifier`` include:
181
184
182
185
.. note:: Here we are rewriting the element identifiers to assure everything is safe to
183
186
put on the command-line. In the future collections will not be able to contain
184
-
keys are potentially harmful and this won't be nessecary.
187
+
keys that are potentially harmful and this won't be nessecary.
185
188
186
189
More on ``data_collection`` parameters
187
190
----------------------------------------------
188
191
189
192
The above three cases (users mapping over single tools, consuming pairs, and
190
193
consuming lists using `multiple` ``data`` parameters) are hopefully the most
191
-
common ways to consume collections as a tool author - but the
194
+
common ways to consume collections for a tool author - but the
192
195
``data_collection`` parameter type allows one to handle more cases than just
193
196
these.
194
197
@@ -240,66 +243,80 @@ implicitly "mapped over" to produce collections as described above - but there
240
243
are a variety of situations for which this idiom is insufficient.
241
244
242
245
Progressively more complex syntax elements exist for the increasingly complex
243
-
scenarios. Broadly speaking - the three scenarios covered are the tool
246
+
scenarios. Broadly speaking - the three scenarios covered are when the tool
244
247
produces...
245
248
246
249
1. a collection with a static number of elements (mostly for ``paired``
247
-
collections, but if a tool does say fixed binning it might make sense to create a list this way as well)
250
+
collections, but if a tool has fixed binding it might make sense to create a list this way as well)
248
251
2. a ``list`` with the same number of elements as an input list
249
-
(this would be a common pattern for normalization applications for
252
+
(this would be a common pattern for normalization applications for
250
253
instance).
251
254
3. a ``list`` where the number of elements is not knowable until the job is
252
255
complete.
253
256
254
257
1. Static Element Count
255
258
-----------------------------------------------
256
259
257
-
For this first case - the tool can simply declare standard data elements
260
+
For this first case - the tool can declare standard data elements
258
261
below an output collection element in the outputs tag of the tool definition.
Templates (e.g. the ``command`` tag) can then reference ``$forward`` and ``$reverse`` or whatever ``name`` the corresponding ``data`` elements are given.
269
-
- as demonstrated in ``test/functional/tools/collection_creates_pair.xml``.
271
+
Templates (e.g. the ``command`` tag) can then reference ``$forward`` and ``$reverse`` or whatever
272
+
``name`` the corresponding ``data`` elements are given as demonstrated
273
+
in ``test/functional/tools/collection_creates_pair.xml``.
270
274
271
-
The tool should describe the collection type via the type attribute on the collection element. Data elements can define ``format``, ``format_source``, ``metadata_source``, ``from_work_dir``, and ``name``.
275
+
The tool should describe the collection type via the type attribute on the collection element.
276
+
Data elements can define ``format``, ``format_source``, ``metadata_source``, ``from_work_dir``, and ``name``.
272
277
273
-
The above syntax would also work for the corner case of static lists. For paired collections specifically however, the type plugin system now knows how to prototype a pair so the following even easier (though less configurable) syntax works.
278
+
The above syntax would also work for the corner case of static lists.
279
+
For paired collections specifically however, the type plugin system now
280
+
knows how to prototype a pair so the following even easier (though less configurable) syntax works.
In this case the command template could then just reference ``${paried_output.forward}`` and ``${paired_output.reverse}`` as demonstrated in ``test/functional/tools/collection_creates_pair_from_type.xml``.
287
+
In this case the command template could then just reference ``${paried_output.forward}``
288
+
and ``${paired_output.reverse}`` as demonstrated in ``test/functional/tools/collection_creates_pair_from_type.xml``.
281
289
282
290
2. Computable Element Count
283
291
-----------------------------------------------
284
292
285
-
For the second case - where the structure of the output is based on the structure of an input - a structured_like attribute can be defined on the collection tag.
293
+
For the second case - where the structure of the output is based on the structure of an
294
+
input - a structured_like attribute can be defined on the collection tag.
Templates can then loop over ``input1`` or ``list_output`` when buliding up command-line expressions. See ``test/functional/tools/collection_creates_list.xml`` for an example.
300
+
Templates can then loop over ``input1`` or ``list_output`` when buliding up command-line
301
+
expressions. See ``test/functional/tools/collection_creates_list.xml`` for an example.
292
302
293
-
``format``, ``format_source``, and ``metadata_source`` can be defined for such collections if the format and metadata are fixed or based on a single input dataset. If instead the format or metadata depends on the formats of the collection it is structured like - ``inherit_format="true"`` and/or ``inherit_metadata="true"`` should be used instead - which will handle corner cases where there are for instance subtle format or metadata differences between the elements of the incoming list.
303
+
``format``, ``format_source``, and ``metadata_source`` can be defined for such collections if the
304
+
format and metadata are fixed or based on a single input dataset. If instead the format or metadata
305
+
depends on the formats of the collection it is structured like - ``inherit_format="true"`` and/or
306
+
``inherit_metadata="true"`` should be used instead - which will handle corner cases where there are
307
+
for instance subtle format or metadata differences between the elements of the incoming list.
294
308
295
309
3. Dynamic Element Count
296
310
-----------------------------------------------
297
311
298
-
The third and most general case is when the number of elements in a list cannot be determined until runtime. For instance, when splitting up files by various dynamic criteria.
312
+
The third and most general case is when the number of elements in a list cannot be determined
313
+
until runtime. For instance, when splitting up files by various dynamic criteria.
299
314
300
-
In this case a collection may define one of more discover_dataset elements. As an example of one such tool that splits a tabular file out into multiple tabular files based on the first column see ``test/functional/tools/collection_split_on_column.xml`` - which includes the following output definition:
315
+
In this case a collection may define one of more discover_dataset elements. As an example of
316
+
one such tool that splits a tabular file out into multiple tabular files based on the first
317
+
column see ``test/functional/tools/collection_split_on_column.xml`` - which includes the following output definition:
301
318
302
-
::
319
+
.. code-block:: xml
303
320
304
321
<collectionname="split_output"type="list"label="Table split on first column">
0 commit comments