Skip to content

Commit 606cc5c

Browse files
authored
Merge pull request #498 from galaxyproject/docs-revamp
[WIP] revamp the docs
2 parents 60f7ef2 + 0cd8611 commit 606cc5c

28 files changed

+825
-671
lines changed

README.rst

+174-138
Large diffs are not rendered by default.

docs/.planemo.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
## Specify a default galaxy_root for test and server commands here.
1+
## Specify a default galaxy_root for the `test` and `serve` commands here.
22
#galaxy_root: /home/user/galaxy
33

4-
## Specify github credentials for publishing gists links (e.g. with
5-
## the share_test command).
4+
## Specify github credentials for publishing gist links (e.g. with
5+
## the `share_test` command).
66
#github:
77
# username: <username>
88
# password: <password>

docs/_writing_clusters.rst

+6-4
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,17 @@ should be used.
1313
For example, the StringTie (tool available `here
1414
<https://github.com/galaxyproject/tools-iuc/blob/master/tools/stringtie/stringtie.xml>`__)
1515
binary ``stringtie`` can take an argument ``-p`` that allows specification
16-
of the number of threads to be used. The Galaxy tool sets this up as follows::
16+
of the number of threads to be used. The Galaxy tool sets this up as follows
17+
18+
::
1719

1820
stringtie "$input_bam" -o "$output_gtf" -p "\${GALAXY_SLOTS:-1}" ...
1921

2022
Here we use ``\${GALAXY_SLOTS:-Z}`` instead of a fixed value (Z being an
2123
integer representing a default value in non-Galaxy contexts). The
2224
backslash here is because this value is interpreted at runtime as
2325
environment variable - not during command building time as a templated
24-
value. Now server administrators can configure how many processes the
26+
value. Now server administrators can configure how many processes the
2527
tool should be allowed to use.
2628

2729
For information on how server administrators can configure this value for
@@ -49,8 +51,8 @@ with the following commands.
4951

5052
::
5153

52-
planemo test --job_config_file ~/planemo_job_conf.xml .
53-
planemo serve --job_config_file ~/planemo_job_conf.xml .
54+
$ planemo test --job_config_file ~/planemo_job_conf.xml .
55+
$ planemo serve --job_config_file ~/planemo_job_conf.xml .
5456

5557
For general information on configuring Galaxy to communicate with clusters
5658
check out `this page

docs/_writing_collections.rst

+73-56
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,55 @@
11
Collections
22
==============================
33

4-
Galaxy has the concept of dataset collections to group together and operate
5-
over them as single units with tools and in workflows.
4+
Galaxy has a concept of dataset collections to group together datasets and operate
5+
over them as a single unit.
66

7-
Galaxy collections are hierarchical and composed from two simple collection
7+
Galaxy collections are hierarchical and composed from two collection
88
types - ``list`` and ``paired``.
99

10-
A ``list`` is a simple a collection of datasets (or other collections) where
11-
each element has an ``identifier``. Unlike Galaxy dataset names which are
12-
transformed throughout complex analyses - the ``identifier`` is generally
13-
perserved and can be used for concepts such ``sample`` name that one wants to
14-
perserve the sample name in earlier mapping steps of a workflow and use it
15-
during reduction steps and reporting later in workflows.
10+
* A **list** is a collection of datasets (or other collections) where
11+
each element has an ``identifier``. Unlike Galaxy dataset names which are
12+
transformed throughout complex analyses - the ``identifier`` is generally
13+
perserved and can be used for concepts such as ``sample`` name that one wants to
14+
perserve in the earlier mapping steps of a workflow and use it
15+
during reduction steps and reporting later.
1616

17-
The ``paired`` collection type is much simpler and more specific to sequencing
18-
applications. Each ``paired`` collection consists of a ``forward`` and
19-
``reverse`` dataset.
17+
* The **paired** collection type is much simpler and more specific to sequencing
18+
applications. Each ``paired`` collection consists of a ``forward`` and
19+
``reverse`` dataset.
2020

21-
.. note:: Read more about creating and managing collections on the `Galaxy Wiki <https://wiki.galaxyproject.org/Histories#Dataset_Collections>`__.
21+
.. note:: Read more about creating and managing collections on the
22+
`Galaxy Wiki <https://wiki.galaxyproject.org/Histories#Dataset_Collections>`__.
2223

2324
Composite types include for instance the ``list:paired`` collection type -
2425
which represents a list of dataset pairs. In this case, instead of each
2526
dataset having a list idenifier, each pair of datasets does.
2627

2728
-------------------------------
28-
Consuming Collctions
29+
Consuming Collections
2930
-------------------------------
3031

31-
Many Galaxy tools can in conjuction with collections used without
32-
modification. Galaxy users can take a collection and `map over` any tool that
32+
Many Galaxy tools can be used without modification in conjuction with collections.
33+
Galaxy users can take a collection and ``map over`` any tool that
3334
consumes individual datasets. For instance, early in typical bioinformatics
3435
workflows you may have steps that filter raw data, convert to standard
3536
formats, perform QC on individual files - users can take lists, pairs, or
3637
lists of paired datasets and map over such tools that consume individual
3738
files. Galaxy will then run the tool once for each dataset in the collection
3839
and for each output of that tool Galaxy will rebuild a new collection with the
39-
same `identifier` structure (so sample name or forward/reverse structure is
40+
same ``identifier`` structure (so sample name or forward/reverse structure is
4041
perserved).
4142

4243
Tools can also consume collections if they must or should process multiple
43-
files at once. We will discuss three cases - consuming pairs of datasets,
44-
consuming lists, and consuming arbitrary collections.
44+
files at once. We will discuss three cases:
4545

46-
.. warning:: If you find yourself consuming a collection of files and calling
47-
the underlying application multiple times within the tool command block, you
48-
are likely doing something wrong. Just process and pair or a single dataset
46+
* consuming pairs of datasets
47+
* consuming lists
48+
* consuming arbitrary collections.
49+
50+
.. note:: If you find yourself consuming a collection of files and calling
51+
the underlying application multiple times within the tool command block, you
52+
are likely doing something wrong. Just process a pair or a single dataset
4953
and allow the user to map over the collection.
5054

5155
Processing Pairs
@@ -57,7 +61,7 @@ allow users to either supply paired collections or two individual datasets.
5761
Furthermore, many tools which process pairs of datasets can also process
5862
single datasets. The following ``conditional`` captures this idiom.
5963

60-
::
64+
.. code-block:: xml
6165
6266
<conditional name="fastq_input">
6367
<param name="fastq_input_selector" type="select" label="Single or Paired-end reads" help="Select between paired and single end data">
@@ -79,10 +83,10 @@ single datasets. The following ``conditional`` captures this idiom.
7983
</conditional>
8084
8185
This introduces a new ``param`` type - ``data_collection``. The optional
82-
attribute ``collection_type`` can be specified to specify which kinds of
86+
attribute ``collection_type`` can specify which kinds of
8387
collections are appropriate for this input. Additional ``data`` attributes
84-
such as ``format`` can be specified to further restrict valid collections.
85-
Here we specified that both items of the paired collection must be of datatype
88+
such as ``format`` can further restrict valid collections.
89+
Here we defined that both items of the paired collection must be of datatype
8690
``fastqsanger``.
8791

8892
In Galaxy's ``command`` block, the individual datasets can be accessed using
@@ -105,11 +109,11 @@ The ``data_collection`` parameter type can specify a ``collection_type`` or
105109
consume lists as a tool author. Parameters of type ``data`` can include a
106110
``multiple="True"`` attribute to allow many datasets to be selected
107111
simultaneously. While the default UI will then have Galaxy users pick
108-
individual datsets, they can easily substitute a collections the tool can
109-
process both as individual datasets. This has the benefit of allowing tools to
112+
individual datsets, they can choose a collections as the tool can
113+
process both. This has the benefit of allowing tools to
110114
process either individual datasets or collections.
111115

112-
::
116+
.. code-block:: xml
113117
114118
<param type="data" name="inputs" label="Input BAM(s)" format="bam" multiple="true" />
115119
@@ -147,21 +151,20 @@ Some example tools which consume multiple datasets (including lists) include:
147151

148152
Also see the tools-devteam repository `Pull Request #20 <https://github.com/galaxyproject/tools-devteam/pull/20>`__ modifying the cufflinks suite of tools for collection compatible reductions.
149153

150-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
151-
Identifiers
152-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154+
Processing Identifiers
155+
-------------------------------
153156

154157
As mentioned previously, sample identifiers are preserved through mapping
155158
steps, during reduction steps one may likely want to use these - for
156-
reporting, comparisons, etc.... When using these multiple ``data`` parameters
159+
reporting, comparisons, etc. When using these multiple ``data`` parameters
157160
the dataset objects expose a field called ``element_identifier``. When these
158161
parameters are used with individual datasets - this will just default to being
159162
the dataset's name, but when used with collections this parameter will be the
160-
element_identifier (i.e. the preserved sample name).
163+
``element_identifier`` (i.e. the preserved sample name).
161164

162165
For instance, imagine merging a collection of tabular datasets into a single
163166
table with a new column indicating the sample name the corresponding rows were
164-
derived from using a little ficitious program called ``merge_rows``.
167+
derived from using a little fictitious program called ``merge_rows``.
165168

166169
::
167170

@@ -181,14 +184,14 @@ Some example tools which utilize ``element_identifier`` include:
181184
182185
.. note:: Here we are rewriting the element identifiers to assure everything is safe to
183186
put on the command-line. In the future collections will not be able to contain
184-
keys are potentially harmful and this won't be nessecary.
187+
keys that are potentially harmful and this won't be nessecary.
185188

186189
More on ``data_collection`` parameters
187190
----------------------------------------------
188191

189192
The above three cases (users mapping over single tools, consuming pairs, and
190193
consuming lists using `multiple` ``data`` parameters) are hopefully the most
191-
common ways to consume collections as a tool author - but the
194+
common ways to consume collections for a tool author - but the
192195
``data_collection`` parameter type allows one to handle more cases than just
193196
these.
194197

@@ -240,66 +243,80 @@ implicitly "mapped over" to produce collections as described above - but there
240243
are a variety of situations for which this idiom is insufficient.
241244

242245
Progressively more complex syntax elements exist for the increasingly complex
243-
scenarios. Broadly speaking - the three scenarios covered are the tool
246+
scenarios. Broadly speaking - the three scenarios covered are when the tool
244247
produces...
245248

246249
1. a collection with a static number of elements (mostly for ``paired``
247-
collections, but if a tool does say fixed binning it might make sense to create a list this way as well)
250+
collections, but if a tool has fixed binding it might make sense to create a list this way as well)
248251
2. a ``list`` with the same number of elements as an input list
249-
(this would be a common pattern for normalization applications for
252+
(this would be a common pattern for normalization applications for
250253
instance).
251254
3. a ``list`` where the number of elements is not knowable until the job is
252255
complete.
253256

254257
1. Static Element Count
255258
-----------------------------------------------
256259

257-
For this first case - the tool can simply declare standard data elements
260+
For this first case - the tool can declare standard data elements
258261
below an output collection element in the outputs tag of the tool definition.
259262

260-
::
263+
.. code-block:: xml
261264
262265
<collection name="paired_output" type="paired" label="Split Pair">
263266
<data name="forward" format="txt" />
264267
<data name="reverse" format_source="input1" from_work_dir="reverse.txt" />
265268
</collection>
266269
267270
268-
Templates (e.g. the ``command`` tag) can then reference ``$forward`` and ``$reverse`` or whatever ``name`` the corresponding ``data`` elements are given.
269-
- as demonstrated in ``test/functional/tools/collection_creates_pair.xml``.
271+
Templates (e.g. the ``command`` tag) can then reference ``$forward`` and ``$reverse`` or whatever
272+
``name`` the corresponding ``data`` elements are given as demonstrated
273+
in ``test/functional/tools/collection_creates_pair.xml``.
270274

271-
The tool should describe the collection type via the type attribute on the collection element. Data elements can define ``format``, ``format_source``, ``metadata_source``, ``from_work_dir``, and ``name``.
275+
The tool should describe the collection type via the type attribute on the collection element.
276+
Data elements can define ``format``, ``format_source``, ``metadata_source``, ``from_work_dir``, and ``name``.
272277

273-
The above syntax would also work for the corner case of static lists. For paired collections specifically however, the type plugin system now knows how to prototype a pair so the following even easier (though less configurable) syntax works.
278+
The above syntax would also work for the corner case of static lists.
279+
For paired collections specifically however, the type plugin system now
280+
knows how to prototype a pair so the following even easier (though less configurable) syntax works.
274281

275-
::
282+
.. code-block:: xml
276283
277284
<collection name="paired_output" type="paired" label="Split Pair" format_source="input1">
278285
</collection>
279286
280-
In this case the command template could then just reference ``${paried_output.forward}`` and ``${paired_output.reverse}`` as demonstrated in ``test/functional/tools/collection_creates_pair_from_type.xml``.
287+
In this case the command template could then just reference ``${paried_output.forward}``
288+
and ``${paired_output.reverse}`` as demonstrated in ``test/functional/tools/collection_creates_pair_from_type.xml``.
281289

282290
2. Computable Element Count
283291
-----------------------------------------------
284292

285-
For the second case - where the structure of the output is based on the structure of an input - a structured_like attribute can be defined on the collection tag.
293+
For the second case - where the structure of the output is based on the structure of an
294+
input - a structured_like attribute can be defined on the collection tag.
286295

287-
::
296+
.. code-block:: xml
288297
289298
<collection name="list_output" type="list" label="Duplicate List" structured_like="input1" inherit_format="true">
290299
291-
Templates can then loop over ``input1`` or ``list_output`` when buliding up command-line expressions. See ``test/functional/tools/collection_creates_list.xml`` for an example.
300+
Templates can then loop over ``input1`` or ``list_output`` when buliding up command-line
301+
expressions. See ``test/functional/tools/collection_creates_list.xml`` for an example.
292302

293-
``format``, ``format_source``, and ``metadata_source`` can be defined for such collections if the format and metadata are fixed or based on a single input dataset. If instead the format or metadata depends on the formats of the collection it is structured like - ``inherit_format="true"`` and/or ``inherit_metadata="true"`` should be used instead - which will handle corner cases where there are for instance subtle format or metadata differences between the elements of the incoming list.
303+
``format``, ``format_source``, and ``metadata_source`` can be defined for such collections if the
304+
format and metadata are fixed or based on a single input dataset. If instead the format or metadata
305+
depends on the formats of the collection it is structured like - ``inherit_format="true"`` and/or
306+
``inherit_metadata="true"`` should be used instead - which will handle corner cases where there are
307+
for instance subtle format or metadata differences between the elements of the incoming list.
294308

295309
3. Dynamic Element Count
296310
-----------------------------------------------
297311

298-
The third and most general case is when the number of elements in a list cannot be determined until runtime. For instance, when splitting up files by various dynamic criteria.
312+
The third and most general case is when the number of elements in a list cannot be determined
313+
until runtime. For instance, when splitting up files by various dynamic criteria.
299314

300-
In this case a collection may define one of more discover_dataset elements. As an example of one such tool that splits a tabular file out into multiple tabular files based on the first column see ``test/functional/tools/collection_split_on_column.xml`` - which includes the following output definition:
315+
In this case a collection may define one of more discover_dataset elements. As an example of
316+
one such tool that splits a tabular file out into multiple tabular files based on the first
317+
column see ``test/functional/tools/collection_split_on_column.xml`` - which includes the following output definition:
301318

302-
::
319+
.. code-block:: xml
303320
304321
<collection name="split_output" type="list" label="Table split on first column">
305322
<discover_datasets pattern="__name_and_ext__" directory="outputs" />
@@ -309,7 +326,7 @@ Nested Collections
309326
-----------------------------------------------
310327

311328
Galaxy `Pull Request #538 <https://github.com/galaxyproject/galaxy/pull/538>`__
312-
implemented the ability to define nested output collections. See the pull
329+
implemented the ability to define nested output collections. See the pull
313330
request and included example tools for more details.
314331

315332
----------------------

docs/_writing_conclusion.rst

+1
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@ More Information
44

55
* `Galaxy's Tool XML Syntax <https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax>`_
66
* `Big List of Tool Development Resources <https://wiki.galaxyproject.org/Develop/ResourcesTools>`_
7+
* `Cheetah templating <http://www.cheetahtemplate.org/docs/users_guide_html/>`_

docs/_writing_cwl_intro.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ start by doing that.
1414

1515
The ``tool_init`` command can take various complex arguments - but three two
1616
most basic ones are shown above ``--cwl``, ``--id`` and ``--name``. The ``--cwl``
17-
flag simply tells Planemo to generate a Common Workflow Language tool. ``--id`` is
17+
flag tells Planemo to generate a Common Workflow Language tool. ``--id`` is
1818
a short identifier for this tool and it should be unique across all tools.
1919
``--name`` is a short, human-readable name for the the tool - it corresponds
2020
to the ``label`` attribute in the CWL tool document.
@@ -93,7 +93,7 @@ In addition to the actual tool file, a test file will be generated
9393
using the example command and provided test data. The file contents are as
9494
follows:
9595

96-
.. literalinclude:: writing/seqtk_seq_tests_v3.yml
96+
.. literalinclude:: writing/seqtk_seq_v3_tests.yml
9797
:language: yaml
9898

9999
This file is a planemo-specific artifact. This file may contain 1 or more
@@ -103,16 +103,16 @@ the example command to build just one test.
103103
Each test consists of a few parts:
104104

105105
- ``doc`` - this attribute provides a short description for the test.
106-
- ``job`` - this can be the path to a CWL job description or a job
107-
description embedded right in the test (``tool_init`` builds the latter).
106+
- ``job`` - this can be the path to a CWL job description or a job
107+
description embedded right in the test (``tool_init`` builds the latter).
108108
- ``outputs`` - this section describes the expected output for a test. Each
109109
output ID of the tool or workflow under test can appear as a key. The
110110
example above just describes expected specific output file contents exactly
111111
but many more expectations can be described.
112112

113-
The tests described in this file can be run using the planemo ``test`` (or
114-
simply ``t``) command on the original file. By default, planemo will run tool
115-
tests with Galaxy but we can also specify the use of ``cwltool`` (the
113+
The tests described in this file can be run using the ``planemo t`` command
114+
on the original file. By default, planemo will run tool
115+
tests with Galaxy but we can also specify the use of ``cwltool`` (the
116116
reference implementation of CWL) which will be quicker and more robust until
117117
while Galaxy support for the CWL is still in development.
118118

@@ -130,7 +130,7 @@ using the ``serve`` (or just ``s``) command.
130130
...
131131
serving on http://127.0.0.1:9090
132132

133-
Open up http://127.0.0.1:9090 in a web browser to view your new
133+
Open up http://127.0.0.1:9090 in a web browser to view your new
134134
tool.
135135

136136
For more information on the Common Workflow Language check out the Draft 3

0 commit comments

Comments
 (0)