Merge pull request #41 from pkgw/pipeline-fixes

Some pipeline fixes for @astrodavid10
This commit is contained in:
Peter Williams 2020-12-08 22:18:28 -05:00 коммит произвёл GitHub
Родитель cd1bf576a2 483b1021a7
Коммит f083551de0
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
9 изменённых файлов: 119 добавлений и 34 удалений

Просмотреть файл

@ -78,7 +78,7 @@ and [PyPI](https://pypi.org/project/toasty/#history).
- [pytest] to run the test suite
- [PyYAML]
- [tqdm]
- [wwt_data_formats]
- [wwt_data_formats] >= 0.7
[astropy]: https://www.astropy.org/
[azure-storage-blob]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob

Просмотреть файл

@ -10,10 +10,12 @@ Image
.. autosummary::
~Image.default_format
~Image.dtype
~Image.height
~Image.mode
~Image.shape
~Image.wcs
~Image.width
.. rubric:: Methods Summary
@ -32,10 +34,12 @@ Image
.. rubric:: Attributes Documentation
.. autoattribute:: default_format
.. autoattribute:: dtype
.. autoattribute:: height
.. autoattribute:: mode
.. autoattribute:: shape
.. autoattribute:: wcs
.. autoattribute:: width
.. rubric:: Methods Documentation

Просмотреть файл

@ -12,6 +12,7 @@ ImageMode
~ImageMode.F16x3
~ImageMode.F32
~ImageMode.F64
~ImageMode.RGB
~ImageMode.RGBA
@ -26,6 +27,7 @@ ImageMode
.. autoattribute:: F16x3
.. autoattribute:: F32
.. autoattribute:: F64
.. autoattribute:: RGB
.. autoattribute:: RGBA

Просмотреть файл

@ -16,7 +16,10 @@ Usage
toasty pipeline approve [--workdir=WORKDIR] {IMAGE-IDs...}
The ``IMAGE-IDs`` argument specifies one or more images by their unique
identifiers.
identifiers. You can specify exact IDs, or `glob patterns`_ as processed by the
Python ``fnmatch`` module. See examples below.
.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch
The ``WORKDIR`` argument optionally specifies the location of the pipeline
workspace directory. The default is the current directory.
@ -26,12 +29,18 @@ Example
=======
Before approving an image, it should be validated. First, check the astrometry
with the help of ``wwtdatatool`` command:
with the help of ``wwtdatatool`` command. To check a group of images all at once,
it can be convenient to merge the individual image files into a temporary index:
.. code-block:: shell
wwtdatatool serve processed/noao0201b/
[open up http://localhost:8080/index.wtml in the webclient, review]
wwtdatatool wtml merge processed/*/index_rel.wtml processed/index_rel.wtml
wwtdatatool preview processed/index_rel.wtml
(Change the forward slashes to backslashes if youre using Windows.) The first
command merges the individual image WTMLs into a new file,
``processed/index_rel.wtml``. The second command opens up this combined file in
the WWT webclient, running an internal webserver to make the data available.
Next, get a metadata report and check for any issues:
@ -39,14 +48,27 @@ Next, get a metadata report and check for any issues:
wwtdatatool wtml report processed/noao0201b/index_rel.wtml
If everything is OK, the image may be approved:
If everything is OK, you can mark the image as approved:
.. code-block:: shell
toasty pipeline approve noao0201b
You can use `glob patterns`_ to match image names. For instance,
.. code-block:: shell
toasty pipeline approve "vla*20" "?vlba"
will match every processed image whose identifier begins with ``vla`` and ends
with ``20``, as well as those whose names are exactly four letters long and end
with ``vlba``. You generally must make sure to encase glob arguments in
quotation marks, as shown above, to prevent your shell from attempting to
process them before Toasty gets a chance to.
After approval of a batch of images, the next step is to :ref:`cli-pipeline-publish`.
Notes
=====

Просмотреть файл

@ -16,7 +16,10 @@ Usage
toasty pipeline fetch [--workdir=WORKDIR] {IMAGE-IDs...}
The ``IMAGE-IDs`` argument specifies one or more images by their unique
identifiers.
identifiers. You can specify exact IDs, or `glob patterns`_ as processed by the
Python ``fnmatch`` module. See examples below.
.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch
The ``WORKDIR`` argument optionally specifies the location of the pipeline
workspace directory. The default is the current directory.
@ -34,15 +37,37 @@ Fetch two images:
After fetching, the next step is to :ref:`cli-pipeline-process-todos`.
Example
=======
You can use `glob patterns`_ to match candidate names. For instance,
.. code-block:: shell
toasty pipeline fetch "rubin-*" "soar?"
will match every candidate whose name begins with ``rubin-``, as well as those
whose names are exactly five letters long and start with ``soar``. You generally
must make sure to encase glob arguments in quotation marks, as shown above, to
prevent your shell from attempting to process them before Toasty gets a chance
to.
Notes
=====
Candidate names may be found by looking at the filenames contained in the
``candidates`` subdirectory of your workspace.
For each candidate that is successfully fetched, a sub-subdirectory is created
in the ``cache_todo`` subdirectory with a name corresponding to the unique
candidate ID.
During the fetch process, the candidates are analyzed. Some of them may be
deemed “not actionable” — a common reason being that an image may not have
sufficient astrometric information attached for it to be placed on the sky as
WWT requires. Such candidates will be discarded, with their information files
moved into the ``rejects`` subdirectory.
For each candidate that is successfully fetched and validated, a
sub-subdirectory is created in the ``cache_todo`` subdirectory with a name
corresponding to the unique candidate ID.
See Also

Просмотреть файл

@ -39,13 +39,13 @@ command-line program.
Configuration
=============
The root of the *destionation* data repository should contain a configuration
The root of the *destination* data repository should contain a configuration
file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up,
you shouldnt need to worry about this file. But to get a new pipeline going,
you need to create it and then place it in your data destination.
As implied, this file contains structured data in the `YAML
<https://yaml.org/>`_ format. An example is:
This file contains structured data in the `YAML <https://yaml.org/>`_ format. An
example is:
.. code-block:: YAML
@ -72,7 +72,7 @@ Djangoplicity Data Source
Currently, the only functional ``source_type`` is ``djangoplicity``, which
downloads and parses an imagery feed from a website powered by the the
`Djangoplicity <https://github.com/djangoplicity/djangoplicity>`_ gallery
system. An example is the `ESO Hubble gallery
system. An example is the `ESA Hubble gallery
<https://spacetelescope.org/images/>`_.
When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml``

Просмотреть файл

@ -78,7 +78,7 @@ setup_args = dict(
'pillow>=7.0',
'PyYAML>=5.0',
'tqdm>=4.0',
'wwt_data_formats>=0.2.0',
'wwt_data_formats>=0.7.0',
],
extras_require = {

Просмотреть файл

@ -12,6 +12,8 @@ pipeline_impl
'''.split()
import argparse
from fnmatch import fnmatch
import glob
import os.path
import sys
@ -19,6 +21,34 @@ from ..cli import die, warn
from . import NotActionableError
def evaluate_imageid_args(searchdir, args):
"""
Figure out which image-ID's to process.
"""
matched_ids = set()
globs_todo = set()
for arg in args:
if glob.has_magic(arg):
globs_todo.add(arg)
else:
# If an ID is explicitly (non-gobbily) added, always add it to the
# list, without checking if it exists in `searchdir`. We could check
# for it in searchdir now, but we'll have to check later anyway, so
# we don't bother.
matched_ids.add(arg)
if len(globs_todo):
for filename in os.listdir(searchdir):
for g in globs_todo:
if fnmatch(filename, g):
matched_ids.add(filename)
break
return sorted(matched_ids)
# The "approve" subcommand
def approve_setup_parser(parser):
@ -31,8 +61,8 @@ def approve_setup_parser(parser):
parser.add_argument(
'cand_ids',
nargs = '+',
metavar = 'CAND-ID',
help = 'Name(s) of candidate(s) to approve and prepare for processing'
metavar = 'IMAGE-ID',
help = 'Name(s) of image(s) to approve for publication (globs accepted)'
)
@ -51,7 +81,7 @@ def approve_impl(settings):
proc_dir = mgr._ensure_dir('processed')
app_dir = mgr._ensure_dir('approved')
for cid in settings.cand_ids:
for cid in evaluate_imageid_args(proc_dir, settings.cand_ids):
if not os.path.isdir(os.path.join(proc_dir, cid)):
die(f'no such processed candidate ID {cid!r}')
@ -90,7 +120,7 @@ def fetch_setup_parser(parser):
'cand_ids',
nargs = '+',
metavar = 'CAND-ID',
help = 'Name(s) of candidate(s) to fetch and prepare for processing'
help = 'Name(s) of candidate(s) to fetch and prepare for processing (globs accepted)'
)
@ -102,25 +132,27 @@ def fetch_impl(settings):
rej_dir = mgr._ensure_dir('rejects')
src = mgr.get_image_source()
for cid in settings.cand_ids:
for cid in evaluate_imageid_args(cand_dir, settings.cand_ids):
# Funky structure here is to try to ensure that cdata is closed in case
# a NotActionable happens, so that we can move the directory on Windows.
try:
cdata = open(os.path.join(cand_dir, cid), 'rb')
except FileNotFoundError:
die(f'no such candidate ID {cid!r}')
try:
cdata = open(os.path.join(cand_dir, cid), 'rb')
except FileNotFoundError:
die(f'no such candidate ID {cid!r}')
print(f'fetching {cid} ... ', end='')
sys.stdout.flush()
try:
cachedir = mgr._ensure_dir('cache_todo', cid)
src.fetch_candidate(cid, cdata, cachedir)
print('done')
try:
print(f'fetching {cid} ... ', end='')
sys.stdout.flush()
cachedir = mgr._ensure_dir('cache_todo', cid)
src.fetch_candidate(cid, cdata, cachedir)
print('done')
finally:
cdata.close()
except NotActionableError:
print('not usable')
os.rename(os.path.join(cand_dir, cid), os.path.join(rej_dir, cid))
os.rmdir(cachedir)
finally:
cdata.close()
# The "init" subcommand

Просмотреть файл

@ -89,7 +89,7 @@ class TestPipeline(object):
args = [
'pipeline', 'fetch',
'--workdir', self.work_path('work'),
'fake_test1',
'fake_test1', '*nomatchisok*',
]
cli.entrypoint(args)
@ -102,7 +102,7 @@ class TestPipeline(object):
args = [
'pipeline', 'approve',
'--workdir', self.work_path('work'),
'fake_test1',
'fake_test1', 'fake_test?',
]
cli.entrypoint(args)