Merge pull request #41 from pkgw/pipeline-fixes
Some pipeline fixes for @astrodavid10
This commit is contained in:
Коммит
f083551de0
|
@ -78,7 +78,7 @@ and [PyPI](https://pypi.org/project/toasty/#history).
|
|||
- [pytest] to run the test suite
|
||||
- [PyYAML]
|
||||
- [tqdm]
|
||||
- [wwt_data_formats]
|
||||
- [wwt_data_formats] >= 0.7
|
||||
|
||||
[astropy]: https://www.astropy.org/
|
||||
[azure-storage-blob]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob
|
||||
|
|
|
@ -10,10 +10,12 @@ Image
|
|||
|
||||
.. autosummary::
|
||||
|
||||
~Image.default_format
|
||||
~Image.dtype
|
||||
~Image.height
|
||||
~Image.mode
|
||||
~Image.shape
|
||||
~Image.wcs
|
||||
~Image.width
|
||||
|
||||
.. rubric:: Methods Summary
|
||||
|
@ -32,10 +34,12 @@ Image
|
|||
|
||||
.. rubric:: Attributes Documentation
|
||||
|
||||
.. autoattribute:: default_format
|
||||
.. autoattribute:: dtype
|
||||
.. autoattribute:: height
|
||||
.. autoattribute:: mode
|
||||
.. autoattribute:: shape
|
||||
.. autoattribute:: wcs
|
||||
.. autoattribute:: width
|
||||
|
||||
.. rubric:: Methods Documentation
|
||||
|
|
|
@ -12,6 +12,7 @@ ImageMode
|
|||
|
||||
~ImageMode.F16x3
|
||||
~ImageMode.F32
|
||||
~ImageMode.F64
|
||||
~ImageMode.RGB
|
||||
~ImageMode.RGBA
|
||||
|
||||
|
@ -26,6 +27,7 @@ ImageMode
|
|||
|
||||
.. autoattribute:: F16x3
|
||||
.. autoattribute:: F32
|
||||
.. autoattribute:: F64
|
||||
.. autoattribute:: RGB
|
||||
.. autoattribute:: RGBA
|
||||
|
||||
|
|
|
@ -16,7 +16,10 @@ Usage
|
|||
toasty pipeline approve [--workdir=WORKDIR] {IMAGE-IDs...}
|
||||
|
||||
The ``IMAGE-IDs`` argument specifies one or more images by their unique
|
||||
identifiers.
|
||||
identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
|
||||
Python ``fnmatch`` module. See examples below.
|
||||
|
||||
.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch
|
||||
|
||||
The ``WORKDIR`` argument optionally specifies the location of the pipeline
|
||||
workspace directory. The default is the current directory.
|
||||
|
@ -26,12 +29,18 @@ Example
|
|||
=======
|
||||
|
||||
Before approving an image, it should be validated. First, check the astrometry
|
||||
with the help of ``wwtdatatool`` command:
|
||||
with the help of ``wwtdatatool`` command. To check a group of images all at once,
|
||||
it can be convenient to merge the individual image files into a temporary index:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
wwtdatatool serve processed/noao0201b/
|
||||
[open up http://localhost:8080/index.wtml in the webclient, review]
|
||||
wwtdatatool wtml merge processed/*/index_rel.wtml processed/index_rel.wtml
|
||||
wwtdatatool preview processed/index_rel.wtml
|
||||
|
||||
(Change the forward slashes to backslashes if you’re using Windows.) The first
|
||||
command merges the individual image WTMLs into a new file,
|
||||
``processed/index_rel.wtml``. The second command opens up this combined file in
|
||||
the WWT webclient, running an internal webserver to make the data available.
|
||||
|
||||
Next, get a metadata report and check for any issues:
|
||||
|
||||
|
@ -39,14 +48,27 @@ Next, get a metadata report and check for any issues:
|
|||
|
||||
wwtdatatool wtml report processed/noao0201b/index_rel.wtml
|
||||
|
||||
If everything is OK, the image may be approved:
|
||||
If everything is OK, you can mark the image as approved:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
toasty pipeline approve noao0201b
|
||||
|
||||
You can use `glob patterns`_ to match image names. For instance,
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
toasty pipeline approve "vla*20" "?vlba"
|
||||
|
||||
will match every processed image whose identifier begins with ``vla`` and ends
|
||||
with ``20``, as well as those whose names are exactly four letters long and end
|
||||
with ``vlba``. You generally must make sure to encase glob arguments in
|
||||
quotation marks, as shown above, to prevent your shell from attempting to
|
||||
process them before Toasty gets a chance to.
|
||||
|
||||
After approval of a batch of images, the next step is to :ref:`cli-pipeline-publish`.
|
||||
|
||||
|
||||
Notes
|
||||
=====
|
||||
|
||||
|
|
|
@ -16,7 +16,10 @@ Usage
|
|||
toasty pipeline fetch [--workdir=WORKDIR] {IMAGE-IDs...}
|
||||
|
||||
The ``IMAGE-IDs`` argument specifies one or more images by their unique
|
||||
identifiers.
|
||||
identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
|
||||
Python ``fnmatch`` module. See examples below.
|
||||
|
||||
.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch
|
||||
|
||||
The ``WORKDIR`` argument optionally specifies the location of the pipeline
|
||||
workspace directory. The default is the current directory.
|
||||
|
@ -34,15 +37,37 @@ Fetch two images:
|
|||
After fetching, the next step is to :ref:`cli-pipeline-process-todos`.
|
||||
|
||||
|
||||
Example
|
||||
=======
|
||||
|
||||
You can use `glob patterns`_ to match candidate names. For instance,
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
toasty pipeline fetch "rubin-*" "soar?"
|
||||
|
||||
will match every candidate whose name begins with ``rubin-``, as well as those
|
||||
whose names are exactly five letters long and start with ``soar``. You generally
|
||||
must make sure to encase glob arguments in quotation marks, as shown above, to
|
||||
prevent your shell from attempting to process them before Toasty gets a chance
|
||||
to.
|
||||
|
||||
|
||||
Notes
|
||||
=====
|
||||
|
||||
Candidate names may be found by looking at the filenames contained in the
|
||||
``candidates`` subdirectory of your workspace.
|
||||
|
||||
For each candidate that is successfully fetched, a sub-subdirectory is created
|
||||
in the ``cache_todo`` subdirectory with a name corresponding to the unique
|
||||
candidate ID.
|
||||
During the fetch process, the candidates are analyzed. Some of them may be
|
||||
deemed “not actionable” — a common reason being that an image may not have
|
||||
sufficient astrometric information attached for it to be placed on the sky as
|
||||
WWT requires. Such candidates will be discarded, with their information files
|
||||
moved into the ``rejects`` subdirectory.
|
||||
|
||||
For each candidate that is successfully fetched and validated, a
|
||||
sub-subdirectory is created in the ``cache_todo`` subdirectory with a name
|
||||
corresponding to the unique candidate ID.
|
||||
|
||||
|
||||
See Also
|
||||
|
|
|
@ -39,13 +39,13 @@ command-line program.
|
|||
Configuration
|
||||
=============
|
||||
|
||||
The root of the *destionation* data repository should contain a configuration
|
||||
The root of the *destination* data repository should contain a configuration
|
||||
file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up,
|
||||
you shouldn’t need to worry about this file. But to get a new pipeline going,
|
||||
you need to create it and then place it in your data destination.
|
||||
|
||||
As implied, this file contains structured data in the `YAML
|
||||
<https://yaml.org/>`_ format. An example is:
|
||||
This file contains structured data in the `YAML <https://yaml.org/>`_ format. An
|
||||
example is:
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
|
@ -72,7 +72,7 @@ Djangoplicity Data Source
|
|||
Currently, the only functional ``source_type`` is ``djangoplicity``, which
|
||||
downloads and parses an imagery feed from a website powered by the the
|
||||
`Djangoplicity <https://github.com/djangoplicity/djangoplicity>`_ gallery
|
||||
system. An example is the `ESO Hubble gallery
|
||||
system. An example is the `ESA Hubble gallery
|
||||
<https://spacetelescope.org/images/>`_.
|
||||
|
||||
When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml``
|
||||
|
|
2
setup.py
2
setup.py
|
@ -78,7 +78,7 @@ setup_args = dict(
|
|||
'pillow>=7.0',
|
||||
'PyYAML>=5.0',
|
||||
'tqdm>=4.0',
|
||||
'wwt_data_formats>=0.2.0',
|
||||
'wwt_data_formats>=0.7.0',
|
||||
],
|
||||
|
||||
extras_require = {
|
||||
|
|
|
@ -12,6 +12,8 @@ pipeline_impl
|
|||
'''.split()
|
||||
|
||||
import argparse
|
||||
from fnmatch import fnmatch
|
||||
import glob
|
||||
import os.path
|
||||
import sys
|
||||
|
||||
|
@ -19,6 +21,34 @@ from ..cli import die, warn
|
|||
from . import NotActionableError
|
||||
|
||||
|
||||
def evaluate_imageid_args(searchdir, args):
|
||||
"""
|
||||
Figure out which image-ID's to process.
|
||||
"""
|
||||
|
||||
matched_ids = set()
|
||||
globs_todo = set()
|
||||
|
||||
for arg in args:
|
||||
if glob.has_magic(arg):
|
||||
globs_todo.add(arg)
|
||||
else:
|
||||
# If an ID is explicitly (non-gobbily) added, always add it to the
|
||||
# list, without checking if it exists in `searchdir`. We could check
|
||||
# for it in searchdir now, but we'll have to check later anyway, so
|
||||
# we don't bother.
|
||||
matched_ids.add(arg)
|
||||
|
||||
if len(globs_todo):
|
||||
for filename in os.listdir(searchdir):
|
||||
for g in globs_todo:
|
||||
if fnmatch(filename, g):
|
||||
matched_ids.add(filename)
|
||||
break
|
||||
|
||||
return sorted(matched_ids)
|
||||
|
||||
|
||||
# The "approve" subcommand
|
||||
|
||||
def approve_setup_parser(parser):
|
||||
|
@ -31,8 +61,8 @@ def approve_setup_parser(parser):
|
|||
parser.add_argument(
|
||||
'cand_ids',
|
||||
nargs = '+',
|
||||
metavar = 'CAND-ID',
|
||||
help = 'Name(s) of candidate(s) to approve and prepare for processing'
|
||||
metavar = 'IMAGE-ID',
|
||||
help = 'Name(s) of image(s) to approve for publication (globs accepted)'
|
||||
)
|
||||
|
||||
|
||||
|
@ -51,7 +81,7 @@ def approve_impl(settings):
|
|||
proc_dir = mgr._ensure_dir('processed')
|
||||
app_dir = mgr._ensure_dir('approved')
|
||||
|
||||
for cid in settings.cand_ids:
|
||||
for cid in evaluate_imageid_args(proc_dir, settings.cand_ids):
|
||||
if not os.path.isdir(os.path.join(proc_dir, cid)):
|
||||
die(f'no such processed candidate ID {cid!r}')
|
||||
|
||||
|
@ -90,7 +120,7 @@ def fetch_setup_parser(parser):
|
|||
'cand_ids',
|
||||
nargs = '+',
|
||||
metavar = 'CAND-ID',
|
||||
help = 'Name(s) of candidate(s) to fetch and prepare for processing'
|
||||
help = 'Name(s) of candidate(s) to fetch and prepare for processing (globs accepted)'
|
||||
)
|
||||
|
||||
|
||||
|
@ -102,25 +132,27 @@ def fetch_impl(settings):
|
|||
rej_dir = mgr._ensure_dir('rejects')
|
||||
src = mgr.get_image_source()
|
||||
|
||||
for cid in settings.cand_ids:
|
||||
for cid in evaluate_imageid_args(cand_dir, settings.cand_ids):
|
||||
# Funky structure here is to try to ensure that cdata is closed in case
|
||||
# a NotActionable happens, so that we can move the directory on Windows.
|
||||
try:
|
||||
cdata = open(os.path.join(cand_dir, cid), 'rb')
|
||||
except FileNotFoundError:
|
||||
die(f'no such candidate ID {cid!r}')
|
||||
try:
|
||||
cdata = open(os.path.join(cand_dir, cid), 'rb')
|
||||
except FileNotFoundError:
|
||||
die(f'no such candidate ID {cid!r}')
|
||||
|
||||
print(f'fetching {cid} ... ', end='')
|
||||
sys.stdout.flush()
|
||||
|
||||
try:
|
||||
cachedir = mgr._ensure_dir('cache_todo', cid)
|
||||
src.fetch_candidate(cid, cdata, cachedir)
|
||||
print('done')
|
||||
try:
|
||||
print(f'fetching {cid} ... ', end='')
|
||||
sys.stdout.flush()
|
||||
cachedir = mgr._ensure_dir('cache_todo', cid)
|
||||
src.fetch_candidate(cid, cdata, cachedir)
|
||||
print('done')
|
||||
finally:
|
||||
cdata.close()
|
||||
except NotActionableError:
|
||||
print('not usable')
|
||||
os.rename(os.path.join(cand_dir, cid), os.path.join(rej_dir, cid))
|
||||
os.rmdir(cachedir)
|
||||
finally:
|
||||
cdata.close()
|
||||
|
||||
|
||||
# The "init" subcommand
|
||||
|
|
|
@ -89,7 +89,7 @@ class TestPipeline(object):
|
|||
args = [
|
||||
'pipeline', 'fetch',
|
||||
'--workdir', self.work_path('work'),
|
||||
'fake_test1',
|
||||
'fake_test1', '*nomatchisok*',
|
||||
]
|
||||
cli.entrypoint(args)
|
||||
|
||||
|
@ -102,7 +102,7 @@ class TestPipeline(object):
|
|||
args = [
|
||||
'pipeline', 'approve',
|
||||
'--workdir', self.work_path('work'),
|
||||
'fake_test1',
|
||||
'fake_test1', 'fake_test?',
|
||||
]
|
||||
cli.entrypoint(args)
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче