Merge pull request #41 from pkgw/pipeline-fixes

Some pipeline fixes for @astrodavid10
2020-12-08 22:18:28 -05:00 · 2020-12-08 22:18:28 -05:00 · f083551de0
--- a/README.md
+++ b/README.md
@ -78,7 +78,7 @@ and [PyPI](https://pypi.org/project/toasty/#history).
 - [pytest] to run the test suite
 - [PyYAML]
 - [tqdm]
- [wwt_data_formats]
+- [wwt_data_formats] >= 0.7

 [astropy]: https://www.astropy.org/
 [azure-storage-blob]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob
--- a/docs/api/toasty.image.Image.rst
+++ b/docs/api/toasty.image.Image.rst
@ -10,10 +10,12 @@ Image

   .. autosummary::

+      ~Image.default_format
      ~Image.dtype
      ~Image.height
      ~Image.mode
      ~Image.shape
+      ~Image.wcs
      ~Image.width

   .. rubric:: Methods Summary
@ -32,10 +34,12 @@ Image

   .. rubric:: Attributes Documentation

+   .. autoattribute:: default_format
   .. autoattribute:: dtype
   .. autoattribute:: height
   .. autoattribute:: mode
   .. autoattribute:: shape
+   .. autoattribute:: wcs
   .. autoattribute:: width

   .. rubric:: Methods Documentation
--- a/docs/api/toasty.image.ImageMode.rst
+++ b/docs/api/toasty.image.ImageMode.rst
@ -12,6 +12,7 @@ ImageMode

      ~ImageMode.F16x3
      ~ImageMode.F32
+      ~ImageMode.F64
      ~ImageMode.RGB
      ~ImageMode.RGBA

@ -26,6 +27,7 @@ ImageMode

   .. autoattribute:: F16x3
   .. autoattribute:: F32
+   .. autoattribute:: F64
   .. autoattribute:: RGB
   .. autoattribute:: RGBA

--- a/docs/cli/pipeline-approve.rst
+++ b/docs/cli/pipeline-approve.rst
@ -16,7 +16,10 @@ Usage
   toasty pipeline approve [--workdir=WORKDIR] {IMAGE-IDs...}

 The ``IMAGE-IDs`` argument specifies one or more images by their unique
-identifiers.
+identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
+Python ``fnmatch`` module. See examples below.
+
+.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch

 The ``WORKDIR`` argument optionally specifies the location of the pipeline
 workspace directory. The default is the current directory.
@ -26,12 +29,18 @@ Example
 =======

 Before approving an image, it should be validated. First, check the astrometry
-with the help of ``wwtdatatool`` command:
+with the help of ``wwtdatatool`` command. To check a group of images all at once,
+it can be convenient to merge the individual image files into a temporary index:

 .. code-block:: shell

-   wwtdatatool serve processed/noao0201b/
-   [open up http://localhost:8080/index.wtml in the webclient, review]
+   wwtdatatool wtml merge processed/*/index_rel.wtml processed/index_rel.wtml
+   wwtdatatool preview processed/index_rel.wtml
+
+(Change the forward slashes to backslashes if you’re using Windows.) The first
+command merges the individual image WTMLs into a new file,
+``processed/index_rel.wtml``. The second command opens up this combined file in
+the WWT webclient, running an internal webserver to make the data available.

 Next, get a metadata report and check for any issues:

@ -39,14 +48,27 @@ Next, get a metadata report and check for any issues:

   wwtdatatool wtml report processed/noao0201b/index_rel.wtml

-If everything is OK, the image may be approved:
+If everything is OK, you can mark the image as approved:

 .. code-block:: shell

   toasty pipeline approve noao0201b

+You can use `glob patterns`_ to match image names. For instance,
+
+.. code-block:: shell
+
+   toasty pipeline approve "vla*20" "?vlba"
+
+will match every processed image whose identifier begins with ``vla`` and ends
+with ``20``, as well as those whose names are exactly four letters long and end
+with ``vlba``. You generally must make sure to encase glob arguments in
+quotation marks, as shown above, to prevent your shell from attempting to
+process them before Toasty gets a chance to.
+
 After approval of a batch of images, the next step is to :ref:`cli-pipeline-publish`.

+
 Notes
 =====

--- a/docs/cli/pipeline-fetch.rst
+++ b/docs/cli/pipeline-fetch.rst
@ -16,7 +16,10 @@ Usage
   toasty pipeline fetch [--workdir=WORKDIR] {IMAGE-IDs...}

 The ``IMAGE-IDs`` argument specifies one or more images by their unique
-identifiers.
+identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
+Python ``fnmatch`` module. See examples below.
+
+.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch

 The ``WORKDIR`` argument optionally specifies the location of the pipeline
 workspace directory. The default is the current directory.
@ -34,15 +37,37 @@ Fetch two images:
 After fetching, the next step is to :ref:`cli-pipeline-process-todos`.


+Example
+=======
+
+You can use `glob patterns`_ to match candidate names. For instance,
+
+.. code-block:: shell
+
+   toasty pipeline fetch "rubin-*" "soar?"
+
+will match every candidate whose name begins with ``rubin-``,  as well as those
+whose names are exactly five letters long and start with ``soar``. You generally
+must make sure to encase glob arguments in quotation marks, as shown above, to
+prevent your shell from attempting to process them before Toasty gets a chance
+to.
+
+
 Notes
 =====

 Candidate names may be found by looking at the filenames contained in the
 ``candidates`` subdirectory of your workspace.

-For each candidate that is successfully fetched, a sub-subdirectory is created
-in the ``cache_todo`` subdirectory with a name corresponding to the unique
-candidate ID.
+During the fetch process, the candidates are analyzed. Some of them may be
+deemed “not actionable” — a common reason being that an image may not have
+sufficient astrometric information attached for it to be placed on the sky as
+WWT requires. Such candidates will be discarded, with their information files
+moved into the ``rejects`` subdirectory.
+
+For each candidate that is successfully fetched and validated, a
+sub-subdirectory is created in the ``cache_todo`` subdirectory with a name
+corresponding to the unique candidate ID.


 See Also
--- a/docs/pipeline.rst
+++ b/docs/pipeline.rst
@ -39,13 +39,13 @@ command-line program.
 Configuration
 =============

-The root of the *destionation* data repository should contain a configuration
+The root of the *destination* data repository should contain a configuration
 file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up,
 you shouldn’t need to worry about this file. But to get a new pipeline going,
 you need to create it and then place it in your data destination.

-As implied, this file contains structured data in the `YAML
-<https://yaml.org/>`_ format. An example is:
+This file contains structured data in the `YAML <https://yaml.org/>`_ format. An
+example is:

 .. code-block:: YAML

@ -72,7 +72,7 @@ Djangoplicity Data Source
 Currently, the only functional ``source_type`` is ``djangoplicity``, which
 downloads and parses an imagery feed from a website powered by the the
 `Djangoplicity <https://github.com/djangoplicity/djangoplicity>`_ gallery
-system. An example is the `ESO Hubble gallery
+system. An example is the `ESA Hubble gallery
 <https://spacetelescope.org/images/>`_.

 When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml``
--- a/setup.py
+++ b/setup.py
@ -78,7 +78,7 @@ setup_args = dict(
        'pillow>=7.0',
        'PyYAML>=5.0',
        'tqdm>=4.0',
-        'wwt_data_formats>=0.2.0',
+        'wwt_data_formats>=0.7.0',
    ],

    extras_require = {
--- a/toasty/pipeline/cli.py
+++ b/toasty/pipeline/cli.py
@ -12,6 +12,8 @@ pipeline_impl
 '''.split()

 import argparse
+from fnmatch import fnmatch
+import glob
 import os.path
 import sys

@ -19,6 +21,34 @@ from ..cli import die, warn
 from . import NotActionableError


+def evaluate_imageid_args(searchdir, args):
+    """
+    Figure out which image-ID's to process.
+    """
+
+    matched_ids = set()
+    globs_todo = set()
+
+    for arg in args:
+        if glob.has_magic(arg):
+            globs_todo.add(arg)
+        else:
+            # If an ID is explicitly (non-gobbily) added, always add it to the
+            # list, without checking if it exists in `searchdir`. We could check
+            # for it in searchdir now, but we'll have to check later anyway, so
+            # we don't bother.
+            matched_ids.add(arg)
+
+    if len(globs_todo):
+        for filename in os.listdir(searchdir):
+            for g in globs_todo:
+                if fnmatch(filename, g):
+                    matched_ids.add(filename)
+                    break
+
+    return sorted(matched_ids)
+
+
 # The "approve" subcommand

 def approve_setup_parser(parser):
@ -31,8 +61,8 @@ def approve_setup_parser(parser):
    parser.add_argument(
        'cand_ids',
        nargs = '+',
-        metavar = 'CAND-ID',
-        help = 'Name(s) of candidate(s) to approve and prepare for processing'
+        metavar = 'IMAGE-ID',
+        help = 'Name(s) of image(s) to approve for publication (globs accepted)'
    )


@ -51,7 +81,7 @@ def approve_impl(settings):
    proc_dir = mgr._ensure_dir('processed')
    app_dir = mgr._ensure_dir('approved')

-    for cid in settings.cand_ids:
+    for cid in evaluate_imageid_args(proc_dir, settings.cand_ids):
        if not os.path.isdir(os.path.join(proc_dir, cid)):
            die(f'no such processed candidate ID {cid!r}')

@ -90,7 +120,7 @@ def fetch_setup_parser(parser):
        'cand_ids',
        nargs = '+',
        metavar = 'CAND-ID',
-        help = 'Name(s) of candidate(s) to fetch and prepare for processing'
+        help = 'Name(s) of candidate(s) to fetch and prepare for processing (globs accepted)'
    )


@ -102,25 +132,27 @@ def fetch_impl(settings):
    rej_dir = mgr._ensure_dir('rejects')
    src = mgr.get_image_source()

-    for cid in settings.cand_ids:
+    for cid in evaluate_imageid_args(cand_dir, settings.cand_ids):
+        # Funky structure here is to try to ensure that cdata is closed in case
+        # a NotActionable happens, so that we can move the directory on Windows.
        try:
-            cdata = open(os.path.join(cand_dir, cid), 'rb')
-        except FileNotFoundError:
-            die(f'no such candidate ID {cid!r}')
+            try:
+                cdata = open(os.path.join(cand_dir, cid), 'rb')
+            except FileNotFoundError:
+                die(f'no such candidate ID {cid!r}')

-        print(f'fetching {cid} ... ', end='')
-        sys.stdout.flush()
-
-        try:
-            cachedir = mgr._ensure_dir('cache_todo', cid)
-            src.fetch_candidate(cid, cdata, cachedir)
-            print('done')
+            try:
+                print(f'fetching {cid} ... ', end='')
+                sys.stdout.flush()
+                cachedir = mgr._ensure_dir('cache_todo', cid)
+                src.fetch_candidate(cid, cdata, cachedir)
+                print('done')
+            finally:
+                cdata.close()
        except NotActionableError:
            print('not usable')
            os.rename(os.path.join(cand_dir, cid), os.path.join(rej_dir, cid))
            os.rmdir(cachedir)
-        finally:
-            cdata.close()


 # The "init" subcommand
--- a/toasty/tests/test_pipeline.py
+++ b/toasty/tests/test_pipeline.py
@ -89,7 +89,7 @@ class TestPipeline(object):
        args = [
            'pipeline', 'fetch',
            '--workdir', self.work_path('work'),
-            'fake_test1',
+            'fake_test1', '*nomatchisok*',
        ]
        cli.entrypoint(args)

@ -102,7 +102,7 @@ class TestPipeline(object):
        args = [
            'pipeline', 'approve',
            '--workdir', self.work_path('work'),
-            'fake_test1',
+            'fake_test1', 'fake_test?',
        ]
        cli.entrypoint(args)