f0fe221484
Reformat long line in openapi.py |
||
---|---|---|
.github | ||
data | ||
oaff | ||
scripts | ||
.dockerignore | ||
.env-dev | ||
.env-integration-tests | ||
.flake8 | ||
.gitignore | ||
.pre-commit-config.yaml | ||
CODE_OF_CONDUCT.md | ||
Dockerfile | ||
LICENSE | ||
README.md | ||
SECURITY.md | ||
SUPPORT.md | ||
babel.cfg | ||
docker-compose.override.yml | ||
docker-compose.test.yml | ||
docker-compose.yml | ||
pyproject.toml | ||
requirements.txt | ||
requirements_dev.txt |
README.md
ogc-api-fast-features
OGC Features API implementation built in Python using FastAPI, compliant with OGC API - Features - Part 1: Core, and inspired by pygeoapi.
ogc-api-fast-features
(oaff
) currently supports a single data source type - PostgreSQL/PostGIS - with wider support for additional data sources planned for a future release.
PostgreSQL/PostGIS
During startup oaff interrogates one or more PostGIS connections to discover all compatible tables and derives metadata for those tables. Tables can be explicitly included or excluded during this process but by default all tables with exactly one spatial column (GEOMETRY or GEOGRAPHY type) and exactly one Primary Key column are made available.
Derived metadata for discovered tables can optionally be overridden or extended via configuration.
Multiple PostgreSQL/PostGIS data sources are supported under a single deployment and all available collections will be aggregated into a unified list.
Demo
For a simple demo clone this repo and execute scripts/server && scripts/demo_data
. This creates a Docker Compose stack and loads some demo data to demonstrate functionality. View at http://localhost:8008
Non-Demo
This section describes configuration requirements for serving data from a non-demo data source.
Environment Variable Prefix
oaff is designed in two parts: an API interface responsible for receiving requests and returning responses ("frontend"), and a business logic component ("backend"). Each has requirements and defaults regarding environment variables.
By default all API environment variables must be prefixed with "API_", e.g. API_LOG_LEVEL
, and all business logic environment variables must be prefixed with "APP_", e.g. APP_DATA_SOURCE_TYPES
, however the prefixes are themselves configurable via envrionment variables. Prefixing is intended to avoid any naming collisions, in which other software in the same environment requires an environment variable of the same name but with a different value. To change the default prefix set API_ENV_VAR_PREFIX
or APP_ENV_VAR_PREFIX
. For example, if APP_ENV_VAR_PREFIX=ABC_
then ABC_DATA_SOURCE_TYPES
. All references to environment variables within this document assume the default prefix.
Data Source Types
APP_DATA_SOURCE_TYPES
is a comma-separated list of the data source types that should be read by a oaff deployment. As only PostgreSQL/PostGIS is currently supported this must be set as APP_DATA_SOURCE_TYPES=postgresql
PostgreSQL/PostGIS
Data Source Naming
oaff supports multiple PostgreSQL/PostGIS data sources within the same API instance and this is managed with source naming. If only using a single data source, naming is not necessary and the following environment variables may be configured:
APP_POSTGRESQL_PROFILE
(mandatory, must be set tostac_hybrid
, see below for details)APP_POSTGRESQL_HOST
(optional, defaults to "localhost")APP_POSTGRESQL_PORT
(optional, defaults to "5432")APP_POSTGRESQL_USER
(optional, defaults to "postgres")APP_POSTGRESQL_PASSWORD
(optional, defaults to "postgres")APP_POSTGERSQL_DBNAME
(optional, defaults to "postgres")
If using multiple data sources, provide unique names in a comma-separated list e.g. APP_POSTGRESQL_SOURCE_NAMES=name1,name2
and append each name to relevant environment variables. For example:
APP_POSTGRESQL_HOST_name1
(optional, defaults to "localhost")APP_POSTGRESQL_HOST_name2
(optional, defaults to "localhost")...
etc
Hereafter, references to environment variables that can be suffixed with a data source's name will be presented in the format APP_ENV_VAR_NAME[_name]
to indicate that the name suffix is optional.
Profiles
The PostgreSQL/PostGIS data source supports the concept of data source profiles, intended to support different strategies for identifying source data within a database. Only a single profile stac_hybrid
currently exists and the profile capability may be considered over-engineering. There are currently no plans to add further profiles. At this time the environment variable APP_POSTGRESQL_PROFILE[_name]
must be set to stac_hybrid
.
MAC (Manage as Collections)
By default a PostgreSQL/PostGIS data source is expected to support a schema called "oaff" and a table within that schema called "collections". Alembic migrations run on API start to create that schema and table if they do not already exist. Some users may not want oaff to modify their database, or may not want to configure oaff with a database user account that has the necessary privileges. The expectation of oaff.collections
existing, and the Alembic functionality to create it, can be disabled by setting APP_POSTGRESQL_MAC[_name]=0
. With MAC
disabled you will not be able to override or extend derived metadata (see below).
If you have reservations around configuring oaff with a database user account that has elevated privileges you can simply start the container once using a privileged account, let it upgrade the database to its expected version, and thereafter run oaff using a non-privileged account. Write access is only needed when a new release includes schema changes and this is not expected to happen on a regular basis.
Temporal Fields
During data interrogation (see above) oaff identifies temporal fields in supported tables and derives metadata to support temporal filtering in data requests. TIMESTAMP WITH TIME ZONE, TIMESTAMP WITHOUT TIME ZONE, and DATE types are supported. Each temporal field is identified individually, and is therefore assumed to represent a single point in time, and this influences how temporal data requests are handled. If an API caller provides a temporal range each temporal field is evaluated on whether its single point in time falls within that range. If an API caller provides a single point in time each temporal field is evaluated on whether it equals that moment.
Sometimes multiple temporal fields in a table may actually represent the beginning and end of a time range, and this affects how a temporal data request should be evaluated. If an API caller provides a temporal range each range within the data should be evaluated by intersection, and if the API caller provides a single point in time each range within the data should be evaluated on whether it contains that moment.
If a table contains temporal ranges it must be configured via oaff.collections
to indicate a relationship between temporal fields. See below for details.
Default Time Zone
If temporal data is stored as TIMESTAMP WITH TIME ZONE PostgreSQL stores data in UTC and transparently converts to any time zone requested by an API caller. However, data stored as TIMESTAMP WITHOUT TIME ZONE or DATE does not have an associated time zone and both the database and oaff are ignorant to its intended context. It is not possible in Python (nor wise in general) to compare temporal data in UTC - as provided by API callers issuing temporal data requests - with temporal data that is not time zone aware. Temporal data must be associated with a time zone, so oaff assigns a default. The default can be configured via APP_POSTGRESQL_DEFAULT_TZ[_name]
and defaults to "UTC". This value must be a string compatible with pytz. Execute pip install pytz && python -c "import os; import pytz; print(os.linesep.join(pytz.all_timezones))"
to see available time zones in pytz.
Supported Temporal Types
In addition to TIMESTAMP WITH TIME ZONE, TIMESTAMP WITHOUT TIME ZONE, and DATE, PostgreSQL also supports a number of range-like temporal data types. These types are not currently supported but support may be added at a later date.
Metadata and oaff.collections
Derived metadata on any supported table can be overridden with an entry in oaff.collections
Column | Nullable | Purpose | Example |
---|---|---|---|
id | False | Overrides the generated collection ID. Generated IDs are stable SHA-256 hashes generated from some source parameters | 'unique-collection-id' |
title | False | Overrides table name as default collection title | 'Unique Collection Title' |
description | True | Overrides empty description field, which cannot be derived during data interrogation | 'I provide descriptive prose about this dataset' |
keywords | True | Provides zero or more keywords for collection tagging in support of SEO and possible future functionality (PostgreSQL array format, not JSON/JSONB) | '{"keyword 1","keyword 2"}' |
license | True | Provides any licensing conditions attached to the collection | 'Example license text' |
providers | True | Provides information about zero or more entities associated with the collection, e.g. who created it, who processed it, etc. JSON format, where "url" and "name" properties are mandatory but "roles" property is optional | '[{"url": "https://domain/path", "name": "Data provider name", "roles": ["producer", "maintainer"]}]' |
extent | True | Overrides derived spatial and/or temporal extents. JSON format. Schema matches STAC spec's extent schema except "spatial" and "temporal" properties are optional and omitting either will retain that part of the derived extent | '{"spatial": {"bbox": -1, -1, 1, 1}, "temporal": {"interval": "1955-11-05T01:35:00Z", "1985-10-26T01:36:00Z"}}' |
temporal | True | Overrides derived temporal column definitions, optionally linking two temporal columns into a range. See above for details. JSON format, supporting two "types" | '[{"type": "range", "start_field": "starts_at", "end_field": "ends_at"}, {"type": "instant", "field": "created"}]' |
schema_name | False | References the schema containing the table | 'public' |
table_name | False | References the table | 'great_scott' |
Whitelisting/Blacklisting
During data interrogation oaff will identify all tables that can be served via its API interface. This may not always be desirable, for example if the database contains a mix of sensitive and non-sensitive data. Whitelisting permits an explicit list of tables to be exposed and blacklisting prevents an explicit list of tables from being exposed. Whitelist and blacklist are exclusive; if both are provided, both will be ignored. Whitelisted or blacklisted table names must be fully-qualified - i.e. schema_name.table_name
- and multiple tables can be specified in a comma-separated list, e.g.
APP_POSTGRESQL_LAYER_WHITELIST[_name]=public.table1,public.table2
APP_POSTGRESQL_LAYER_BLACKLIST[_name]=public.table1,public.table2
CITE Compliance
Follow the instructions here to execute CITE compliance tests against oaff. If executing tests in a Docker container against an API instance in a separate Docker container you may need to reference a special hostname. For example, to execute using Docker on MacOS:
scripts/server && scripts/demo_data
to start the API containersdocker run --rm -p 8081:8080 ogccite/ets-ogcapi-features10
to start the CITE testing container- Navigate browser to http://localhost:8081/teamengine
- Login with
ogctest/ogctest
- Start a new test session
- Provide http://docker.for.mac.localhost:8008 for the test URL
- Note the lack of a trailing slash. There appears to be an error in the CITE tests that results in invalid request paths (e.g.
///conformance
) if the test URL ends in a trailing slash
- Note the lack of a trailing slash. There appears to be an error in the CITE tests that results in invalid request paths (e.g.
Developing
The following provides information for developers looking to maintain or extend oaff. Development requires Python 3.8+.
Commands
scripts/setup
: configure local dev env (Python virtual env recommended)scripts/server
: build and start API containersscripts/demo_data
: load test/demo data for development and refresh API configurationscripts/stop
: stop API containersscripts/update
: rebuild containersscripts/test
: execute testsscripts/cibuild
: execute same build & test process as CI build (executes tests, no Docker build cache)scripts/console
: enter console for named containerscripts/logs
: follow logs for named containerscripts/update_i18n
: extract and compile translations using Babelscripts/format
: format code using isort and Blackscripts/debug_test_start
: start a test database instance to support debugging tests (see below)scripts/debug_test_stop
: stop test database instance
Architecture
Pre-commit hooks
Git hooks are used to run formatting (isort, black), linting (flake8), and type-checking (mypy). The pre-commit scripts can be installed by running scripts/setup
or simply pre-commit install
. Thereafter, the scripts defined in .pre-commit-config.yaml
will be run on the files included in a commit. Specific tool configurations are included in pyproject.toml
.
FastAPI
oaff's frontend uses FastAPI but the backend is not tightly-coupled to that framework. Frontend routers interpret path and query parameters and build request objects using request classes defined in the backend. A single delegate
method passes request objects to the backend and awaits a response. Responses provide either data or error detail and an appropriate API response is constructed from whichever is returned. During the frontend's startup sequence a FrontendConfiguration
object is passed to the backend, providing key frontend properties that need to be known by the backend. Examples:
- when the backend builds a data response to an API request it needs to know the root path from which the API is served to correctly build any links in the response
- when the backend renders an HTML-templated response it needs to know the hosted location of JavaScript and CSS assets
The backend should have no knowledge of frontend behaviour, including request parameter names, beyond what is explicitly provided via FrontendConfiguration
.
Thanks in part to FastAPI's use of async oaff is end-to-end async when responding to API calls, including async connections to PostgreSQL/PostGIS. This should improve its ability to support higher concurrent loads, but benchmarking is required to establish a quantitaive baseline and comparison with other OGC API - Features implementations.
Pygeofilter
oaff depends on pygeofilter to translate spatial and temporal data request parameters into an abstract query structure, and then from that abstract structure into PostgreSQL-compatible SqlAlchemy query objects. In Part 1 (Core) of the OGC API - Features specification only basic spatial and temporal filters are required, and pygeofilter is able to support those requirements. pygeofilter also has developing support for Simple CQL as described in OGC API - Features - Part 3: Filtering and the Common Query Language (CQL) and when oaff extends to CQL support pygeofilter is expected to provide much of that functionality.
Data Sources
During startup the data interrogation phase creates an object whose class extends the DataSource
base class for each of the configured data sources. Each DataSource
is required to implement a number of methods including get_layers
. In order to support additional data source types new DataSource
sub-classes will be required.
Re/configuration
Layer objects represent the layers discovered during the data interrogation phase and are stored in memory. If data sources are modified, or environment variables that affect data interrogation are modified, interrogation must be re-run to acknowledge those changes.
The simplest way to re-run data interrogation is to restart the application, however this may not always be desirable. The frontend provides an endpoint POST /control/reconfigure
that is only accessible from certain request origins and origins are configurable via an environment variable. In theory an administrator could give themselves - or a machine acting on their behalf - access to this endpoint and initiate a reconfiguration following data changes, though this strategy has not been explored extensively. Attempts to access POST /control/reconfigure
from a non-permitted origin will result in a 404 response.
Feature [Set] Provider
oaff currently supports HTML and JSON/GeoJSON output encodings for metadata (e.g. /collections
) and data (e.g. /collections/{collection_id}/items
) requests. Future development efforts are expected to add additional output encodings such as GeoPackage or FlatGeoBuf.
Given the goal of supporting multiple data source types, and multiple output encoding formats, one possible strategy is to use a dependency such as GDAL OGR for data handling. Data can be translated from a wide range of source formats to OGR Layer/Feature objects and from those objects to a wide range of output formats. However there are several scenarios where such a translation will be inefficient. For example, data requested from a GeoJSON source format in a GeoJSON output encoding should not need to be translated via OGR as this unnecessary translation could add considerable overhead to the user's request. Similarly, some source formats may provide the ability to generate a data response in a range of output encodings and thereby offer a more efficient approach to data retrieval. Finally, for many output formats OGR requires a file path to which it can write data. Writing data to disk, and then reading it back to populate the API response, could add considerable overhead to request processing times.
The FeatureProvider
and FeatureSetProvider
base classes support the ability of the API to acknowledge, and take advantage of, nuances in the relationships between source and target formats. PostgresqlFeatureProvider
and PostgresqlFeatureSetProvider
utilise PostgreSQL/PostGIS's ability to generate results in GeoJSON format and thereby avoid the need for object marshalling and JSON serialization within the API.
As additional support is added for other source formats and output encodings it may be necessary to marshall from a source format into a common format using the OGR API, however that does not warrant a common marshalling strategy across all output encodings and any inefficiencies should be limited to those source/output pairs that require marshalling.
Debugging
Debug API in Visual Studio Code
The following launch.json config can be used to debug the API (http://localhost:8123).
A database instance must be running on localhost:5432.
Execute docker-compose up -d postgres
to establish a database instance if one is not already running. Run scripts/demo_data
to create data if required.
A debug instance of the API can execute alongside a running instance of the main Docker Compose stack.
{
"version": "0.2.0",
"configurations": [
{
"name": "FastAPI",
"type": "python",
"request": "launch",
"module": "oaff.fastapi.api.main",
"cwd": "${workspaceFolder}",
"console": "integratedTerminal",
"justMyCode": false,
"args":[ "8123" ],
"env": {
"API_LOG_LEVEL": "debug",
"APP_DATA_SOURCE_TYPES": "postgresql",
"APP_POSTGRESQL_SOURCE_NAMES": "stac",
"APP_POSTGRESQL_PROFILE_stac": "stac_hybrid"
}
}
]
}
Debug Tests in Visual Studio Code
The following launch.json config can be used to debug tests. A database instance must be running on localhost:2345.
Execute scripts/debug_test_start
to establish a test database instance.
Remove the test database instance with scripts/debug_test_stop
.
To debug a single test add , "-k", "<<test name>>"
to the args
array.
The test database instance and tests can execute alongside a running instance of the main Docker Compose stack.
{
"version": "0.2.0",
"configurations": [
{
"name": "Pytest",
"type": "python",
"request": "launch",
"module": "pytest",
"args": ["oaff", "--maxfail=1"],
"env": {
"APP_DATA_SOURCE_TYPES": "postgresql",
"APP_POSTGRESQL_SOURCE_NAMES": "stac",
"APP_POSTGRESQL_HOST_stac": "localhost",
"APP_POSTGRESQL_PROFILE_stac": "stac_hybrid",
"APP_POSTGRESQL_PORT_stac": "2345"
}
}
]
}
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.