Skip to content

Latest commit

 

History

History
723 lines (544 loc) · 30.9 KB

File metadata and controls

723 lines (544 loc) · 30.9 KB

Configuration

Once you have installed pygeoapi, it's time to setup a configuration. pygeoapi's runtime configuration is defined in the YAML format which is then referenced via the PYGEOAPI_CONFIG environment variable. You can name the file whatever you wish; typical filenames end with .yml.

Note

A sample configuration can always be found in the pygeoapi GitHub repository.

pygeoapi configuration contains the following core sections:

  • server: server-wide settings
  • pubsub: Publish-Subscribe settings (optional)
  • logging: logging configuration
  • metadata: server-wide metadata (contact, licensing, etc.)
  • resources: dataset collections, processes and stac-collections offered by the server

The full configuration schema with descriptions of all available properties can be found here.

Note

Standard YAML mechanisms can be used (anchors, references, etc.) for reuse and compactness.

Configuration directives and reference are described below via annotated examples.

Reference

server

The server section provides directives on binding and high level tuning.

For more information related to API design rules (the api_rules property in the example below) see :ref:`API Design Rules`.

server:
  bind:
      host: 0.0.0.0  # listening address for incoming connections
      port: 5000  # listening port for incoming connections
  url: http://localhost:5000/  # url of server
  icon: https://example.org/favicon.ico  # favicon / shortcut icon for default HTML template customization
  logo: https://example.org/logo.png  # logo/banner for default HTML template customization
  mimetype: application/json; charset=UTF-8  # default MIME type
  encoding: utf-8  # default server encoding
  language: en-US  # default server language
  locale_dir: /path/to/translations
  ogc_schemas_location: /opt/schemas.opengis.net  # optional local copy of https://schemas.opengis.net
  gzip: false  # default server config to gzip/compress responses to requests with gzip in the Accept-Encoding header
  cors: true  # boolean on whether server should support CORS
  pretty_print: true  # whether JSON responses should be pretty-printed
  admin: false  # whether to enable the Admin API

  # server limits on number of items to return.
  # overridable when redefined in resource level configuration
  limits:
      default_items: 50
      max_items: 1000
      max_distance_x: 25
      max_distance_y: 25
      max_distance_units: m
      on_exceed: throttle  # throttle or error (default=throttle)

  # configuration to specify directory tree for HTML page templates
  # omit this to use the default pygeoapi templates
  # overridable when redefined in resource level configuration
  templates:
    # recommended to use absolute paths
    path: /path/to/jinja2/templates/folder # path to templates folder containing the Jinja2 template HTML files
    static: /path/to/static/folder # path to static folder containing css, js, images and other static files referenced by the template

  # leaflet map setup for HTML template rendering
  map:
      url: https://tile.openstreetmap.org/{z}/{x}/{y}.png
      attribution: '&copy; <a href="https://openstreetmap.org/copyright">OpenStreetMap contributors</a>'

  # optional OGC API - Processes asynchronous job management configuration
  manager:
      name: TinyDB  # plugin name (see pygeoapi.plugin for supported process_manager's)
      connection: /tmp/pygeoapi-process-manager.db  # connection info to store jobs (e.g. filepath)
      output_dir: /tmp/  # temporary file area for storing job results (files)

  # optional API design rules to which pygeoapi should adhere
  api_rules:
      api_version: 1.2.3  # omit to use pygeoapi's software version
      strict_slashes: true  # trailing slashes will not be allowed and result in a 404
      url_prefix: 'v{api_major}'  # adds a /v1 prefix to all URL paths
      version_header: X-API-Version  # add a response header of this name with the API version

pubsub

The pubsub section provides directives for enabling publication of CloudEvent messaages on item-based transactions

pubsub:
    name: MQTT
    broker:
        url: mqtt://localhost:1883
        channel: my/service/topic
.. seealso::
   :ref:`pubsub` for more information on Publish-Subscribe capabilities


logging

The logging section provides directives for logging messages which are useful for debugging.

logging:
    level: ERROR  # the logging level (see https://docs.python.org/3/library/logging.html#logging-levels)
    logfile: /path/to/pygeoapi.log  # the full file path to the logfile
    logformat:  # example for milliseconds:'[%(asctime)s.%(msecs)03d] {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s'
    dateformat:  # example for milliseconds:'%Y-%m-%dT%H:%M:%S'

Note

If level is defined and logfile is undefined, logging messages are output to the server's stdout.

logging.rotation

The rotation supports rotation of disk log files. The logfile file is opened and used as the stream for logging.

logging:
    logfile: /path/to/pygeoapi.log  # the full file path to the logfile
    rotation:
        mode:  # [time|size]
        when:  # [s|m|h|d|w0-w6|midnight]
        interval:
        max_bytes:
        backup_count:

Note

Rotation block is not mandatory and defined only when needed. The mode can be defined by size or time. For RotatingFileHandler set mode size and parameters max_bytes and backup_count.

For TimedRotatingFileHandler set mode time and parameters when, interval and backup_count.

metadata

The metadata section provides settings for overall service metadata and description.

metadata:
    identification:
        title: pygeoapi default instance  # the title of the service
        description: pygeoapi provides an API to geospatial data  # some descriptive text about the service
        keywords:  # list of keywords about the service
            - geospatial
            - data
            - api
        keywords_type: theme  # keyword type as per the ISO 19115 MD_KeywordTypeCode codelist. Accepted values are discipline, temporal, place, theme, stratum
        terms_of_service: https://creativecommons.org/licenses/by/4.0/  # terms of service
        url: https://example.org  # informative URL about the service
    license:  # licensing details
        name: CC-BY 4.0 license
        url: https://creativecommons.org/licenses/by/4.0/
    provider:  # service provider details
        name: Organization Name
        url: https://pygeoapi.io
    contact:  # service contact details
        name: Lastname, Firstname
        position: Position Title
        address: Mailing Address
        city: City
        stateorprovince: Administrative Area
        postalcode: Zip or Postal Code
        country: Country
        phone: +xx-xxx-xxx-xxxx
        fax: +xx-xxx-xxx-xxxx
        email: you@example.org
        url: Contact URL
        hours: Mo-Fr 08:00-17:00
        instructions: During hours of service. Off on weekends.
        role: pointOfContact

resources

The resources section lists 1 or more dataset collections to be published by the server. The key of the resource name is the advertised collection identifier.

The resource.type property is required. Allowed types are:

  • collection
  • process
  • stac-collection

The providers block is a list of 1..n providers with which to operate the data on. Each provider requires a type property. Allowed types are:

  • feature
  • coverage
  • tile

A collection's default provider can be qualified with default: true in the provider configuration. If default is not included, the first provider is assumed to be the default.

resources:
    obs:
        type: collection  # REQUIRED (collection, process, or stac-collection)
        visibility: default  # OPTIONAL
        title: Observations  # title of dataset
        description: My cool observations  # abstract of dataset
        keywords:  # list of related keywords
            - observations
            - monitoring
        linked-data:  # linked data configuration (see Linked Data section)
            context:
                - datetime: https://schema.org/DateTime
                - vocab: https://example.com/vocab#
                  stn_id: "vocab:stn_id"
                  value: "vocab:value"
        links: # list of 1..n related links
            - type: text/csv  # MIME type
              rel: canonical  # link relations per https://www.iana.org/assignments/link-relations/link-relations.xhtml
              title: data  # title
              href: https://github.com/mapserver/mapserver/blob/branch-7-0/msautotest/wxs/data/obs.csv  # URL
              hreflang: en-US  # language
        extents:  # spatial and temporal extents
            spatial:  # required
                bbox: [-180,-90,180,90]  # list of minx, miny, maxx, maxy
                crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84  # CRS
            temporal:  # optional
                begin: 2000-10-30T18:24:39Z  # start datetime in RFC3339
                end: 2007-10-30T08:57:29Z  # end datetime in RFC3339
                trs: http://www.opengis.net/def/uom/ISO-8601/0/Gregorian  # TRS
                resolution: P1D  # ISO 8601 duration
                default: 2000-10-30T18:24:39Z  # default time
            # additional extents can be added as desired (1..n)
            foo:
                url: https://example.org/def  # required URL of the extent
                range: [0, 10] # required overall range/extent
                units: °C  # optional units
                values: [0, 2, 5, 5, 10]  # optional, enumeration of values
        providers:  # list of 1..n required connections information
            - type: feature  # underlying data geospatial type. Allowed values are: feature, coverage, record, tile, edr
              name: CSV  # required: plugin name or import path. See Plugins section for more information.
              data: tests/data/obs.csv  # required: the data filesystem path or URL, depending on plugin setup
              id_field: id  # required for vector data, the field corresponding to the ID

              # optional fields
              uri_field: uri  # field corresponding to the Uniform Resource Identifier (see Linked Data section)
              time_field: datetimestamp  # field corresponding to the temporal property of the dataset
              title_field: foo  # field of which property to display as title/label on HTML pages
              default: true  # if not specified, the first provider definition is considered the default
              properties:  # if specified, return only the following properties, in order
                  - stn_id
                  - value
              format:  # default format
                  name: GeoJSON  # required: format name
                  mimetype: application/json  # required: format mimetype
              options:  # optional options to pass to provider (i.e. GDAL creation)
                  option_name: option_value
              include_extra_query_parameters: false  # include extra query parameters that are not part of the collection properties (default: false)
              # editable transactions: DO NOT ACTIVATE unless you have setup access control beyond pygeoapi
              editable: true  # optional: if backend is writable, default is false
              # coordinate reference systems (CRS) section is optional
              # default CRSs are http://www.opengis.net/def/crs/OGC/1.3/CRS84 (coordinates without height)
              # and http://www.opengis.net/def/crs/OGC/1.3/CRS84h (coordinates with ellipsoidal height)
              crs:  # supported coordinate reference systems (CRS) for 'crs' and 'bbox-crs' query parameters
                  - http://www.opengis.net/def/crs/EPSG/0/28992
                  - http://www.opengis.net/def/crs/OGC/1.3/CRS84
                  - http://www.opengis.net/def/crs/EPSG/0/4326
              storage_crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84  # optional CRS in which data is stored, default: as 'crs' field
              storage_crs_coordinate_epoch: 2017.23  # optional, if storage_crs is a dynamic coordinate reference system
              always_xy: false  # optional should CRS respect axis ordering
        formatters:  # list of 1..n formatter definitions
            - name: path.to.formatter  # Python path of formatter definition
              attachment: true  # whether or not to provide as an attachment or normal response
              geom: false  # whether or not to include geometry

    hello-world:  # name of process
        type: process  # REQUIRED (collection, process, or stac-collection)
        processor:
            name: HelloWorld  # Python path of process definition
.. seealso::
   `Linked Data`_ for optionally configuring linked data datasets

.. seealso::
   :ref:`plugins` for more information on plugins

Using environment variables

pygeoapi configuration supports using system environment variables, which can be helpful for deploying into 12 factor environments for example.

Below is an example of how to integrate system environment variables in pygeoapi.

server:
    bind:
        host: ${MY_HOST}
        port: ${MY_PORT}

Multiple environment variables are supported as follows:

data: ${MY_HOST}:${MY_PORT}

It is also possible to define a default value for a variable in case it does not exist in the environment using a syntax like: value: ${ENV_VAR:-the default}

server:
    bind:
        host: ${MY_HOST:-localhost}
        port: ${MY_PORT:-5000}
metadata:
    identification:
        title:
            en: This is pygeoapi host ${MY_HOST} and port ${MY_PORT:-5000}, nice to meet you!

Adding links to collections

You can add any type of link to a resource of type collection. pygeoapi does not enforce anything here, as long as the link has a type, rel, and href parameter. The type parameter defines the MIME type (Content-Type) of the linked resource. The rel parameter tell something about what kind of link it is. You could set this to license to add a data license link, or to describedBy if you wish to add a schema definition for example.

It's also possible to add (bulk) download links to a collection. These links should have their rel parameter set to enclosure and must have a length parameter that defines the content length (byte size) of the file. If you know the content length and it never changes, you can set this and pygeoapi will return the enclosure link(s) as-is.

However, the downloadable resource may be subject to change (e.g. it may grow in size over time). In that case, you can omit the length and pygeoapi will figure out the actual Content-Length header by issuing a HEAD request on the given URL (href parameter). Furthermore, if it notices that the defined type (MIME type) of the link does not match the actual Content-Type in the response headers, it will automatically update the type accordingly. Note that type is a mandatory link parameter though, so you must always set it.

So for example, you could define a download link like so:

links
  - type: application/octet-stream  # must have some MIME type
    rel: enclosure
    title: download link
    href: https://myserver.com/data/file.zip  # URL

And pygeoapi will turn that into:

{
  "links": {
    "type": "application/zip",
    "rel": "enclosure",
    "title": "download link",
    "href": "https://myserver.com/data/file.zip",
    "length": 46435
  }
}

Note how the MIME type was updated to match the actual Content-Type and that the length was set according to the Content-Length header.

Note

If the length parameter is omitted and pygeoapi was not able to verify the Content-Length within 1 second and/or within 1 URL redirect, the enclosure link will not be included in the response. This means that if you want to be sure that the link is always included, you will have to set a length.

Publishing hidden resources

pygeoapi allows for publishing resources without advertising them explicitly via its collections and OpenAPI endpoints. The resource is available if the client knows the name of the resource apriori.

To provide hidden resources, the resource must provide a visibility: hidden property. For example, considering the following resource:

resources:
     foo:
         title: my hidden resource
         visibility: hidden

Examples:

curl https://example.org/collections  # resource foo is not advertised
curl https://example.org/openapi  # resource foo is not advertised
curl https://example.org/collections/foo  # user can access resource normally

API Design Rules

Some pygeoapi setups may wish to adhere to specific API design rules that apply at an organization. The api_rules object in the server section of the configuration can be used for this purpose.

Note that the entire api_rules object is optional. No rules will be applied if the object is omitted.

The following properties can be set:

api_version

If specified, this property is a string that defines the semantic version number of the API. Note that this number should reflect the state of the API data model (request and response object structure, API endpoints, etc.) and does not necessarily correspond to the software version of pygeoapi. For example, the software could have been completely rewritten (which changes the software version number), but the API data model might still be the same as before.

Unfortunately, pygeoapi currently does not offer a way to keep track of the API version. This means that you need to set (and maintain) your own version here or leave it empty or unset. In the latter case, the software version of pygeoapi will be used instead.

strict_slashes

Some API rules state that trailing slashes at the end of a URL are not allowed if they point to a specific resource item. In that case, you may wish to set this property to true. Doing so will result in a 404 Not Found if a user adds a / to the end of a URL. If omitted or false (default), it does not matter whether the user omits or adds the / to the end of the URL.

url_prefix

Set this property to include a prefix in the URL path (e.g. https://base.com/<my_prefix>/endpoint). Note that you do not need to include slashes (either at the start or the end) here: they will be added automatically.

If you wish to include the API version number (depending on the api_version property) in the prefix, you can use the following variables:

  • {api_version}: full semantic version number
  • {api_major}: major version number
  • {api_minor}: minor version number
  • {api_build}: build number

For example, if the API version is 1.2.3, then a URL prefix template of v{api_major} will result in v1 as the actual prefix.

version_header

Set this property to add a header to each pygeoapi response that includes the semantic API version (see api_version). If omitted, no header will be added. Common names for this header are API-Version or X-API-Version. Note that pygeoapi already adds a X-Powered-By header by default that includes the software version number.

Hierarchical collections

Collections defined in the resources section are identified by the resource key. The key of the resource name is the advertised collection identifier. For example, given the following:

resources:
  lakes:
    ...

The resulting collection will be made available at http://localhost:5000/collections/lakes

All collections are published by default to http://localhost:5000/collections. To enable hierarchical collections, provide the hierarchy in the resource key. Given the following:

resources:
  naturalearth/lakes:
    ...

The resulting collection will then be made available at http://localhost:5000/collections/naturalearth/lakes

Note

This functionality may change in the future given how hierarchical collection extension specifications evolve at OGC.

Note

Collection grouping is not available. This means that while URLs such as http://localhost:5000/collections/naturalearth/lakes function as expected, URLs such as http://localhost:5000/collections/naturalearth will not provide aggregate collection listing or querying. This functionality is also to be determined based on the evolution of hierarchical collection extension specifications at OGC.

Selective properties in feature and record providers

Providers defined in the providers section of a feature/record collection definition can support selective properties to return only a subset of the schema attributes. This allows to specialise the behavior of queryables and the GeoJSON's properties returned in the payload.

For example, given the above example of the lakes collection a restriction on the schema properties returned by its provider can be defined with the following:

resources:
  lakes:
    ...
    providers:
      - type: feature
        name: ...
        data:
          ...
        properties:
          - name

Examples:

curl https://example.org/collections/lakes/queryables  # only the name definition is returned
curl https://example.org/collections/lakes/items  # only the name attribute is returned in properties
curl https://example.org/collections/lakes/items/{item_id}  # only the name attribute is returned in properties

Limiting data responses

pygeoapi defines a limits configuration parameter that will allow a user to define default and maximum limits for multiple data types. This parameter is defined at the server level (server.limits) with the ability to override at resource level (resources[*].limits). An example of this setting is shown below:

limits:
    default_items: 10  # applies to vector data
    max_items: 500  # applies to vector data
    max_distance_x: 123  # applies to all datasets
    max_distance_y: 456 # applies to all datasets
    max_distance_units: m  # as per UCUM https://ucum.org/ucum#section-Tables-of-Terminal-Symbols
    on_exceed: error  # one of error, throttle

The limits setting is applied as follows:

  • can be defined at both the server and resources levels, with resource limits overriding server wide limits settings
  • on_exceed can be set to error or throttle (default). If a client specified limit exceeds those set by the server: - when set to error, an exception is returned - when set to throttle the maximum data allowed by the collection/server/provider is returned

Vector data (features, records)

  • when a limit not specified by the client, limits.default_items can be used to set the result set size
  • when a limit is specified by the client, the minimum of the limit parameter and limits.max_items is calculated to set the result set size

Raster data (coverages, environmental data retrieval)

  • when a bbox or spatial subset is specified by the client, limits.max_distance_x, limits.max_distance_y and limits.max_distance_units are used to determine whether a request has asked for more data than the collection is configured to provide and respond accordingly (via on_exceed)

Linked Data

JSON-LD support

pygeoapi supports structured metadata about a deployed instance, and is also capable of presenting data as structured data. JSON-LD equivalents are available for each HTML page, and are embedded as data blocks within the corresponding page for search engine optimisation (SEO). Tools such as the Google Structured Data Testing Tool can be used to check the structured representations.

The metadata for an instance is determined by the content of the metadata section of the configuration. This metadata is included automatically, and is sufficient for inclusion in major indices of datasets, including the Google Dataset Search.

For collections, at the level of item, the default JSON-LD representation adds:

  • An @id for the item, which is the URL for that item. If uri_field is specified, it is used, otherwise the URL is to its HTML representation in pygeoapi.
  • Separate GeoSPARQL/WKT and schema.org/geo versions of the geometry. schema.org/geo only supports point, line, and polygon geometries. Multipart lines are merged into a single line. The rest of the multipart geometries are transformed reduced and into a polygon via unary union or convex hull transform.
  • @context for the GeoSPARQL and schema geometries.
  • The unpacked properties block into the main body of the item.

For collections, at the level of items, the default JSON-LD representation adds:

  • A schema.org itemList of the @id and @type of each feature in the collection.

The optional configuration options for collections, at the level of an item of items, are:

  • If uri_field is specified, JSON-LD will be updated such that the @id has the value of uri_field for each item in a collection

Note

While this is enough to provide valid RDF (as GeoJSON-LD), it does not allow the properties of your items to be unambiguously interpretable.

pygeoapi currently allows for the extension of the @context to allow properties to be aliased to terms from vocabularies. This is done by adding a context section to the configuration of a dataset.

The default pygeoapi configuration includes an example for the obs sample dataset:

linked-data:
  context:
    - datetime: https://schema.org/DateTime
    - vocab: https://example.com/vocab#
      stn_id: "vocab:stn_id"
      value: "vocab:value"

This is a non-existent vocabulary included only to illustrate the expected data structure within the configuration. In particular, the links for the stn_id and value properties do not resolve. We can extend this example to one with terms defined by schema.org:

linked-data:
  context:
    - schema: https://schema.org/
      stn_id: schema:identifier
      datetime:
          "@id": schema:observationDate
          "@type": schema:DateTime
      value:
          "@id": schema:value
          "@type": schema:Number

Now this has been elaborated, the benefit of a structured data representation becomes clearer. What was once an unexplained property called datetime in the source CSV, it can now be expanded to https://schema.org/observationDate, thereby eliminating ambiguity and enhancing interoperability. Its type is also expressed as https://schema.org/DateTime.

This example demonstrates how to use this feature with a CSV data provider, using included sample data. The implementation of JSON-LD structured data is available for any data provider but is currently limited to defining a @context. Relationships between items can be expressed but is dependent on such relationships being expressed by the dataset provider, not pygeoapi.

An example of a data provider that includes relationships between items is the SensorThings API provider. SensorThings API, by default, has relationships between entities within its data model. Setting the intralink field of the SensorThings provider to true sets pygeoapi to represent the relationship between configured entities as intra-pygeoapi links or URIs. This relationship can further be maintained in the JSON-LD structured data using the appropriate @context with the sosa/ssn ontology. For example:

Things:
  linked-data:
    context:
      - sosa: "http://www.w3.org/ns/sosa/"
        ssn: "http://www.w3.org/ns/ssn/"
        Datastreams: sosa:ObservationCollection

Datastreams:
  linked-data:
    context:
      - sosa: "http://www.w3.org/ns/sosa/"
        ssn: "http://www.w3.org/ns/ssn/"
        Observations: sosa:hasMember
        Thing: sosa:hasFeatureOfInterest

Observations:
  linked-data:
    context:
      - sosa: "http://www.w3.org/ns/sosa/"
        ssn: "http://www.w3.org/ns/ssn/"
        Datastream: sosa:isMemberOf

Sometimes, the JSON-LD desired for an individual feature in a collection is more complicated than can be achieved by aliasing properties using a context. In this case, it is possible to implement a custom Jinja2 template. GeoJSON-LD is rendered using the Jinja2 templates defined in collections/items/item.jsonld and collections/items/index.jsonld. A pygeoapi collection requiring custom GeoJSON-LD can overwrite these templates using dataset level templating. To learn more about Jinja2 templates, see :ref:`html-templating`.

linked-data:
  context:
    - datetime: https://schema.org/DateTime

Validating the configuration

To ensure your configuration is valid, pygeoapi provides a validation utility that can be run as follows:

pygeoapi config validate -c /path/to/my-pygeoapi-config.yml

Summary

At this point, you have the configuration ready to administer the server.