NAV
cURL

Introduction

Welcome to the Resource Watch API Developer Documentation.

Who is this for?

This section covers the behind-the-scenes details of the RW API, that are relevant for developers trying to build their own RW API microservice. If you are looking for instructions on how to use the RW API to power your applications, the RW API Documentation is probably what you are looking for.

The developer documentation is aimed at software developers that are familiar with the RW API from a user perspective, and want to extend or modify the functionality of the API. From a technical point of view, this section assumes you are familiar with some technologies, protocols and patterns that are used on the RW API, such as:

This guide also assumes you are comfortable with programming in general. To keep these docs simple, and as most of the RW API source code is written in nodejs, that is the language we'll use for examples or when presenting specific tools and libraries. However, while we recommend using Nodejs, you may use different tools and/or languages when developing your microservices.

If any of these concepts are new or unfamiliar, we suggest using your favourite search engine to learn more about them before proceeding.

API Architecture

This chapter covers the basic architectural details of the API. If you are new to RW API development, you should start here, as key concepts explained here will be needed as you go more hands-on with the API code and/or infrastructure.

Microservice architecture

The RW API is built using a microservices architecture using Control Tower as the gateway application.

In this configuration, Control Tower (CT) offers gateway and core functionality, like routing or user management, while user-facing functionality is provided by a set of microservices that communicate with each other and the external world through Control Tower.

Internal communication between CT and the microservices is done through HTTP requests, and as such each microservice is built on top of a web server..These different web servers create a network within the API itself, to which will refer throughout the rest of the documentation when we mention "internal network" or "internal requests". By opposition, an "external request" will reference a request from/to the world wide web, and "external network" basically means "the internet".

Control Tower itself acts both as an API in itself - with endpoints for management and monitoring - as well as a catch-all side, that is used to proxy requests to the functional endpoints.

Microservice dependency graphs

Microservice dependency graph

The graph above illustrates the dependencies between different microservices as of July 2020. Most dependencies are based on endpoint calls: an arrow pointing from query to dataset means that the query microservice makes a call to one of the endpoints implemented by the dataset microservice. The exception to this rule are doc-orchestrator, doc-executor and doc-writer, who instead depend on each other via RabbitMQ messages.

Microservice with no dependencies

The microservices above do not depend on any of the other microservices.

It's worth noting that most microservices depend on Control Tower and functionality implicitly offered by it - for example, Control Tower will automatically pass to the microservice the details of the user, should the original HTTP request come with a valid JSON Web Token.

Data layer dependencies

Data layer dependencies

The graph above illustrates the different data layer elements present on the RW API, and the microservices or sites that depend on each of these.

Lifecycle of a request

All incoming requests to the API are handled by CT that, among other things, does the following: - Checks if there's a microservice capable of handling that request. - Checks if authentication data is required and/or is present. - Automatically filters out anonymous requests to authenticated endpoints.

We'll go into more details about these processes in the next sections

Control Tower matches each incoming external request to a microservice, by comparing its URI and request method. It then generates a new HTTP request to that microservice and will wait for a response to it - which is used as a response to the original external request.

Microservices can make requests to each other via Control Tower. They also have unrestricted access to the public internet, so 3rd party services can be accessed as they normally would be. The API infrastructure also has other resources, like databases (MongoDB, ElasticSearch, Postgres) or publish-subscribe queues (RabbitMQ), which can be accessed. We'll cover those in more details in a separate section.

Control Tower

This chapter covers Control Tower in depth - what it does and how it does it.

Overview

Control Tower is essentially a mix of 3 main concepts: - An API proxy/router - A lightweight management API - A plugin system to extend functionality and its own API.

We'll cover those 3 topics individually in this section.

API proxy/router

Control Tower's most basic functionality is accepting external requests to the API, "forwarding" them to the corresponding microservices, and returning whatever response the microservices produce.

Once an external request is received, Control Tower uses the HTTP request method and the URI of the request to match it to one of the known endpoints exposed internally by one of the microservices. Note that all external requests are handled exclusively by Control Tower, and microservices are no able to directly receive requests from outside the API's internal network (they can, however, make external requests).

This matching process can have one of two results:

Handling a matching request

Should a match be found for the external request's URI and method, a new, internal request is generated and dispatched to the corresponding microservice. The original external request will be blocked until the microservice replies to the internal request - and the internal request's reply will be used as the reply to the external request as soon as the corresponding microservice returns it.

Handling a non-matching request

No URI and method match

At its most basic, the external requests is matched by URI and HTTP method. Should no combination of URI and method be found, Control Tower will reply to the external request with a 404 HTTP code.

Authenticated

Microservice endpoints can be marked as requiring authentication. If a matching request is received, but no user authentication data is provided in the request, Control Tower will reject the request with a 401 HTTP code.

Application key

Similar to what happens with authenticated endpoints, microservice endpoints can also request an application to be provided in the request. If that requirement is not fulfilled, Control Tower will reject the request with a 401 HTTP code.

Filter error

Registered endpoints also have the possibility to specify filters - a custom requirement that must be met by the request in order for a request match to be successful. For example, endpoints associated with dataset operations use a common URI and HTTP method, but will be handled by different microservices depending on the type of dataset being used - this type of functionality can be implemented using the filter functionality. On some scenarios, even if all the previous conditions are met, filters may rule out a given match, in which case a 401 HTTP code will be returned.

Microservice registration process

The matching process described above is carried out by Control Tower based on endpoints declared by each microservice. In this section, we'll take a detailed look at the process through which microservices can declare their available endpoints to Control Tower

Overview

Here's a graphical overview of the requests exchanged between CT and a microservice:

Control Tower Request Microservice
<=== POST /v1/microservice <===
{"name":"microservice name", "url": "http://microservice-url.com", "active": true }
===> Reply ===>
HTTP OK
===> GET /api/info ===>
<=== Reply <===
{ /* JSON with microservice details */ }
(every 30 seconds)
===> GET /api/ping ===>
<=== Reply <===
pong

The registration process is started by the microservice. It announces its name, internal URL and active state to Control Tower. This tentatively registers the microservice in Control Tower's database, and triggers the next step of the process.

Immediately after receiving the initial request from the microservice, Control Tower uses the provided URL to reach the microservice. This is used not only to load the endpoint data, but also to ensure that Control Tower is able to reach the microservice at the provided URL - if that does not happen, the registration process is aborted on Control Tower's side, and the microservice data dropped. When it receives this request, the microservice should reply with a JSON array of supported endpoints. We'll dive deeper into the structure of that reply in a moment.

The last step of the process is Control Tower processing that JSON entity, and storing its data in its database. From this point on, the microservice is registered and is able to receive user requests through Control Tower.

Control Tower will, every 30 seconds, emit a ping request to the microservice, which must reply to it to confirm to Control Tower that it's still functional. Should the microservice fail to reply to a ping request, Control Tower will assume its failure, and de-register the microservice and associated endpoints. Should this happen, it's up to the microservice to re-register itself on Control Tower, to be able to continue accepting requests.

Minimal configuration

During the registration process, each microservice is responsible for informing Control Tower of the endpoints it supports, if they require authentication, application info, etc. This is done through a JSON object, that has the following basic structure:

{
    "name": "dataset",
    "tags": ["dataset"],
    "endpoints": [
        {
            "path": "/v1/dataset",
            "method": "GET",
            "redirect": {
                "method": "GET",
                "path": "/api/v1/dataset"
            }
        }
    ],
    "swagger": {}
}

Breaking it down bit by bit:

Within the endpoints array, the expected object structure is the following:

Taking the example above, that reply would be provided by the dataset microservice while registering on Control Tower. It would tell CT that it has a single public endpoint, available at GET <api public URL>/v1/dataset. When receiving that external request, the redirect portion of the endpoint configuration tells CT to issue a GET request to <microservice URL>/api/v1/dataset. It then follows the process previously described to handle the reply from the microservice and return its content to the user.

Advanced configuration

The example from the previous section covers the bare minimum a microservice needs to provide to Control Tower in order to register an endpoint. However, as discussed before, Control Tower can also provide support for more advanced features, like authentication and application data filtering. The JSON endpoint snippet below shows how these optional parameters can be used to configure said functionality.

{
    "name": "query-document",
    "tags": ["query-document"],
    "endpoints": [
        {
            "path": "/v1/query/:dataset",
            "method": "GET",
            "binary": true,
            "authenticated": true,
            "applicationRequired": true,
            "redirect": {
                "method": "POST",
                "path": "/api/v1/document/query/:dataset"
            },
            "filters": [{
                "name": "dataset",
                "path": "/v1/dataset/:dataset",
                "method": "GET",
                "params": {
                    "dataset": "dataset"
                },
                "compare": {
                    "data": {
                        "attributes": {
                            "connectorType": "document"
                        }
                    }
                }
            }]
        }   
    ]
 }

Within the endpoints array, you'll notice a few changes. The path property now includes an URI which contains :dataset in it - this notation tells Control Tower to expect a random value in that part of the URI (delimited by /) and to refer to that value as dataset in other parts of the process.

The redirect.path references the same :dataset value - meaning the incoming value from the external request will be passed on to the internal requests to the microservice generated by Control Tower.

You'll also notice new fields that were not present before:

The filtering section has the most complex structure, so lets analyse it with a real-world example. The above example would match a request like GET <api external URL>/v1/query/<dataset id>. Once that request is received by Control Tower, it is tentatively matched to this endpoint, but pending the validation of the filter section.

To process the filter, Control Tower will do a request of type filters.method (GET, in this case) and URI filters.path. As the URI contains a :dataset variable, it will use the filters.params object to map that value to the dataset value from the external request. This internal request is then issued and the process briefly stopped until a response is returned.

Once the response is returned, the filters.compare object comes into play: Control Tower will see if the response body object matches the set value in the filters.compare object. In this particular example, the resulting comparison would be something like response.body.data.attributes.connectorType == "document". Should this evaluate to true, the filter is considered to have matched, and this endpoint is used to dispatch the internal request to. Otherwise, this endpoint is discarded from the list of possible matches for this external request.

The data loaded as part of the filtering process is then passed on to the microservice that will handle that call, either as query arguments (DELETE or GET requests) or as part of the body (PATCH, POST and PUT requests). The query param/body property is named after the filter.name value configured in the filter. In our example, the request issued to the microservice that matches the filter would be of type POST, so its body will contain a dataset property containing the response from querying the filter endpoint - /v1/dataset/:dataset.

Its worth noting at this stage that there's no restriction regarding uniqueness of internal endpoints - two microservices may support the same endpoint, and use filters to differentiate between them based on the underlying data. The example above illustrates how a microservice can support the /v1/query/:dataset external endpoint, but only for datasets of type document. A different microservice may support the same endpoint, but with a different filter value (for example carto) and offer the same external functionality with a completely different underlying implementation.

Should multiple endpoints match an external request, one of them is chosen and used - there are no guarantees in terms of which is picked, so while this scenario does not produce an error, you probably want to avoid it.

Management API

In the previous section, we discussed how microservices can register their endpoint on Control Tower, exposing their functionality to the outside world. That registration process uses part of Control Tower's own API, which we'll discuss in finer detail in this section.

Microservice management endpoints

The microservice registration endpoint is one of 4 endpoints that exist around microservice management:

GET /microservice/

This endpoint shows a list of registered microservice and their corresponding endpoints.

curl -X GET \
  http://<CT URL>/api/v1/microservice \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'
[
    {
        "infoStatus": {
            "numRetries": 0,
            "error": null,
            "lastCheck": "2018-12-05T07:33:30.244Z"
        },
        "pathInfo": "/info",
        "pathLive": "/ping",
        "status": "active",
        "cache": [],
        "uncache": [],
        "tags": [
            "dataset"
        ],
        "_id": "5aa66766aee7ae846a419c0c",
        "name": "Dataset",
        "url": "http://dataset.default.svc.cluster.local:3000",
        "version": 1,
        "endpoints": [
            {
                "redirect": {
                    "method": "GET",
                    "path": "/api/v1/dataset"
                },
                "path": "/v1/dataset",
                "method": "GET"
            }
        ],
        "updatedAt": "2018-11-23T14:27:10.957Z",
        "swagger": "{}"
    }
]

GET /microservice/status

Lists information about operational status of each microservice - like errors detected by Control Tower trying to contact the microservice or number of retries attempted.

curl -X GET \
  http://<CT URL>/api/v1/microservice/status \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'
[
    {
        "infoStatus": {
            "numRetries": 0,
            "error": null,
            "lastCheck": "2018-12-05T07:36:30.199Z"
        },
        "status": "active",
        "name": "Dataset"
    }
]

POST /microservice/

This is the endpoint used by microservices to register on Control Tower. You can find a detailed analysis of its syntax in the previous section

DELETE /microservice/:id

This endpoint is used to unregister a microservice's endpoints from Control Tower. Control Tower does not actually delete the microservice information, nor does it immediately remove the endpoints associated to it. This endpoint iterates over all endpoint associated with the microservice to be unregistered, and flags them for deletion - which is actually done by a cron task that in the background. Until that moment, the microservice and associated endpoints will continue to be available, and external requests to those endpoints will be handled as matched as they were before. However, you will notice that endpoints scheduled for deletion will have a toDelete value of true - more on this in the next section.

curl -X DELETE \
  http://<CT URL>/api/v1/microservice/<microservice id> \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'
{
    "infoStatus": {
        "numRetries": 0
    },
    "pathInfo": "/info",
    "pathLive": "/ping",
    "status": "active",
    "cache": [],
    "uncache": [],
    "tags": [
        "dataset",
        "dataset",
        "dataset"
    ],
    "_id": "5c0782831b0bf92a37a754e2",
    "name": "Dataset",
    "url": "http://127.0.0.1:3001",
    "version": 1,
    "updatedAt": "2018-12-05T07:49:36.754Z",
    "endpoints": [
        {
            "redirect": {
                "method": "GET",
                "path": "/api/v1/dataset"
            },
            "path": "/v1/dataset",
            "method": "GET"
        }
    ],
    "swagger": "{}"
}

Endpoint management endpoints

GET /endpoint

This endpoint lists all microservice endpoint known by Control Tower. Note that it does not contain endpoints offered by Control Tower itself or any of its plugins.

curl -X GET \
  http://<CT URL>/api/v1/endpoint \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'
[
    {
        "pathKeys": [],
        "authenticated": false,
        "applicationRequired": false,
        "binary": false,
        "cache": [],
        "uncache": [],
        "toDelete": true,
        "_id": "5c0784c88dcce0323abe705d",
        "path": "/v1/dataset",
        "method": "GET",
        "pathRegex": {},
        "redirect": [
            {
                "filters": null,
                "_id": "5c0784c88dcce0323abe705e",
                "method": "GET",
                "path": "/api/v1/dataset",
                "url": "http://127.0.0.1:3001"
            }
        ],
        "version": 1
    }
]

DELETE /endpoint/purge-all

This endpoint purges the complete HTTP cache for all microservices. It does not support any kind of parametrization, so it's not possible to use this endpoint to clear only parts of the cache. As such, we recommend not using this endpoint unless you are certain of its consequences, as it will have noticeable impact in end-user perceived performance.

curl -X DELETE \
  http://<CT URL>/api/v1/endpoint/purge-all \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'

Documentation management endpoints

GET /doc/swagger

Generates a complete Swagger JSON file documenting all API endpoints. This swagger is compiled by Control Tower based on Swagger files provided by each microservice. As such, the Swagger details for a given endpoint will only be as good as the information provided by the microservice itself.

curl -X GET \
  http://<CT URL>/api/v1/endpoint \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'
{
    "swagger": "2.0",
    "info": {
        "title": "Control Tower",
        "description": "Control Tower - API",
        "version": "1.0.0"
    },
    "host": "tower.dev:9000",
    "schemes": [
        "http"
    ],
    "produces": [
        "application/vnd.api+json",
        "application/json"
    ],
    "paths": {
        "/api/v1/doc/swagger": {
            "get": {
                "description": "Return swagger files of registered microservices",
                "operationId": "getSwagger",
                "tags": [
                    "ControlTower"
                ],
                "produces": [
                    "application/json",
                    "application/vnd.api+json"
                ],
                "responses": {
                    "200": {
                        "description": "Swagger json"
                    },
                    "500": {
                        "description": "Unexpected error",
                        "schema": {
                            "$ref": "#/definitions/Errors"
                        }
                    }
                }
            }
        }
    },
    "definitions": {
        "Errors": {
            "type": "object",
            "description": "Errors",
            "properties": {
                "errors": {
                    "type": "array",
                    "items": {
                        "$ref": "#/definitions/Error"
                    }
                }
            }
        },
        "Error": {
            "type": "object",
            "properties": {
                "id": {
                    "type": "integer",
                    "format": "int32",
                    "description": "A unique identifier for this particular occurrence of the problem."
                },
                "links": {
                    "type": "object",
                    "description": "A links object",
                    "properties": {
                        "about": {
                            "type": "string",
                            "description": "A link that leads to further details about this particular occurrence of the problem."
                        }
                    }
                },
                "status": {
                    "type": "string",
                    "description": "The HTTP status code applicable to this problem, expressed as a string value"
                },
                "code": {
                    "type": "string",
                    "description": "An application-specific error code, expressed as a string value"
                },
                "title": {
                    "type": "string",
                    "description": "A short, human-readable summary of the problem that SHOULD NOT change from occurrence to occurrence of the problem, except for purposes of localization."
                },
                "detail": {
                    "type": "string",
                    "description": "A human-readable explanation specific to this occurrence of the problem. Like title, this field's value can be localized"
                },
                "source": {
                    "type": "object",
                    "description": "An object containing references to the source of the error, optionally including any of the following members",
                    "properties": {
                        "pointer": {
                            "type": "string",
                            "description": "A JSON Pointer [RFC6901] to the associated entity in the request document"
                        },
                        "parameter": {
                            "type": "string",
                            "description": "A string indicating which URI query parameter caused the error."
                        }
                    }
                },
                "meta": {
                    "type": "object",
                    "description": "A meta object containing non-standard meta-information about the error."
                }
            }
        }
    }
}

Plugin management endpoints

Control Tower as a plugin system of its own, which we'll cover in detail in the next section. As part of that system it has a few API endpoints to support certain actions

GET /plugin

Lists all currently enabled plugins, along with their configuration.

curl -X GET \
  http://<CT URL>/api/v1/plugin \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json'
[
    {
        "active": true,
        "_id": "5bfd440834d5076bb4609f9f",
        "name": "manageErrors",
        "description": "Manage Errors",
        "mainFile": "plugins/manageErrors",
        "config": {
            "jsonAPIErrors": true
        }
    }
]

PATCH /plugin/:id

Updates the settings of a given plugin.

curl -X PATCH \
  http://<CT URL>/api/v1/plugin/:pluginId \
  -H 'Authorization: Bearer <your user token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "config": {
        "jsonAPIErrors": false
    }
}'
{
    "_id": "5bfd440834d5076bb4609f9f",
    "name": "manageErrors",
    "description": "Manage Errors",
    "mainFile": "plugins/manageErrors",
    "config": {
        "jsonAPIErrors": false
    }
}

Control Tower plugins

Control Tower provides basic API management functionality, but it also has a set of plugins that allow decoupling non-core functionality from the core code base. In this section we'll briefly cover the functional aspect of each plugin, without focusing too much on the underlying implementation, which will be covered separately.

Time Request

This plugin times the time elapsed between the external request is received and the reply to it being dispatched back to the external API client. It adds the computed value as a response header X-Response-Time

Manage errors

This plugin intercepts replies that represent errors and formats them properly.

CORS

This plugins adds the necessary headers to support CORS

Invalidate cache endpoints

Varnish cache integration plugin that invalidates cache entries.

Response formatter

Handles response formats other than the default JSON, setting headers and formatting the response body according to the requested content type. Currently only supports XML.

Statistics

Collects and stores statistics about the API usage.

MongoDB sessions

Adds support for storing session data on MongoDB

Oauth plugin

User management and authentication plugin. Supports email+password based registration, as well as Facebook, Twitter, Apple and Google oauth-based authentication.

Redis cache

Redis-based caching for Control Tower.

Application key authorization

Application key handling.

Fastly cache

Integrates HTTP caching using Fastly.

Read-only mode

Rejected endpoints will have HTTP status 503 and the following message:

"API under maintenance, please try again later."

Configuration of white and black lists:

{
    // This blacklist prevents GETting widgets
    "blacklist": ["/api/v1/widget"],

    // This whitelist allows POST/PATCH/DELETE calls for datasets and layers
    "whitelist": ["/api/v1/dataset", "/api/v1/layer"]
}

This plugin activates a read-only mode that passes through all calls to GET endpoints and rejects calls to POST/PATCH/PUT/DELETE endpoints. The plugin provides the following configurations for increased flexibility:

Note: Please ensure you know what you're doing when activating this plugin, since it will highly restrict the usage of the API.

Control Tower plugin development

This chapter covers Control Tower plugin development basics. It aims at providing basic understanding of how existing Control Tower plugins work, and give you the foundations to develop your own plugins.

Implementing functionality

Control Tower and its plugins are implemented using Nodejs and Koa. Be sure to familiarize yourself with the key concepts behind this framework before exploring this section, as it assumes you are comfortable with concepts like route declaration or koa middleware.

Existing plugin functionality for Control Tower falls within one of two categories: middleware that intercepts requests as they are processed by Control Tower, or new endpoints that are added to Control Tower's API. Some plugins combine the two approaches to implement their functionality.

Middleware

Middleware-based functionality consists of intercepting the external request and/or response and either gathering data about them (for logging or statistics) or modifying them (for example, for caching). This type of functionality can be implemented using koa's middleware API, as Control Tower itself does not modify or extend that in any way.

Endpoint creation

Plugins can also extend Control Tower's endpoint list by registering new endpoints of their own. An example of this is the Control Tower Oauth plugin that exposes a number of endpoints to support actions like registering, logging in, recovering password, etc. Like with the middleware approach, this can be implemented by relying on koa principles, and Control Tower does not modify this behavior.

Data storage

Control Tower uses a Mongo database to store data like known microservices or endpoints. Plugins can also access this database and create their own collections, should they wish to. Control Tower and existing plugins rely on Mongoose to that effect, and you'll find example of its usage throughout Control Tower and its plugins code.

Plugin bootstrap and config

During its first initialization, Control Tower will load plugin settings from this file. It includes a list of plugins to be initialized, whether or not they should be active, and their default configuration. On subsequent executions, this information will be loaded from the database instead, so additional changes you may want to do should be done on the database, and not on this file.

An important part of this file and of the corresponding database entries is plugin configuration. This data is stored within the plugins MongoDB collection managed by Control Tower but it's made available to plugins as they are initialized and ran.

Microservice development guide

In this chapter, we'll cover additional details that you, as a RW API developer, should keep in mind when developing your microservice. We'll focus not only on the technical requirements you need to meet for your microservice to communicate with the remaining RW API internal components, but also discuss the policy surrounding development for the RW API, as a way to achieve a certain degree of consistency across a naturally heterogeneous microservice-based system.

Microservice overview

As described in the API Architecture section, microservices are small web applications that expose a REST API through a web server. This means that microservices can be built using any programming language, just as long as it supports HTTP communication. In practical terms, most of this API's core microservices are built using nodejs, with Python and Rails being distant 2nd and 3rd respectively. New microservices being developed should respect this hierarchy when it comes to choosing a development language, and the adoption of a different stack should be validated with the remaining development team members beforehand.

In this whole section, we will use code examples from the Dataset microservice, which is built using nodejs. We will discuss the general principles, which should apply to all implementations, as well as implementation details, which may apply to your scenario if you are also using nodejs, or that may not apply if you are using something else.

Development lifecycle

As a developer of the RW API, your ultimate goal is to make an improvement to the API source code and push it to the production environment. Of course, this is an overly simplistic description of a complex process, and the goal of the next section is to dive deeper into the steps you need to take to achieve that. Breaking this down into a more detailed list, these are the high-level steps you'll need to take in order to contribute to the RW API:

In the next sections we'll dive deeper into the details of each step.

Setting up a development environment

In this section, we'll cover the details of how you can configure your operating system to be used as a development environment for the Dataset microservice, which is built using nodejs. These instructions will apply, without major changes, to all other nodejs-based microservices. For microservices based on Python or Rails, and when using Docker, you should also be able to use these instructions. Native execution for Python and Rails microservices is done using equivalent commands, which we'll outline as we go.

Note that these instructions aim at giving you the details about what's specific to the RW API, and it's not a step-by-step list of commands you can copy-paste. For example, we will not cover the details of how to install dependencies - that's something best answered by that particular piece of software's documentation page for your operating system, which you can easily find with your favourite search engine.

Also, when applying these instruction to different microservices, be sure to review their respective README.md file for a comprehensive list of dependencies you'll need, or other specific details about its setup process.

Execution - native vs Docker

All microservices can be executed in two ways: natively or using Docker. If you are not familiar with Docker, we suggest briefly learning about it does before proceeding. In a nutshell, it simplifies setup and execution, at the expense of varying performance hit, depending on your operating system. Here are a few key points you should consider when making a decision between executing natively or using Docker:

Using native execution

Getting the code

The first step will be getting the source code from Github to your computer using the Git CLI (or equivalent).

git clone https://github.com/resource-watch/dataset.git 

Or, if you prefer, you can use:

git clone git@github.com:resource-watch/dataset.git

Installing dependencies

In the source code you just downloaded, you'll find a README.md file with detailed instruction for the microservice, including dependencies you'll need to install.

For all Node.js microservices, you'll need to install Node.js and Yarn. Rather than installing Node.js from the official website, we recommend using nvm, which allows you to easily install and manage different Node.js versions on your computer, since different microservices may require different versions of Node.js to run.

# Install Node.js v12 for the dataset microservice
nvm install 12

# Switch to the v12 installation
nvm use 12

Once you've installed a version manager like nvm, you need to check which version of the language to install. For Node.js microservices, the package.json file typically has a engine value which will tell you which version(s) of Node.js are supported. Another place where you'll find this info (which also works for other languages) is the content of the Dockerfile (typically in the first line) - in the dataset microservice, for example, FROM node:12-alpine means this microservice runs on Node.js v12.

# To install dependencies, navigate to the directory where you cloned the microservice and run:
yarn

Yarn is a package manager for Node.js applications (a spiritual equivalent to pip for Python or Bundler for Ruby). Once it's installed, be sure to use it to install the necessary libraries (see right).

The microservice's README may specify additional dependencies you need to install. MongoDB, for example, is a common dependency of many RW API microservices, with applications like Postgres, Redis, RabbitMQ or Open Distro for Elasticsearch also being required on certain microservices. If a version number is not identified on the README.md file, the docker-compose-test.yml file may help. image: mongo:3.4 means this microservice depends on MongoDB v3.4.

Besides these dependencies, microservices may also depend on the Control Tower gateway, and other microservices:

If your endpoint does not rely on other microservices, and you don't rely on or can spoof the user data provided by Control Tower, you can set the CT_REGISTER_MODE environment variable to a value other than auto to disable the automatic registration on startup, thus removing the dependency on Control Tower. However, this is not recommended, as using Control Tower will have your development environment resemble the production setup, thus potentially highlighting any issues you may otherwise miss.

To set up Control Tower, follow these same instructions, as the process is the same as for any nodejs microservice.

A note on dependencies

Due to a recent infrastructure migration, some README files may mention old dependencies that have since been replaced with newer equivalents. Here are the old dependencies you may find, and their newer equivalent:

Configuration

With the dependencies set up, it's time to configure the microservice. This is done using environment variables (env vars) which you can define in multiple ways, depending on your OS, way of executing the code (e.g. many IDEs have a "Run" feature that allow configuring environment variables using a GUI) and personal preference. For this tutorial, and going forward, we'll assume you'll run the code from a terminal and specify the environment variables inline.

NODE_ENV=production CT_REGISTER_MODE=none <more variables> <your command>

To find out more about which env vars you can/need to specify, refer to the microservice's README.md file, as it typically documents the main variables available to you. Nodejs-base microservices will often have a full list in the config/custom-environment-variables.json file. The docker-compose-test.yml and docker-compose-develop.yml files contain usages of said variables, and may be helpful if you are looking for an example or an undocumented variable.

As a rule of thumb, env vars configure things like databases address and credentials, 3rd party services (for example, an AWS S3 bucket URL or AWS access credentials), or Control Tower URL (only necessary if you decide to use it).

Starting the microservice

# Starting a Node.js microservice:
yarn start

# Node.js using inline environment variables:
NODE_ENV=production CT_REGISTER_MODE=none <your other environment variables> yarn start

# Starting a Python microservice may look something like this:
python main.py

# Rails-based microservices can rely on the traditional Rails CLI:
rails server

Once you have determined the values you'll need to run your microservice with the desired configuration, you should have everything ready to run it. For nodejs based microservice like Dataset, you can do this by running yarn start. For other languages, the startup command will be different (see right).

You can also review the entrypoint.sh file content, under the start or develop sections, as it will contain the command you need to execute to run the code natively.

The application should output useful information, like database connection status and HTTP port. Depending on your configuration for the Control Tower auto registration, it will also output the result of that process. Overall, if no error message is produced, the microservice should be up and running, and available at the port specified by its output.

Running the tests

# Running tests for a Node.js microservice:
yarn test

# Node.js with environment variables:
NODE_ENV=test CT_REGISTER_MODE=none <your other environment variables> yarn test

# Python:
exec pytest <test folder>

# Ruby:
bundle exec rspec spec

Most microservices (hopefully all in the future) come with tests included. Running these tests can help you identify issues with your code changes, and are required for any new modifications merged into the RW API. It's recommended that you run tests locally before pushing changes to Github.

Tests sometimes mock certain dependencies, like external 3rd party service, but often require an actually running database, as a native execution would (think MongoDB or Postgres). Check the docker-compose-test.yml for whatever services it runs besides the microservice - those are the dependencies you'll need to have up and running to run the tests natively. Control Tower is not required to run the tests.

Test execution requires roughly the same env vars as running the actual microservice. For microservices that rely on a database, make sure you are not using the same database as you do for development purposes - tests assume database isolation, and will delete preexisting data.

See right for how to run tests for microservices in different languages. You can also review the entrypoint.sh file content, under the test section, which will contain the exact command you need to execute.

Common errors and pitfalls

Using Docker

Getting the code

The first step will be getting the source code from Github to your computer using the Git CLI (or equivalent).

git clone https://github.com/resource-watch/dataset.git 

Or, if you prefer, you can use:

git clone git@github.com:resource-watch/dataset.git

Installing dependencies

As we mentioned before, if you decide to use Docker, your only dependency will be Docker itself (and docker-compose, which comes included). Depending on your OS, Docker installation instructions will differ, but your favourite web search engine will hopefully point you in the right direction.

When you run Docker, it will automatically fetch the necessary dependencies and run them for you. However, if you are not using Linux, you may have to fine-tune some settings so that dependencies like MongoDB can communicate with your microservice - we'll review this in detail in a bit.

Note that Docker will not fetch nor run Control Tower for you - if you want to execute your microservice in integration with Control Tower, you'll have to set it up manually. Alternatively, set the CT_REGISTER_MODE environment variable to any value other than auto.

Configuration

Configuration for Docker based execution is done using environment variables (env vars) passed to the Docker runtime using a special dev.env file. Some microservices will include a dev.env.sample or equivalent that you can copy-paste and use as a starting point when configuring your environment.

To find out more about which env vars you can/need to specify, refer to the microservice's README.md file, as it typically documents the main variables available to you. Nodejs-base microservices will often have a full list in the config/custom-environment-variables.json file. The docker-compose-test.yml and docker-compose-develop.yml files contain usages of said variables, and may be helpful if you are looking for an example or an undocumented variable.

As a rule of thumb, env vars configure things like databases address and credentials, 3rd party services (for example, an AWS S3 bucket URL or AWS access credentials), or Control Tower URL (only necessary if you decide to use it). Your docker-compose file may already have predefined values for some of these, in which case do not overwrite them unless you are certain of what you're doing.

Docker networking works differently on Linux vs other operating systems, and you need to keep this in mind when specifying values for things like MongoDB or Control Tower addresses. Under Linux, Docker containers and the host operating system run in the same network host, so you can use localhost, for example, when telling a dockerized Dataset microservice where it can reach Control Tower (running natively or in a Docker container). Under other operating systems, however, Docker containers run on a different network host, so you should instead use your local network IP - using localhost will not reach your expected target.

Starting the microservice

For convenience, most microservices include a unix-based script that will run the Docker command that will start your microservice, along with the dependencies covered by Docker. The file name will vary from microservice to microservice, and the argument may also vary, but it's usually something along the lines of:

./dataset.sh develop

Mac users' mileage may vary with these scripts, and Windows users will need to manually open these file and reproduce the included logic in Windows-compatible syntax - don't worry, they are pretty simple and easy to understand.

Docker will take a few minutes to run, especially during the first execution, but once it's up and running, you should see the HTTP address where your microservice is available in the output printed to the console.

Running the tests

Running tests under Docker is similar to running the actual microservice. The easiest way to do so, for unix-based OSs is using the included .sh helper file:

./dataset.sh test

Common errors and pitfalls

CI/CD

The RW API uses multiple tools in it's CI and CD pipelines. All microservices that compose the RW API use a common set of tools:

We assume, at this point, that you're already familiar with Github and its core functionality, like branches and pull requests (PRs). If that's not the case, use your favourite search engine to learn more about those concepts.

Each microservice lives in a separate Github repository, most of which have Travis and Code Climate integrations configured. Whenever a pull request is created, both tools will be triggered automatically - Travis will run the tests included in the code, and notify the PR author of the result. Code Climate builds on top of that, and monitors and reports code coverage. The behavior of both tools is controlled by a single .travis.yml file you'll find in the root of each microservice's code base, and you can learn about it on each of the tool's documentation page. You can see the result directly on the PR page.

When you want to submit a change to the code of one of the microservices, you should:

At this stage, and even if your tests pass locally, they may fail when executed in Travis. We recommend running them again if this happens, to see if any hiccup occurred. If that's not the case, look into the Travis logs to learn more. Unfortunately, the reasons for these are diverse. It can be related to env vars defined inside the .travis.yml file, missing or incorrectly configured dependencies, differences in packages between your local environment and Travis', etc. At the time of writing, and by default which can be overridden, Travis uses Ubuntu and is configured to use native execution when running tests, so using that very same approach locally may get you closer to the source of the problem you're experiencing. Travis' output log will usually help you identify what's happening, and get you closer to a solution.

Once reviewed by a peer, your changes will be merged and will be ready for deployment to one of the live environments.

Currently, the RW API has 3 different environments:

Each microservice repository has a branch matching the name of each of these 3 environments, and changes will always go from a feature branch to dev, then to staging, and finally to production. To push your changes across the different environments, you should:

Depending on the scale of the changes you're doing, it's recommended to use git tags with semantic versioning. Also be sure to update the CHANGELOG.md accordingly, and the package.json or equivalent files if they refer a version number.

Changes being pushed to either production or staging should be announced in advance in the general channel in the WRI API Slack (and to contact Ethan Roday if you're not in that Slack workspace). Specifically, for changes going to production, that notice period should be of at least 14 days, during which said changes should be available in staging for testing by everyone. In rare cases, if a hotfix is needed to fix a breaking change in production, the 14-day lead time can be circumvented, but an announcement still must be made.

It's also best practice to announce the changes you're about to deploy before doing so, so that other developers of RW API applications can be on the lookout for regressions, and can quickly get in touch with you should any undesired behavior change be detected.

Each of the referred environments lives on a separate Kubernetes cluster (hosted with AWS EKS), and deployment is done using individual Jenkins instances:

All 3 instances have similar overall configuration, but different microservices may deploy differently depending on the behavior coded into the Jenkinsfile that's part of their source code - for example, some WRI sites are also deployed using this approach, but opt to deploy both staging and production versions to the production cluster, and may not be in the staging or dev Jenkins. However, the majority of services follow the convention of a single branch per Jenkins instance, with the branch name matching the name of the respective environment.

The list of jobs you find on each Jenkins instance will match the list of services deployed on that environment. In the details of each job, you should find a branch named after the environment, which corresponds to the Github branch with the same name (some services may still have the old approach, with develop for dev and staging, and master for production). You may also find other branches, or a different branch structure, depending on the service itself - again, the Jenkinsfile configuration is king here, and you should refer to it to better understand what is the desired behavior per branch. In some cases, old branches will be listed on Jenkins but should be ignored.

Deployments need to be triggered manually, on a per-microservice and per-branch basis. Once a deployment starts, Jenkins will run the Jenkinsfile content - it is, after all, a script - and perform the actions contained in it. While it's up to the maintainer of each microservice to modify this script, more often than not it will run the tests included in the microservice, using Docker, and if these pass, push the newly generated Docker image to Docker Hub. It will then update the respective Kubernetes cluster with content of the matching subfolder inside the k8s folder of the microservice, plus the k8s/service folder if one exists. The last step is to deploy the recently pushed Docker image from Docker Hub to the cluster, which will cause Kubernetes to progressively replace running old instances of the service with ones based on the new version.

A couple of important notes here:

While it's rare, tests ran by Jenkins at this stage may also fail, preventing your deployment. In these cases, refer to the Jenkins build log for details, which most of the times can be reproduced locally running your tests using Docker. If your Jenkins log mentions issues related with disk capacity or network address assignment problems, please reach out to someone with access to the Jenkins VMs and ask for a docker system prune.

Infrastructure configuration

While the workflow above will cover most of the changes you'll do as an RW API developer - changes to the code that powers the API - from time to time you'll need to adjust the actual infrastructure on which the API runs. This section covers what you need to know to be able to manage the infrastructure.

Infrastructure as code using Terraform

Each of the 3 RW API environments lives on a separate AWS account. To ease maintenance, the infrastructure configuration is shared by all 3 environments, and is maintained using a common Terraform project, an infrastructure as code tool. If you are not familiar with Terraform, we recommend learning about it before proceeding.

Structure-wise, the 3 RW API environments are mostly equal, with the differences between them being the following: - Scale and redundancy: the production environment has more and more capable hardware, to account for higher user load and also to provide redundancy on key services - Sites: due to its stability-oriented purpose, the production environment also hosts the sites for some WRI-related projects, which run in dedicated EKS node groups, and that do not exist on the dev or staging clusters. - Availability: being a development-only resource, the dev environment does not necessarily need to be available 24/7, and it may be intentionally unavailable as a cost-saving measure - we call this hibernation.

Due to the structure of the RW API infrastructure, the final architecture is defined by 2 Terraform projects:

The Kubernetes Terraform project relies on the resources provisioned by the AWS Terraform project (which is why they can't be merged into a single one), so be sure that they are applied in that order.

While the Kubernetes Terraform project contains an increasingly large portion of the overall Kubernetes configuration, there are some additional Kubernetes elements provisioned outside of it.

RW API hibernation

As mentioned above, to save costs on the dev environment, its functionality may be turned off at times when it's not needed - we called this "hibernation". The goal is to have a way to dial down resources in times when they are not needed (which we anticipate will be most of the time), while also giving RW API developers a simple and easy way to restore full functionality in times when it's needed.

This can be achieved by modifying the hibernate boolean variable in the Terraform dev variables file and applying these changes (Github Actions will do this automatically on push/merge to the dev branch). Setting this value to true will cause the dev RW API to go into hibernation and become unavailable, while false restore its functionality. Keep in mind that both hibernation and restoration processes will take a few minutes, so we recommend the company of your favourite beverage while you carry out these steps.

One important note here: while it's meant to be used with the dev environment only, there is no failsafe mechanism in place preventing the staging or production environments from being set into hibernation as well. When modifying the Terraform variables file, be sure you are on the correct file, otherwise you may accidentally cause staging or production unavailability.

Access to infrastructure resources

For management or debug purposes, you may need to access the infrastructure resources. Depending on what you want to achieve, there are multiple paths forward. However, for all of them, a common required element is an AWS account with the adequate permissions. These permissions will depend on what you're trying to achieve in particular. The AWS IAM permission mechanism is too powerful and complex to cover here, so be prepared to see a few "permission denied" from time to time, and to discuss with your fellow RW API developers what permissions you are missing that will allow you to access a give resource.

Infrastructure details

Infrastructure details are accessible in multiple ways, depending on exactly what you're looking for.

If you are looking for a high-level piece of information (like "how many CPUs are we running?"), you may use the AWS Console directly, as it provides a simple UI for a lot of information. Alternatively, investigating the Terraform files are a good way to learn about what services are configured overall, without having to browse every page of the AWS Console, or worry that you may be looking in the wrong AWS Region.

Finally, for very low level details, AWS has a CLI tool that may expose information not available through the channels mentioned above.

In all scenarios, if you are looking to permanently modify the infrastructure, keep in mind that the Terraform projects are kings here, and any change made using either the AWS Console or AWS CLI that is not persisted to Terraform should be considered ephemeral, as it may be overwritten at any time without prior warning. You may, however, modify the infrastructure using the AWS Console or AWS CLI as a means of experimentation, before projecting your final changes on Terraform.

Infrastructure access

Infrastructure access is often need as a way to access things like Kubernetes, database dumps, system status, etc. It's not an end in itself, but rather a necessary step to achieve other goals. To configure your infrastructure access, you'll need two elements.

The first of which is a running and configured AWS CLI tool installation. The AWS CLI tool has comprehensive documentation, which should also cover the install steps for your particular operating system. To configure it you'll also need the AWS account covered in the previous section.

The second element you'll need is access to the bastion host. If you are not familiar with bastion hosts, we recommend reading about it before proceeding but, in a nutshell, a bastion host works as a single point of entry into key parts of the infrastructure, which are otherwise inaccessible from the public internet. A way to contact a service running in the infrastructure from the outside world is creating an SSH tunnel that proxies traffic to that service through the bastion host, thus bypassing this restriction. For this to work, you need SSH access to the bastion host, which a fellow RW API developer may grant you.

shell script ssh -N -L <local port>:<target service address>:<target service port> <bastion host user>@<bastion host address>

To create an SSH tunnel under a unix-based system, you'll need to run a command like the example here.

Database access

Access to databases (to extract a dump for testing, for example) depends on how said database service is configured. At the time of writing, some database services run as AWS managed services, while other live inside the Kubernetes cluster, as Kubernetes services.

For database services provided by AWS managed services, the only necessary steps are the ones covered previously on the Infrastructure access section. After that, you should be able to reach the host of the database service, per details provided by the service itself. You may also need authentication details for a specific service, which you may find either on the Terraform configuration, the Kubernetes secrets files or AWS secret storage.

For access to database services running as a Kubernetes service, you'll need Kubernetes access (which we will cover next). Once you have that configured, you should configure a Kubernetes port forward to map said service to a port of your local host. Access credentials are typically available on the Kubernetes secrets files.

Kubernetes access

The RW API runs in a AWS EKS Kubernetes cluster, which can be accessed using the kubectl command line tool, which you should install on your computer. You also need the elements previously covered in the Infrastructure access section, so be sure that your AWS CLI is installed and configured, and that you have a way to communicate with the infrastructure's inner elements.

To configure kubectl, you will need some details that are specific to the kubernetes cluster you're trying to access. Said details are available as the output of the terraform apply command that's executed by Github Actions for the AWS Terraform project. Be mindful that, amongst those details, is the URL through which kubectl should contact the Kubernetes control plane. Given that you are using an SSH tunnel, you should:

ssh -N -L 4433:<EKS URL>:443 <bastion host user>@<bastion host URL>

Here's an example of how you could create said SSH tunnel:

Log access

Logs for the whole infrastructure are centralized in AWS Cloudwatch. Optionally, if you find it more convenient, you can opt to use kubectl to access logs for a particular pod or container, but you'll also find that output on AWS Cloudwatch.

Certain AWS managed services' logs will only be available on Cloudwatch, so we encourage you to learn how to navigate it.

Testing your changes

With your code live on one of the clusters, you should now proceed to testing it. The type of tests you should run vary greatly with the nature of the changes you did, so common sense and industry best practices apply here:

If you are implementing a new endpoint and it's mission critical to the RW API or one of the applications it powers, you may want to add a API smoke test to ensure that any issue affecting its availability is detected and reported. Refer to that section of the docs for more details.

Microservice internal architecture - nodejs

Nodejs microservices are based on the Koa framework for nodejs. To understand the following code snippets, we assume you are familiar with the basics of the framework, like how routes are declared and handled, what middleware is, and how it works. You should also be somewhat familiar with tools like npm, mongo and mongoose, Jenkins CI, docker and docker-compose

Anatomy of a (nodejs) microservice

In this section, we'll use the dataset microservice as an example, but these concepts should apply to most if not all nodejs microservices:

Since we are interested in the microservice's functional bits, we'll analyse the app folder content in more detail. It's worth mentioning that, depending on how you run the microservice, the respective docker compose files may contain relevant information and configuration, as do the files inside the config folder.

The app folder contains the following structure:

The grunt file includes several task definition that may be useful during day-to-day development. However, grunt is semi-deprecated (it's still needed, don't remove it) in the sense that it's recommended to define useful tasks in the package.json file instead - those tasks will, in turn, call grunt tasks.

Inside the app/src folder you'll find the following structure. The folders below will be commonly found on all microservice, unless stated otherwise:

Adding a new endpoint

In this section we'll cover how you can add a new endpoint with new functionality to an existing microservice. The aim is not to be a comprehensive guide to cover all cases, but more of a quick entry point into day-to-day actions you may want to perform, which should be complemented by your own learning of how a microservice works - remember that all microservices, despite being structurally similar, have their own custom code and functionality.

To add a new endpoint, here's the short tasklist you have to tackle:

Register your route in koa

Route registration is done using the koa-router library, and can be done in the app/src/routes/api/v1/dataset.router.js file, usually at the bottom of if:


// router object declaration, usually at the top of the file
const router = new Router({
    prefix: '/dataset',
});

// routes declaration, usually at the bottom of the file
router.get('/', DatasetRouter.getAll);
router.post('/find-by-ids', DatasetRouter.findByIds);
router.post('/', validationMiddleware, authorizationMiddleware, authorizationBigQuery, DatasetRouter.create);
// router.post('/', validationMiddleware, authorizationMiddleware, authorizationBigQuery, authorizationSubscribable, DatasetRouter.create);
router.post('/upload', validationMiddleware, authorizationMiddleware, DatasetRouter.upload);
router.post('/:dataset/flush', authorizationMiddleware, DatasetRouter.flushDataset);
router.post('/:dataset/recover', authorizationRecover, DatasetRouter.recover);

router.get('/:dataset', DatasetRouter.get);
router.get('/:dataset/verification', DatasetRouter.verification);
router.patch('/:dataset', validationMiddleware, authorizationMiddleware, DatasetRouter.update);
router.delete('/:dataset', authorizationMiddleware, DatasetRouter.delete);
router.post('/:dataset/clone', validationMiddleware, authorizationMiddleware, DatasetRouter.clone);

In here you'll find the already existing routes. As you can see from the rather explicit syntax, you need to call the method that matches the desired HTTP verb on the router object, and pass it a variable number of arguments - more on this in the next section. One thing to keep in mind is that all the routes in a file are typically prefixed, as defined in the router object declaration.

Control Tower integration

While they could technically work as standalone applications, microservices are built from the ground up to work through Control Tower. As such, not only do they lack built-in functionality provided by Control Tower itself (for example, user management), they also need to handle their own integration with Control Tower. Control Tower provides integration libraries for certain languages and frameworks, which you can use to ease development: - nodejs package for Koa - Python module for Flask - Rails engine

These libraries provide 2 basic features that we'll cover in detail in this chapter. You can also use it for reference in case you want to implement a microservice in a different programming language or framework.

We'll use the nodejs library as reference and example in the following sections, as it's the most commonly used language in this API. Other libraries will provided the same underlying functionality, but may have different ways to operate. Refer to each library's specific documentation for more details.

Registering on Control Tower

The first feature provided by these libraries, and that a microservice must perform, is registering on Control Tower. Most of the details of this process can be found on Control Tower's documentation, which you should read at this point if you haven't already.

// dataset microservice registration example

const ctRegisterMicroservice = require('ct-register-microservice-node');
const Koa = require('koa');
const logger = require('logger');

const app = new Koa();

const server = app.listen(process.env.PORT, () => {
    ctRegisterMicroservice.register({
        info: require('../microservice/register.json'),
        swagger: require('../microservice/public-swagger.json'),
        mode: (process.env.CT_REGISTER_MODE && process.env.CT_REGISTER_MODE === 'auto') ? ctRegisterMicroservice.MODE_AUTOREGISTER : ctRegisterMicroservice.MODE_NORMAL,
        framework: ctRegisterMicroservice.KOA2,
        app,
        logger,
        name: config.get('service.name'),
        ctUrl: process.env.CT_URL,
        url: process.env.LOCAL_URL,
        token: process.env.CT_TOKEN,
        active: true
    }).then(() => {
    }, (error) => {
        logger.error(error);
        process.exit(1);
    });
});

Covering the arguments in detail:

This registration call usually takes place right after the microservice's start process has ended, and the corresponding web server is available. Keep in mind that the call above will trigger an HTTP request to Control Tower, which in turn will call the microservice's web server - so make sure the microservice's web server is up and running when you attempt to register it.

Requests to other microservices

Besides contacting Control Tower to register themselves, microservices also need to contact Control Tower to make requests to other microservices.

// Microservice call to another microservice's endpoint
const ctRegisterMicroservice = require('ct-register-microservice-node');
const tags = ['tag1', 'tag2'];

ctRegisterMicroservice.requestToMicroservice({
    uri: `/graph/dataset/${id}/associate`,
    method: 'POST',
    json: true,
    body: {
        tags
    }
});

In this example, the dataset microservice makes a call to the /graph/dataset/<id>/associate endpoint to tag a dataset with the given tag list. This endpoint is implemented by the Graph microservice, but the request is actually handled by Control Tower. Taking a deeper look at the code that implements the call above, we learn a few things:

// Implementation of call to another microservice

requestToMicroservice(config) {
    logger.info('Adding authentication header ');
    try {
        let version = '';
        if (process.env.API_VERSION) {
            version = `/${process.env.API_VERSION}`;
        }
        if (config.version === false) {
            version = '';
        }
        config.headers = Object.assign(config.headers || {}, { authentication: this.options.token });
        if (config.application) {
            config.headers.app_key = JSON.stringify({ application: config.application });
        }
        config.uri = this.options.ctUrl + version + config.uri;
        return requestPromise(config);
    } catch (err) {
        logger.error('Error to doing request', err);
        throw err;
    }

}

As explained above, although the call is ultimately to another microservice, the request is sent to Control Tower, who is then responsible for issuing another internal request to the destination microservice, getting the reply from that call and passing it on to the microservice that initiated the process.

Another thing you'll notice is that this call depends on preexisting configuration stored in the this.options property. These configuration options are stored within the object during the Control Tower registration process, meaning you should not attempt to make a call to another microservice unless you have previously registered your microservice on Control Tower. Keep in mind that this is a restriction of this particular integration library, and not of Control Tower itself - a different implementation could do internal requests to microservices through Control Tower without being registered as a microservice.

In some scenarios, while developing, it's not practical to run all the microservices your logic depend on on your development computer. The Writing end-to-end tests section has some details about writing tests for your code, including how you can mock such calls, so you don't have to run the actual dependencies.

Docker

When deployed in a production environment, microservices will run in a Docker container. As a microservice developer, you should include in your microservice the necessary configuration to run your application inside a container. This is done using a Dockerfile, and you can use the Dataset microservice's Dockerfile as an example of how one of these files looks like for a nodejs based microservice.

Its worth noting that these container are set up in a way that allows using them to both run the microservice itself, or their tests. This will be useful further ahead when we review the testing approach you should use when writing microservices.

Data layer

Many microservices require the ability to store data to perform their function. The RW API has several data storage tools available to you, in case you need to store information to run your service.

Warning: microservices run on ephemeral containers managed by Kubernetes, and often in multiple parallel instances, so do not rely on storing data on the filesystem, unless you know there's something like a Kubernetes' persistent volume to back it up.

When accessing these tools, there are a few things you should keep in mind:

Currently, the following data storage tools are available on the RW API cluster:

MongoDB v3.6

MongoDB is the most frequently used data storage tool, as it supports schema-less document storage, thus making it easy to setup and run. When using MongoDB, be sure to give your collection an unique name, to avoid conflicts

To see an example of how to use MongoDB on a real-world microservice, check out the Dataset microservice.

Postgres v9.6

Use Postgres if your application needs a relational database. Unlike other data storage tools, Postgres access to individual microservices is granted on a per-database basis.

To see an example of how to use Postgres on a real-world microservice, check out the Resource watch manager microservice (written in Ruby on Rails).

AWS Elasticsearch Service v7.7

Use AWS Elasticsearch Service (powered by Open Distro for Elasticsearch) for search optimization or heterogeneous data storage with quick access.

To see an example of how to use Elasticsearch on a real-world microservice, check out the Document dataset adapter microservice.

Redis v5.0

Redis is an in-memory data storage tool, and can also be used as a pub-sub messaging tool.

You can learn how to use Redis in your applications by looking at the code of the Subscriptions microservice.

Neo4J v2.0

Neo4J is a graph database used by Graph microservice to build complex associations between different RW API resources.

RabbitMQ v3.7

RabbitMQ is a message broker service, which is particularly useful when handling long, asynchronous operations. You can see an example of its usage on the Document microservice - Executor submodule code base.

Cloud services

Some microservices have data storage needs that are not covered by the applications described here (for example, file storage). In those scenarios, it's common to use cloud services (like AWS S3, for example), but do reach out to the broader RW API development team before implementing your solution.

HTTP caching

The RW API has a system-wide HTTP cache that you may use to cache your requests, improving scalability and response times. This cache is based on Fastly, and you can browse its documentation if you are looking for a specific detail on its behavior. For most common use cases, you just need to keep in mind the following:

These headers are then intercepted by the Control Tower Fastly integration plugin which uses the Fastly API (through an integration nodejs library) to carry out the corresponding actions.

Logging

An important part of microservice operation is logging events as it processes requests. Many errors are only triggered during staging and production server execution, and without proper logging, there isn't a way to identify how it can be reproduced, so it can then be fixed.

Common development languages often come with either built-in or 3rd party logging libraries than make logging easier to handle. Current nodejs microservices use Bunyan to manage logs, which eases managing log destinations (stdout, file, etc) and log levels. Other libraries, for nodejs and other languages, offer similar functionality.

For microservice staging and production logs, the output channels should be stdout and stderr, the standard output streams you'll find on most OSs. When live, these, will seamlessly integrate with the infrastructure to which microservices will be deployed, and will allow for cluster-wide logging.

const logger = require('logger');

logger.info('Validating Dataset Update');

The example above logs that the validation process for input data associated with a dataset updated has been started. You'll notice that the info() function is called - this sets the logging level for this message. While different logging tools implement different strategies to differentiate logs, most microservices uses these 4 levels:

A common issue some developers have concerns logging errors. It's not uncommon to find microservices where all types of errors generate a error log entry. However, this actually produces a lot of noise, and make it hard to debug. Consider the following two scenarios when attempting to load a dataset by id:

Both cases are, indeed, errors. However, the first one is not an application error - the microservice behaved as it should. In this scenario, logging this event should not involve an error level event, as nothing unexpected, from the application's point of view, happened: a user asked for something that does not exist, and the microservice handled that as it should.

On the second case, however, something really unexpected did happen - the microservice could not contact the database. This is an application level error, as we assume that our databases are always available for to microservices. This is an example scenario where a error logging line should be generated. Or, putting it in another way, only use errors logging for situations where a RW API developer should look into it.

Another best practice we recommend for log management is using an application-wide configuration value to define the logging level. This is prove extremely useful when you switch from your local development environment (where you may prefer the debug logging level for maximum detail) to production (where warn or error may be more reasonable).

When using Bunyan, logging levels are set per stream. Many microservices integrate the Config library at this stage, allowing you to have different values for production, staging or other environments. Config also allows you to override selected values with a environment variable, typically LOGGER_LEVEL, which you may use, for example, to temporarily override the logging level on a particular environment without changing the predefined default values.

If you want to access your logging output for a microservice that's already deployed on either staging or production, you'll need access to kubernetes logging UI or CLI.

Testing

Testing code is important. And, as the developer of a RW API microservice, it's your responsibility to ensure that your code is bug free and easily extendable in the future. That means it should ship with a set of tests that can ensure, now and in the future, that it does what it's supposed to do. And the best way to do that is through testing.

If you are developing a new microservice or endpoint, it's expected that you provide a complete test suit for your code. In many cases, existing microservices will be a valuable source of examples you can copy and adapt to your needs. On occasion, you'll need to changes to endpoints that are not yet covered by tests. In those scenarios, we ask that add at least the tests to cover your modification. If you are feeling generous, and want to add tests that cover the endpoint's full functionality, you'll have our gratitude - test coverage for the RW API's endpoints is a work in progress, and not all endpoints have been reached just yet.

Writing end-to-end tests

Most microservices rely, to varying degrees, on end-to-end tests. In the context of an HTTP based microservice, this means that tests are responsible for issuing an HTTP request to a running instance of your microservice, getting the response and validating its content. Tests should also handle things like mocking resources and isolation from outside world - we'll get to these in a moment.

Example of a test from the dataset microservice

it('Create a JSON dataset with data in the body should be successful', async () => {
    const timestamp = new Date();
    const dataset = {
        name: `JSON Dataset - ${timestamp.getTime()}`,
        application: ['forest-atlas', 'rw'],
        applicationConfig: {
            'forest-atlas': {
                foo: 'bar',
            },
            rw: {
                foo: 'bar',
            }
        },
        connectorType: 'document',
        env: 'production',
        provider: 'json',
        dataPath: 'data',
        dataLastUpdated: timestamp.toISOString(),
        data: {
            data: [
                {
                    a: 1,
                    b: 2
                },
                {
                    a: 2,
                    b: 1
                },
            ]
        }
    };

    nock(process.env.CT_URL)
        .post('/v1/doc-datasets/json', (request) => {
            request.should.have.property('connector').and.be.an('object');
            const requestDataset = request.connector;

            requestDataset.should.have.property('name').and.equal(dataset.name);
            requestDataset.should.have.property('connectorType').and.equal(dataset.connectorType);
            requestDataset.should.have.property('application').and.eql(dataset.application);
            requestDataset.should.have.property('data').and.deep.equal(dataset.data);
            requestDataset.should.have.property('sources').and.eql([]);
            requestDataset.should.not.have.property('connectorUrl');

            return true;
        })
        .reply(200, {
            status: 200,
            detail: 'Ok'
        });

    const response = await requester.post(`/api/v1/dataset`).send({
        dataset,
        loggedUser: USERS.ADMIN
    });
    const createdDataset = deserializeDataset(response);

    response.status.should.equal(200);
    response.body.should.have.property('data').and.be.an('object');
    createdDataset.should.have.property('name').and.equal(`JSON Dataset - ${timestamp.getTime()}`);
    createdDataset.should.have.property('connectorType').and.equal('document');
    createdDataset.should.have.property('provider').and.equal('json');
    createdDataset.should.have.property('connectorUrl').and.equal(null);
    createdDataset.should.have.property('tableName');
    createdDataset.should.have.property('userId').and.equal(USERS.ADMIN.id);
    createdDataset.should.have.property('status').and.equal('pending');
    createdDataset.should.have.property('overwrite').and.equal(false);
    createdDataset.should.have.property('applicationConfig').and.deep.equal(dataset.applicationConfig);
    createdDataset.should.have.property('dataLastUpdated').and.equal(timestamp.toISOString());
    createdDataset.legend.should.be.an.instanceOf(Object);
    createdDataset.clonedHost.should.be.an.instanceOf(Object);
});

Current nodejs based microservices rely on Chai and Mocha as testing libraries, and this code example shows one of the tests that validate the dataset creation process. The code block is relatively large, but the logic is simple:

Different microservices and endpoint will have different requirements when it comes to testing, but the great majority of endpoints can be tested using simple variations of these steps. There are some additional considerations you should take into account when testing:

Test coverage metrics

While not required, most microservices use code coverage tools to evaluate how much of your code base is actually being checked when the test suite is executed. Nodejs based microservices frequently use NYC and Istanbul for this purpose, in case you are looking for a recommendation.

Running your tests using docker compose

The previous section covers an example of how a test looks like. Depending on your microservice technology stack, you have different ways of running your tests - in the case of the Dataset microservice, tests are executed using yarn.

However, to standardise test execution, you should also create a docker compose file that runs your tests (and their dependencies). This docker compose configuration should use the existing docker file set up previously, unless that's not possible.

Here's an example of one of these files. These will be particularly useful down the line, but also convenient for running tests locally.

For convenience, microservices commonly have a one line CLI command that allows running tests using the docker compose configuration you provide. These are particularly useful for other developers to run your tests without having to manually set up the associated dependencies.

CI/CD, Travis and Code Climate

Assuming you are hosting your microservice code on a service like Github, then you may benefit from its integration with CI/CD tools. There are multiple options in this space, and they mostly offer the same core functionality, but our preference so far has been to use Travis. In a nutshell, you can configure Travis to run your tests every time you push a new commit to a Github pull request. Tests will run on Travis' servers, and if they fail, you will get a message on your pull request warning you about this.

For full details on Travis and its features, how to configure it, what alternatives are there, and their pros and cons, please refer to your favourite search engine. If you are just want the basic, "it just works" configuration, this file from the Dataset microservice will have most of what you'll need.

Apart from running your tests, Travis also integrates with a service called Code Climate which analyses your code and looks for potentially problematic bits and suggests you fix them. More often than not, we just rely on another functionality offered by Code Climate - code coverage. This allows you to easily monitor how your pull request influences code coverage - for example, you can set up an alarm that warns you in case your pull request decreases your code coverage, which may indicate that you added more code than you tested.

Most microservices will display a test status and code coverage badges on their README, as a way to display if the tests are passing, and a rough estimation of how broad the test coverage is.

Smoke testing

Besides the test tools covered above, which are used to validate that your code changes work as designed, there is also a smoke test tool in place, that periodically issues a request to selected RW API endpoints and validates that the response match an expected preconfigured value. These tests are not used to validate functionality, but rather availability - if something goes wrong, a human is notified that the RW API is not working as it should, and that this is potentially affecting users.

If you believe your microservice implements a mission-critical endpoint that would merit one of these tests, please reach out to the RW API team.

Deploying your microservice

Jenkins

Microservice deployment to the Kubernetes clusters is done using Jenkins. The actual deployment process is configurable using a Jenkinsfile script written in Groovy. Most microservices use the same file, as the logic in it is flexible enough to accommodate most scenarios.

In a nutshell, this Jenkinsfile will:

A note on branches: an old branching scheme you may still find on some microservices relied on master + develop branches, but it's gradually being replaced by a scheme that uses dev, staging and production. All repos use one scheme or the other, but not both simultaneously, and the Jenkinsfile will reflect that.

At the beginning of each deploy process, you may also see an confirmation input that, if accepted, will redeploy the kubernetes configuration contained in the microservice code repository to the respective kubernetes cluster: develop branch to the staging cluster, master branch to the production cluster.

One thing worth noting is that the docker images generated using this method are publicly available on dockerhub. Be careful not to store any sensitive data in them, as it will be available to anyone.

Getting access to Jenkins

Each environment (dev, staging, production) has its own Jenkins server:

If you need an account in one of these environments (for example to approve a deployment in production), contact ethan.roday@wri.org.

Kubernetes configuration

Most microservice have a Kubernetes configuration folder, typically containing 3 folders:

Note that these settings are only applied if you opt in to it, by interacting with the input request that is displayed on Jenkins at the very beginning of the deployment process.

Documentation

README

Here are some 'do' and 'do not' you should take into account when writing the README.md for your microservice.

Do:

Do not:

Overall, the README should be targeted at developers that may need to run, test and debug your code.

Functional documentation

Documentation describing the business logic implemented by your microservice should go in the RW API reference documentation page. The documentation is available on this Github repository and its README includes instructions on how to use it and contribute.

Documentation is a key component of a successful API, so when altering public-facing behavior on the RW API, you must update the documentation accordingly, so that RW API users out there can be aware of the changes you made.

Code styling

As a way to help RW API developers collaborate, most microservices include a linter tool and ruleset to promote, as much as possible, a common set of rules for the way code is structured.

For microservices written in nodejs, this is achieved using Eslint with this configuration file.

For ruby-based microservices, you can use Rubocop along with this configuration file.

Most microservices will also include a .editorconfig file - you can learn more about there here.

Endpoints

This section of the documentation refers to endpoints that can only be used for the purposes of development. These endpoints can only be called by other micro services via Control Tower.

Finding users by ids

To retrieve the information of multiple users by ids, use the /auth/user/find-by-ids endpoint.

This endpoint requires authentication, and can only be called from another micro service.

# retrieve info for multiple users with the given ids
curl -X POST https://api.resourcewatch.org/auth/user/find-by-ids \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"  -d \
'{
    "ids": [
        "0706f055b929453eb1547392123ae99e",
        "0c630aeb81464fcca9bebe5adcb731c8",
    ]
}'

Example response:

    {
        "data": [
            {
                "provider": "local",
                "role": "USER",
                "_id": "0706f055b929453eb1547392123ae99e",
                "email": "example@user.com",
                "createdAt": "2016-08-22T11:48:51.163Z",
                "extraUserData": {
                    "apps": [
                        "rw",
                        "gfw"
                    ]
                },
                "updatedAt": "2019-12-18T15:59:57.333Z"
            },
            {
                "provider": "local",
                "role": "ADMIN",
                "_id": "0c630aeb81464fcca9bebe5adcb731c8",
                "email": "example2@user.com",
                "createdAt": "2016-08-22T11:48:51.163Z",
                "extraUserData": {
                    "apps": [
                        "rw",
                        "gfw",
                        "prep",
                        "aqueduct",
                        "forest-atlas",
                        "data4sdgs",
                        "gfw-climate",
                        "gfw-pro",
                        "ghg-gdp"
                    ]
                },
                "updatedAt": "2019-12-18T15:59:57.333Z"
            }
        ]
    }

Microservice reference

This document should give developers a bird's eye view of existing microservices, their status and resources, organized by namespace.

Core

Name URL Travis Status Code Coverage
arcgis Github Build Status Test Coverage
bigquery Github Build Status Test Coverage
carto Github Build Status Test Coverage
control-tower Github Build Status Test Coverage
converter Github Build Status Test Coverage
dataset Github Build Status Test Coverage
doc-executor Github Build Status Test Coverage
doc-orchestrator Github Build Status Test Coverage
doc-writer Github Build Status Test Coverage
document Github Build Status Test Coverage
fires summary Github Build Status Test Coverage
gee Github Build Status Test Coverage
gee-tiles Github Build Status Test Coverage
geostore Github Build Status Test Coverage
graph-client Github Build Status Test Coverage
layer Github Build Status Test Coverage
metadata Github Build Status Test Coverage
mail Github Build Status Test Coverage
query Github Build Status Test Coverage
rw-lp Github
task-async Github
vocabulary Github Build Status Test Coverage
webshot Github Build Status Test Coverage
widget Github Build Status Test Coverage

GFW

Name URL Travis Status Code Coverage
analysis-gee Github Build Status Test Coverage
arcgis-proxy Github Build Status Test Coverage
area Github Build Status Test Coverage
forest-change Github Build Status Test Coverage
gfw-forma Github Build Status Test Coverage
gfw-guira Github Build Status Test Coverage
gfw-ogr Github Build Status Test Coverage
gfw-prodes Github Build Status Test Coverage
gfw-umd Github Build Status Test Coverage
gfw-user Github Build Status Test Coverage
gs-pro-config Github Build Status Test Coverage
glad-analysis-athena Github Build Status Test Coverage
high-res Github Build Status Test Coverage
imazon Github Build Status Test Coverage
quicc Github Build Status Test Coverage
story Github Build Status Test Coverage
subscriptions Github Build Status Test Coverage
true-color-tiles Github Build Status Test Coverage
viirs-fires Github Build Status Test Coverage

FW

Name URL Travis Status Code Coverage
forest-watcher-api Github Build Status Test Coverage
forms Github Build Status Test Coverage
fw-alerts Github Build Status Test Coverage
fw-contextual-layers Github Build Status Test Coverage
fw-teams Github Build Status Test Coverage

Aqueduct

Name URL Travis Status Code Coverage
aqueduct-analysis Github Build Status Test Coverage

PREP

Name URL Travis Status Code Coverage
nexgddp Github Build Status Test Coverage
prep-api Github Build Status Test Coverage
prep-app Github
prep-manager Github
proxy Github Build Status Test Coverage

Climate Watch

Name URL Travis Status Code Coverage
Climate Watch Flagship Github
Climate Watch India Platform Github
Climate Watch Indonesia Platform Github
Climate Watch South Africa Platform Github
Climate Watch: Emissions Scenario Portal Github

RW

Name URL Travis Status Code Coverage
resource-watch-manager Github Build Status Test Coverage

API Smoke Tests

This chapter covers the existing API Smoke Tests, including instructions on how to manage existing tests and create new ones.

The API Smoke Tests are implemented using Canaries provided by AWS Synthetics (docs here).

Template for smoke tests

Template for an AWS Synthetics Canary

const synthetics = require('Synthetics');
const log = require('SyntheticsLogger');
const AWS = require('aws-sdk');
const https = require('https');
const http = require('http');

const apiCanaryBlueprint = async function () {

  const verifyRequest = async function (requestOption, body = null) {
    return new Promise((resolve, reject) => {
      // Prep request
      log.info("Making request with options: " + JSON.stringify(requestOption));
      let req = (requestOption.port === 443) ? https.request(requestOption) : http.request(requestOption);

      // POST body data
      if (body) { req.write(JSON.stringify(body)); }

      // Handle response
      req.on('response', (res) => {
        log.info(`Status Code: ${res.statusCode}`)

        // Assert the status code returned
        if (res.statusCode !== 200) {
          reject("Failed: " + requestOption.path + " with status code " + res.statusCode);
        }

        // Grab body chunks and piece returned body together
        let body = '';
        res.on('data', (chunk) => { body += chunk.toString(); });

        // Resolve providing the returned body
        res.on('end', () => resolve(JSON.parse(body)));
      });

      // Reject on error
      req.on('error', (error) => reject(error));
      req.end();
    });
  }

  // Build request options
  let requestOptions = {
    hostname: "api.resourcewatch.org",
    method: "GET",
    path: "/v1/dataset",
    port: 443,
    headers: {
      'User-Agent': synthetics.getCanaryUserAgentString(),
      'Content-Type': 'application/json',
    },
  };

  // Find and use secret for auth token
  const secretsManager = new AWS.SecretsManager();
  await secretsManager.getSecretValue({ SecretId: "gfw-api/token" }, function(err, data) {
    if (err) log.info(err, err.stack);
    log.info(data);
    requestOptions.headers['Authorization'] = "Bearer " + JSON.parse(data["SecretString"])["token"];
  }).promise();

  // Find and use secret for hostname
  await secretsManager.getSecretValue({ SecretId: "wri-api/smoke-tests-host" }, function(err, data) {
    if (err) log.info(err, err.stack);
    log.info(data);
    requestOptions.hostname = JSON.parse(data["SecretString"])["smoke-tests-host"];
  }).promise();

  const body = await verifyRequest(requestOptions);
  const id = body.data[0].id;

  // Change needed request options
  requestOptions.method = "GET";
  requestOptions.path = "/v1/dataset/"+id;

  // Make second request
  await verifyRequest(requestOptions);
};

exports.handler = async () => {
  return await apiCanaryBlueprint();
};

New tests should be based on the template displayed on the side, in order to take advantage of the configurations already in place.

Tests can execute multiple requests, but please minimize the number of interactions with databases to avoid creating junk data (for this reason, smoke testing POST, PATCH and DELETE endpoints is not recommended).

Another thing to notice is the usage of AWS secrets for storing a token to execute the request (gfw-api/token), as well as the hostname where the test will be executed (wri-api/smoke-tests-host).

The template on the side executes a GET request to /v1/dataset, grabs the first ID in the response data and executes a second GET request to the /v1/dataset/:id endpoint.

The test will pass if there are no exceptions thrown or promise rejections during the execution of the test. For the example on the side, the test will fail if any of the requests performed returns a status code that is not 200.

Things to pay attention

Use a user to run the tests

Please ensure that all tests are ran using a token for a user which was specifically created for running the tests. Also, it goes without saying, please don't share either the token or the credentials for the user running the tests with anyone.

Always configure alarms for the smoke tests

Smoke tests by default are created without an associated alarm. When managing or creating smoke tests, please ensure that each test has a unique alarm associated to it.

Also, please ensure that the created alarm has an action defined to notify someone in case of failure of a test.

Running smoke tests locally

Step 5 (before):

exports.handler = async () => {
  return await apiCanaryBlueprint();
};

Step 5 (after):

apiCanaryBlueprint();

In order to run smoke tests on your local machine for testing the script, some modifications need to be done:

  1. Copy the smoke test script into a file in your local machine (in this case, we're going to assume the name the file as index.js).
  2. Comment out any references to the Synthetics NPM package, which is only available for internal usage in the canary script.
  3. Replace all log.info references (or any other method of the log package) with console.log and comment out the usage of the SyntheticsLogger NPM package.
  4. Comment out references to the usage of AWS secrets and to the aws-sdk NPM package.
  5. Replace the last lines of the script (see on the side).

After these changes, you should be able to run the script locally using node index.js. Remember that any exception or error thrown will cause the test to fail, otherwise the test will be considered a pass. If you want to explicitly fail the test if some assertion is not valid, you can throw a new Error with a message for debugging.

Before updating the script once again in AWS Synthetics, don't forget to revert ALL the changes (just follow the steps in the reverse order).

Query transformations

While the WRI API aims to make the query interface as broad and transparent as possible, some of the querying options described below will not be available for specific dataset providers, depending on this API's implementation or limitations on the actual data provider's side.

Additionally to provider-specific limitations, every SQL query is transformed by the sql2json microservice, also maintained as NPM package. There is a first conversion from SQL to JSON, and then from JSON to a SQL syntax that is compatible with Open Distro for Elasticsearch SQL syntax.

You can read more about the limitations of using SQL with Elasticsearch here.

Microservices

A list of information related to the microservices

List all registered microservices

To obtain a list of all the registered microservices:

curl -X GET https://api.resourcewatch.org/api/v1/microservice \
-H "Authorization: Bearer <your-token>"

Example response:

[
    {
        "infoStatus": {
            "numRetries": 0,
            "error": null,
            "lastCheck": "2019-02-04T14:05:30.748Z"
        },
        "pathInfo": "/info",
        "pathLive": "/ping",
        "status": "active",
        "cache": [],
        "uncache": [],
        "tags": [
            "dataset"

        ],
        "_id": "id",
        "name": "Dataset",
        "url": "http://dataset.default.svc.cluster.local:3000",
        "version": 1,
        "endpoints": [
            {
                "redirect": {
                    "method": "GET",
                    "path": "/api/v1/dataset"
                },
                "path": "/v1/dataset",
                "method": "GET"
            },
            {
                "redirect": {
                    "method": "POST",
                    "path": "/api/v1/dataset/find-by-ids"
                },
                "path": "/v1/dataset/find-by-ids",
                "method": "POST"
            }
        ],
        "updatedAt": "2019-01-24T13:04:46.728Z",
        "swagger": "{}"
    },
    {
        "infoStatus": {
            "numRetries": 0,
            "error": null,
            "lastCheck": "2019-02-04T14:05:30.778Z"
        },
        "pathInfo": "/info",
        "pathLive": "/ping",
        "status": "active",
        "cache": [
            "layer"
        ],
        "uncache": [
            "layer",
            "dataset"
        ],
        "tags": [
            "layer"
        ],
        "_id": "5aa667d1aee7ae16fb419c23",
        "name": "Layer",
        "url": "http://layer.default.svc.cluster.local:6000",
        "version": 1,
        "endpoints": [
            {
                "redirect": {
                    "method": "GET",
                    "path": "/api/v1/layer"
                },
                "path": "/v1/layer",
                "method": "GET"
            },
            {
                "redirect": {
                    "method": "POST",
                    "path": "/api/v1/dataset/:dataset/layer"
                },
                "path": "/v1/dataset/:dataset/layer",
                "method": "POST"
            },
            {
                "redirect": {
                    "method": "GET",
                    "path": "/api/v1/dataset/:dataset/layer"
                },
                "path": "/v1/dataset/:dataset/layer",
                "method": "GET"
            }
        ],
        "updatedAt": "2018-11-08T12:07:38.014Z",
        "swagger": "{}"
    }
]

Filters

The microservice list provided by the endpoint can be filtered with the following attributes:

Filter Description Accepted values
status Status of the microservice pending, active or error
url Internal URL of the microservice within the cluster String

Filtering by status

curl -X GET https://api.resourcewatch.org/api/v1/microservice?status=active \
-H "Authorization: Bearer <your-token>"

Get a microservice by id

To obtain the details of a single microservice, use:

curl -X GET https://api.resourcewatch.org/api/v1/microservice/5aa667d1aee7ae16fb419c23 \
-H "Authorization: Bearer <your-token>"

Example response:

{
  "data": {
    "id": "5aa667d1aee7ae16fb419c23",
    "infoStatus": {
        "numRetries": 0,
        "error": null,
        "lastCheck": "2019-02-04T14:05:30.778Z"
    },
    "pathInfo": "/info",
    "pathLive": "/ping",
    "status": "active",
    "cache": [
        "layer"
    ],
    "uncache": [
        "layer",
        "dataset"
    ],
    "tags": [
        "layer"
    ],
    "name": "Layer",
    "url": "http://layer.default.svc.cluster.local:6000",
    "version": 1,
    "endpoints": [
        {
            "redirect": {
                "method": "GET",
                "path": "/api/v1/layer"
            },
            "path": "/v1/layer",
            "method": "GET"
        },
        {
            "redirect": {
                "method": "POST",
                "path": "/api/v1/dataset/:dataset/layer"
            },
            "path": "/v1/dataset/:dataset/layer",
            "method": "POST"
        },
        {
            "redirect": {
                "method": "GET",
                "path": "/api/v1/dataset/:dataset/layer"
            },
            "path": "/v1/dataset/:dataset/layer",
            "method": "GET"
        }
    ],
    "updatedAt": "2018-11-08T12:07:38.014Z",
    "swagger": "{}"
   }
}

Delete microservice

To remove a microservice:

curl -X DELETE https://api.resourcewatch.org/api/v1/microservice/:id \
-H "Authorization: Bearer <your-token>"

This will delete the microservice and its associated endpoints from the gateway's database. It does not remove the actual running microservice application instance, which may re-register and become available once again.

Areas v2 Notification Emails

Areas v2 services rely on email notifications to update users about the status of their areas. Specifically, when creating an area, updating an area, or when an ADMIN updates multiple areas by their geostore ids:

Interacting with Sparkpost for building email templates

Emails are sent using the Sparkpost API. For the emails to be sent, there must exist templates in Sparkpost ready to be sent, taking into account the different languages supported by the Areas service:

For the email sent to users when the Area of Interest is ready to be viewed, there should exist the following email templates on Sparkpost:

For the email sent to users when the Area of Interest is being generated, there should exist the following email templates on Sparkpost:

In order to build your templates on Sparkpost, you need to have access to WRI's Sparkpost account - for that, please reach out to a member of WRI in order to be granted access.

When building the actual templates, you can use variable interpolation to customize the emails sent taking into account the area that is being processed/has been processed. While building the dashboard-pending-* or dashboard-complete-* emails, the following variables are provided and can be used in the construction of the email body:

Subscriptions

When communicating with the Subscriptions microservice from other microservices, you have access to special actions that are not available when using the public API. This section concerns subscriptions endpoints that offer special functionality when handling requests from other microservices.

Creating a subscription for another user

Creating a subscription for user with ID 123 - only works when called by other MS!

curl -X POST https://api.resourcewatch.org/v1/subscriptions \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"  -d \
 '{
    "name": "<name>",
    "datasets": ["<dataset>"],
    "params": { "geostore": "35a6d982388ee5c4e141c2bceac3fb72" },
    "datasetsQuery": [
        {
            "id": ":subscription_dataset_id",
            "type": "test_subscription",
            "threshold": 1
        }
    ],
    "application": "rw",
    "language": "en",
    "env": <environment>,
    "resource": { "type": "EMAIL", "content": "email@address.com" },
    "userId": "123"
}'

You can create a subscription for another user by providing the user id in the body of the request.

This can only be done when performing requests from another microservice.

Field Description Type Required
userId Id of the owner of the subscription - if not provided, it's set as the id of the user in the token. String No

Updating a subscription for another user

If the request comes from another microservice, then it is possible to modify subscriptions belonging to other users. Otherwise, you can only modify subscriptions if you are the owner of the subscription.

The following fields are available to be provided when modifying a subscription:

Field Description Type Required
userId Check here for more info String No

Finding subscriptions by ids

curl -X POST https://api.resourcewatch.org/v1/subscriptions/find-by-ids \
-H "Authorization: Bearer <your-token>"
-H "Content-Type: application/json"  -d \
 '{ "ids": ["5e4d273dce77c53768bc24f9"] }'

Example response:


{
    "data": [
        {
            "type": "subscription",
            "id": "5e4d273dce77c53768bc24f9",
            "attributes": {
                "createdAt": "2020-02-19T12:17:01.176Z",
                "userId": "5e2f0eaf9de40a6c87dd9b7d",
                "resource": {
                    "type": "EMAIL",
                    "content": "henrique.pacheco@vizzuality.com"
                },
                "datasets": [
                    "20cc5eca-8c63-4c41-8e8e-134dcf1e6d76"
                ],
                "params": {},
                "confirmed": false,
                "language": "en",
                "datasetsQuery": [
                    {
                        "threshold": 1,
                        "lastSentDate": "2020-02-19T12:17:01.175Z",
                        "historical": [],
                        "id": "20cc5eca-8c63-4c41-8e8e-134dcf1e6d76",
                        "type": "COUNT"
                    }
                ],
                "env": "production"
            }
        }
    ]
}

You can find a set of subscriptions given their ids using the following endpoint.

Finding subscriptions for a given user

curl -X POST https://api.resourcewatch.org/v1/subscriptions/user/5e2f0eaf9de40a6c87dd9b7d \
-H "Authorization: Bearer <your-token>"

Example response:


{
    "data": [
        {
            "type": "subscription",
            "id": "5e4d273dce77c53768bc24f9",
            "attributes": {
                "createdAt": "2020-02-19T12:17:01.176Z",
                "userId": "5e2f0eaf9de40a6c87dd9b7d",
                "resource": {
                    "type": "EMAIL",
                    "content": "henrique.pacheco@vizzuality.com"
                },
                "datasets": [
                    "20cc5eca-8c63-4c41-8e8e-134dcf1e6d76"
                ],
                "params": {},
                "confirmed": false,
                "language": "en",
                "datasetsQuery": [
                    {
                        "threshold": 1,
                        "lastSentDate": "2020-02-19T12:17:01.175Z",
                        "historical": [],
                        "id": "20cc5eca-8c63-4c41-8e8e-134dcf1e6d76",
                        "type": "COUNT"
                    }
                ],
                "env": "production"
            }
        }
    ]
}

You can find all the subscriptions associated with a given user id using the following endpoint.

This endpoint supports the following optional query parameters as filters:

Field Description Type
application Application to which the subscription is associated. Read more about the application field here. String
env Environment to which the subscription is associated. Read more about this field in the Environments concept section. String

Finding all subscriptions

curl -X GET https://api.resourcewatch.org/v1/subscriptions/find-all \
-H "Authorization: Bearer <your-token>"

Example response:

{
    "data": [
        {
            "type": "subscription",
            "id": "57bc7f9bb67c5da7720babc3",
            "attributes": {
                "name": null,
                "createdAt": "2019-10-09T06:17:54.098Z",
                "userId": "57bc2631f077ce98007988f9",
                "resource": {
                    "type": "EMAIL",
                    "content": "your.email@resourcewatch.org"
                },
                "datasets": [
                    "umd-loss-gain"
                ],
                "params": {
                    "geostore": "d3015d189631c8e2acddda9a547260c4"
                },
                "confirmed": true,
                "language": "en",
                "datasetsQuery": [],
                "env": "production"
            }
        }
    ],
    "links": {
        "self": "https://api.resourcewatch.org/v1/subscriptions/find-all?page[number]=1&page[size]=10",
        "first": "https://api.resourcewatch.org/v1/subscriptions/find-all?page[number]=1&page[size]=10",
        "last": "https://api.resourcewatch.org/v1/subscriptions/find-all?page[number]=1&page[size]=10",
        "prev": "https://api.resourcewatch.org/v1/subscriptions/find-all?page[number]=1&page[size]=10",
        "next": "https://api.resourcewatch.org/v1/subscriptions/find-all?page[number]=1&page[size]=10"
    },
    "meta": {
        "total-pages": 1,
        "total-items": 1,
        "size": 10
    }
}

You can find all the subscriptions using the following endpoint.

This endpoint supports the following optional query parameters as filters:

Field Description Type Example
application Application to which the subscription is associated. Read more about the application field here. String 'rw'
env Environment to which the subscription is associated. Read more about this field in the Environments concept section. String 'production'
updatedAtSince Filter returned subscriptions by the updatedAt date being before the date provided. Should be a valid ISO date string. String '2020-03-25T09:16:22.068Z'
updatedAtUntil Filter returned subscriptions by the updatedAt date being after the date provided. Should be a valid ISO date string. String '2020-03-25T09:16:22.068Z'
page[size] The number elements per page. The maximum allowed value is 100 and the default value is 10. Number 10
page[number] The page to fetch. Defaults to 1. Number 1

User Management

When communicating with the Authorization microservice from other microservices, you have access to additional endpoints that are not available when using the public API. This section details these endpoints.

Finding users by ids

curl -X POST https://api.resourcewatch.org/auth/user/find-by-ids \
-H "Authorization: Bearer <your-token>"
-H "Content-Type: application/json"  -d \
 '{ "ids": ["5e4d273dce77c53768bc24f9"] }'

Example response:


{
    "data": [
        {
            "id": "5e4d273dce77c53768bc24f9",
            "_id": "5e4d273dce77c53768bc24f9",
            "email": "your@email.com",
            "name": "",
            "createdAt": "2021-03-24T09:19:25.000Z",
            "updatedAt": "2021-03-26T09:54:08.000Z",
            "role": "USER",
            "provider": "local",
            "extraUserData": { "apps": ["gfw"] }
        }
    ]
}

You can find a set of users given their ids using the following endpoint. The ids of the users to find should be provided in the ids field of the request body.

Please keep in mind that, under the hood, user management relies on Okta - for this reason, this endpoint depends on Okta's user search functionalities to find users by ids, and thus, inherits Okta's limitations. Okta limits user search at a maximum of 200 users per request, so in practice, this means we can only fetch pages of 200 users at a time. If you try to find, for instance, 400 users by ids, 2 requests will need to be made to Okta to fulfill this request, and as such, the performance of this endpoint might be degraded.

Due to these limitations, we advise only resort to this endpoint when you have no other valid alternative to find users. Even in that case, you might run into slow response times or, ultimately, not receiving the expected results when calling this endpoint.

Finding user ids by role

Request structure to find user ids by role:

curl -X GET https://api.resourcewatch.org/auth/user/ids/:role \
-H "Authorization: Bearer <your-token>"

Example request to find user ids of ADMIN users:

curl -X GET https://api.resourcewatch.org/auth/user/ids/ADMIN \
-H "Authorization: Bearer <your-token>"

Example response:


{
    "data": [
        "5e4d273dce77c53768bc24f9",
        "5e4d273dce77c53768bc24f8",
        "5e4d273dce77c53768bc24f7",
        "5e4d273dce77c53768bc24f6",
        "5e4d273dce77c53768bc24f5",
        "5e4d273dce77c53768bc24f4",
        "5e4d273dce77c53768bc24f3"
    ]
}

You can find the ids of the users for a given role using the following endpoint. Valid roles include "USER", "MANAGER" and "ADMIN". The response includes the array of ids matching the role provided in the data field.

Please keep in mind that, under the hood, user management relies on Okta - for this reason, this endpoint depends on Okta's user search functionalities to find users by role, and thus, inherits Okta's limitations. Okta limits user search at a maximum of 200 users per request, so in practice, this means we can only fetch pages of 200 users at a time. If you try to find, for instance, users for the "USER" role, since there's a high number of "USER" users, many requests will have to be made to Okta to fulfill this request. As such, the performance of this endpoint might be degraded.

Due to these limitations, we advise only resort to this endpoint when you have no other valid alternative to find users. Even in that case, you might run into slow response times or, ultimately, not receiving the expected results when calling this endpoint.

Also, please note that existing endpoints may rely on this endpoint to be able to fulfill their requests. This is the case of sorting or filtering datasets/widgets/layers by user role, for instance. As such, the performance of these endpoints may also be affected by the degradation of performance of this endpoint.

Graph

The interaction with some of the graph endpoints is restricted to other RW API services - the following sections describe these endpoints. Keep in mind user-facing graph endpoints are described in detail in the graph endpoint documentation. The graph concept docs might also be a useful resource for learning what the RW API graph is and what it has to offer you.

Creating dataset graph nodes

POST request to create a dataset graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/dataset/:id \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint creates a graph node for the dataset with id provided in the URL path.

This endpoint is automatically called on dataset creation, so you don't need to manually do it yourself after you create a dataset. In order to ensure that API users cannot manually create graph nodes for datasets, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to create a graph node for a dataset, you will receive a response with HTTP status code 403 Forbidden.

Errors for creating dataset graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.

Creating widget graph nodes

POST request to create a widget graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/widget/:idDataset/:idWidget \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint creates a graph node for the widget with id provided in the URL path. It also creates a graph edge, connecting the newly created widget graph node to the graph node for the dataset associated with this widget.

This endpoint is automatically called on widget creation, so you don't need to manually do it yourself after you create a widget. In order to ensure that API users cannot manually create graph nodes for widgets, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to create a graph node for a widget, you will receive a response with HTTP status code 403 Forbidden.

Errors for creating widget graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Dataset not found No graph node for the dataset with id provided was found.

Creating layer graph nodes

POST request to create a layer graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/layer/:idDataset/:idLayer \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint creates a graph node for the layer with id provided in the URL path. It also creates a graph edge, connecting the newly created layer graph node to the graph node for the dataset associated with this layer.

This endpoint is automatically called on layer creation, so you don't need to manually do it yourself after you create a layer. In order to ensure that API users cannot manually create graph nodes for layers, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to create a graph node for a layer, you will receive a response with HTTP status code 403 Forbidden.

Errors for creating layer graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Dataset not found No graph node for the dataset with id provided was found.

Creating metadata graph nodes

POST request to create a metadata graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/metadata/:resourceType/:idResource/:idMetadata \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint creates a graph node for the metadata with id provided in the URL path. As you might have come across in the Metadata endpoint documentation, metadata is always associated with either a dataset, layer, or widget. So, when creating a graph node for a metadata entry, you must also provide the resource type (dataset, layer, or widget) and its corresponding id.

Calling this endpoint will also create a graph edge connecting the newly created metadata graph node to the graph node for the resource (dataset, layer, or widget) associated with it.

This endpoint is automatically called on metadata creation, so you don't need to manually do it yourself after you create a metadata entry. In order to ensure that API users cannot manually create graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to create a graph node for a metadata entry, you will receive a response with HTTP status code 403 Forbidden.

Errors for creating metadata graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Resource {:resourceType} and id ${:idResource} not found No graph node for the resource with id provided was found.

Deleting dataset graph nodes

DELETE request to remove a dataset graph node:

curl -X DELETE https://api.resourcewatch.org/v1/graph/dataset/:id \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint deletes the graph node for the dataset with id provided in the URL path.

This endpoint is automatically called on dataset deletion, so you don't need to manually do it yourself after you create a dataset. In order to ensure that API users cannot manually delete graph nodes for datasets, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to delete a graph node for a dataset, you will receive a response with HTTP status code 403 Forbidden.

Errors for deleting dataset graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.

Deleting widget graph nodes

DELETE request to remove a widget graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/widget/:id \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint deletes the graph node for the widget with id provided in the URL path.

This endpoint is automatically called on widget deletion, so you don't need to manually do it yourself after you delete a widget. In order to ensure that API users cannot manually delete graph nodes for widgets, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to delete a graph node for a widget, you will receive a response with HTTP status code 403 Forbidden.

Errors for deleting widget graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.

Deleting layer graph nodes

DELETE request to remove a layer graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/layer/:id \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint deletes the graph node for the layer with id provided in the URL path.

This endpoint is automatically called on layer deletion, so you don't need to manually do it yourself after you delete a layer. In order to ensure that API users cannot manually delete graph nodes for layers, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to delete a graph node for a layer, you will receive a response with HTTP status code 403 Forbidden.

Errors for deleting layer graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.

Deleting metadata graph nodes

DELETE request to remove a metadata graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/metadata/:id \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint deletes the graph node for the metadata with id provided in the URL path.

This endpoint is automatically called on metadata deletion, so you don't need to manually do it yourself after you delete a metadata entry. In order to ensure that API users cannot manually delete graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to delete a graph node for a metadata entry, you will receive a response with HTTP status code 403 Forbidden.

Errors for deleting metadata graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.

Associating concepts to graph nodes

POST request to associate concepts to a graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/:resourceType/:idResource/associate \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json" \
-d '{
  "tags": ["health", "society"]
}'

This endpoint creates a graph edge, representative of the relationship between the resource identified in the URL path and the concepts provided in the tags field of the request body.

This endpoint is automatically called when you associate the vocabulary "knowledge_graph" to a resource, so you don't need to manually do it yourself. In order to ensure that API users cannot manually create graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to call this endpoint, you will receive a response with HTTP status code 403 Forbidden.

Errors for associating concepts with graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Resource {:resourceType} and id ${:idResource} not found No graph node for the resource with id provided was found.

Updating concepts associated with graph nodes

PUT request to update the concepts associated to a graph node:

curl -X PUT https://api.resourcewatch.org/v1/graph/:resourceType/:idResource/associate \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json" \
-d '{
  "tags": ["health", "society"],
  "application": "rw"
}'

This endpoint updates the graph edge associated with the resource identified in the URL path. Existing concepts are deleted and replaced with the ones provided in the tags field of the request body.

This endpoint is automatically called when you associate the vocabulary "knowledge_graph" to a resource, so you don't need to manually do it yourself. In order to ensure that API users cannot manually create graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to call this endpoint, you will receive a response with HTTP status code 403 Forbidden.

Errors for associating concepts with graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Resource {:resourceType} and id ${:idResource} not found No graph node for the resource with id provided was found.

Deleting concepts associated with graph nodes

DELETE request to remove concepts associated to a graph node:

curl -X DELETE https://api.resourcewatch.org/v1/graph/:resourceType/:idResource/associate \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint deletes the graph edge associated with the resource identified in the URL path.

This endpoint is automatically called when you associate the vocabulary "knowledge_graph" to a resource, so you don't need to manually do it yourself. In order to ensure that API users cannot manually create graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to call this endpoint, you will receive a response with HTTP status code 403 Forbidden.

Query parameters

Specifying the application of the resource to be deleted:

curl -X DELETE https://api.resourcewatch.org/v1/graph/:resourceType/:idResource/associate?application=gfw

You can use the query parameter application to specify the application of the graph edge to be deleted by this request. You can find out more information about this field here.

Errors for associating concepts with graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Resource {:resourceType} and id ${:idResource} not found No graph node for the resource with id provided was found.

Creating favorite relationships between users and graph nodes

POST request to create favorite relationship between user and graph node:

curl -X POST https://api.resourcewatch.org/v1/graph/favourite/:resourceType/:idResource/:userId \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json" \
-d '{ "application": "rw" }'

This endpoint creates a graph edge representative of a favorite relationship between the resource identified in the URL path and the user id also identified in the URL path.

This endpoint is automatically called when you call vocabulary's create favorite endpoint, so you don't need to manually do it yourself. In order to ensure that API users cannot manually create graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to call this endpoint, you will receive a response with HTTP status code 403 Forbidden.

Errors for associating concepts with graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Resource {:resourceType} and id ${:idResource} not found No graph node for the resource with id provided was found.

Deleting favorite relationships between users and graph nodes

DELETE request to remove favorite relationship between user and graph node:

curl -X DELETE https://api.resourcewatch.org/v1/graph/favourite/:resourceType/:idResource/:userId \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json"

This endpoint deletes the graph edge representative of a favorite relationship between the resource identified in the URL path and the user id also identified in the URL path.

This endpoint is automatically called when you call vocabulary's delete favorite endpoint, so you don't need to manually do it yourself. In order to ensure that API users cannot manually create graph nodes for metadata entries, this endpoint requires authentication from a RW API service, meaning that normal API users won't be able to call this endpoint successfully. If, as an API user and using your user's token, you try to call this endpoint, you will receive a response with HTTP status code 403 Forbidden.

Query parameters

Specifying the application of the favorite relationship to be deleted:

curl -X DELETE https://api.resourcewatch.org/v1/graph/favourite/:resourceType/:idResource/:userId?application=gfw

You can use the query parameter application to specify the application of the graph edge to be deleted by this request. You can find out more information about this field here.

Errors for associating concepts with graph nodes

Error code Error message Description
401 Unauthorized No authorization token provided.
403 Not authorized You are trying to call this endpoint without being identified as a RW API service.
404 Resource {:resourceType} and id ${:idResource} not found No graph node for the resource with id provided was found.

Forest Watcher Contextual layers

Updating the Tree Cover Loss data

The FW Contextual layers microservice serves tree cover loss (TCL) data in tile format through one of its endpoints. The underlying dataset that is used to create these tiles is updated regularly - typically once a year - through a process that requires minor modifications to the configuration of the service. Below we'll outline the steps to carry out this update

Update the Tree Cover Loss layer URL

The first step is updating the source code of the Contextual layers microservice - specifically this line - and setting the updated URL for the updated dataset containing the tree cover loss data. This updated dataset will be used as a replacement of the previous one. On the more common scenario of the yearly updates to the TCL data, this means that the new dataset should have the data for all previously existing years, plus the newly added year.

Once you've done this, be sure to follow the Microservice development guide to test and deploy your changes to all relevant live environments.

Creating/updating a FW contextual layer

At this stage, the FW Contextual layers is already capable of serving the updated data through the tile endpoint. However, the Contextual layers microservice also has a built-in layer listing mechanism, which you may want to update, for the convenience of end users. You can learn more about it in the FW Contextual layers endpoint documentation section.

Example request to add a new contextual layer

curl -X POST https://api.resourcewatch.org/v1/contextual-layer \
-H "Authorization: Bearer <your-token>" \
-H "Content-Type: application/json" -d \
'{
    "isPublic": true,
    "name": "Tree Cover Loss (2019)",
    "url": "https://api.resourcewatch.org/contextual-layer/loss-layer/2018/2019/{z}/{x}/{y}.png",
    "enabled": true
}'

In the scenario where you are updating the TCL layer to add a new year of data, you typically want to add a new contextual layer to the list of existing ones, so it shows up in future listings of layers used by client applications (and thus becomes available to end users through client UIs). To do so, you should use the Create contextual layer endpoint with a payload similar to the example on the side. The example above illustrates how you would add new data for the 2019 year - be sure to adjust the layer name and URL to match your needs.