Resource Watch API Concepts
This section of the documentation defines the key concepts of the RW API and how they interact with each other. It's divided into three sections:
Common Behaviors details how the RW API approaches standard API behaviors like caching and pagination.
Common Concepts describes several API-specific conceptual entities that are used across multiple endpoints.
Core Resources is a set of high-level descriptions of the key resources that the RW API manages. It's useful review the concept docs for a given resource before diving into the corresponding section in the reference documentation.
Back to the documentation homepage
Common Behaviors
This section covers common API behaviors implemented by various endpoints related to errors, caching, sorting, filtering, and pagination.
Errors
The following error codes are used across the API. Each endpoint also documents resource-specific error codes and associated messages in the reference docs.
Error Code | Meaning |
---|---|
400 | Bad Request -- Your request is incomplete or contains errors. |
401 | Unauthorized -- Your JWT is missing or out of date. |
403 | Forbidden -- You do not have permission to access this resource. |
404 | Not Found -- The resource requested could not be found. |
409 | Conflict -- The resource requested is currently locked and cannot be edited. |
422 | Unprocessable -- The request cannot be processed. Certain required fields may be missing or incorrect. |
500 | Internal Server Error -- The server encountered an error while processing your request. Try again later. |
Caching
HTTP caching is a technique that stores a copy of a given resource and serves it back when requested. When a cache has a requested resource in its store (also called a cache hit), it intercepts the request and returns its copy instead of re-computing it from the originating server. If the request is not yet stored in cache (also called cache miss), it is forwarded to the server responsible for handling it, the response is computed and stored in cache to serve future requests. This achieves several goals: it eases the load of the server that doesn’t need to serve all requests itself, and it improves performance by taking less time to transmit the resource back. You can read more about HTTP caching in the Mozilla Developer Network docs on HTTP caching.
The RW API has a server-side, system-wide HTTP cache that may be used to cache your requests. Keep in mind that, in the context of the RW API, not all endpoints have caching enabled for them. You'll find a list below with the services which rely on caching in the RW API. If, as a developer, you'd like your endpoints to benefit from caching, you need to explicitly implement it. Please refer to the developer docs on HTTP caching for more details on how to implement caching for your API resources.
The default cache time to live (TTL) applied to the responses stored in the RW API's cache is 3 days, but specific endpoints may specify a different expiration time for their responses. For the purposes of caching, only responses of GET requests with successful response codes (such as 200, 203, 204, or 302) are considered for caching. Also, for security reasons, authentication, authorization, or user related information contained in the HTTP request headers is never stored in cache. This is also the case for authenticated GET responses.
3rd party HTTP caching
Keep in mind that, besides the RW API cache, there might be other HTTP caches between your application and the RW API servers. These caches might be public (ie. your ISP's cache, or your local network's) or private (your browser's cache), and one or many may exist between you and the RW API infrastructure (which includes the RW API cache described here). The information detailed below describes the behavior of the RW API cache only, and illustrates how you, as a RW API user, would perceive it, were there no other caches at play. However, that may not always be true, and you may experience different behavior caused by these other caches. If you suspect this may be the case, you should:
- Disable any local cache you may have (for example, if you are using a browser to issue requests, you may need to explicitly disable the browser's built-in cache).
- Use HTTPS to bypass public HTTP caches.
HTTPS and caching
As you may or may not know, HTTPS - the secure variant of HTTP protocol - establishes a secure, encrypted connection between client (you) and server (the RW API). This architecture means that traditional public HTTP caches cannot be used, and are thus bypassed. However, the RW API HTTP cache operates within the RW API infrastructure, meaning it will still be used to cache and serve responses, even if you use an HTTPS connection.
How to detect a cached response
Example cURL command with detailed header information:
curl -svo /dev/null https://api.resourcewatch.org/v1/dataset \
-H "x-api-key: <your-api-key>"
Example response of the command above including a MISS
x-cache
header:
< HTTP/2 200
< content-type: application/json; charset=utf-8
< server: nginx
< cache: dataset
< x-response-time: 37 ms
< accept-ranges: bytes
< date: Tue, 29 Dec 2020 15:44:18 GMT
< via: 1.1 varnish
< age: 0
< x-served-by: cache-mad22045-MAD
< x-cache: MISS
< x-cache-hits: 0
< x-timer: S1609256659.546595,VS0,VE426
< vary: Origin, Accept-Encoding
< content-length: 11555
Example response of the command above including a HIT
x-cache
header:
< HTTP/2 200
< content-type: application/json; charset=utf-8
< server: nginx
< cache: dataset
< x-response-time: 37 ms
< accept-ranges: bytes
< date: Tue, 29 Dec 2020 15:44:26 GMT
< via: 1.1 varnish
< age: 7
< x-served-by: cache-mad22039-MAD
< x-cache: HIT
< x-cache-hits: 1
< x-timer: S1609256666.390657,VS0,VE0
< vary: Origin, Accept-Encoding
< content-length: 11555
One of the most important things you should know about caching is how to detect if you are receiving a cached response or not. To do this, you should inspect the headers of RW API's responses, looking for a x-cache
header. If it does not contain this header, it was not cached by the RW API system-wide cache. If it contains the x-cache
header, it will have one of two values:
MISS
, which means the resource you're trying to GET was not found in cache, and so a fresh response was served;HIT
, which means the resource you're trying to GET was found in cache and the cached response was served.
You can read more about this and other cache-related headers used by the RW API in this link.
Keep in mind that 3rd party caches might be present between your application and the RW API servers which can modify these headers.
Cache invalidation
One of the common hassles of caching is cache invalidation - how to tell a caching tool that a certain response it has stored is no longer valid, and needs to be recomputed.
The RW API handles this internally and automatically for you. It has a built-in system that is able to invalidate specific cached responses, following a request that affects the output of said responses. This mechanism is rather complex, but you, as the RW API user don't really need to worry about it - you just need to be aware that the RW API cache will be invalidated automatically, so that you always get the correct, up to date information for your requests, even if they had been previously cached.
Keep in mind that 3rd party caches might be present between your application and the RW API servers, and their content may not be invalidated immediately.
Which services rely on caching
Sorting and Filtering
Sorting
Example request sorting by a single condition:
curl -X GET https://api.resourcewatch.org/v1/dataset?sort=name \
-H "x-api-key: <your-api-key>"
Example request sorting by multiple conditions:
curl -X GET https://api.resourcewatch.org/v1/dataset?sort=name,description \
-H "x-api-key: <your-api-key>"
Example request sorting by multiple conditions, descending and ascending:
curl -X GET https://api.resourcewatch.org/v1/dataset?sort=-name,+description \
-H "x-api-key: <your-api-key>"
As a rule of thumb, you can sort RW API resources using the sort
query parameter. Usually, sorting can be performed using any field from the resource schema, so be sure to check each resource's model reference to find which fields can be used for sorting. Sorting by nested model fields is not generally supported, but may be implemented for particular resources. In some exceptional cases, you also have the possibility of sorting by fields that are not present in the resource model (e.g., when fetching datasets, you can sort by user.name
and user.role
to sort datasets by the name or role of the owner of the dataset) - be sure to check each resource's documentation to find out which additional sorting criteria you have available.
Multiple sorting criteria can be used, separating them by commas. You can also specify the sorting order by prepending the criteria with either -
for descending order or +
for ascending order. By default, ascending order is assumed.
Keep in mind that it’s up to each individual RW API service (dataset, widget, layer, etc) to define and implement the sorting mechanisms. Because of this, the examples above may not be true for all cases. Refer to the documentation of each resource and endpoint for more details on sorting.
Which services comply with these guidelines
The following endpoints adhere to the Sorting conventions defined above:
- Get v2 areas endpoint
- Get areas endpoint
- Get collections endpoint
- Get dashboards endpoint
- Get datasets endpoint
- Get layers endpoint
- Get metadata endpoint
- Get widgets endpoint
Filtering
Example request filtering using a single condition:
curl -X GET https://api.resourcewatch.org/v1/dataset?name=viirs \
-H "x-api-key: <your-api-key>"
Example request filtering using multiple conditions:
curl -X GET https://api.resourcewatch.org/v1/dataset?name=birds&provider=cartodb \
-H "x-api-key: <your-api-key>"
Example request filtering by an array field using the
,
OR multi-value separator:
curl -X GET https://api.resourcewatch.org/v1/dataset?application=rw,gfw \
-H "x-api-key: <your-api-key>"
Example request filtering by an array field using the
@
AND multi-value separator:
curl -X GET https://api.resourcewatch.org/v1/dataset?application=rw@gfw \
-H "x-api-key: <your-api-key>"
Like in the case of sorting, most RW API resources allow filtering the returned results of list endpoints using query parameters. As a rule of thumb, you can use the API resource's fields as query parameter filters, as shown in the examples on the side. You can also combine different query parameters into a complex and
logic filter. Note that you can achieve a logical or
by passing a regular expression with two disjoint options, like this: ?name=<substr_a>|<substr_b>
.
For string type fields, the filter you pass will be interpreted as a regular expression, not as a simple substring filter. This gives you greater flexibility in your search capabilities. However, it means that, if you intend to search by substring, you must escape any regex special characters in the string.
Array fields (like the application
field present in some of the API resources - read more about the application field) support more complex types of filtering. In such cases, you can use ,
as an or
multi-value separator, or @
as a multi-value, exact match separator.
Object fields expect a boolean value when filtering, where true
matches a non-empty object and false
matches an empty object. Support for filtering by nested object fields varies for different API resource, so be sure to check the documentation of the API endpoint for more detailed information.
Again, as in the case of sorting, keep in mind that it’s up to each individual RW API service (dataset, widget, layer, etc) to define and implement the filtering mechanisms. Because of this, the examples above may not be true for all cases. Refer to the documentation of each resource and endpoint for more details on filtering and the available fields to use as query parameter filters.
Which services comply with these guidelines
The following endpoints adhere to the Filtering conventions defined above:
Pagination
Example request where default pagination is applied, returning one page of 10 elements (1st - 10th elements):
curl -X GET https://api.resourcewatch.org/v1/dataset \
-H "x-api-key: <your-api-key>"
Example request fetching the 3rd page of 10 elements (21st - 30th elements):
curl -X GET https://api.resourcewatch.org/v1/dataset?page[number]=3 \
-H "x-api-key: <your-api-key>"
Example request fetching the 5th page of 20 elements (81st - 100th elements):
curl -X GET https://api.resourcewatch.org/v1/dataset?page[number]=5&page[size]=20 \
-H "x-api-key: <your-api-key>"
Many times, when you're calling RW API's list endpoints, there will be a lot of results to return. Without pagination, a simple search could return hundreds or even thousands of elements, causing extraneous network traffic. For that reason, many services list their resources as pages, to make sure that not only responses are easier to handle, but also that services are scalable. Most paginated results have a built-in default limit of 10 elements, but we recommend you always explicitly set the limit parameter to ensure you know how many results per page you'll get.
The pagination strategy used across the RW API relies on two query parameters:
Field | Description | Type | Default |
---|---|---|---|
page[size] |
The number elements per page. Values above 100 are not officially supported. | Number | 10 |
page[number] |
The page number. | Number | 1 |
Keep in mind that, to work predictably, you must always specify sorting criteria when fetching paginated results. If sorting criteria is not provided, the overall order of the elements might change between requests. Pagination will still work, but the actual content of the pages might show missing and/or duplicate elements. Refer to the general sorting guidelines and the sorting section for the RW API resource you're loading for details on sorting options available for that resource type.
Once again, keep in mind that it’s up to each individual RW API service (dataset, widget, layer, etc) to define and implement the pagination strategy. Because of this, the examples above may not be true for all cases. Refer to the documentation of each resource and endpoint for more details on how to correctly paginate your list requests.
Structure of a paginated response
Example request where default pagination is applied:
curl -X GET https://api.resourcewatch.org/v1/dataset \
-H "x-api-key: <your-api-key>"
Example paginated response:
{
"data": [
{...},
{...},
{...},
{...},
{...},
{...},
{...},
{...},
{...},
{...}
],
"links": {
"self": "http://api.resourcewatch.org/v1/dataset?page[number]=1&page[size]=10",
"first": "http://api.resourcewatch.org/v1/dataset?page[number]=1&page[size]=10",
"last": "http://api.resourcewatch.org/v1/dataset?page[number]=99&page[size]=10",
"prev": "http://api.resourcewatch.org/v1/dataset?page[number]=1&page[size]=10",
"next": "http://api.resourcewatch.org/v1/dataset?page[number]=2&page[size]=10"
},
"meta": {
"size": 10,
"total-pages": 99,
"total-items": 990
}
}
Paginated responses return a JSON object containing 3 data structures:
data
is an array containing the actual list of elements which results from applying the pagination criteria specified in thepage[number]
andpage[size]
query parameters;links
is a helper object that provides shortcut URLs for commonly used pages, using the same criteria applied in the initial request:self
contains the URL for the current page;first
contains the URL for the first page;last
contains the URL for the last page;prev
contains the URL for the previous page;next
contains the URL for the next page;
meta
is an object containing information about the total amount of elements in the resource you are listing:size
reflects the value used in thepage[size]
query parameter (or the default size of 10 if not provided);total-pages
contains the total number of pages, assuming the page size specified in thepage[size]
query parameter;total-items
contains the total number of items;
Which services comply with these guidelines
The following endpoints adhere to the pagination conventions defined above:
- Areas service
- Areas v2 service
- Collections service
- Dashboards service
- Datasets service
- Layers service
- Tasks service
- Widgets service
Common Concepts
Several important concepts cut across many different API resources. Knowing about these concepts is fundamental for a better understanding of how to interact with the RW API.
Applications
As you might come across while reading these docs, different applications and websites rely on the RW API as the principal source for their data. While navigating through the catalog of available datasets, you will find some datasets used by the Resource Watch website, others used by Global Forest Watch. In many cases, applications even share the same datasets!
To ensure the correct separation of content across the different applications that use the RW API, you will come across a field named application
in many of the API's resources (such as datasets, layers, widgets, and others). Using this field, the RW API allows users to namespace every single resource, so that it's associated only with the applications that use it.
Existing applications
Currently, the following applications are using the API as the principal source for their data:
- the Resource Watch website, where the
application
field takes the valuerw
; - the Global Forest Watch website, where the
application
field takes the valuegfw
; - the Partnership for Resilience and Preparedness website, where the
application
field takes the valueprep
; - the Forest Atlases websites for different countries of the world also rely on the RW API - in the case of these websites, the
application
field takes the valueforest-atlas
; - the Forest Watcher mobile application, where the
application
field takes the valuefw
;
If you would like to see your application added to the list of applications supported by the RW API, please contact us.
Best practices for the application field
Fetching datasets for the Resource Watch application
curl -X GET https://api.resourcewatch.org/v1/dataset?application=rw \
-H "x-api-key: <your-api-key>"
Fetching datasets for the Global Forest Watch application
curl -X GET https://api.resourcewatch.org/v1/dataset?application=gfw \
-H "x-api-key: <your-api-key>"
This section describes some best practices when using the application
field. Please keep in mind that, since it is up to each RW API service to implement how this field is used, there might be some differences in the usage of this field between RW API services. Refer to each RW API resource or endpoint's documentation for more details on each specific case.
As a rule of thumb, the application
field is an array of strings, required when creating an RW API resource that uses it. You can edit the application
values by editing the RW API resource you are managing. You can then use the application
field as a query parameter filter to filter content specific to the given application (check some examples of the usage of the application
field when fetching RW API datasets on the side).
RW API users also use the application
field and can be associated with multiple applications. In this case, the application
field is used to determine which applications a user manages (access management). As you'll be able to understand from reading General notes on RW API users, each user's application
values are used to determine if a given user can administrate an RW API resource. Typically, to manipulate said RW API resource, that resource, and the user account, must have at least one overlapping value in the application
field.
Which services comply with these guidelines
Below you can find a list of RW API resources that use the application
field:
- Areas v1
- Areas v2
- Collections
- Dashboards
- Dataset
- Graph
- Layer
- Metadata
- Subscriptions
- Topics
- Users
- Vocabulary
- Widgets
Environments
Certain RW API resources, like datasets, layers, or widgets, use the concept of environment
, or env
, as a way to help you manage your data's lifecycle. The main goal of environments
is to give you an easy way to separate data that is ready to be used in production-grade interactions from data that is still being improved on.
When you create a new resource, like a dataset, it's given the production
env value by default. Similarly, if you list resources, there's an implicit default filter that only returns resources whose env
value is production
. This illustrates two key concepts of environments
:
- By default, when you create data on the RW API, it assumes it's in a production-ready state.
- By default, when you list resources from the RW API, it assumes you want only to see production-ready data.
However, you may want to modify this behavior. For example, let's say you want to create a new widget on the RW API and experiment with different configuration options without displaying it publicly. To achieve this, you can set a different environment
on your widget - for example, test
. Or you may want to deploy a staging version of your application that also relies on the same RW API but displays a different set of resources. You can set those resources to use the staging
environment and have your application load only that environment, or load both production
and staging
resources simultaneously. You can also list resources in all environments, by using the special all
query parameter value.
Keep in mind that production
(set by default) and all
(not supported) are the only "special" values for the environment
field. Apart from it, the environment
can take any value you want, without having any undesired side-effects.
Resources that use environment
can also be updated with a new environment
value, so you can use it to control how your data is displayed. Refer to the documentation of each resource to learn how you can achieve this.
It's worth pointing out that endpoints that address a resource by id typically don't filter by environment
- mostly only listing endpoints have different behavior depending on the requested environment
value. Also worth noting is that this behavior may differ from resource to resource, and you should always refer to each endpoint's documentation for more details.
In a nutshell, this means that:
- When creating one of these resources, it will be set with the
production
environment, unless specified otherwise by you. - When listing these resources, the list will only show elements with the
production
environment, unless you explicitly filter by a different value. Filtering byall
shows resources for all environments. - You may update a resource's environment using the corresponding update endpoint (there are special cases to this, please refer to the specific resource documentation).
- Endpoints that address a resource by id, like fetching by id, updating or deleting, do not filter by environment.
Which services support environments
The following services/resources comply with the above guidelines:
- Dataset
- Graph
- Layer
- Subscriptions
- Widgets
- Areas v2
- Collections
- Topic
- Dashboard
- FAQ, Tools, Static pages and Partners (not documented yet, part of the Resource Watch manager microservice)
Including related resources with environments
Some of the resources above link to each other (for example, datasets may have layers and widgets associated with them), and have a convenient includes
optional query parameter, that allows you to load said associated resources in a single API call. When fetching a list of these resources and their associated (included) resources while using an environment filter, the resource list will be filtered by environment, but the included resources will not - so, for example, you may get a list of datasets filtered by the production
environment, but it may include layers and/or widgets from other environments.
In these situations, if you want to also filter the included resources by the same env
value, you can provide the optional filterIncludesByEnv=true
query parameter.
References to resources with environments
The Vocabulary and Graph services fundamentally serve to create associations around resources, either between themselves or with tags. At their core, these services do not use the concept of environment
, but the result of some of their endpoints are lists of resources like datasets, layers or widgets, which do have environment
.
When using these endpoints, you can provide an optional env
query parameter filter, that will be applied to the list of environment-aware resources returned by those endpoints. Note that, unlike listing resources using their "native" endpoints, these endpoints do not default to filtering the result set by the production
environment - by default, they will not apply any env
filtering, and will return resources from all environments.
User roles
RW API users have a role associated with it, defined in the role
field of each user. You can check your own role by consulting your user information using the GET /users/me
endpoint or getting a JSON Web Token and decoding its information. The role
of the user is defined as a string, and it can take one of the following values:
USER
MANAGER
ADMIN
Role-based access control
Typical hierarchy for roles:
USER (least privileges) < MANAGER < ADMIN (most privileges)
The role field is usually used across the RW API for controlling access to API resources. While not required nor enforced, typically user roles are used hierarchically, being USER
the role with the least privileges, and ADMIN
the one with most privileges. A common pattern you’ll find on some services is:
USER
accounts can read (usually all data or just data owned by the user, depending on any privacy or security concerns in the service in question), but only create new resources;MANAGER
accounts can perform all of theUSER
actions, complemented with editing or deleting resources owned by them;ADMIN
accounts can do all of the above even for resources created by other users.
Role-based access control is usually conjugated with the list of applications associated with the user: typically, in order to manipulate a given resource, that resource and the user account must have at least one overlapping application value. Read more about the application field and which services use it in the Applications concept documentation.
Keep in mind that it’s up to each individual RW API service (dataset, widget, layer, etc) to define how they restrict or allow actions based on these or other factors, so the examples above may not be true for all cases. Refer to the documentation of each resource and endpoint for more details on restrictions they may have regarding user accounts and their properties.
How to change the role of an user
Changing role of users is restricted to ADMIN
users, so if you intend to upgrade your user role to a MANAGER
or ADMIN
role, please get in touch with one of the ADMIN
users and request the change. If you are already an ADMIN
user and you intend to change the role of another user, you can do so using the PATCH /users/:id
endpoint.
Which services comply with these guidelines
The following endpoints adhere to the user role conventions defined above:
Core Resources
This section provides high-level, conceptual descriptions of the core resources managed by the RW API. For details on how to interact with these resources, see the Reference documentation.
Dataset
One of the Resource Watch API's (RW API) goals is to provide a common interface for interacting with data hosted in different sources and formats. A Dataset is the Resource Watch's API way of providing users with access to data, while trying to, as much as possible, abstract and standardise operations that would otherwise fall on the user's hands to figure out. It's one of the cornerstones that many other API features build upon - and those features can help you get even more out of your data!
Example: A Resource Watch API dataset can represent data in a JSON file, hosted on Carto, Global Forest Watch or Google Earth Engine, to name a few. However, when accessing that data, you don't have to learn 4 different technologies - the Resource Watch API gives you a single, unified query interface.
On top of datasets, the RW API offers multiple resources that allow you to access data in multiple formats. These will be covered later in full detail, but as an example, here are some ways in which you can access and use datasets:
- the RW API allows you to create Widgets, graphic representations of data, which can be made interactive to meet your custom needs;
- if your data is georeferenced, you can use the Layers service to display data on informative maps;
- you can create Subscriptions associated with datasets, and be notified via email of significant updates;
- you can build your own Queries using a SQL-like syntax, and use the data to build your own custom visualizations.
- the Metadata service offers you a way to provide additional details about your data, like multi-language descriptions that will allow you to reach a broader audience
- the Geostore service allows you to access or save geometries, offering lots of useful tools when handling georeferenced datasets.
Dataset providers
Each dataset has a provider (json/carto/GEE/...) that must specified on creation - it's this value that tells the RW API how to handle different data providers and formats, so you don't have to. Below you'll find a list of the different providers supported:
Carto
Carto is an open, powerful, and intuitive map platform for discovering and predicting key insights underlying the location data in our world.
Global Forest Watch
Global Forest Watch (GFW) is an online platform that provides data and tools for monitoring forests. By harnessing cutting-edge technology, GFW allows anyone to access near real-time information about where and how forests are changing around the world.
ArcGIS feature layer
ArcGIS server is a complete, cloud-based mapping platform. You can learn more about ArcGIS here.
Google Earth Engine
Google Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth’s surface.
Web Map Services
WMS connector provides access to data served through OGC WMS protocol standard.
Rasdaman (Raster Data Manager)
Rasdaman is a database with capabilities for storage, manipulation and retrieval of multidimensional arrays.
NEX-GDDP
The NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) dataset is comprised of downscaled climate scenarios for the globe that are derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) and across two of the four greenhouse gas emissions scenarios known as Representative Concentration Pathways (RCPs).
Note: While you may find and use existing dataset of this type, creation of new NEX-GDDP based datasets is restricted to specific users.
BigQuery
BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse designed to help you make informed decisions quickly, so you can transform your business with ease.
Note: While you may find and use existing dataset of this type, creation of new BigQuery based datasets is restricted to specific users.
Loca
LOCA, which stands for Localized Constructed Analogs, is a technique for downscaling climate model projections of the future climate.
Note: While you may find and use existing dataset of this type, creation of new LOCA based datasets is restricted to specific users.
Comma-Separated Values (CSV)
Data provided in the form of a Comma-Separated Values (CSV) document.
Tab-Separated Values (TSV)
Data provided in the form of a Tab-Separated Values (TSV) document.
JavaScript Object Notation (JSON)
Data provided in the form of a JSON document.
Extensible (X) Markup Language (XML)
Data provided in the form of a XML document.
Dataset connector type
Each dataset provider has an associated connector type, which you can determine using the table below.
Connector type | Providers |
---|---|
document | csv , json , tsv , xml |
rest | cartodb , gfw , featureservice , gee , bigquery , rasdaman , nexgddp , loca |
wms | wms |
The connector type reflects an important aspect of a dataset: where is the actual data kept, and how is it accessed:
document
connector type: a dataset that uses this connector type has its data hosted on the RW API database. For example, datasets with providerjson
are based on user provided JSON files, and have thedocument
connector type. This means that, on dataset creation, the content of the provided JSON files is copied onto an internal RW API database, and future queries will get their data from that database - the actual JSON files are not used after the creation process is done.rest
connector type: a dataset that uses this connector type proxies the underlying service specified asprovider
. For example, a dataset that uses the providercartodb
has the connector typerest
. This means that queries to that dataset will cause the API to query the carto URL provided, and pass the result to the user. Apart from temporary caches, the actual data is never kept on the RW API itself. The underlying carto table needs to exist and be accessible for the RW API dataset to work.wms
connector type: this connector type is used only for datasets that use thewms
provider. The RW API does not access the data for these datasets.
Query
In the previous section, we covered the concept of a RW API dataset which, in simple terms, is a way to tell the RW API that your data exists, and where. While cataloging datasets in a public repository is useful, making that data easily accessible is one of the main goals of the RW API. This is where queries come in.
In the context of the RW API, a dataset query is very similar an SQL query would be to a database - it's a specially crafted statement that allows you to express what data you want, from which dataset, and with which structure. The RW API will use that query to get you the data you need, in the format you asked for, so that it's easy to use in the context of your applications. While it doesn't comply (nor does it attempt to) with any of the formal SQL specifications, RW API queries use a SQL-like syntax that will be very familiar to anyone who has worked with a relational database in the past. If that's not your case, there are many tutorials out there that will help you learn the basics in no time.
Using this common, SQL-based syntax, RW API queries allow you to query its datasets using a common API and syntax, no matter where the actual underlying data is hosted. Querying a carto dataset is the same as querying a JSON document or a BigQuery table. This is one of the main goals of the RW API, and one of most valuable features we offer you, the end user - using a single tool and language, you can quickly query data from a wide range of sources, on a broad set of topics.
The /query
reference documentation goes into more detail on how you can submit your queries to the API, the specifics of more advanced operations and detailed limitations of querying the RW API but, for now, there are 3 high-level ideas that you should keep in mind:
All queries should be SELECT
queries
Querying in the RW API is meant to be used only to read data, not to manipulate it. If you have used SQL before, you know it can be used to modify data, but, as a rule of thumb, that's not the approach used in the RW API. If you'd like to modify the data of a dataset, you should use the dataset update endpoints instead. The only exception to this is for deleting data from document-based datasets - check here for more details.
Not all SQL constructs are supported
If you've used SQL in the past, you know how powerful (and complex) it can be. Things like nested queries or joins can be hard to use and even more to maintain, even without the added complexity of an environment where multiple data providers coexist. That's why the RW API limits its support to basic SQL syntax, so we can focus on delivering a tool that's simple and easy to use for most users. The supported SQL syntax reference section reference docs go into more detail on what's supported and what's not, and will help you understand the specifics of what you can achieve with RW API queries.
Some operations will depend on the provider of the dataset you're querying
Our goal is to provide a common querying interface across all datasets, independent of their provider. While we believe we've achieved it in most cases, RW API queries can only be as powerful as the underling provider (and their own APIs) allows them to be. There are cases in which a given SQL construct or function is supported for a given provider, but not for another. The supported SQL syntax reference reference docs has more details on these limitations, per provider.
Layer
Apart from the possibility of fetching data from multiple data sources using a common interface, the RW API provides you with features to help you visualize such data. A layer is one of the resources that you can use to help you make custom visualizations for your data, and they play a very important role when visualizing datasets that contain geospatial data.
A layer is a visual specification of how the data of a dataset can be rendered and styled on a map. It stores data that will help your applications render dataset data as a map layer, using tools like Leaflet, Mapbox GL JS or Layer Manager. In this sense, it's important to emphasise that a layer object does not store the actual data to be represented - it instead stores the information that those rendering tools will use to render your data. Rephrasing it: the data should come from the dataset (through queries), while a layer provides the information about how that data is meant to be rendered by the corresponding rendering tool.
In the layer endpoint documentation, we'll go into more detail on how you can manage layers, as well as the existing limitations that you might run into when using layers. However, for now, there are 3 high-level ideas that you should keep in mind:
The layer does not interact with dataset data
Layers store visual configurations needed for rendering styling a given dataset's data. However, they do not interact with the dataset data in any way. It is your responsibility, as a RW API user, to build the appropriate queries to retrieve the data that you want to display (read more about datasets or how you can query datasets to obtain their data). Remember, use layers as a complement of dataset queries, not a replacement.
Most of layer fields are free-form
Many of the fields in the layer object are free-form, meaning the RW API does not apply any restriction to whatever is saved in those fields. This is great, because it allows for a very high level of flexibility for users, but has the downside of making it harder to document how to use layers. We tried our best to provide clear and consistent specifications for all the fields of a layer, but keep in mind that this high level of flexibility makes it harder to deliver a concrete description.
Different applications and rendering tools use layers in different ways
At the moment, most applications using the layer endpoints have adapted layers to tailor fit their needs, taking advantage of the flexibility that free-form fields bring. This means that, when navigating through the catalog of available layers, you might run into very different ways of implementing and using layers. Once again, we will try our best to cover the different ways of using and managing layers, but keep in mind that there is currently no standard way of using layers.
As layers are typically managed within the realm of a RW API based application, they typically contain data that covers the needs of that application's specific rendering tool. This again means that you may find varying structure in the data contained in a layer.
Widget
You have fetched your data from your dataset using queries. And you know how to build custom visualizations for geospatial data using layers. Yet, sometimes you want to build a plain graphical representation of your data - whether it is geo-referenced or not: this is when widgets come in handy.
A widget is a visual specification of how to style and render the data of a dataset (think pie, bar or line charts).
As with layers, each widget has a single dataset associated with it, and a dataset can be associated with many widgets. You can represent the same data in different ways by creating different widgets for the same dataset. The same widget can store independent configuration values for each RW API based application. It can also contain the required configuration for rendering the same visualization using different rendering tools.
However, this association between widgets and datasets is only for organizational purposes. As such, like in the case of layers, the widget itself does not interact with the dataset data. You can either use the widget's queryUrl
field to store the query to get the widget's data or store it inside the free form widgetConfig
object. In any of these cases, it is your responsibility as an API user to query the data that will be used for rendering the widget.
In the widget endpoint documentation, you can get into more detail on how you can manage widgets.
Widget configuration using Vega grammar
As in the case of layers (where many fields are free-form), the RW API does not apply any restriction to the data saved in the widget configuration field (widgetConfig
). This allows for a very high level of flexibility for users but has the downside of making it harder to document how to use widgets.
However, the existing widgets' standard approach for building widget configuration objects is using Vega grammar. Vega is a visualization grammar, a declarative format for creating interactive visualization designs. You can then use a chart library that supports Vega grammar syntax (such as d3.js) to render the widget and get your graphical representation.
Please note that, in any case, you can use whatever configuration format you wish since the widgetConfig
field is free-form.
Metadata
Understanding the context of data, empowering users to find and make informed decisions about its suitability is another important goal of the Resource Watch API (RW API). For example, a user may be looking to answer questions like, "Does this dataset provide information about tree cover in my region?", "What are the physical units?", "How was this measured?", and "How should I provide proper attribution to the data provider?".
Answers to these questions can be stored in a resource's metadata. By definition, metadata is always associated with another entity. In the context of the RW API, metadata objects will always contain information about either a dataset, a layer or a widget.
Content-wise, the RW API aims to provide a good balance between structure and flexibility with its metadata service, providing a group of common fields you'll find on all metadata elements, as well as giving API users the tools to create metadata that meets the individual needs of their applications. Through a mix of both types of fields, these are some of the recommended information you should aim to provide when specifying the metadata for one of your resources:
- Title/Name – Name given to the resource.
- Description – A description of the resource.
- Author – The entities or persons produced the data.
- Source – Where the original data was sourced from, and how to access said source.
- Contact information – A way to contact the author.
- Schema – Description of the data structure, like column names, types, descriptions, etc.
- License – Information about the rights held in and over the resource.
As we'll see in further details when covering the metadata endpoints, a RW API metadata object has a few structured but optional fields to capture some of the general details described above, while also allowing you to specify your own custom fields for ease of extension and flexibility, should you want to provide additional levels of detail about your resources. It will also give you the tools to only specify a subset of the suggested elements above, should you decide to use that approach.
Besides being associated with a single RW API resource (dataset, widget, or layer), metadata objects must also identify the language in which they are written, as well as the application they are associated with. This allows a single resource, say a dataset, to have its metadata translated into different languages, and be adjusted to meet the needs of different applications. We'll go into further details on these when we cover the metadata endpoints in their dedicated section.
Vocabularies and tags
Vocabularies and tags are the two central pillars of the RW API's tagging mechanism, so we'll be covering them together. These concepts can seem abstract at times, so it's important that you understand the concepts we'll describe here, before moving on to the documentation for the actual endpoints.
Tags
Let's start with an example: you are browsing some old files on your computer, and you run into your MP3 collection from when you were a teenager. You re-discover some great tunes from way back when, but you also rediscover how unorganized you were, so you decide to curate your music collection a bit, for old times sake. So, as you listen to those "dusty" files, you start adding keywords to each file, like "rock", "slow one" or "1996", that describe each song, and help you structure and organize your collection. You are adding tags to your files.
Like in our example, a RW API tag is a simple word or a concept, that is used to describe a resource. In the RW API, a resource can be a dataset, a widget, or a layer (MP3 files currently not supported, sorry). While a tag can be whatever you want, they are most useful if they capture simple concepts that apply to multiple resources. So, while "rock", "slow one" and "1996" are three good tags for your music collection, a tag like "slow rock song from 1996" is probably too specific, and wouldn't work well as a tag - we'll see why in a moment.
Now that you have your tags in place, your music collection is starting to feel a lot more organized. You notice you can quickly group similar songs, and create playlists like "pop songs" or "95s punk", and you can even use tags to find songs that match your current mood, that you didn't remember you had in your collection. Tags are really powerful to discover resources you didn't know existed - whether it's "pop" songs, "deforestation" datasets, or "social inequality" layers.
RW API tags are shared by all users, meaning any user can discover your resources if they search by the tags you assigned to them. Similarly, you can use tags to discover resources created and tagged by other users, so it adds discoverability value both ways.
However, in order to reach their full potential, tags need to be combined with another concept: vocabularies.
Vocabulary
Let's go back to your music collection. As you realize tags are a powerful tool to curate your collection, you start adding more tags to your songs. Soon you realize that every file now has many many tags, and it's starting to become complicated to make sense of each of the 20 tags you've now added to your songs - you're no longer sure if the "1996" refers to the year it was recorded, released, or remastered. You play songs tagged with "brainstorm" and start thinking about a way to solve this problem.
You realize the problem you're facing could be solved by grouping tags in a way that identifies what they refer to. So, for example, you could have tag groups called "year of release", "instruments played" or "genre". Each song could use as many or as few tags from each tag group, and tags with the same name but belonging to different groups would effectively mean and be different things. When associating a tag with a song, you would need to identify which tag group that tag belongs to, to make it work. Finally, you realize that "tag groups" is a poor name, and decide to call it something else - vocabulary.
As in our example above, RW API vocabularies are a way to group multiple tags in a way that makes it easier to organize them, while also giving them more meaning. When tagging a RW API resource, not only do you need to specify the tag values but also the vocabulary to which each tag belongs - giving context to your tags and making them more powerful and easier to understand. Like with tags, vocabularies are also shared by all RW API users, which benefits everyone in terms of discoverability.
Geostore
Allowing users to interact with data in the context of geographic data structures, such as the boundaries of countries or the location of power plants, is an important goal of the Resource Watch API (RW API). For example a user maybe looking to answer questions like "How much tree cover is there in my region?" or "How many power-plants are situated on the coast?".
Both these questions imply defining the (geographical) boundaries of the question. In the case of regions these are often expressed as bounding polygons, however as seen in the examples above, geographic structures may also represent lines (such as the coast-line) or point (locations of power-plants). All of which are efficiently represented using the Vector data model.
The Resource Watch API allows users to define custom vector data structures in GeoJSON format. It allows the storage of simple geographical features, and, being an extension of JSON, its format is familiar to API users. In the context of the RW API, these custom geographical structures are called geostores.
GeoJSON object
If you are not familiar with GeoJSON this article gives a quick overview, for the brave you can also check out the specifications. Being an Open Standard most GIS support export of GeoJSON, for example QGIS. If you are creating your own GeoJSON objects the JSON Schema docs describing the different GeoJSON object types can be useful.
Assuming you are now familiar with the basic structure of GeoJSON, next we will highlight some important considerations about how Geostore treats GeoJSON objects.
- GeoJSON objects are always stored and returned as a
FeatureCollection
; when creating a geostore using aFeature
orGeometry
it is always converted to aFeatureCollection
. - When creating a geostore from a
FeatureCollection
only the firstFeature
is stored, all other features are discarded with no warning. - Geostore retains all
Feature
properties. - All geometry types are accepted, except
GeometryCollection
. - GeoJSON only supports one geographic coordinate reference system [CRS], using the World Geodetic System 1984 (WGS 84) datum, with longitude and latitude units of decimal degrees. Other CRS are not supported. Geostore does not check for the validity of the coordinate CRS.
- During creation the GeoJSON geometry is checked for geometric validity and, if required, an attempt is made to repair the geometry (using ST_MakeValid).
What are geostores used for?
Geostores are a core part of many of the analysis and data provided by the API, and the primary way for defining geographical boundaries in the RW API.
Using geostores, you can, for example, create areas of interest and get notified about deforestation and fire alerts in geographical areas of your interest. Using the GFW website, you can even draw a custom shape of your interest. This shape gets translated into a geostore, which you can then use to subscribe to deforestation and fire alerts.
Additionally, you can query certain RW API datasets providing an ID of a geostore as a query parameter. This will restrict the returned data to the points that fit inside the geostore with ID provided. So you can directly query data specific to a geographic boundary identified by the geostore provided.
The Geostore API also aims at providing a curated list of shapes for ISO country, GADM admin region, or protected areas from WDPA. These endpoints are used by other services in the API as the source of truth for geographic representations.
Area of Interest
Areas of interest are custom polygons representative of geographical regions that API users may have a particular interest on. Users can define their own custom Areas of Interest or select from predefined areas supported by the API, and can subscribe to deforestation alerts (GLAD alerts) or fire alerts (VIIRS alerts) that happen inside those areas. Areas of interest can be managed using the endpoints provided by the Areas service, while the subscription logic of notifying API users is handled by the Subscriptions service. Your areas of interest can also be managed by accessing your MyGFW account on the Global Forest Watch website.
Email and webhook notifications
While creating an area, you have the option to subscribe to deforestation alerts (GLAD alerts), fire alerts (VIIRS fire alerts) and/or a monthly summary of both GLAD and VIIRS alerts in the area of interest defined. Each notification also has the option of being received by the API user as an email or a POST to a webhook URL.
- GLAD notifications are sent daily, starting after 10:00 AM (EST - Eastern Time).
- VIIRS notifications are sent daily, starting after 7:00 AM (EST - Eastern Time).
- Monthly summary notifications are sent monthly, starting after 11:00 AM (EST - Eastern Time) of the first day of every month.
Different ways of defining areas of interest
Areas of interest can be defined in one of four ways. Keep in mind that all of these different methods of creating areas can also be used while logged in with your account on the Global Forest Watch website:
- by referencing a country, one of its regions, or a subregion within a region - countries are identified by their ISO 3166-1 alpha-3 code, and regions and subregions are identified by their respective GADM id, which can be obtained from the GADM dataset.
- by referencing a specific protected area by the id of that area in the World Database on Protected Areas (WDPA).
- by referencing the ID and type of a land use area provided by different datasets - currently, the following land use datasets are supported:
- mining for mining areas.
- logging for Congo Basin logging roads.
- oilpalm for palm oil plantations.
- fiber for wood fiber plantations.
- by creating a specific geostore using the geostore endpoint, and using its ID.
Read more on how to create areas using the different methods in the Areas v2 endpoints section.
Graph
As you have read in the Dataset concept section, one of the main goals of the RW API is to provide a common interface for interacting with data provided by a variety of services. However, given the number of datasets currently hosted in the RW API (2506 at the time of writing), it can be hard to find exactly what dataset you are looking for. It can even be overwhelming to navigate through the datasets list, due to the number of datasets available and the wide range of topics each dataset relates to.
The RW API provides you different ways to search for the information you are looking for: from searching datasets by keyword and exploring dataset metadata, to categorization through the use of vocabulary and tags. However, none of these options is optimal when it comes to finding similar datasets, related to the ones you find relevant. This is where the Graph service comes to the rescue.
Before jumping to the details of the RW API Graph service, you should be familiar with what a graph is. There are many resources available online for this purpose (e.g. the Wikipedia entry on graph), but the main concept you must keep in mind is that a graph is a data structure consisting of a set of nodes and a set of edges. A node represents an abstract entity, and an edge is a connection between two nodes, representative of a relationship between those two nodes.
In the context of the RW API's Graph service, nodes represent one of the following:
Edges define relationships of different types between the different types of graph nodes. Relationships can be detailed and specific (for instance, defining a favorite relationship between a dataset and user), but they can also be more generic, establishing a connection between a resource and a concept.
Using the Graph service, you will be able to explore RW API's datasets from a concept perspective. You will be able to:
- find datasets related to broad concepts such as
solar_energy
orwater_stress
; - find datasets related to specific properties of the dataset's data like
vector
orraster
; - find datasets related to a given dataset;
- infer concepts from a given list of concepts;
The RW API Graph service enables you to build powerful applications, where you can easily and more humanly navigate through your datasets. It gives you the possibility of focusing on topics and concepts each dataset is related to, building UIs geared towards navigation by similarity as opposed to simple datasets list. A great example of the usage of the Graph service is Resource Watch's Explore page.
Head to the graph endpoint documentation for more details on how you can leverage concepts and relationships to enhance your datasets.