Metadata-Version: 2.4
Name: disqover-api
Version: 1.19.0
Summary: DISQOVER API client
Project-URL: Homepage, https://www.ontoforce.com
Author-email: ONTOFORCE <backend@ontoforce.com>
License: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: <3.15,>=3.10
Requires-Dist: python-dateutil~=2.8
Requires-Dist: pyyaml~=6.0
Requires-Dist: requests~=2.27
Requires-Dist: typing-extensions>=4.7.1
Requires-Dist: urllib3~=1.26
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Description-Content-Type: text/markdown

# Python API library for DISQOVER

> Version 1.19.0<br />
> Compatible with DISQOVER 7.15 - 7.20

## Introduction

This library provides a convenient wrapper around the API offered by the DISQOVER platform. It strives to provide a
comprehensive, easy-to-use Python interface that enables advanced automation workflows.

The library provides abstractions of all user-facing concepts DISQOVER in a uniform manner, and hides all technical
details of the REST API that DISQOVER offers. This library remains backwards compatible with older version of DISQOVER,
which allows for a decoupling between your scripting and changes that might occur on the DISQOVER API side. Therefore,
we can always recommend updating this library to the latest available version, so that you obtain all features and
bug fixes.

## Installation

You can use PIP to install this library:

```bash
pip install disqover-api --extra-index-url=https://pypi.ontoforce.com
```


## Tutorial: the basics

> In this tutorial, you will learn how to provide your credentials, start a DISQOVER session, and get some basic
> info from the DISQOVER platform.

Open an interactive Python session. Start by importing the necessary classes:

```python
from disqover import InlineCredentialsProvider, DisqoverSession
```

The first class is used to store your username and password, as well as the URL of the host on which DISQOVER is
running. You simply provide these arguments to the credentials provider's constructor:

```python
cred = InlineCredentialsProvider('http://localhost:80',
                                 'your.name@company.com',
                                 'your-pass')
```

Great! Now we will actually connect to DISQOVER and login with your account. To this effect, we create a *session*,
which is as simple as:

```python
session = DisqoverSession(cred)
```

It is advised to use this session as a *context manager* ([see official Python documentation](https://docs.python.org/3/reference/compound_stmts.html#with)),
which means that you put all code using the DISOQOVER session within a `with` statement:

```python
with DisqoverSession(cred) as session:
    ... your code here ...
```

This way, the session is started and ended behind the scenes in a clean way. When the session is started, your user
will be used to login to DISQOVER (and accept the license agreement if it is your first login). Note that you don't need
to use a context manager, but if you don't, you need to **explicitly start** the session (and stop it in a similar way):

```
session.start()
```

As a sanity check, we will use the simple `ApplicationInfoAPI` to tell us something more about the DISQOVER installation.
Import this API like this:

```
from disqover import ApplicationInfoAPI
```

Try to get the DISQOVER version in the following way:

```python
api = ApplicationInfoAPI(session)
info = api.get_info()
print(info.version)
```

You can also do a quick sanity check and verify that the various DISQOVER services are running well:

```python
api = ApplicationInfoAPI(session)
for service in api.sanity_check():
    print(service.name, service.status)
```

You should see something like this:

```
Indexed data OK
User database OK
Data Ingestion Engine OK
Caching OK
Job Queue OK
License agreement OK
Authorization Service OK
Remote Data Subscription service OK
```

If the systems are OK, you can move on to the rest of this tutorial. If not, contact your system administrator.

## Tutorial: data queries

### Inspecting the configuration

> In this tutorial, you will learn how to access the Data Query API and use it to obtain the DISQOVER configuration.
> You need an active `DisqoverSession` to start with this tutorial.

Import the `DataQueryAPI` class from the library and create an object from it, providing it with the DISQOVER session
as an argument.

```python
from disqover import DataQueryAPI

api = DataQueryAPI(session)
```

You can get a lot of information about the DISQOVER data configuration ("Which types are present and which properties do
they have?") and data ontology ("Which sources provide data and how are they linked?") right from the Data Query API:

```python
configuration = api.configuration
ontology = api.hierarchy
```

The first object is a `DataConfiguration` object. You can get a list of canonical types (`.canonical_types`),
type links (`.type_links`) and sub-types (`.sub_types`) or you can look for specific types if you want:

```python
for ct in configuration.canonical_types:
    print(ct.uri)

my_canonical_type = configuration.get_canonical_type('my_canonical_type_uri')
my_type_link = configuration.get_type_link('my_type_link_uri')
my_sub_type = configuration.get_sub_type('my_sub_type_uri')
```

You can easily obtain the label and description of the Canonical Type by using the `label` and `description` attributes.
A Canonical Type also contains information about:
 - **properties**: the content of instances that is displayed in the instance details view, or in the instance table
 - **facets**: contain data that can be used to show distributions and for filtering the results list
 - **attributes**: a unified concept for literal data (no links) that has both property and facet characteristics
 - **directed relations**: a unified concept for links that has both property and facet characteristics

You can obtain a list of each of these concepts by using the `.properties`, `.facets`, `.attributes` and
`.directed_relations` attributes of the Canonical Type object. You can also request one specific concept by using
`.get_property`, `.get_facet`, `.get_attribute` or `.get_directed_relation`.

All concepts have a URI, a label and a description. Properties and facets also have a `data_format` attribute,
which gives an indication of the type of data that is stored within them (e.g. `literal|content|text` or `literal|date|iso`).
If a property and a facet contain the same data, you can also jump from one concept to the other by using the
`equivalent_property` or `equivalent_facet` attributes. When there is no equivalency, the value will be `None`.

When a property or facet is derived from an attribute or a directed relation, you can also use the `equivalent_attribute`
and `equivalent_directed_relation` attributes to jump to the corresponding objects. When a property or facet was
defined explicitly, the value of these attributes will be `None`.

You can also inspect the attributes of a Canonical Type. Be aware that your DISQOVER installation must have been setup
to use unified attributes with the proper feature flag and the appropriate configuration in the Data Ingestion pipeline.

First of all, an attribute has a `data_type`. This data type is an enum object that describes the type of content of the
attribute, including categorical strings, text, integer numbers, dates, etc. Depending on the data type, an attribute
can have different capabilities. The `capabilities` attribute of an `Attribute` object is a set of enum values, and the complete list
of possible values is RETRIEVABLE, FILTERABLE, FACETABLE, SEARCHABLE, and SORTABLE. The `multi_valued` attribute of the
object indicates whether the attribute can contain multiple values per instance.

Also try exploring the Type Links and Sub Types in the configuration, if present.

The Data Ontology contains the data sources that contribute to the DISQOVER data. Use `ontology.data_sources` to explore
them or `ontology.get_data_source` to get a specific Data Source by URI.

A Data Source has some interesting attributes. Beside the label and description, you can also get the number of
instances that have a contribution from it (`.instance_count`) and also the number of instances per Canonical Type
(`.instance_counts_per_canonical_type`) as a dictionary. The contributions of a data source to certain properties,
facets or links can be obtained with `.contributions`.

### Retrieving instances

Now we will construct a query and inspect the result from the DISQOVER API. Create a simple query that looks for
publications matching a certain text like this:

```python
from disqover import MatchText, BelongsToCanonicalType

publication_uri = 'http://ns.ontoforce.com/ontologies/integration_ontology#Publication'

query = MatchText('behaviour therapy') & BelongsToCanonicalType(publication_uri)
```

The `query` object just represents the search path being taken; no data has been requested yet. In the example above,
the URI of the Canonical Type has been used. You can find this URI by exploring the 'DATA' page on DISQOVER,
as well as all facet and property URIs. In this library, each Canonical Type, Property, Facet, etc. is represented by
an object. The `BelongsToCanonicalType` also works when given a Canonical Type object, such as in the example below:

```python
def get_ct_by_label(label):
    for ct in api.configuration.canonical_types:
       if ct.label == label:
           return ct
    return None

publication_ct = get_ct_by_label('Publication')
query = MatchText('behaviour therapy') & BelongsToCanonicalType(publication_ct)
```

To obtain all instances that fulfill our criteria, you would use the `get_instances` method of the API:

```python
api.get_instances(query)
```

> Strings used to create a MatchText filter are interpreted just as text entered from the DISQOVER UI is.
This means you can also use wildcards (*, ?), you can use the logical operators AND and OR, and parentheses.
The contents of the string are parsed and interpreted by Solr. Solr will split on words, so that they don't have to appear in the exact order they are entered in order to match to results
If you do want to keep certain words together, you have to use the double quotes for this.
>
> The easiest way to do this is Python is by defining your string with single quotes:
> ```python
> text = '"behaviour therapy"'
> ```
> alternatively, you can define a string with double quotes and escape the quotes for Solr:
> ```python
> text = "\"behaviour therapy\""
> ```


The result of this method call can be iterated over, as:

```python
for instance in api.get_instances(query):
   ... process instance ...
```

During the loop, DISQOVER will be queried repeatedly in the background. Subsequent pages of 1000 instances will be
requested, until all resulting instances for the query have been retrieved. If for any reason you want to change
the size of the pages it uses internally, you can so by using the `get_iterator` method on the `get_instances` result
(although the result will be exactly the same):

```python
for instance in api.get_instances(query).get_iterator(page_size=500):
   ... process instance ...
```

When you want to know the number of instances that match your query upfront, you can easily request it as follows:

```python
count = api.get_instances(query).get_count()
```

You can request the first page of instances like this:

```python
page = api.get_instances(query).get_first_page()
```

Here, as well, the page size can be adjusted with the `page_size=` parameter. The default page size is 100.

Let's look at an individual instance (of class `Instance`). It has a URI (`.uri`), a label (`.label`) and synonyms (`.synonyms`).
You can also get the alternative URIs (`.alternative_uris`) and the URIs of the data sources that contribute to the instance
(`.data_source_uris`).

By default, the instance objects will not hold any more information than the attributes mentioned above. If you want to
include property and/or facet values for each instance, you can do so by specifying the `properties` and `facets` parameters
of the `get_instances` method:

```python
for instance in api.get_instances(query, properties=[a, b], facets=[c, d]):
   ...
```

The properties/facets can be either URIs or the actual `Property` or `Facet` objects (from the configuration). You
can request properties and facets independently of each other.

The property and facet values for each instance are respectively stored in the `properties` and `facets` attributes on
the `Instance` objects, as dictionaries.

The keys of `instance.properties` are property URIs, and for each key you will find a list of `PropertyValue` objects.
Each such object has a label and a value. If you want to visualize the properties of an instance, you could write:

```
for prop_uri, values in instance.properties.items():
    print(prop_uri, ':')
    for value in values:
        print(' - ' + value.label)
```

And you will see something like this:

```
http://ns.ontoforce.com/2013/my_type/prop/prop_a:
 - Value 1
 - Value 2
http://ns.ontoforce.com/2013/my_type/prop/prop_b:
 - Value A
 - Value B
```

Besides a `label`, each property value also has a `value` (for literal properties, this is equal to the label,
but for link properties, the value contains the URI of the linked instances), and also a single `data_source_uri`.
This means that, if a property value for an instance is provided by multiple data sources, you will find the same
value multiple times in your result but each time with a different `data_source_uri`.

The facets dictionary (`instance.facets`) on each instance has a similar structure. The keys are the facet URIs, and
the values are a list of `InstanceFacetValue` objects for each facet. The attributes of a particular facet value
depend on the type of facet:

   * The values of categorical facets contain a `label` and a `query_value`
attribute.
   * The values of numerical facets have a single attribute, `value`.
   * Coordinate facets give values with a `longitude` and `latitude` attribute.
   * Other facet types (hierarchical, location trees) have values just like the categorical facets.


You can form much more advanced queries, for example the following query that will help you find persons that are
linked to PubMed publications:

```
BelongsToCanonicalType("http://ns.ontoforce.com/ontologies/integration_ontology#Person") &
LinksTo(
     FacetEqual("dataset", "http://ns.ontoforce.com/datasets/pubmed") &
     BelongsToCanonicalType("http://ns.ontoforce.com/ontologies/integration_ontology#Publication")
)
```

Below is a table listing all possible query elements that can be used. Each query element is created as an instance
of a specific class. Both the standard and alternative class names can be imported as `from disqover import x`.

[//]: # (begin-landscape)

| Class name               | Alternative names       | Description                                                                                                       | Example usage                                                      | Negatable              |
| ------------------------ | ----------------------- | ----------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ | ---------------------: |
| `FacetEqual`             |                         | filter on instances for which a facet has a specific value                                                        | `FacetEqual("color_facet_uri", "blue")`                            |                      Y |
| `BelongsToCanonicalType` |                         | filter on instances belonging to a specific Canonical Type                                                        | `BelongsToCanonicalType("city_ct_uri")`                            |                      Y |
| `LinksToCanonicalType`   | `LinkedToCanonicalType` | filter on instances that have any link to a specific Canonical Type                                               | `LinksToCanonicalType("disease_ct_uri")`                           |                      Y |
| `FromDataSource`         |                         | filter on instances that have a contribution from a given data source                                             | `FromDataSource("my_ds_uri")`                                      |                      Y |
| `FromLocalIngestion`     |                         | filter on instances that do not originate from an RDS package, but from local source data                         | `FromLocalIngestion()`                                             |                      Y |
| `FromRDSServer`          |                         | filter on instances that originate from an RDS subscription to a specific server                                  | `FromRDSServer("subscription_name")`                               |                      Y |
| `InTree`                 |                         | filter on instances for which the value(s) of a hierarchical facet fall under a certain parent node               | `InTree("tree_facet_uri", "parent_uri")`                           |                      Y |
| `FacetMissing`           |                         | filter on instances that have no value for a given facet                                                          | `FacetMissing("classification_facet_uri")`                         | N (see `FacetPresent`) |
| `FacetPresent`           |                         | filter on instances that have a value for a given facet                                                           | `FacetPresent("title_facet_uri")`                                  | N (see `FacetMissing`) |
| `InstanceURIs`           |                         | search for specific instances by their URI                                                                        | `InstanceURIs("instance_uri1", "instance_uri2")`                   |                      Y |
| `InCollection`           |                         | filter on instances that are part of a specific instance collection                                               | `InCollection("collection_name")`                                  |                      N |
| `MatchText`              |                         | text search, as in the search bar on DISQOVER. Searches in label fields or all properties, depending on parameters | `MatchText("malaria", only_label=True, only_semantic=True)`        |                      Y |
| `MatchPropertyText`      |                         | text search within a specific property, or multiple properties at once                                            | `MatchPropertyText("covid", ["keyword_prop_uri", "title_prop_uri"])` |                      Y |
| `InRange`                |                         | filter on instances for which a (numerical / date) facet has values within a specified range                      | `InRange("num_facet_uri", 5, 10, exclude_end=True)`                |                      Y |
| `InProximity`            |                         | filter on instances for which a coordinate facet has values within a certain distance around a central coordinate | `InProximity("coord_facet_uri", (20.1, -30.6), 125.4)`             |                      Y |
| `AtLocation`             | `LocationEqual`         | filter on instances that correspond to a certain location in a location facet                                     | `AtLocation("loc_facet_uri", "Paris", 1)`                          |                      Y |
| `LinksTo`                | `LinkedTo`              | search for instances that are linked to the result of a given query                                               | `LinksTo(query)`  (*)                                              |                      Y |
| `RelatesTo`              | `RelatedTo`             | search for instances that are related through some Relation to the result of a given query                    | `RelatesTo(query, "relation_uri", RelatesTo.Direction.DIRECT)` (*) |                      Y |
| `PluginSearch`           |                         | use an instance search plugin                                                                                     | `PluginSearch("identifier_search", {"search_field": "CA209-551"})` |                      N |

(*) `query` is another query, for example `BelongsToCanonicalType("disease_ct_uri") & MatchText("covid")`.

[//]: # (end-landscape)

Multiple query elements can be combined using the logic operators `&` (**AND**) or `|` (**OR**). Arbitrary combinations
of multiple **AND** or **OR** statements can be constructed with the proper use of parentheses.

Most query elements can also be negated (see the last column of the table above), meaning that you can actually filter
for the instances that do **not** match with the specified filter parameters. To negate a query element,
simple add `negate=True` when you construct it. Alternatively, you can also use the `~` operator, such as:

```
color_is_not_blue = ~FacetEqual("color_facet_uri", "blue"")
```

Besides requesting instances, you can also obtain sub-instances for an instance (`api.query_subinstances`),
get the number of links to all canonical types from a certain query (`api.query_links`), or obtain all synonyms
for a given search term (`api.query_synonyms`). You can also obtain facet values, which we briefly explain in the topic
below.

The result of a synonyms query is a list of `Concept`'s, which could be thought of as minimal representations of
instances in DISQOVER. Each concept has `.uris`, a `.label`, a list of `.canonical_type_uris`, and a dictionary
of `.synonyms` which maps synonym strings to `Synonym` objects that contain a `score`, among other things.
The argument to `query_synonyms` is simply a text search, for example:

```
concepts = api.query_synonyms("diabetes")
```

As mentioned, the `query_links` method can *only* be used if you are just interested in the *number* of links to other
types that a certain instance or set of instances has. You can also specify a Type Link, in which case you can
get the count per relation (e.g. for your list of person URIs, and type link 'PublicationToPerson',
'is first author' provides 100 linked publications, and 'is last author' provides 50). Ignoring type links,
you get the number of links to other canonical types like this:

```python
for ct_uri, count in api.query_links(instance_uri).counts.items():
   print(f"there are {count} links to {ct_uri}")
```

To get the relation type counts:

```python
for relation_uri, count in api.query_links(instance_uri, link_type_uri=type_link_uri, destination_ct_uri=target_ct_uri).relations.items():
   print(f"there are {count} relations of the kind {relation_uri} to {target_ct_uri}")
```

As you see, the `LinksResult` object that is returned from the `api.query_links` method contains two dictionaries,
`counts` (with Canonical Type URIs as keys) and `relations` (with relation URIs as keys), and which of these contains
the desired information depends on the arguments to the function.

### Advanced instance querying

#### Using server-side cursor for querying

Note: the following functionality is deprecated for API v1.0 (Ruby query engine) and fully removed for API v1.1
(Python Data Query Service).

Instances can also be retrieved using a 'server-side cursor', which prevents sending the same query to the DISQOVER server
when retrieving multiple pages. This can result in a performance improvement and less processing on the client-side, but does
come with limitations (see below).

To enable a 'server-side cursor', use the following code

```python
instances = api.get_instances(query, server_side_cursor=True, data_location='remote|local')
```

The `data_location` is required and must be set to 'local' to only retrieve instances from the local DISQOVER deployment
or 'remote' to retrieve data from the DISQOVER federation server. Executing queries with `server_side_cursor=True` that
query both the local and the remote federation server is not supported and hence must be done in separate requests.
Note that even with separate requests for local/remote, the performance will be higher.

The `query` format remains the same as described above and the returned `instances` can also be processed in the same way
(e.g. you can loop over them, get a result page, ...).

### Getting facet values

From any query that you have built, you can use `api.query_facet` to obtain the different possible facet values and their counts.
To do this, you will need to create a `FacetParameters` object. You first need to determine/decide whether you will
be looking for numerical facet values, categorical facet values ('facets that can be represented as a list of
possible values'), or facet values organized in a hierarchical structure (a 'tree').
For either one of these options, you will need a different `FacetParameters` object:

```python
from disqover import NumericalFacetParameters, CategoricalFacetParameters, HierarchicalFacetParameters
```

Use the `NumericalFacetParameters` class as follows:

```python
parameters = NumericalFacetParameters('numerical_facet_uri', step_size=10, interval=(13.5,225))
```

or

```python
parameters = NumericalFacetParameters('numerical_facet_uri', bin_count=15, interval=(13.5,225))
```

The `CategoricalFacetParameters` takes the following parameters:

 * `uri`: the URI of a facet
 * `limit`: the number of facet values in the slice
 * `offset`: the offset for the slice
 * `order_by_label`: `True` or `False`
 * `sort_desc`: `True` or `False`

The `HierarchicalFacetParameters` class takes all of the above parameters, but additionally accepts a `level` and `parent_uri`.
Note that you can use a `CategoricalFacetParameters` object even if the underlying facet is actually organized in a
hierarchical way. In other words, for hierarchical facets, you can choose to request the values in the hierarchy with
actual instances attached (the 'leaves') using `CategoricalFacetParameters`, or to get the values organized as a nested
structure using `HierarchicalFacetParameters`. For categorical facets, you always have to create a
`CategoricalFacetParameters` object, and likewise for numerical facets you will need a `NumericalFacetParameters` object.

With your facet parameters object and a query, you can use the following method of the Data Query API to request
your result:

```python
result = api.query_facet(parameters, query)
```

A facet result contains the following properties:

 * `uri`: URI of the facet
 * `count`: number of instances that have a value for this facet
 * `missing_count`: number of instances that do not have a value for this facet
 * `distinct_count`: number of facet values
 * `values`: the values

Each facet value has a `value`, a `label`, and a `count`. Hierarchical facet values also have a `direct_count`, an
`is_leaf` flag, and a set of `ancestor_uris`. Numerical facet values represent data bins, so they have a `lower` and
`upper` value. A numerical facet results also has, apart from the properties mentioned above, a `stats` attribute.
It contains the `start` and `stop` value of the result, the overal `data_min` and `data_max` values and a total `count`.

Besides querying a single facet, you can also query multiple facets at once: we call this a *nested* facet query.
To this effect, use the `query_nested_facets` method, now specifying a list of `FacetParameters` objects. So:

```python
result = api.query_nested_facets([params1, params2, params3], query)
```

You obtain a `NestedFacetsResult` object, which has the following attributes:

   * `count`: the number of instances matching the query
   * `facet_count`: the number of requested facets
   * `values`: a `NestedFacetValues` object

The `NestedFacetValues` object has:

   * `facet`: the `Facet` object for the first specified facet (`params1`), as from the configuration
   * `values`: the corresponding values, as a list of `NestedFacetValue` objects

Each value in this top-level list of facet values contains:

   * `value`: a `CategoricalFacetValue`, `HierarchicalFacetValue`, or `HistogramBin` (depending on the facet type / parameters).
     See the paragraph about `query_facet` (above) for details.
   * `count`: the number of instances matching this facet value
   * `nested_values`: the deeper-level `NestedFacetValues` object, corresponding to `params2`.

In turn, the nested `NestedFacetValues` object has a `facet` object (Facet for `params2`) and `values`. The nesting
goes on until the deepest requested level.

## Tutorial: decoding a query from a URL

> If you are familiar with the DISQOVER UI, you have probably noticed that the URL of the search page changes as you filter down, and that you can share this URL with others to share your query. This tutorial explains how you can use the API to decode the URL of the search page and work with the corresponding query, so that you can share your queries with others, or save them for later use.

### Parsing a URL

First, to get some insight into how the URL encoding works, we will parse a URL from the DISQOVER search page. You can copy the URL of your search page and paste it in the code below:

```python
from disqover.data_query.queries import parse_query_from_url
url = 'https://my-disqover-instance.com/search?query=your_query_in_url_format'
query = parse_query_from_url(url)
# Note: _serialize() is an internal method that returns the JSON representation used in the URL.
json_query = query._serialize()
print(f"Query from URL: {json_query}\n")
```

### Get instances from a URL
If you are only interested in retrieving instances from a URL, you can directly use the `get_instances_from_disqover_url` method of the `DataQueryAPI`:

```python
from disqover import DataQueryAPI
api = DataQueryAPI(session)
instances = api.get_instances_from_disqover_url(url)
```

## Tutorial: data ingestion

> If you are a data scientist, and you are familiar with Data Ingestion in DISQOVER, you can also use the API to create,
manage and run pipelines.

### Inspecting and running pipelines

To start, create an instance of the Data Ingestion API class:

```python
from disqover import DataIngestionAPI

api = DataIngestionAPI(session)
```

If you have the GUID of a specific pipeline, you can obtain it simply like this:

```
pipeline = api.pipelines['guid']
```

Otherwise, if you want to process multiple pipelines or are interested in viewing their properties,
you can iterate over the pipelines:

```
for pipeline in api.pipelines:
   ... process pipeline ...
```

Like querying instances, the iteration will request pages of pipelines in the background, and the size of the pages can
be adjusted by using the `get_iterator` method. The pipelines iterator method also accepts a `sort_by` and a
`direction` parameter. Use them like this:

```python
for pipeline in api.pipelines.get_iterator(page_size=50, sort_by='created_by', direction='desc'):
   ... process pipeline ...
```

The different sorting options are:

   * `name`: alphabetically on pipeline name
   * `created_by`: alphabetically by pipeline creator (email)
   * `last_run`: chronologically on last run date
   * `run_by`: alphabetically by the user that triggered the last pipeline run

The options for `direction` are `asc` (ascending order, the default) or `desc` (descending order).

Each pipeline object has the following properties:

 * `guid`: the unique identifier of the pipeline
 * `name`: the name given by the pipeline builder
 * `description`: a description given by the pipeline builder
 * `updated_at`: the `datetime` object representing the last update time
 * `created_by`: a `User` object, the person who created the pipeline
 * `runs`: a `PipelineRuns` object, which is iterable (see below)

You can easily give the pipeline a new name or description, as below. The values are immediately updated
(there is no need for a `save`).

```python
pipeline.name = 'Pipeline New Name'
pipeline.description = 'This describes my pipeline'
```

By iterating over the `runs`, you can inspect the different executions of the pipeline over time. Just as iterating
over queried instances, pipelines, ... you can do:

```python
for run in pipeline.runs:
   ... inspect run ...
```

The same principles of iteration also apply (using `get_iterator`). You can also obtain a specific run
by its unique ID (the GUID):

```python
pipeline_run = pipeline.runs['guid']
```

In many cases, you probably want to obtain the last run. Therefore, we offer the `get_last` method:

```python
all_runs = pipeline.runs.get_all()
some_runs = pipeline.runs.get_slice()
last_run = pipeline.runs.get_last()
```

A run (`PipelineRun` object) has the following properties:

 * `status`: current status of the pipeline run (makes most sense to check for the last run)
 * `current_step`: current step (same remark)
 * `start`: start time, a `datetime` object
 * `stop`: end time (if finished), a `datetime` object
 * `run_options`: all run parameters, as `RunOptions` object
 * `incremental_info`: information about updated components and/or data sources, and whether the run is incremental (a `IncrementalInfo` object)

The status can have one of the following values (`from disqover import PipelineStatus`):

 * `PipelineStatus.NOT_RUN`: This pipeline has not been run
 * `PipelineStatus.RUNNING`: This pipeline is still running
 * `PipelineStatus.FINISHED`: The run is finished
 * `PipelineStatus.FAILED`: The run has failed

You can check the number of components that have been executed in the run with `.executed_component_count`, get the
status of individual components with the `.get_guid_to_component_status()` method, and request the run logs with
`.get_logs()`.

On the pipeline object, you can also request the currently running component at any time, like this:

```python
component = pipeline.get_executing_component()
```

Pipeline verification can be performed with `.verify_components()`, which is also possible to perform on individual
components (`component.verify()`). Classes and predicates can be inspected with `pipeline.get_classes_and_predicates()`
and the data sources used with `get_data_sources`.

It is very straightforward to start a new pipeline run, simply do:

```python
job = pipeline.start_run(run_options)
```

The run options must be specified by creating a `RunOptions` object:

```python
from disqover import RunOptions

run_options = RunOptions(option_a=...,
                         option_b=...)
```

We will not list all possible options here, but instead let the class signature lead you. There are a few shortcuts
for common use cases:

```python
run_options = RunOptions.full()
run_options = RunOptions.outdated()
run_options = RunOptions.differential()
run_options = RunOptions.incremental()
```

You are probably familiar with these *Run modes* from the Data Ingestion section of the DISQOVER platform, where these
different modes are described in more detail.

When you launch a pipeline run, you get a `RunPipelineJob` object, which you can use to keep track of the run status.
A convenient feature is that you can block the execution of your script until the pipeline run has finished:

```python
job.wait(interval=10)
```

The interval (the time span between subsequent checks to the Data Ingestion Engine for the run status) is specified in
seconds, and the default value is 50.

You can of course also stop a run, simply with `job.stop()`

### Importing and exporting pipelines

To create a new pipeline from an exported YAML file, simply use:

```
pipeline = api.pipelines.create(file_path=pipeline_path)
```

The pipeline path must be specified as a `Path` (`from pathlib import Path`) object. You can give the pipeline a name
by passing the `name` argument, and likewise for a description. If you don't specify a file path, you create an empty pipeline.

An existing pipeline can be easily exported:

```
pipeline.export_to_file(file_path)
```

Optionally, you can pass the GUIDs of components to export (the default behaviour is to export **all** components).

You can also import components from a YAML into an existing pipeline. To this effect, use:

```python
pipeline.import_from_file(file_path)
```

Here also can you specify specific component GUIDs.


### Inspecting a pipeline

There are different attributes and methods on a `Pipeline` object that allow you to get more insight into what a pipeline
is composed of. To obtain an overview of the different Classes in the pipeline and their corresponding predicates, you
can use the `pipeline.get_classes_and_predicates()` method. It returns a dictionary that maps each Class name to a
`ClassInfo` object that contains attributes such as `resource_count`, `created_by_component`, and `predicates`.
The latter is a list of `Predicate` objects that have a `name`, `type_identifier` and a `contributing_components` attribute.
To get a list of the data sources contributing to the pipeline data, you can use the `pipeline.get_data_sources()` method,
returning a list of `PipelineDataSource` objects, containing information such as URI, label, description,
modification date, and others.

To inspect the components, a `Pipeline` object has `component_count`, `component_guids` and a `components` attributes.
There is also a `get_components_of_type` method that allows you to obtain all `Component` objects for a specific
component type. For example:

```python
for component in pipeline.get_components_of_type(component_type):
    print(f"Found component '{component.name}' with GUID {component.guid}")
```

The different component types can be found through the `component_types` attribute of the `DataIngestionAPI`:

```python
for component_type in api.component_types:
    print(f"Component type '{component_type.name}' with identifier {component_type.identifier}: {component_type.description}")
```

There is a method that gives you the type of importer component that corresponds to a certain file type:

```python
component_type = api.component_types.get_importer(FileType.XML)
```

To obtain information about the segments in a pipeline, use the `segment_count`, `segment_guids` or `segments` attributes.
An individual component or segment in a pipeline can be retrieved as follows:

```python
my_component = pipeline.get_component('guid')
my_segment = pipeline.get_segment('guid')
```

A pipeline component can have one or multiple predecessor components. Using the `component.predecessors` attribute, you
get a tuple of other `Component` objects.

To inspect or adapt the URI templates used for a pipeline, you can use the `get_uri_templates` method and obtain a
modifiable `UriTemplates` object:

```python
templates = pipeline.get_uri_templates()
print(templates.namespace)
print(templates.canonical_type)
print(templates.facet)
...
```

### Modifying a pipeline

It is possible to adapt or even build a complete pipeline programmatically using the classes and methods offered by
this library. A component can be easily removed from a pipeline as follows (and equivalently for segments):

```python
pipeline.remove_component(component_guid)  # or a component object
```

The options of a particular component can be obtained as a list of tuples, where each tuple consists of an option ID and
the option's value:

```python
component = pipeline.get_component(component_guid)
for option_id, option_value in component.get_option_values():
    print(f'{option_id}: {option_value}')
```

Alternatively, you can get an individual option as follows: `value = component.get_option_value(option_id)`.
To change an option value, use `component.set_option_value(option_id, new_value)` or `component.set_option_values({id1: value1, id2: value2})`
if you want to change multiple settings at once.

As in the DISQOVER user interface, in the case of an importer component you can have (most of) the options be filled in
as the result of a scan of the source files. The `DataIngestionAPI` has a `source_data` attribute that allows you
to inspect the contents of the source data directories. You can for example iterate over the `api.source_data.contents`,
`api.source_data.directories` or `api.source_data.files`, or get a particular item by using the `[]` operator:

```python
my_file = api.source_data.directories['my_data_source'].directories['new_data'].files['a.csv']
```

A `SourceFile` object provides a `scan` method that suggests importer options based on the file's content:

```python
result = my_file.scan(FileType.CSV, predicate_prefix='my_type:', max_seconds=20)  # default is 10 seconds
```

The result is a `ScanResult` object, that contains a `import_options` attribute with the suggested option values.
In the case of CSV (`FileType.CSV`), Identifier Block (`FileType.CHUNK_ID`) and Separator Block (`FileType.CHUNK_SEP`)
files, the scan result can be applied directly to set the options of an import component:

```python
importer.set_options_from_scan_result(result)
```

In the case of JSON (`FileType.JSON`), XML (`FileType.XML`), RDF (`FileType.RDF`) or Excel (`FileType.EXCEL`) files, the
the scan result can contain different proposals for option values from which you have to choose. In this case, the result
object is an instance of the `UnresolvedScanResult` class. By checking the `is_resolved` attribute, you can make sure
that you convert the result into a usable `ScanResult` object. The follow code snippet, for example, prompts you to choose
one of the proposed option values:

```
if not scan_result.is_resolved:  # or equivalently isinstance(scan_result, UnresolvedScanResult)
   choice_combinations = tuple(scan_result.possible_proposal_choices)
   print("Possible proposal choices:")
   for index, choice_combo in enumerate(choice_combinations):
      print(f"[{index}] {choice_combo}")
   chosen_combo_index = int(input("Make a choice (enter a number): ")
   chosen_combo = choice_combinations[chosen_combo_index]
   scan_result = scan_result.resolve(chosen_combo)

# Now it's possible to use the scan result, like:
importer.set_options_from_scan_result(scan_result)
```

Besides adapting the options of existing components, you can also add entirely new components to a pipeline. To do this,
you use the `add_component` method of a pipeline as follows:

```python
new_component = pipeline.add_component(component_type, name,
                                       options={id1: value1, id2: value2},
                                       predecessors=[comp_a, comp_b])
```

The predecessors of a component can also be set (or modified) later, by using the following methods:

```python
component.add_predecessor(comp_a)
component.remove_predecessor(comp_b)
```

A segment is added by using `new_segment = pipeline.add_segment(name, description)`. Components are added to or removed from
a segment by using `segment.add_component(component)` or `segment.remove_component(component)` respectively.


## Tutorial: user management

> In this tutorial, you will learn how to manage users and security.

### Creating users

You can create multiple users in one go by supplying a list of email addresses, along with a license agreement to use.
To view the list of licenses, use:

```python
from disqover import LicenseManagementAPI

api = LicenseManagementAPI(session)
for license in api.license_agreements:
    print(license.identifier, license.name)
```

You can also obtain the license agreement for your own user:

```python
my_license = api.get_license_for_current_user()
```

To create one or multiple user with a specific license, use:

```python
from disqover import UserManagementAPI

api = UserManagementAPI(session)

emails = ['user1@test.org', 'user2@test.org']
result = api.users.create(license, emails)
```

The result will be a list of `(user, password)` tuples (a `User` object and a string), allowing you to take note of
the generated passwords to provide to your end users. Alternatively, you can setup user accounts without generating
passwords, where the users get an activation email. The usage is identical:

```python
users = api.users.create_with_activation(license, emails)
```

The only difference is that the result is a list of `User` objects, instead of a list of tuples.

### Retrieving users

The `User` object for the current user (you) can be retrieved with:

```python
user = Users(session).current_user()
```

To iterate over the users in DISQOVER, use (the filtering and sorting is of course optional):

```python
for user in api.users.get_iterator(sort_by='email', filter={'include_disabled': True}):
   ... process user ...
```

The same iteration principles apply as presented before. The possible values for sorting are:

  * 'email'
  * 'disabled'
  * 'is_community'
  * 'created_at'
  * 'last_sign_in_at'
  * 'license'

All arguments supplied in the example are optional.

The most important attributes of a user (`User` object) are:

  * `email`: the user email address
  * `created_at`: the time the user was created, as a `datetime` object
  * `updated_at`: the last moment a property of this user was changed, as a `datetime` object
  * `last_sign_in_at`: the time of last login, as a `datetime` object
  * `license`: the `LicenseAgreement` object
  * `data_group_tags`: tags used for data restriction
  * `user_groups`: the user groups the user belongs to, as a list of `UserGroup` objects

It is possible to change a user's license:

```python
user.assign_license(my_license)
```

Or for multiple users at once:

```python
api.users.assign_license(my_license, [user1, user2, ...])
```

You can manage machine-to-machine ("M2M") users in a similar way. For iteration:

```python
for m2m_user in api.m2m_users:
   ... process user ...
```

Obtaining a specific user:

```python
user = api.m2m_users[user_id]
```

A `M2MUser` object has the following attributes:

  * `created_by`: the user that created the M2M user, as a `User` object
  * `created_at`: the time of creation, as a `datetime` object
  * `last_usage_at`: the time of last login with this M2M user, as a `datetime` object
  * `valid_until`: the expiration date of the M2M user, as a `datetime` object
  * `user_groups`: the user groups attached to the M2M user, as a list of `UserGroup` objects


### Security

To allow a user to access extra functionality, they need permission to perform that operation. To get an object that
represents the access rights of a user, use:

```python
access = user.get_access()
```

The resulting object, a `UserAccess` object, specifies:
 * `user_groups`: the list of `UserGroups` a `User` belongs to.
 * `data_group_tags`: the associated `data_group_tags`, which reflects the data access rights.
 * `criticality`: the `User`'s overall criticality level, a `CriticalityLevel` object.

The user access is defined by the groups a user is attached to. The `UserAccess` object has a method `has_permission`,
which when given a `UserPermission` object, returns either `True` or `False`, depending on whether the user has that
permission.

A `UserPermission` is an object that represents the right to perform specific operations in DISQOVER. You can view the
set of permissions by iterating over `api.permissions`:

```python
from disqover import UserManagementAPI

api = UserManagementAPI(session)
for permission in api.permissions:
    print(permission)
```

A `UserPermission` object has an identifier, a name, a description, a default value (`True`/`False`), a
criticality level, and a `section` name.

You can also get a specific permission by its identifier:

```python
etl_admin = api.permissions['ETL.PIPELINE.ADMIN']
```

With this object, you can then check whether a user has that permission:

```python
user.get_access().has_permission(etl_admin)
```

A user gets its permissions from the user groups it belongs to.

User groups are separate objects, which can be iterated like so:

```python
for group in api.user_groups:
   print(group.name)
```

You can use `get_iterator` and use `sort_by` and `direction` (`asc` or `desc`). The sorting options are:

  * `name`
  * `description`
  * `updated_at`

If your current user has the necessary rights to create/update user groups, you can create one like this:

```python
group = api.user_groups.create(
     name='test',
     description='test descr',
     identifier='test_group',
     permissions=[etl_admin]
)
```

As indicated, `permissions` is a list of `UserPermission` objects that define what users that belong to this group
are allowed to do.

It is also possible to update a group's permissions:

```python
group.set_permission(can_create_users)
group.remove_permission(etl_admin)
```

Now we can assign the `UserGroup` to the `User`:

```python
user.assign_user_groups([user_creator_group])
```

A `UserGroup` can be removed like this:

```python
UserGroups(session).remove(user_creator_group, force=True)
```

This will forcibly remove the group, even if there a still users that are members of the group.
If you don't specify `force=True`, and there are users attached to the group, you will get an error.
The `remove` method of user groups will never remove users.

### External groups

If your DISQOVER setup uses SAML, you can administer the mapping of user_groups to the groups defined in your IdP.
To get the list of existing `ExternalGroup`s known to DISQOVER, use the iteration mechanism you are now familiar with:

```python
for external_group in api.external_groups:
    print(external_group.name)
```

You can create a new `ExternalGroup` like this:

```python
external_group = api.external_groups.create(name='idp_group')
```

It's possible to append or replace the mapped `ExternalGroup`s of a `UserGroup`:

```python
user_creator_group.append_external_groups([ext_group])

# or

user_creator_group.replace_external_groups([ext_group])
```

### Security events

If your user has the rights, you can retrieve information about specific actions of users in the DISQOVER platform.
You can filter on user, operation and on criticality level.

An example call to retrieve the actions performed by a malicious user, sorted by criticality, is below:

```python
from disqover import SecurityEventAPI

api = SecurityEventAPI(session)

for event in api.events.get_iterator(sort_by='criticality', filter={'user_id': malicious_user.email}):
    print(event.operation, event.criticality)
```

Possible sorting methods are `timestamp` and `criticality`. The filter can be a combination of:

  * `user_id`: an email address
  * `operation`: an operation id
  * `criticality`: only show events with the given criticality


## Stored data

The stored data service is used by DISQOVER to manage resources such as graphs and spaces, instance/table/chart views,
custom start pages, saved explorations, custom chart plugins, and more.

### Resource types

To get a list of all resource types, use:

```python
from disqover import StoredDataAPI

api = StoredDataAPI(session)
for resource_type in api.resource_types:
    print(resource_type)
```

Each resource type has a `name`, a `label` and a `description`. The resource types are set by DISQOVER, you cannot
create, modify or delete resource types.

There can be multiple versions for a particular resource type. If there is more than one version, this means that we
have at some point required the structure of the resource data to change in accordance with new requirements. Use
`resource_type.get_versions()` to get a list of the available versions for a resource type. This is a list of
`StoredDataResourceTypeVersion` objects, which refer back to their `resource_type`, and also have a `version_name` and
an `obsoleted` attribute. If a resource type version is obsoleted, new resources should not be created with this version.
It is customary to use the latest version of a resource type when creating a new resource for that type. Note that
DISQOVER will automatically update existing resources to the latest version.

### Creating resources

Creating a new resource is done through the `api.resources.create` method. This method needs a resource type,
the resource type version name, and a label. It also takes non-required parameters: a parameter to set the permission
rules, a description, the resource data, resource annotations, and the parameters for potential sub-resources.

The code to create a new saved exploration for the Canonical Type "Clinical Study" with a particular table view and
chart view, viewable by all users, would look as follows:

```python
from disqover import ResourcePermissionsRule, StoredDataResourceData

saved_search = api.resource_types["saved_search"]

annotations = {
    "canonicalTypeURI": "http://ns.disqover.com/canonical_type/clinical_study",
    "type": "saved_search"
}
test_data = StoredDataResourceData.from_json({
    'activeView': 'explore',
    'chartViewResourceID': 'b83c7cf1-5286-43a4-8cb1-d77d345b7e33',
    'jsonFilter': '{"tag":"kFB5I2","op":"and","filters":[{"tag":"ct-kFB5I2","facet":"type","op":"equals","value":"cfv:http://ns.disqover.com/canonical_type/clinical_study"},{"op":"and","filters":[{"op":"include_all"}],"tag":"dacn8N"}]}',
    'tableViewResourceID': 'e8a39480-2475-4d3a-80eb-23b2b3543fa7',
})

rule = ResourcePermissionsRule.create_for_all()

new = api.resources.create(
    saved_search,
    "1",
    "My own Clinical Study exploration",
    permission_rules=[rule],
    description="Created via the API library",
    data=test_data,
    annotations=annotations,
)
```

The returned object is of type `StoredDataResource`. This object has the following attributes:

 * `label`: a human-readable label for the resource
 * `description`: a short description
 * `created_at`: a `datetime.datetime` object representing the creation time
 * `created_by`: the `User` that created the resource
 * `modified_at`: a `datetime.datetime` object representing the last modification time
 * `modified_by`: the `User` that last modified the resource
 * `resource_type`: the `StoredDataResourceType` of the resource
 * `version_name`: the version name of the resource type
 * `revision_guid`: a unique identifier for this resource revision
 * `data`: a `StoredDataResourceData` object, representing the resource data
 * `annotations`: a dictionary-like object of type `StoredDataResourceAnnotations`, holding the resource annotations
 * `permission_rules`: a `StoredDataResourcePermissionRules` object, allowing you to list and manage the resource permission rules
 * `sub_resources`: a `StoredDataSubResources` object, allowing you to manage sub-resources

The `label`, `description`, `version_name` and `data` attributes can be set directly, the other attributes are read-only.
The `annotations` and `permission_rules` objects, however, have methods to modify their contents.

### Iterating over resources

Iterating over all resources that are visible to the current user is simply done like this:

```python
for resource in api.resources:
    ... process resource ...
```

In most cases, you'll want to get the resources of a particular type. You can do this by using `get_iterator` and
specifying the resource type as a filter:

```python
from disqover.stored_data.models.filters import ResourceTypeFilter

for resource in api.resources.get_iterator(
        filter=ResourceTypeFilter('instance_view')
):
     ...
```

Besides the `ResourceTypeFilter`, the `filters` module also contains a `CreatedByFilter`, a `ModifiedByFilter`,
a `CreatedBeforeFilter`, a `CreatedAfterFilter`, a `ModifiedBeforeFilter`, a `ModifiedAfterFilter`, a `ResourceParentFilter`,
and a `AnnotationEqualsFilter`. You can also combine filters using the `AndFilter` and `OrFilter`.

> Note: if you use the `ResourceParentFilter`, you are - per definition - interested in resources that are sub-resources
> of another resource. However, by default, the iteration over resources only returns top-level resources
> (i.e. resources that are not sub-resources of another resource). You can override this behaviour by adding
> `include_subresources=True` to the `get_iterator` method. In that case, the iteration will also return sub-resources,
> and you can use the `ResourceParentFilter` to filter on specific parent resources. Note that the `include_subresources`
> parameter can also be used without a `ResourceParentFilter`, in which case the iteration will return all resources,
> both top-level and sub-resources.

Below is an example of how to use the `AnnotationEqualsFilter` to get all resources that have an annotation with
key "type" and value "canonical_type":

```python
from disqover.stored_data.models.filters import AnnotationEqualsFilter

for resource in api.resources.get_iterator(
     filters=AnnotationEqualsFilter("type", "canonical_type"),
):
    ...
```

You can also add the parameter `negate=True` if you want to filter on resources that do *not* have the value
'canonical_type' for the annotation with key 'type'. Other filters (`CreatedByFilter`, `ModifiedByFilter`,
`ResourceTypeFilter`, `ResourceParentFilter`) also have this parameter.

Sorting the resources is possible, by passing the `sort_by` and `direction` parameters (the latter defaults to
ascending) to the `get_iterator` method. The `sort_by` parameter can have the following values:

 * `"created_at"`
 * `"created_by"`
 * `"modified_at"`
 * `"modified_by"`
 * `"type_name"`

### Getting and setting resource data

The `resource.data` value is an object of type `StoredDataResourceData`, which holds a binary representation of the
resource data. If applicable, you can have it interpreted as JSON and get the
corresponding Python object (dictionary or list):

```python
json_data = resource.data.as_json()
```

To set or change the data of a resource, simply pass a new `StoredDataResourceData` object as `resource.data = ...`.
You can create such an object from JSON data like this:

```python
from disqover import StoredDataResourceData

data = StoredDataResourceData.from_json({"my_key": "my_value"})
```

### Getting and setting resource annotations

Resource annotations are key-value pairs that can be attached to a resource to provide additional metadata. As shown
above, you can use annotations for filtering resources. To get all annotations of a resource, access the
`resource.annotations` attribute. The result is object of type `StoredDataResourceAnnotations`. This object behaves
as a dictionary, meaning you can iterate over the keys (`for key in resource.annotations:`), get the value for a
specific key (`value = resource.annotations['my_key']`), and check for the existence of a key
(`if 'my_key' in resource.annotations:`). There is also a `get` method that takes a default value as the second
argument, that is returned if the key does not exist. An `update` method can be used to set a new value for multiple
(either new or existing) keys at once. You can remove multiple keys by using the `remove` method and passing the
different keys as positional arguments.


### Getting and setting resource permission rules

Whether resources can be viewed, modified or deleted by specific sets of users is determined by something called
"permission rules". One or multiple permission rules may be assigned to a resource. Each permission rule defines a set
of users and a set of permissions that apply to that set of users. An example of a permission rule could be that
**all users** can **read** the resource. Another permission rule could dictate that **only users in the 'admin'
user group** can **modify** the resource. Alternatively, a **single user** could be given **ownership** of the resource,
meaning that they can also delete it. A permission rule can also be based on DISQOVER user permissions, e.g. any user
with the permission `TEMPLATES.ADMIN` may be allowed to modify the resource.

To get the permission rules of a resource, use `resource.permission_rules`. The result is a
`StoredDataResourcePermissionRules` object. This object behaves a bit like a list, meaning it has a length
(`len(resource.permission_rules)`) and is iterable:

```python
for rule in resource.permission_rules:
    ...
```

Each rule has the following attributes:

 * `user_specification`: a `UserSpecification` object
 * `permissions`: a `ResourcePermissionSet` object
 * `info`: a ResourcePermissionRuleInfo object, containing additional info such as when the rule was created and by whom

A `UserSpecification` object has a `type` and a `value` attribute. The `type` is any of the following enum values:

 * `UserSpecificationType.ALL_USERS`: the rule applies to all users
 * `UserSpecificationType.SINGLE_USER`: the rule applies to a single user
 * `UserSpecificationType.USER_GROUP`: the rule applies to all users of a specific user group
 * `UserSpecificationType.PERMISSION`: the rule applies to all users who have a specific user permission

The `value` attribute depends on the `type`. In the case of `ALL_USERS`, the value is `None`. In the case of
`SINGLE_USER`, the value is a `User` object. In the case of `USER_GROUP`, the value is a `UserGroup` object. In the case
of `PERMISSION`, the value is a `UserPermission` object.

The `ResourcePermissionSet` object has 5 boolean attributes:

 * `read`: the users can see the resource
 * `write`: the users can modify the resource
 * `collaborator`: the users can change read, write, and collaborator permissions for others or themselves
 * `sharing`: the users can change read permissions for others or themselves
 * `owner`: owners can delete resources and also change any permission for everyone


Setting a permission rule is done through the `set` method of the `resource.permission_rules` object. It takes a
`ResourcePermissionsRule` object as input. The easiest way to create such an object is through its `create` method.
You give it a user specification value as the first argument, and you can specify the specific permissions as
boolean keyword arguments. For example:

```python
resource.permission_rules.set(ResourcePermissionsRule.create(a_user_object, read=True, write=True))
```

The rule above will grant read and write access to one specific user. By default, the permission parameters have
a value of `False`, so you only need to specify the ones you want to set to `True`. To create a rule for all users,
use a value of `None`. Alternatively, you can also use `ResourcePermissionsRule.create_all()`:

```python
resource.permission_rules.set(ResourcePermissionsRule.create_for_all(read=True))
```

A `ResourcePermissionsRule` has a `describe` method, which returns a human-readable string describing the rule. For
example, printing the description of a rule that gives all access to the resource owner yields (`print(rule.describe())`):

```
The user 'xyz@company.com' owns the resource, can read, can write, is collaborator and can share
```

### Default resources

For some resource types, it is useful to set default resources. For example, there can be multiple chart views for
a particular Canonical Type, but one can be set as the system default. Defaults can also be defined for a specific
user (or for a user group, or users with a specific permission). Defaults can also be differentiated by different
so-called "specifiers", for example the URI of the Canonical Type for which the chart view is the default.

All default resources for a particular resource type can be obtained as follows:

```python
resource_type = api.resource_types['chart_view']

for default in resource_type.get_all_defaults():
     ...
```

Each default is a `DefaultStoredDataResource` object, which has the following attributes:

 * `resource`: the corresponding `StoredDataResource` object
 * `specifier`: a string, for example a Canonical Type URI
 * `user_specification`: a `UserSpecification` object, indicating to whom this default applies

To get the default that applies to the current user, use the `get_my_default` method. A specifier *can* be
passed as an argument. You can also get the default for a specific user specification by using the
`get_specific_default` method.

### Resource aliases

Besides a unique identifier, the GUID, a resource can also be identified by an alias. An alias is a string (possibly
a user-friendly name) that is unique within a specific resource type. It does not belong to a resource, but can be
re-assigned to another resource at any given time.

To view and configure resource aliases, you should go through the `aliases` attribute of a specific
`StoredDataResourceType`. This attribute is a `StoredDataResourceAliases` object, which supports:

 * iteration over the aliases defined for the resource type
 * getting a specific alias (`aliases[alias_name]`)
 * creating a new alias with the `create` method
 * removing an alias with the `remove` method.

The result of iterating over or getting an alias is a `StoredDataResourceAlias` object, which has a `name`, a
`type_name` and a reference to the `resource`. If you want the aliases for a specific resource, use a filter:

```python
for resource_alias in resource_type.aliases.get_iterator(filters={"resource": resource}):
    ...
```

You can also use a resource GUID (a string) as the filter value.

The `create` method takes a name for the alias and a `StoredDataResource` object (or a GUID representing one).
The `remove` method takes an alias object (`StoredDataResourceAlias`) or an alias name (string).

You can also change the resource that a particular alias points to, by simply modifying the `resource` attribute.
For example:

```python
alias.resource = my_other_resource
```

The alias is the `StoredDataResourceAlias` object. The `my_other_resource` is a `StoredDataResource` object.

### Sub-resources

Some resources can have sub-resources (or some may be required to have them!).
Sub-resources are a convenient way to make sure that certain resources have the same permission as their parent,
and are deleted when the parent resource is deleted.

As already mentioned above, a `StoredDataResource` object has a `sub_resources` attribute, which is an object
of type `StoredDataSubResources`. Iterating over this object gives you all sub-resources of the parent resource:

```python
for sub_resource in resource.sub_resources:
    ... process sub_resource ...
```

A sub-resource is of the same type as a regular resource (`StoredDataResource`), so it has the same methods and
attributes.

Sub-resources can be (or must be, depending on the resource type) created at the same time as the parent resource, by
passing a list of sub-resource parameters (objects of type `SubResourceParameters`) as the `sub_resources` argument
of the `create` method on `api.resources`. For example, an exploration (`saved_search`) resource with a specific
table view and chart view as sub-resources would be created as follows (see the "Creating resources" section above
for the other arguments):

```python
from disqover import SubResourceParameters

my_charts = SubResourceParameters(
    resource_type=api.resource_types["chart_view"],
    version_name="3",
    label="My ad-hoc chart view",
    description="Created with the API library",
    data=...,
    annotations={"canonicalTypeURI": "http://ns.disqover.com/canonical_type/clinical_study"},
)

my_table = SubResourceParameters(
    resource_type=api.resource_types["table_view"],
    version_name="2",
    label="My ad-hoc table view",
    description="Created with the API library",
    data=...,
    annotations={"canonicalTypeURI": "http://ns.disqover.com/canonical_type/clinical_study"},
)

api.resources.create(
    api.resource_types["saved_search"],
    "1",
    "My own Clinical Study exploration",
    permission_rules=[...],
    description="Created via the API library",
    data=...,
    annotations=...,
    sub_resources=[
        my_charts,
        my_table,
    ],
)
```

The exact form of the content of the `data` attributes for the charts and table views is beyond the scope of this
documentation. This content is represented by an object of type `StoredDataResourceData`, which can be constructed
from JSON data using the `from_json` method as demonstrated above.

To create a new resource as a sub-resource of an existing resource, simply use the `create` method:

```python
new_sub_resource = resource.sub_resources.create(
    api.resource_types["table_view"],
    "1",
    "My ad-hoc table view",
    description="Created with the API library",
    data=...,
    annotations={...},
)
```

Note that you cannot just add any resource as a sub-resource of another resource. A resource of type
"saved_search" can for example have at most one "table_view" and one "chart_view" sub-resource.

### Migrating resources

You can migrate resources (including their data, annotations, ...) as follows:

```python
from disqover import migrate_resources
from disqover.constants import ImportConflictResolutionMethod

source_session = DisqoverSession(...)  # the disqover environment to export resources from
destination_session = DisqoverSession(...)  # the disqover environment to import into
migrate_resources(source_session, destination_session, on_conflict=ImportConflictResolutionMethod.ERROR)
```

If you do not have access to both environments on the same host, you can export the resources to a (compressed)
file. Please note that this code is prone to breaking as migrations are further expanded to include more features.
Whenever possible, prefer the `migrate_resources` function.

```python
from disqover import StoredDataAPI
import json
import gzip

source_session = DisqoverSession(...)  # the disqover environment to export resources from
resources = StoredDataAPI(source_session).export_resources()
with gzip.open('resources.json.gz', 'wt', encoding='UTF-8') as f:
    json.dump(resources, f)
```

Now, you can move the file and import it:

```python
from disqover import StoredDataAPI
from disqover.constants import ImportConflictResolutionMethod
import json
import gzip

destination_session = DisqoverSession(...)  # the disqover environment to import into
with gzip.open('resources.json.gz', 'rt', encoding='UTF-8') as f:
    resources = json.load(f)
StoredDataAPI(destination_session).import_resources(resources_export=resources, on_conflict=ImportConflictResolutionMethod.ERROR)
```

If you try to migrate a resource that already exists in the destination session, an error will be thrown if you use the
default conflict resolution enum `ImportConflictResolutionMethod.ERROR`. By setting the
variable `on_conflict` you can specify how you want to merge the sources.
The `ImportConflictResolutionMethod` enum has the following options:
- `ImportConflictResolutionMethod.ERROR` is the default behaviour, and will raise an error if it occurs
- `ImportConflictResolutionMethod.OVERWRITE` will identify what changed, and overwrite the conflicting attributes in the
destination session
- `ImportConflictResolutionMethod.KEEP_EXISTING` will ignore conflicts. Conflicting resources will remain unchanged in
the destination session.

## Advanced topics

### Iteration, pagination, sorting and filtering

In the tutorials above, we have already demonstrated the principle of 'iteration', where you can retrieve different
entities in DISQOVER and process (or inspect) them in a loop. We provide a uniform method for this iteration
irrespective of whether you are requesting users, pipelines, instances, RDS subscriptions, stored data resources, etc.
This means you can use `for`-loops (either with `get_iterator` to specify additional parameters or without) to iterate
over any of these types of entities.

In the table below we provide an overview of the different parts of the API that involve collections
of entities in DISQOVER. When the `get_iterator` method accepts a `sort_by` parameter, you see the
different possible values in this table. In some cases, filtering is supported, as indicated in the last column.
Filters are specified as `filters={"key1": value_1, "key2": value2"}`.

Besides the `sort_by`, `filters` and `direction` parameters (depending on which part of the API), the `get_iterator`
method will also accept the `page_size` parameter (for any API). Use it to define the number of
entities the API library will request from DISQOVER per page; the value must be a positive integer number.
You can also use the `get_first_page` method which also takes the `page_size` parameter, e.g. the following code
will return a page of the first 20 entities:

```python
page = api.something.get_first_page(20)
```

The default page size is always 100.

| API                         | Iterable attribute (or method)     | Sortable by                                                                          | Supports filtering |
|-----------------------------|------------------------------------|--------------------------------------------------------------------------------------|--------------------|
| `DataQueryAPI`              | `api.get_instances(...)`           | Property URI                                                                         | No                 |
| `DataIngestionAPI`          | `api.pipelines`                    | `name`, `created_by`, `last_run`, `run_by`                                           | No                 |
|                             | --> `pipeline.runs`                | <no sorting>                                                                         | No                 |
|                             | `api.component_types`              | <no sorting>                                                                         | No                 |
|                             | `api.source_data.contents`         | <no sorting>                                                                         | No                 |
|                             | `api.source_data.directories`      | <no sorting>                                                                         | No                 |
|                             | `api.source_data.files`            | <no sorting>                                                                         | No                 |
| `LicenseManagementAPI`      | `api.license_agreements`           | <no sorting>                                                                         | No                 |
| `UserManagementAPI`         | `api.users`                        | `email`, `disabled`, `is_community`, `created_at`, `last_sign_in_at`, `license`      | Yes (*)            |
|                             | `api.m2m_users`                    | <no sorting>                                                                         | No                 |
|                             | `api.user_groups`                  | `name`, `description`, `updated_at`                                                  | No                 |
|                             | `api.external_groups`              | `name`                                                                               | No                 |
|                             | `api.permissions`                  | <no sorting>                                                                         | No                 |
| `SecurityEventAPI`          | `api.events`                       | `timestamp`, `criticality`                                                           | Yes (**)           |
| `InstanceCollectionsAPI`    | `api.instance_collections`         | `name`, `createdBy`                                                                  | No                 |
|                             | --> `collection.instances`         | `label`, `addedOn`, `addedBy`                                                        | No                 |
|                             | --> `collection.removed_instances` | `label`, `removedOn`                                                                 | No                 |
| `ResourcesAPI`              | `api.resources`                    | <undocumented>                                                                       | Yes (***)          |
| `TasksAPI`                  | `api.tasks`                        | <undocumented>                                                                       | Yes (**)           |
|                             | `api.schedules`                    | <undocumented>                                                                       | Yes (**)           |
| `EventsAPI`                 | `api.events`                       | <undocumented>                                                                       | Yes (**)           |
|                             | `api.event_subscriptions`          | <undocumented>                                                                       | Yes (**)           |
|                             | `api.event_types`                  | <undocumented>                                                                       | Yes (**)           |
|                             | `api.event_targets`                | <undocumented>                                                                       | Yes (**)           |
| `UserViewsAPI`              | `api.user_views`                   | <no sorting>                                                                         | Yes (****)         |
| `PluginsAPI`                | `api.plugins`                      | <no sorting>                                                                         | No                 |
| `RemoteDataSubscriptionAPI` | `api.subscriptions`                | <no sorting>                                                                         | No                 |
|                             | `api.published_data_sets`          | <no sorting>                                                                         | No                 |
| `SystemAPI`                 | `api.settings`                     | <no sorting>                                                                         | No                 |
| `StoredDataAPI`             | `api.resource_types`               | <no sorting>                                                                         | No                 |
|                             | --> `resource_type.aliases`        | <no sorting>                                                                         | Yes (*****)        |
|                             | `api.resources`                    | `type_name`, `created_by`, `modified_by`, `created_at`, `modified_at`                | Yes (******)       |
|                             | --> `resource.sub_resources`       | <no sorting>                                                                         | Yes (*******)      |

(*) possible filter keys:
 * `email`: a string
 * `include_disabled`: a boolean value
 * `is_community`: a boolean value

(**) undocumented

(***) filtering is required. Possible keys:
 * `filter`: undocumented
 * `include_content`: a boolean value
 * `mode`: `all`, `current` or `global`

(****) possible filter keys:
 * `include_groups`: a boolean value

(*****) possible filter keys:

 * `resource`: the resource object or a resource GUID

(******) filtering is performed with the filter classes in the `disqover.stored_data.models.filters` module.

(*******) possible filter keys:
 * `resource_type`: resource type name or `StoredDataResourceType` object
 * `annotations`: a dictionary (see example in documentation of stored data API above)
