CPS core Get Data Node Inconsistencies

With recent developments in the CPS core API's a few inconsistencies that were observed with the Get Data Node API. The goal of this wiki is to provide the analysis of these inconsistencies, how these functional inconsistencies compare against the Query API and some possible solutions to them.

Issues & Decisions

Notes

Decisions

15 May 2023

As per discussion with Lukasz Rajewski the following decisions were made:
- The approach to have a custom logic to retrieve all the list items seems to be most appropriate to make Get Data Node API consistent with Query API
- The approach to use a special character, asterisk(*) does not address the problem of inconsistency. As we would like to have the xpath in request consistent with that in Query API
  - Example:
    /v2/dataspaces/my-dataspace/anchors/multipleTop/node?xpath=/bookstore/categories&descendants=3
- The approach to Redirect the request to Query API does not seem to be a feasible approach.

15 May 2023

More research needs to be done for the first proposed approach and need to document it in detail.

Inconsistent response when retrieving data nodes.

When a get operation is performed on Root node xpath (/) it returns an array list containing all the data nodes under the root node xpath. But if the same operation is performed for a List Data node no array list is returned instead a 404 Not Found response is returned.

Requesting data under root node xPath

/v2/dataspaces/my-dataspace/anchors/multipleTop/node?xpath=/&descendants=3

Response

[
    {
        "stores:bookstore": {
            "store-name": "My Bookstore",
            "store-owner": "James",
            "categories": [
                {
                    "name": "Test book",
                    "price": 100,
                    "stock": false,
                    "book-category": "SciFi"
                }
            ]
        }
    },
    {
        "stores:electronics-store": {
            "store-name": "My Electronics store",
            "category": [
                {
                    "status": true,
                    "address": "India",
                    "store-type": "electronics"
                }
            ]
        }
    }
]

Requesting entire List using Get Data Node API

/v2/dataspaces/my-dataspace/anchors/multipleTop/node?xpath=/bookstore/categories&descendants=3

Response

{
    "status": "404 NOT_FOUND",
    "message": "DataNode not found",
    "details": "DataNode with xpath /bookstore/categories was not found for anchor multipleTop and dataspace my-dataspace."
}

Similarly, when the comparing the Get Data Node API to Query API, it was observed that when a Query is performed on a List data node then all the list items are returned, but when the same operation is performed using Get Data Node API it returns 404 Not Found as response.

Requesting an entire List using Query API

Requesting entire List using Query API and xPath

/v2/dataspaces/my-dataspace/anchors/multipleTop/nodes/query?cps-path=/bookstore/categories&descendants=3

Response by Query API

[
    {
        "stores:categories": {
            "name": "Cook book",
            "price": 500,
            "stock": true,
            "book-category": "Cooking"
        }
    },
    {
        "stores:categories": {
            "name": "Test book",
            "price": 100,
            "stock": false,
            "book-category": "SciFi"
        }
    }
]

Reason for the inconsistency

When we look at the way the data is being stored in CPS DB we can see that when storing a List Data Node, the entire list is not stored under a single xPath referring to the list data node.

Instead, each individual list item is stored in the CPS DB with its along with its unique xPath which consists of the name of List Data node and the corresponding "key" leaf or "key" attribute.

And hence there is currently the entire list cannot be retrieved by the xPath for the List Data Node and only the individual list items can be retrieved.

"xpath"	"attributes"
"/bookstore"	"{""store-name"": ""My Bookstore"", ""store-owner"": ""James""}"
"/bookstore/categories[@book-category='SciFi']"	"{""name"": ""Test book"", ""price"": 100, ""stock"": false, ""book-category"": ""SciFi""}"
"/bookstore/categories[@book-category='Cooking']"	"{""name"": ""Cook book"", ""price"": 500, ""stock"": true, ""book-category"": ""Cooking""}"
"/electronics-store"	"{""store-name"": ""My Electronics store""}"
"/electronics-store/category[@store-type='electronics']"	"{""status"": true, ""address"": ""India"", ""store-type"": ""electronics""}"

In the above table we can see that the the individual list items of the list "Categories" are stored with respect to their unique xpaths and no xPath such as /bookstore/categories exists in the DB.

Possible Solutions

Implement a logic to retrieve all data nodes when a list is requested

One approach could be to implement logic to retrieve all the list items when a List data node is queried, similar to the approach taken in Query API.

This would require a custom logic in Get Data Node API which would fetch all the list items when a xpath for a list data node is sent in the get request. This would make the Get Data node API consistent with the Query API.

A sample request in this case would look something like:

/v2/dataspaces/my-dataspace/anchors/multipleTop/node?xpath=/bookstore/categories&descendants=3

Using a special character in the xPath for Get Data Node API

By using a special character as part of the attribute in xPath we can implement a logic to retrieve the entire List data node.

For instance, to retrieve the entire list we can use an asterisk symbol in the attributes part of the xPath . And a logic can be implemented where if the asterisk is present in the attribute part of the xPath then CPS should return the entire list data node.

Sample Request

/v2/dataspaces/my-dataspace/anchors/multipleTop/node?xpath=/bookstore/categories[@category=*]&descendants=3

In the above request our xPath looks something looks like this:

xPath with asterisk symbol as attribute

/bookstore/categories[@type=*]

Expected Response

[
    {
        "stores:categories": {
            "name": "Cook book",
            "price": 500,
            "stock": true,
            "book-category": "Cooking"
        }
    },
    {
        "stores:categories": {
            "name": "Test book",
            "price": 100,
            "stock": false,
            "book-category": "SciFi"
        }
    }
]

Redirect request to the Query API

Another proposed approach is to redirect the request to the Query API when a list data node is requested.

Note: The problem with this approach is to identify whether the request data node is a list or not, by using the xpath provided in the Get request. Also it is not feasible to redirect the request to Query API.

Accepted Solution

By modifying the SQL query to retrieve the data nodes we can fetch all the list items which have a similar xpaths.

In the present query, when querying a list item, we specify its absolute xpath as it is the only available data node with the specified xpath. Ex: /bookstore/categories[@code="1"]

When we try to query the entire list, we get DataNodeNotFound exception because there is no data node with the xpath representing a list in the database. Ex: /bookstore/categories

"id"	"xpath"	"attributes"	"parent_id"
200	"/bookstore/categories[@book-category='Kids']"	"{""name"": ""Kids book"", ""price"": 5000, ""stock"": true, ""book-category"": ""Kids""}"	198
199	"/bookstore/categories[@book-category='SciFi']"	"{""name"": ""Test book"", ""price"": 1000, ""stock"": false, ""book-category"": ""SciFi""}"	198
198	"/bookstore"	"{""store-name"": ""My Bookstore"", ""store-owner"": ""ABC""}"	null

Old query

WHERE anchor_id = :anchorId AND xpath = ANY (:xpaths)

So, from the above snapshot of the fragment table, it can be clearly seen that individual list items are stored in the CPS DB and there is no data node representing the entire list. And hence when the existing query is executed it looks for the fragment with the specific xpath.

Now, the updated query does not look for a list data node for a given xpath (representing a list Ex: /bookstore/categories). Instead, it looks for all the fragments in the table which have a similar xpath to that of the list followed by opening square bracket "[" and any attribute identified in the query by special symbol "%"

Updated Query

WHERE anchor_id = :anchorId "
+ "     AND (xpath = ANY (:xpaths) OR xpath LIKE ANY (array(SELECT concat(t, '[%') FROM unnest(:xpaths) as t)))

So as seen in the above segment of the query, we have "AND (xpath = ANY (:xpaths) OR xpath LIKE ANY (array(SELECT concat(t, '[%') FROM unnest(:xpaths) as t)))".

Any given xpath for a list (Ex: /bookstore/categories) would become like so:

(xpath = '/bookstore/categories' OR xpath LIKE '/bookstore/categories[%')

And this would return 2 fragments as follows.

"id"	"xpath"	"attributes"	"parent_id"
200	"/bookstore/categories[@book-category='Kids']"	"{""name"": ""Kids book"", ""price"": 5000, ""stock"": true, ""book-category"": ""Kids""}"	198
199	"/bookstore/categories[@book-category='SciFi']"	"{""name"": ""Test book"", ""price"": 1000, ""stock"": false, ""book-category"": ""SciFi""}"	198

Impact on response body

The updated query returns child fragments when querying the entire list and it has the following impacts when compared to the existing response body.

Currently in order to retrieve the entire list data node we have to query the Parent data node. So, in the above example for the list named "categories" we have to query the parent node that is "bookstore". Hence the xpath in the query would become "/bookstore" and we would get the following response:

Current Response

[
    {
        "stores:bookstore": {
            "categories": [
                {
                    "name": "Test book",
                    "price": 1000,
                    "stock": false,
                    "book-category": "SciFi"
                },
                {
                    "name": "Kids book",
                    "price": 5000,
                    "stock": true,
                    "book-category": "Kids"
                }
            ],
			"store-name": "My Bookstore",
            "store-owner": "ABC"
        }
    }
]

Here we can see that in the response we get a parent node bookstore and the list named "categories", as its child, and the list items identified by keys "SciFi" and "Kids" respectively.

But when we query the entire list directly using the updated query we will get 2 data nodes (representing "SciFi" and "Kids") instead of one parent node ("bookstore") with 2 children ("SciFi" and "Kids").

These 2 individual data nodes when parsed into a JSON will result in a response with as follows:

Response when retrieving whole list

[
    {
        "stores:categories": {
            "name": "Test book",
            "price": 1000,
            "stock": false,
            "book-category": "SciFi"
        }
    },
    {
        "stores:categories": {
            "name": "Kids book updated",
            "price": 5000,
            "stock": true,
            "book-category": "Kids"
        }
    }
]

The behavior is consistent with the existing functionality of Query API as well:

Response with Query API

[
    {
        "stores:categories": {
            "name": "Test book",
            "price": 1000,
            "stock": false,
            "book-category": "SciFi"
        }
    },
    {
        "stores:categories": {
            "name": "Kids book updated",
            "price": 5000,
            "stock": true,
            "book-category": "Kids"
        }
    }
]