Page Comparison

Table of Contents

...

The test timings used to generate the charts in this document is available as a spreadsheet: CPS-1646 CpsPath queries using path-component lookup.xlsx

Query one device out of many

...

In this case, a query that matches a single device node is executed.
Case Query one out of many using descendant cps path Query one out of many using absolute cps path
Query //openroadm-device[@device-id="C201-7-1A-14"] /openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"]
Comparison graph
Image Modified
Graph detail
Time complexity of
existing solution
O(N) O(N)
Time complexity of
proposed solution
O(N)
Near-constant, O(1)
Comments
The use of LIKE 'openroadm-device[%' in the query to match candidates
means that all fragments need to be checked, e.g. for 3000 devices,
3000 x 86 ~= 250k fragments need to be scanned.
The use of the absolute path narrows the search space to only direct
children of /openroadm-devices - for 3000 devices, we only need to scan
3000 fragments instead of 250k. (Technically this is still O(N), but linear on
the number of items in the list, not the number of fragments in the DB.)
As seen in the graphs, query performance for current master branch is linear on the size of the database, while the PoC implementation is constant time (independent of database size).
Note: I have not illustrated all different fetchDescendantOptions, as it has only minor impact in the case of fetching 1 device node.

Query all devices

...

using descendant cps path

In this case, a query that matches all device nodes using a descendant cps path is executed.

Fetch descendants	Omit Descendants	Direct Descendants	All Descendants
Query	//openroadm-device[@ne-state="inservice"]	//openroadm-device[@ne-state="inservice"]	//openroadm-device[@ne-state="inservice"]
Comparison graph
Graph detail
Time complexity of existing solution	Unclear, as there is very high variance. Probably O(N).	O(N²)	O(N²)
Time complexity of proposed solution	O(N)	O(N)	O(N)

Query all

...

devices using absolute cps path

In this case, a query that matches many device nodes using an absolute cps path is executed.

Fetch descendants	Omit Descendants	Direct Descendants	All Descendants
Query	/openroadm-devices/openroadm-device[@status="success"]	/openroadm-devices/openroadm-device[@status="success"]	/openroadm-devices/openroadm-device[@status="success"]
Comparison graph
Graph detail
Time complexity of existing solution	Unclear, as there is very high variance. Likely O(N).	O(N²)	O(N²)
Time complexity of proposed solution	O(N)	O(N)	O(N)

Possible improvements

Cps Path Query capabilities can be extended

Presently, Cps Path queries limit leaf-condition, text-condition, etc. to the final path component. The proposed solution allows for future improvement where leaf conditions could be applied to any or all path components.

...

Optimization: Index-only lookup when list key is used in leaf condition

Given the following CPS path query:

/bookstore/categories[@code='1']/books[@title='Matilda']

...

In this case, the bookstore model specifies that the 'title' leaf is the key for 'books', and thus the xpath is encoded in the database as: /bookstore/categories[

...

@code='

...

1']/books[@title='Matilda']

...

We can optimize for this case, by checking if xpath_component = "books[@title='Matilda']

...

Presently, only case 1 will return data (because 'code' is the list key for 'categories'), and cases 2 and 3 return nothing. Given the proposed solution uses an SQL sub-query for each path-component, the query generator can be adapted to support cases 2 and 3, with no expected performance impact.

Other operations can be accelerated

The same algorithm to improve query performance could also be used to improve performance of other operations, such as GET, UPDATE, and DELETE data nodes. However, some of these existing operations have "plural" implementations taking Collections as parameters, such as getDataNodesForMultipleXpaths. Additional investigation is needed for these cases.

Optimization: Index-only lookup when list key is used in leaf condition

Given the following CPS path query:

/bookstore/categories[@code='1']/books[@title='Matilda']

In this case, the bookstore model specifies that the 'title' leaf is the key for 'books', and thus the xpath is encoded in the database as: /bookstore/categories[@code='1']/books[@title='Matilda']

We can optimize for this case, by checking if xpath_component = "books[@title='Matilda']", thus avoiding the need to examine the attributes field in the fragment table (thus the entire operation involves only database indexes).

In the PoC implementation, the following is a snippet from the SQL generated from the ", thus avoiding the need to examine the attributes field in the fragment table (thus the entire operation involves only database indexes).

In the PoC implementation, the following is a snippet from the SQL generated from the CPS path query:

Code Block

language	bash

        AND (xpath_component = 'books' OR xpath_component LIKE 'books[%') 
        AND (attributes @> '{"title":"Matilda"}')

...

categories[@code='13Matilda

id	parent_id	anchor_id	xpath	xpath_component	list_name
1	NULL	3	/bookstore	bookstore	NULL
2	11	3	/bookstore/categories[@code='1']	categories[@code='1']	categories
3	2	3	/bookstore/categories[@code='1']	/books[@title='Matilda']	books[@title='Matilda']	categoriesbooks
4	2	3	/bookstore/categories[@code='1']/books[@title='	The Gruffalo']	books[@title='The Gruffalo']	books

Our query could be modified to replace the LIKE with an indexed look up:

AND (xpath_component = 'books' OR list_name = '

...

books'

...

This would give rise to constant time look up in such cases. If this solution is not implemented, users can work around this problem wrapping lists in container nodes, with the container being included in the descendant query, e.g. instead of //books[@title='

...

Matilda']

...

, change the model and use //books/book[@title='

...

Our query could be modified to replace the LIKE with an indexed look up:

AND (xpath_component = 'books' OR list_name = 'books')

This would give rise to constant time look up in such cases. If this solution is not implemented, users can work around this problem wrapping lists in container nodes, with the container being included in the descendant query, e.g. instead of //Matilda'] as a query. This would then have O(1) performance.

Note: if we wish to avoid creating additional database fields, using list/container name field would offer the best balance of performance, e.g.

id	parent_id	anchor_id	xpath	container
1	NULL	3	/bookstore	bookstore
2	1	3	/bookstore/categories[@code='1']	categories
3	2	3	/bookstore/categories[@code='1']/books[@title='Matilda']

...

	books
4	2	3	/bookstore/categories[@code='1']/books[@title='

...

Note: if we wish to avoid creating additional database fields, using list/container name field would offer the best balance of performance, e.g.

3

id	parent_id	anchor_id	xpath	container
1	NULL	3	/bookstore	bookstore
2	1	The Gruffalo']	books

In this case, attributes for any list element path component would need to be looked up (so it would no longer be index-only). My suspicion is that this would have better general performance than using xpath_component alone, but testing is needed to verify.

Cps Path Query capabilities can be extended

Presently, Cps Path queries limit leaf-condition, text-condition, etc. to the final path component. The proposed solution allows for future improvement where leaf conditions could be applied to any or all path components.

For example, given these queries:

/bookstore/categories[@code='1']

...

/books[@title='Matilda']
/bookstore/categories[

...

@name='

...

SciFi']/books[@title='Matilda']

...

/bookstore/categories

...

/books[@title='

...

Matilda']

...

Presently, only case 1 will return data (because 'code' is the list key for 'categories'), and cases 2 and 3 return nothing. Given the proposed solution uses an SQL sub-query for each path-component, the query generator can be adapted to support cases 2 and 3, with no expected performance impact.

Other operations can be accelerated

The same algorithm to improve query performance could also be used to improve performance of other operations, such as GET, UPDATE, and DELETE data nodes. However, some of these existing operations have "plural" implementations taking Collections as parameters, such as getDataNodesForMultipleXpaths. Additional investigation is needed for these cases.

Work Breakdown for Implementation

...

Versions Compared

Old Version 87

New Version 88

Key

Query one device out of many

Query all devices

using descendant cps path

Direct Descendants

All Descendants

Query all

devices using absolute cps path

Possible improvements

Cps Path Query capabilities can be extended

Optimization: Index-only lookup when list key is used in leaf condition

Other operations can be accelerated

Optimization: Index-only lookup when list key is used in leaf condition

Cps Path Query capabilities can be extended

Other operations can be accelerated

Work Breakdown for Implementation

Case	Query one out of many using descendant cps path	Query one out of many using absolute cps path
Query	//openroadm-device[@device-id="C201-7-1A-14"]	/openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"]
Comparison graph		Image Modified
Graph detail
Time complexity of existing solution	O(N)	O(N)
Time complexity of proposed solution	O(N)	Near-constant, O(1)
Comments	The use of LIKE 'openroadm-device[%' in the query to match candidates means that all fragments need to be checked, e.g. for 3000 devices, 3000 x 86 ~= 250k fragments need to be scanned.	The use of the absolute path narrows the search space to only direct children of /openroadm-devices - for 3000 devices, we only need to scan 3000 fragments instead of 250k. (Technically this is still O(N), but linear on the number of items in the list, not the number of fragments in the DB.)