Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Code Block
languagesql
SELECT
    * 
FROM
    fragment 
WHERE
    parent_id =IN (
        SELECT
            id 
        FROM
            fragment 
        WHERE
            xpath_component = 'categories[@code=''1'']'
            AND parent_id =IN (
                SELECT
                    id 
                FROM
                    fragment 
                WHERE
                    xpath_component = 'bookstore'
                    AND anchor_id = 3
                    AND parent_id IS NULL
            )
        ) 
        AND (
            xpath_component = 'books'
            OR xpath_component LIKE 'books[%'
        ) 
        AND (
            attributes @> '{"price":15}'
        )

...

The test timings used to generate the charts in this document is available as a spreadsheet: CPS-1646 CpsPath queries using path-component lookup.xlsx

Query one device out of many

...

In this case, a query that matches a single device node is executed.

CaseQuery one out of many using descendant cps pathQuery one out of many using absolute cps path
Query//openroadm-device[@device-id="C201-7-1A-14"]/openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"]
Comparison graph

Image Modified

Graph detail

Time complexity of
existing solution

O(N)O(N)
Time complexity of
proposed solution

O(N)

Near-constant, O(1)

As seen in the graphs, query performance for current master branch is linear on the size of the database, while the PoC implementation is constant time (independent of database size).

Note: I have not illustrated all different fetchDescendantOptions, as it has only minor impact in the case of fetching 1 device node.

Query all devices (N out of N) using descendant cps path

In this case, a query that matches all device nodes using a descendant cps path is executed.

...

Direct Descendants

...

All Descendants

...

Image Removed

...

Image Removed

...

Image Removed

...

Image Removed

...

Image Removed

...

Image Removed

...

Time complexity of
existing solution

...

Query all devices (N out of N) using absolute cps path

In this case, a query that matches many device nodes using an absolute cps path is executed.

Image Removed
Fetch descendantsOmit DescendantsDirect DescendantsAll Descendants
Query/openroadm-devices/openroadm-device[@status="success"]/openroadm-devices/openroadm-device[@status="success"]/openroadm-devices/openroadm-device[@status="success"]
Comparison graph

Image Removed

Image Removed

Image Removed

Graph detail

Image Removed

Image Removed

Comments

The use of LIKE 'openroadm-device[%' in the query to match candidates
means that all fragments need to be checked, e.g. for 3000 devices,
3000 x 86 ~= 250k fragments need to be scanned.

The use of the absolute path narrows the search space to only direct
children of /openroadm-devices - for 3000 devices, we only need to scan
3000 fragments instead of 250k. (Technically this is still O(N), but linear on
the number of items in the list, not the number of fragments in the DB.)

As seen in the graphs, query performance for current master branch is linear on the size of the database, while the PoC implementation is constant time (independent of database size).

Note: I have not illustrated all different fetchDescendantOptions, as it has only minor impact in the case of fetching 1 device node.

Query all devices using descendant cps path

In this case, a query that matches all device nodes using a descendant cps path is executed.

Fetch descendantsOmit Descendants

Direct Descendants

All Descendants

Query//openroadm-device[@ne-state="inservice"]//openroadm-device[@ne-state="inservice"]//openroadm-device[@ne-state="inservice"]
Comparison graph

Image Added

Image Added

Image Added

Graph detail

Image Added

Image Added

Image Added

Time complexity of
existing solution

Unclear, as there is very high variance. Likely Probably O(N).O(N2)O(N2)
Time complexity of
proposed solution
O(N)O(N)O(N)

...

Cps Path Query capabilities can be extended

Presently, Cps Path queries limit leaf-condition, text-condition, etc. to the final path component. The proposed solution allows for future improvement where leaf conditions could be applied to any or all path components.

For example, given these queries:

  1. /bookstore/categories[@code='1']/books[@title='Matilda']
  2. /bookstore/categories[@name='SciFi']/books[@title='Matilda']
  3. /bookstore/categories/books[@title='Matilda']

Presently, only case 1 will return data (because 'code' is the list key for 'categories'), and cases 2 and 3 return nothing. Given the proposed solution uses an SQL sub-query for each path-component, the query generator can be adapted to support cases 2 and 3, with no expected performance impact.

Other operations can be accelerated

...

Query all devices using absolute cps path

In this case, a query that matches many device nodes using an absolute cps path is executed.

Fetch descendantsOmit DescendantsDirect DescendantsAll Descendants
Query/openroadm-devices/openroadm-device[@status="success"]/openroadm-devices/openroadm-device[@status="success"]/openroadm-devices/openroadm-device[@status="success"]
Comparison graph

Image Added

Image Added

Image Added

Graph detail

Image Added

Image Added

Image Added

Time complexity of
existing solution

Unclear, as there is very high variance. Likely O(N).O(N2)O(N2)
Time complexity of
proposed solution
O(N)O(N)O(N)

Possible improvements

Optimization: Index-only lookup when list key is used in leaf condition

...

This would allow for an index-only lookup of the node, without needing the LIKE comparison or the attribute check.

Optimization: Faster look-ups for lists and list elements

...

Given the bookstore model where books are stored in a list, and given a path query //books[@price=15], using proposed solution, this would use a LIKE 'books[' to find potential matches.

...

The use of the LIKE operator in this case would result in O(N) time, as every fragment may need to be compared against the LIKE pattern. There is a solution which may push this case toward constant O(1) performance, namely adding the list name as an indexed column to the fragments table - note only list elements need this field populated, container nodes can set this field to NULL:

idparent_idanchor_idxpathxpath_componentlist_name
1NULL3/bookstorebookstoreNULL
213/bookstore/categories[@code='1']categories[@code='1']categories
323/bookstore/categories[@code='1']/books[@title='Matilda']books[@title='Matilda']books
423/bookstore/categories[@code='1']/books[@title='The Gruffalo']books[@title=
'The Gruffalo']books

Our query could be modified to replace the LIKE with an indexed look up:

AND (xpath_component = 'books' OR list_name = 'books')

This would give rise to constant time look up in such cases.

Work Breakdown for Implementation

...

'The Gruffalo']books

Our query could be modified to replace the LIKE with an indexed look up:

AND (xpath_component = 'books' OR list_name = 'books')

This would give rise to constant time look up in such cases. If this solution is not implemented, users can work around this problem wrapping lists in container nodes, with the container being included in the descendant query, e.g. instead of //books[@title='Matilda'], change the model and use //books/book[@title='Matilda'] as a query. This would then have O(1) performance.

Note: if we wish to avoid creating additional database fields, using list/container name field would offer the best balance of performance, e.g.

idparent_idanchor_idxpathcontainer
1NULL3/bookstorebookstore
213/bookstore/categories[@code='1']categories
323/bookstore/categories[@code='1']/books[@title='Matilda']books
423/bookstore/categories[@code='1']/books[@title='The Gruffalo']books

In this case, attributes for any list element path component would need to be looked up (so it would no longer be index-only). My suspicion is that this would have better general performance than using xpath_component alone, but testing is needed to verify.

Cps Path Query capabilities can be extended

Presently, Cps Path queries limit leaf-condition, text-condition, etc. to the final path component. The proposed solution allows for future improvement where leaf conditions could be applied to any or all path components.

For example, given these queries:

  1. /bookstore/categories[@code='1']/books[@title='Matilda']
  2. /bookstore/categories[@name='SciFi']/books[@title='Matilda']
  3. /bookstore/categories/books[@title='Matilda']

Presently, only case 1 will return data (because 'code' is the list key for 'categories'), and cases 2 and 3 return nothing. Given the proposed solution uses an SQL sub-query for each path-component, the query generator can be adapted to support cases 2 and 3, with little expected performance impact. Note this feature would require storing the container/list name as an indexed field, as described in the previous section.

Other operations can be accelerated

The same algorithm to improve query performance could also be used to improve performance of other operations, such as GET, UPDATE, and DELETE data nodes. However, some of these existing operations have "plural" implementations taking Collections as parameters, such as getDataNodesForMultipleXpaths. Additional investigation is needed for these cases.

Work Breakdown for Implementation

Beside the work done in the PoC implementation, there is additional work for this change to be production-ready.  The main algorithm is mostly complete in the PoC (all integration tests are passing for the PoC). The existing PoC code can thus be refactored to make it production ready.

Fetch descendants

The proposed solution can broadly be broken into two major changes:

  1. Cps Path query using path-component look-up
  2. Using recursive SQL to fetch descendants of nodes returned from 1.

I proposed that the recursive SQL to fetch descendants be implemented first as an independent change, as it move worst-case complexity from O(N2) to O(N). By contrast, the query using path-component look-up will improve best-case performance from O(N) to O(1).

Cps Path Parser changes

The PoC uses String.split for parsing path components, which means paths containing '/' in the leaf condition are not parsed correctly, such as //books[@title='GNU/Linux']. CpsPathBuilder and CpsPathQuery classes from cps-path-parser module will need to be updated to provide the individual path components (in normalized form).

...