Table of Contents |
---|
...
The current implementation of Cps Path queries relies on regular expressions in the generated SQL queries.
SQL query using regular expression on the full path
For example, given this Cps Path:
...
A new solution is being proposed where each path component of the Cps Path is looked up in constant time, thus addressing performance issues.
Objective
...
The objective is to achieve the maximum best possible performance for queries.
- For a query that will return 1 out of N items, the best possible time complexity is constant, O(1).
- For a query that will return all N items, the best possible time complexity is linear, O(N).
Implementation Proposal
An additional fragment table field is needed to store path components
The new approach involves adding a column to the Fragment table, storing the last path component (called xpath_component here). The new column is indexed, to allow constant-time lookups.
id | parent_id | anchor_id | xpath | xpath_component |
---|---|---|---|---|
1 | NULL | 3 | /bookstore | bookstore |
2 | 1 | 3 | /bookstore/categories[@code='1'] | categories[@code='1'] |
3 | 2 | 3 | /bookstore/categories[@code='1']/books[@title='Matilda'] | books[@title='Matilda'] |
4 | 2 | 3 | /bookstore/categories[@code='1']/books[@title='The Gruffalo'] | books[@title='The Gruffalo'] |
SQL query for path-component lookup
For example, given this CPS Path:
...
In addition, a new algorithm for fetching descendants is implemented.
Fetching descendants using Recursive SQL query
...
The proposed solution will use recursive SQL to fetch descendants of fragment returned from the CPS path query:
...
Existing solution | Proposed solution |
---|---|
Because the existing solution uses a regex, all nodes in anchor 3 need to be examined to see if they match the regex. | Because the proposed solution uses sub-queries to look up each path component, only relevant nodes are examined. |
|
|
If the number of nodes is doubled, the number of xpaths checked against the regex is also doubled (linear complexity). | If the number of nodes is doubled, the same number of nodes is still scanned in this case (constant complexity). |
Proof of Concept
A proof of concept (poc) implementation was developed so that performance could be compared against existing Cps Path query algorithms.
...
The test timings used to generate the charts in this document is available as a spreadsheet: CPS-1646 CpsPath queries using path-component lookup.xlsx
Detailed Performance Comparison
Query one device out of many (1 out of N)
...
Case | Query one out of many using descendant cps path | Query one out of many using absolute cps path |
---|---|---|
Query | //openroadm-device[@device-id="C201-7-1A-14"] | /openroadm-devices/openroadm-device[@device-id="C201-7-1A-19"] |
Comparison graph | ||
Graph detail | ||
Time complexity of | O(N) | O(N) |
Time complexity of proposed solution | O(N) | O(1) |
...
Fetch descendants | Omit Descendants | Direct Descendants | All Descendants |
---|---|---|---|
Query | /openroadm-devices/openroadm-device[@status="success"] | /openroadm-devices/openroadm-device[@status="success"] | /openroadm-devices/openroadm-device[@status="success"] |
Comparison graph | |||
Graph detail | |||
Time complexity of | Unclear, as there is very high variance. Likely O(N). | O(N2) | O(N2) |
Time complexity of proposed solution | O(N) | O(N) | O(N) |
Possible improvements
...
Cps Path Query capabilities can be extended
...