Table of Contents |
---|
Summary
This is an assessment of current performance of CPS operations, as of August 2023.
...
- NCMP CM-handle de-registration moved from average quadratic to linear time (constant rate). De-registering 20,000 CM-handles is thousands of times faster now.
- CPS Path Queries have moved from worst-case quadratic to linear time complexity. Queries that took days previously take minutes or seconds now.
- Memory usage during read, query, and update operations has been decreased by more than 90% (10 times less memory).
- Not only has performance increased, but reliability and robustness. It is now possible for a single batch operation to handle tens of thousands of elements, where it would have failed before.
...
A varying number of Open ROADM device nodes will be stored using CpsDataService::saveData.
Created device nodes | 1 | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 10001,000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fragments created | 86 | 8,600 | 17,200 | 25,800 | 34,400 | 43,000 | 51,600 | 60,200 | 68,800 | 77,400 | 86,000 | |
Time (seconds) | 0.30 | 2.36 | 4.36 | 7.15 | 9.76 | 11.50 | 14.77 | 18.43 | 19.79 | 22.16 | 26.54 | |
Fragments per second | 86 |
287 | 3,644 | 3,945 | 3,608 | 3,525 | 3,739 | 3,494 | 3,266 | 3,477 | 3,493 | 3,240 |
Graph of time taken to store device nodes
Observations
- Storing data nodes has linear time complexity (as expected).
- Raw performance is roughly 3000 3,500 fragments per second for the given test setup.
- Performance can be improved by enabling write batching (CPS-1795)
- There are edge cases with exponential complexity (adding books to the bookstore model).
...
The current database schema does not allow for Hibernate to use JDBC write batching. There is work worke in progress to address this.
...
There is work in progress to address this behavior using a custom algorithm using JDBC. The use of a graph database would implicitly fix such issues.
Updating data nodes
In this scenario, 1,000 Open ROADM device nodes are already defined. A number of these existing data nodes will be updated using CpsDataService::updateDataNodeAndDescendants.
...
Time (seconds)
...
69.46
...
Observations
- Updating data nodes has linear time complexity (as expected).
- Raw performance is roughly 600 fragments per second for the given model and test setup.
- Updating data nodes is 5 times slower than storing data nodes.
Commentary
This is the weak spot of all write operations. A custom algorithm comparing existing data against updated data (delta algorithm) may be required to improve performance here. This could also indicate that Hibernate is not being used effectively.
TODO mention json alphabetical sorting
See here: https://vladmihalcea.com/the-best-way-to-map-a-onetomany-association-with-jpa-and-hibernate/ for explanation on why this occurs. Basically, unidirectional mapping OneToMany suffers from poor performance due to the order in which Hibernate writes out the entities. Switching to either unidirectional ManyToOne, or bidirectional OneToMany and ManyToOne could solve the issue using pure Hibernate, but they come with drawbacks. (To Do: More details on this.)
Updating data nodes
NOTE: I have removed the previous test results here, as the test analysis was flawed. Analysis is ongoing as per CPS-1674.
Performance tests to support new analysis are provisionally available at: https://gerrit.nordix.org/c/onap/cps/+/19086
Updating data leaves
In this scenario, 1,000 Open ROADM OpenROADM device nodes are already defined. The data leaves of a number of these existing data nodes will be updated using CpsDataService::updateNodeLeaves.
...
Code Block | ||
---|---|---|
| ||
{ 'openroadm-device': [ {'device-id':'C201-7-1A-1', 'status':'fail', 'ne-state':'jeopardy'} ] } |
Test Results
...
Test Results
Updated device nodes | 1 | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1,000 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fragments updated | 1 | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 10001,000 | ||
Time (seconds) | 0.20 | 0.27 | 0.28 | 0.28 | 0.32 | 0.38 | 0.39 | 0.47 | 0.49 | 0.52 | 0.560.49 | 0.52 | 0.56 |
Fragments per second | 5 | 370 | 714 | 1,071 | 1,250 | 1,316 | 1,539 | 1,489 | 1,633 | 1,731 | 1,786 |
Graph of updating data leaves of device nodes
Observations
- Updating data leaves has linear time complexity.
- Raw performance is about 3000 around 1,500 fragments per second.
- This is very fast compared to updating whole data nodes, and should be preferred where possible.
...
Deleting data nodes
In this scenario, 300 Open ROADM OpenROADM device nodes are already defined. A number of these data nodes will be deleted using CpsDataService::deleteDataNodes. The types of nodes will be varied, for example, deleting container nodes, list elements, or whole lists.
Test results
N = | 50 | 100 | 150 | 200 | 250 | 300 | Example xpath |
---|---|---|---|---|---|---|---|
Delete top-level container node | - | - | - | - | - | 0.63 | /openroadm-devices |
Batch delete N/300 container nodes | 0.15 | 0.26 | 0.38 | 0.45 | 0.55 | 0.69 | /openroadm-devices/openroadm-device[@device-id='C201-7-1A-10']/org-openroadm-device |
Batch delete N/300 lists elements | 0.13 | 0.25 | 0.34 | 0.45 | 0.55 | 0.67 | /openroadm-devices/openroadm-device[@device-id='C201-7-1A-49'] |
Batch delete N/300 whole lists | 0.51 | 1.05 | 1.40 | 1.85 | 2.13 | 2.56 | /openroadm-devices/openroadm-device[@device-id='C201-7-1A-293']/org-openroadm-device/degree |
Try batch delete N/300 non-existing | 0.25 | 0.54 | 0.67 | 0.95 | 1.15 | 1.32 | /path/to/non-existing/node[@id='27'] |
...
- Delete performance is linear on the amount of data being deleted (as expected).
- Raw performance of deleting container nodes is around 4035,000 fragments per second. (So we can delete data nodes more than around 10x faster than creating them.)
- Deleting lists is much slower than deleting the parent container of the list (this can be improved).
- Of note, attempting to delete non-existing data nodes takes longer than actually deleting the equivalent amount of nodes with descendants - it is a slow operation.
...
Reading the top-level container node with no descendants:
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000nodes | 500 | 1,000 | 1,500 | 2,000 | 2,500 | 3,000 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fragments read | 1 | 1 | 1 | 1 | 1 | 1 | ||||||
Time (milliseconds) | 47 | 52 | 48 | 56 | 48 | 47 |
The above data clearly indicates constant time.
Reading the top-level container node with all descendants:
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 30001,000 | 1,500 | 2,000 | 2,500 | 3,000 |
---|---|---|---|---|---|---|---|---|---|---|
Fragments read | 43,000 | 86,000 | 129,000 | 172,000 | 215,000 | 258,000 | ||||
Time (seconds) | 0.42 | 1.19 | 1.54 | 2.16 | 2.53 | 2.67 | ||||
Fragments per second | 102,381 | 72,269 | 83,766 | 79,630 | 85,657 | 96,629 |
e
Graph of time taken to read top-level container node with all descendants
Observations
- Reading a single top-level container node with no descendants has constant time (as expected).
- Reading a single top-level container node with all descendants has linear time (as expected).
- Raw performance of reading with all descendants is roughly 100,000 fragments per second.
...
This test uses CpsDataService::getDataNodesForMultipleXpaths with all descendants to retrieve a varying number of Open ROADM device nodes.
Test results
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
---|---|---|---|---|---|---|
Time (seconds) | 0.61 | 1.15 | 1.52 | 2.14 | 2.96 | 3.97 |
Special case: attempting to read multiple non-existing data nodes
In this case, we attempt to read many non-existing data nodes:
...
The above case appears to be constant time, but theoretically must be linear - we'll call it negligible time.
device nodes | 500 | 1,000 | 1,500 | 2,000 | 2,500 | 3,000 |
---|---|---|---|---|---|---|
Fragments read | 43,000 | 86,000 | 129,000 | 172,000 | 215,000 | 258,000 |
Time (seconds) | 0.61 | 1.15 | 1.52 | 2.14 | 2.96 | 3.97 |
Observations
- Reading many data nodes with all descendants has linear time (as expected).
- Attempting to read many non-existing data nodes takes negligible time.
- Raw performance of reading many with all descendants is roughly 80,000 fragments per second.
...
Recently, functionality was added to enable reading whole lists (CPS-1696). Here we compare performance of reading a container node containing a list, versus reading the list (with all descendants).
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 30001,000 | 1,500 | 2,000 | 2,500 | 3,000 | xpath |
---|---|---|---|---|---|---|---|---|---|---|---|
Reading container | 0.386 | 0.712 | 1.529 | 2.667 | 1.759 | 3.112 | /openroadm-devices | ||||
Reading list | 0.585 | 1.335 | 2.036 | 2.860 | 2.769 | 3.949 | /openroadm-devices/openroadm-device |
...
CM-handle registration
Release | DateCmHandles | CM-Handles removed | 100 | 500 | 1000 | 2000 | 5000 | 10000 | 20000 | 400001,000 | 2,000 | 5,000 | 10,000 | 20,000 | 40,000 | Comments |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kohn | Oct 2022 | Time taken | 8 sec | 16 sec | 19 sec | 33 sec | 1 min | 3 min | ERROR | ERROR | Error due to DB running out shared memory | |||||
London | May 2023 | Time taken | 6 sec | 7 sec | 12 sec | 22 sec | 1 min | 2 3 min | 10 min | 32 min | ||||||
current | Aug 2023 | Time taken | 6 sec | 7 sec | 10 sec | 16 sec | 31 sec | 57 sec | 71 sec | 108 sec |
...
CM-handle de-registration
Release | DateCmHandles | CM-Handles removed | 100 | 500 | 1000 | 2000 | 5000 | 10000 | 20000 | 400001,000 | 2,000 | 5,000 | 10,000 | 20,000 | 40,000 | Comments |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kohn | Oct 2022 | Time taken | 7 sec | 2 min | 7 min | 25 min | 2.5 hour | est: 10 hour | est: 2 days | est: 7 days | Some values estimated due to time constraints | |||||
London | May 2023 | Time taken | < 1 sec | 2 sec | 3 sec | 5 sec | 17 sec | 37 sec | 85 sec | ERROR | Error due to 32K limit | |||||
current | Aug 2023 | Time taken | < 1 sec | 1 sec | 3 sec | 4 sec | 14 sec | 23 sec | 65 sec | 2 min |
Current release has exactly linear performance for CM-handle de-registration (on a single thread):
CM-handles Removed | Time (sec) | CM-handles/sec |
---|---|---|
500 | 1.5 | 327 |
1000 | 2.6 | 377 |
5000 | 13.2 | 377 |
10000 | 25.9 | 385 |
20000 | 56.1 | 356 |
Commentary
CM-handle de-registration is a synchronous operation. Since the above tests send a single large batch, the operation is performed in a single thread. The linear performance reflects this. If the operation was broken into many smaller requests, the expectation is that it would complete many times faster - but this has not been tested.