CPS Core Performance
Test Environment
TODO: Laptop specs
The performance tests are written in Groovy (a JVM language). As all CPS Core operations are synchronous, the results here are to be considered as single-threaded operation only.
Test data
Test data used complies with Open ROADM YANG model. Specifically, openroadm-device nodes consisting of 86 fragments are created. For example, a test that creates 1000 device nodes will result in 86,000 fragments in the database. Some tests use up to 3000 device nodes (258,000 fragments - equivalent to around 20,000 CM-handles in NCMP), with four anchors replicating the data, meaning that the system has been tested up to 1 million fragments. Not all results are displayed on this page, but are included in the attached spreadsheet.
Storing data nodes
A varying number of Open ROADM device nodes will be stored using CpsDataService::saveData.
Created device nodes | 1 | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
Time (seconds) | 0.295 | 2.36 | 4.36 | 7.15 | 9.76 | 11.50 | 14.77 | 18.43 | 19.79 | 22.16 | 26.54 |
Observations
- Storing data nodes has linear time complexity (as expected).
- Raw performance is roughly 3000 fragments per second for the given test setup.
- Performance can be improved by enabling write batching (CPS-1795)
- There are edge cases with exponential complexity.
Updating data nodes
In this scenario, 1000 Open ROADM device nodes are already defined. A number of these existing data nodes will be updated using CpsDataService::updateDataNodeAndDescendants.
Updated device nodes | 1 | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
Time (seconds) | 0.215 | 12.79 | 28.38 | 44.23 | 51.55 | 69.46 | 85.67 | 95.02 | 109.16 | 117.00 | 131.15 |
Observations
- Updating data nodes has linear time complexity (as expected).
- Raw performance is roughly 600 fragments per second for the given model and test setup.
- Updating data nodes is 5 times slower than storing data nodes.
Updating data leaves
In this scenario, 1000 Open ROADM device nodes are already defined. The data leaves of a number of these existing data nodes will be updated using CpsDataService::updateNodeLeaves.
Example JSON payload for updating data leaves for one device:
{ 'openroadm-device': [ {'device-id':'C201-7-1A-1', 'status':'fail', 'ne-state':'jeopardy'} ] }
Test Results
Updated device nodes | 1 | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
Time (seconds) | .201 | .266 | .276 | .280 | .317 | .379 | .385 | .465 | .485 | .520 | .561 |
Observations
- Updating data leaves has linear time complexity.
- Raw performance is about 3000 fragments per second.
- This is very fast compared to updating whole data nodes. I recommend that NCMP use this API for updating CM-handle state.
Deleting data nodes
In this scenario, 300 Open ROADM device nodes are already defined. A number of these data nodes will be deleted using CpsDataService::deleteDataNodes. The types of nodes will be varied, for example, deleting container nodes, list elements, or whole lists.
Test results
N | 50 | 100 | 150 | 200 | 250 | 300 | Example xpath |
Delete top-level container node | - | - | - | - | - | 0.630 | /openroadm-devices |
Batch delete N/300 container nodes | 0.150 | 0.261 | 0.377 | 0.453 | 0.553 | 0.686 | /openroadm-devices/openroadm-device[@device-id='C201-7-1A-10']/org-openroadm-device |
Batch delete N/300 lists elements | 0.132 | 0.248 | 0.338 | 0.449 | 0.545 | 0.670 | /openroadm-devices/openroadm-device[@device-id='C201-7-1A-49'] |
Batch delete N/300 whole lists | 0.509 | 1.054 | 1.401 | 1.848 | 2.134 | 2.555 | /openroadm-devices/openroadm-device[@device-id='C201-7-1A-293']/org-openroadm-device/degree |
Try batch delete N/300 non-existing | 0.250 | 0.535 | 0.667 | 0.951 | 1.145 | 1.318 | /path/to/non-existing/node[@id='27'] |
Observations
- Delete performance is linear on the amount of data being deleted (as expected).
- Raw performance of deleting containers of list elements is around 40,000 fragments per second. (So we can delete data nodes around 10x faster than creating them.)
- Deleting lists is much slower than deleting the parent container of the list (which can be easily improved).
- Of note, attempting to delete non-existing data nodes takes longer than actually deleting the equivalent amount of nodes with descendants - it is a slow operation.
Suggested improvement: For whole list deletion, add a condition to the WHERE clause in the SQL for deleting lists, to narrow the search space to children of the parent. For example:
SELECT * FROM fragment WHERE (existing conditions)AND parent_id = (SELECT id FROM fragment WHERE xpath = '/parent-xpath')
This should narrow the performance gap in this case.
Reading data nodes
In these tests, a varying number of Open ROADM devices are created and retrieved.
Reading top-level container node
In this test, CpsDataService::getDataNodes is used to retrieve the top-level container node.
Test results
Reading the top-level container node with no descendants:
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Time (milliseconds) | 47 | 52 | 48 | 56 | 48 | 47 |
The above data clearly indicates constant time.
Reading the top-level container node with all descendants:
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Time (seconds) | 0.423 | 1.189 | 1.536 | 2.159 | 2.526 | 2.696 |
Observations
- Reading a single top-level container node with no descendants has constant time (as expected).
- Reading a single top-level container node with all descendants has linear time (as expected).
- Raw performance of reading with all descendants is roughly 100,000 fragments per second.
Reading data nodes for multiple xpaths
This test uses CpsDataService::getDataNodesForMultipleXpaths with all descendants to retrieve a varying number of Open ROADM device nodes.
Test results
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Time (seconds) | 0.619 | 1.151 | 1.522 | 2.136 | 2.957 | 3.965 |
Special case: attempting to read multiple non-existing data nodes
In this case, we attempt to read many non-existing data nodes:
Total devices nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 |
Time (milliseconds) | 10 | 10 | 9 | 9 | 7 | 8 |
The above case appears to be constant time, but theoretically must be linear - we'll call it negligible time.
Observations
- Reading many data nodes with all descendants has linear time (as expected).
- Attempting to read many non-existing data nodes takes negligible time.
- Raw performance of reading many with all descendants is roughly 80,000 fragments per second.
Additional test cases: Reading container node versus list
Recently, functionality was added to enable reading whole lists (CPS-1696). Here we compare performance of reading a container node containing a list, versus reading the list (with all descendants).
Total device nodes | 500 | 1000 | 1500 | 2000 | 2500 | 3000 | xpath |
Reading container | 0.386 | 0.712 | 1.529 | 2.667 | 1.759 | 3.112 | /openroadm-devices |
Reading list | 0.585 | 1.335 | 2.036 | 2.860 | 2.769 | 3.949 | /openroadm-devices/openroadm-device |
As can be seen, it is slower reading the list than reading the parent node containing the list.
Suggested improvement: Add a condition to the WHERE clause in the SQL for reading lists, to narrow the search space to children of the parent. For example:
SELECT * FROM fragment WHERE (existing conditions)AND parent_id = (SELECT id FROM fragment WHERE xpath = '/parent-xpath')
This should narrow the performance gap in this case.
Cps Path Queries
TODO
Comparison of NCMP Performance across versions
CM-handle registration
Release | Date | CmHandles | 100 | 500 | 1000 | 2000 | 5000 | 10000 | 20000 | 40000 | Comments |
---|---|---|---|---|---|---|---|---|---|---|---|
Kohn | Oct 2022 | Time | 8 sec | 16 sec | 17 sec | 33 sec | 1 min | 3 min | ERROR | ERROR | Error due to DB running out shared memory |
London | May 2023 | Time | 6 sec | 7 sec | 12 sec | 22 sec | 1 min | 2 min | 10 min | 32 min | |
current | Aug 2023 | Time | 6 sec | 7 sec | 10 sec | 16 sec | 31 sec | 57 sec | 71 sec | 108 sec |
CM-handle registration is multi-threaded, so performance may appear to scale better than linear, until the CPU cores are maxed out.
As can be seen below, CPU usage never reached 100% during the tests of the current version.
CM-handle deregistration
Release | Date | CmHandles | 100 | 500 | 1000 | 2000 | 5000 | 10000 | 20000 | 40000 | Comments |
---|---|---|---|---|---|---|---|---|---|---|---|
Kohn | Oct 2022 | Time | 7 sec | 2 min | 7 min | 25 min | 2.5 hour | est: 10 hour | est: 2 days | est: 7 days | Some values estimated due to time constraints |
London | May 2023 | Time | < 1 sec | 2 sec | 3 sec | 5 sec | 17 sec | 37 sec | 85 sec | ERROR | Error due to 32K limit |
current | Aug 2023 | Time | < 1 sec | 1 sec | 3 sec | 4 sec | 14 sec | 23 sec | 65 sec | 2 min |
Current release has exactly linear performance for CM-handle de-registration (on a single thread):
Removed | Time (sec) | CM-handles/sec |
---|---|---|
500 | 1.53 | 327 |
1000 | 2.65 | 377 |
5000 | 13.26 | 377 |
10000 | 25.93 | 385 |
20000 | 56.15 | 356 |