CPS Core Performance

Test Environment

TODO: Laptop specs

The performance tests are written in Groovy (a JVM language). As all CPS Core operations are synchronous, the results here are to be considered as single-threaded operation only.

Test data

Test data used complies with Open ROADM YANG model. Specifically, openroadm-device nodes consisting of 86 fragments are created. For example, a test that creates 1000 device nodes will result in 86,000 fragments in the database. Some tests use up to 3000 device nodes (258,000 fragments - equivalent to around 20,000 CM-handles in NCMP), with four anchors replicating the data, meaning that the system has been tested up to 1 million fragments. Not all results are displayed on this page, but are included in the attached spreadsheet.

Storing data nodes

A varying number of Open ROADM device nodes will be stored using CpsDataService::saveData.

Created device nodes	1	100	200	300	400	500	600	700	800	900	1000
Time (seconds)	0.295	2.36	4.36	7.15	9.76	11.50	14.77	18.43	19.79	22.16	26.54

Observations

Storing data nodes has linear time complexity (as expected).
Raw performance is roughly 3000 fragments per second for the given test setup.
Performance can be improved by enabling write batching (CPS-1795)
There are edge cases with exponential complexity.

Updating data nodes

In this scenario, 1000 Open ROADM device nodes are already defined. A number of these existing data nodes will be updated using CpsDataService::updateDataNodeAndDescendants.

Updated device nodes	1	100	200	300	400	500	600	700	800	900	1000
Time (seconds)	0.215	12.79	28.38	44.23	51.55	69.46	85.67	95.02	109.16	117.00	131.15

Observations

Updating data nodes has linear time complexity (as expected).
Raw performance is roughly 600 fragments per second for the given model and test setup.
Updating data nodes is 5 times slower than storing data nodes.

Updating data leaves

In this scenario, 1000 Open ROADM device nodes are already defined. The data leaves of a number of these existing data nodes will be updated using CpsDataService::updateNodeLeaves.

Example JSON payload for updating data leaves for one device:

{
    'openroadm-device': [
        {'device-id':'C201-7-1A-1', 'status':'fail', 'ne-state':'jeopardy'}
    ]
}

Test Results

Updated device nodes	1	100	200	300	400	500	600	700	800	900	1000
Time (seconds)	.201	.266	.276	.280	.317	.379	.385	.465	.485	.520	.561

Observations

Updating data leaves has linear time complexity.
Raw performance is about 3000 fragments per second.
This is very fast compared to updating whole data nodes. I recommend that NCMP use this API for updating CM-handle state.

Deleting data nodes

In this scenario, 300 Open ROADM device nodes are already defined. A number of these data nodes will be deleted using CpsDataService::deleteDataNodes. The types of nodes will be varied, for example, deleting container nodes, list elements, or whole lists.

Test results

N	50	100	150	200	250	300	Example xpath
Delete top-level container node	-	-	-	-	-	0.630	/openroadm-devices
Batch delete N/300 container nodes	0.150	0.261	0.377	0.453	0.553	0.686	/openroadm-devices/openroadm-device[@device-id='C201-7-1A-10']/org-openroadm-device
Batch delete N/300 lists elements	0.132	0.248	0.338	0.449	0.545	0.670	/openroadm-devices/openroadm-device[@device-id='C201-7-1A-49']
Batch delete N/300 whole lists	0.509	1.054	1.401	1.848	2.134	2.555	/openroadm-devices/openroadm-device[@device-id='C201-7-1A-293']/org-openroadm-device/degree
Try batch delete N/300 non-existing	0.250	0.535	0.667	0.951	1.145	1.318	/path/to/non-existing/node[@id='27']

Observations

Delete performance is linear on the amount of data being deleted (as expected).
Raw performance of deleting containers of list elements is around 40,000 fragments per second. (So we can delete data nodes around 10x faster than creating them.)
Deleting lists is much slower than deleting the parent container of the list (which can be easily improved).
Of note, attempting to delete non-existing data nodes takes longer than actually deleting the equivalent amount of nodes with descendants - it is a slow operation.

Suggested improvement: For whole list deletion, add a condition to the WHERE clause in the SQL for deleting lists, to narrow the search space to children of the parent. For example:

SELECT * FROM fragment WHERE (existing conditions)

  AND parent_id = (SELECT id FROM fragment WHERE xpath = '/parent-xpath')

This should narrow the performance gap in this case.

Reading data nodes

In these tests, a varying number of Open ROADM devices are created and retrieved.

Reading top-level container node

In this test, CpsDataService::getDataNodes is used to retrieve the top-level container node.

Test results

Reading the top-level container node with no descendants:

Total device nodes	500	1000	1500	2000	2500	3000
Time (milliseconds)	47	52	48	56	48	47

The above data clearly indicates constant time.

Reading the top-level container node with all descendants:

Total device nodes	500	1000	1500	2000	2500	3000
Time (seconds)	0.423	1.189	1.536	2.159	2.526	2.696

Observations

Reading a single top-level container node with no descendants has constant time (as expected).
Reading a single top-level container node with all descendants has linear time (as expected).
Raw performance of reading with all descendants is roughly 100,000 fragments per second.

Reading data nodes for multiple xpaths

This test uses CpsDataService::getDataNodesForMultipleXpaths with all descendants to retrieve a varying number of Open ROADM device nodes.

Test results

Total device nodes	500	1000	1500	2000	2500	3000
Time (seconds)	0.619	1.151	1.522	2.136	2.957	3.965

Special case: attempting to read multiple non-existing data nodes

In this case, we attempt to read many non-existing data nodes:

Total devices nodes	500	1000	1500	2000	2500	3000
Time (milliseconds)	10	10	9	9	7	8

The above case appears to be constant time, but theoretically must be linear - we'll call it negligible time.

Observations

Reading many data nodes with all descendants has linear time (as expected).
Attempting to read many non-existing data nodes takes negligible time.
Raw performance of reading many with all descendants is roughly 80,000 fragments per second.

Additional test cases: Reading container node versus list

Recently, functionality was added to enable reading whole lists (CPS-1696). Here we compare performance of reading a container node containing a list, versus reading the list (with all descendants).

Total device nodes	500	1000	1500	2000	2500	3000	xpath
Reading container	0.386	0.712	1.529	2.667	1.759	3.112	/openroadm-devices
Reading list	0.585	1.335	2.036	2.860	2.769	3.949	/openroadm-devices/openroadm-device

As can be seen, it is slower reading the list than reading the parent node containing the list.

Suggested improvement: Add a condition to the WHERE clause in the SQL for reading lists, to narrow the search space to children of the parent. For example:

SELECT * FROM fragment WHERE (existing conditions)

  AND parent_id = (SELECT id FROM fragment WHERE xpath = '/parent-xpath')

This should narrow the performance gap in this case.

Cps Path Queries

TODO

Comparison of NCMP Performance across versions

CM-handle registration

Release	Date	CmHandles	100	500	1000	2000	5000	10000	20000	40000	Comments
Kohn	Oct 2022	Time	8 sec	16 sec	17 sec	33 sec	1 min	3 min	ERROR	ERROR	Error due to DB running out shared memory
London	May 2023	Time	6 sec	7 sec	12 sec	22 sec	1 min	2 min	10 min	32 min
current	Aug 2023	Time	6 sec	7 sec	10 sec	16 sec	31 sec	57 sec	71 sec	108 sec

CM-handle registration is multi-threaded, so performance may appear to scale better than linear, until the CPU cores are maxed out.

As can be seen below, CPU usage never reached 100% during the tests of the current version.

CM-handle deregistration

Release	Date	CmHandles	100	500	1000	2000	5000	10000	20000	40000	Comments
Kohn	Oct 2022	Time	7 sec	2 min	7 min	25 min	2.5 hour	est: 10 hour	est: 2 days	est: 7 days	Some values estimated due to time constraints
London	May 2023	Time	< 1 sec	2 sec	3 sec	5 sec	17 sec	37 sec	85 sec	ERROR	Error due to 32K limit
current	Aug 2023	Time	< 1 sec	1 sec	3 sec	4 sec	14 sec	23 sec	65 sec	2 min

Current release has exactly linear performance for CM-handle de-registration (on a single thread):

Removed	Time (sec)	CM-handles/sec
500	1.53	327
1000	2.65	377
5000	13.26	377
10000	25.93	385
20000	56.15	356

Montreal Read/Write Performance

Table of Contents

CPS Core Performance

Test Environment

Storing data nodes

Updating data nodes

Updating data leaves

Deleting data nodes

Reading data nodes

Reading top-level container node

Reading data nodes for multiple xpaths

Additional test cases: Reading container node versus list

Cps Path Queries

Comparison of NCMP Performance across versions

CM-handle registration

CM-handle deregistration