CPS-2478: Module Sync Inefficiencies
References
Issues & Decisions
Issue | Notes | Decision | |
---|---|---|---|
1 | Calls to DB for modules (check existing Tag) | Could easily cache Module Set Tag in memory to reduce this | @Toine Siebelink Nov 18, 2024 Implemented as part of https://gerrit.onap.org/r/c/cps/+/139344 |
2 | First batch (on each thread) calls to DMI for same Tag | Use cache from #1 or store first cm Handle in DB immediately instead of as part of Batch | @Toine Siebelink Nov 18, 2024 PoC-ed as part of https://gerrit.onap.org/r/c/cps/+/139344 but then replaced with Distributed Hazelcast Set instead |
3 | Store new schema set for each cm handle (instead of Tag) | Use schema set concept (in CPS Core) to only store each new Module Set Tag once. This seems the correct usage of Schema Set concept and wil have the greatest performance benefit. This requires more costly and difficult solution as NCMP code is develop assuming each cm handle schema set name is the same as its id.
| @Toine Siebelink Nov 18, 2024 No considered as part of this User Story. Create a new Technical Debt Jira instead: CPS-2506: Use ModuleSetTag for SchemaSet Names during NCMP registrationClosed |
Analysis
A small Spock&Groovy integration test as been created to sync a few hundred cm handles with multiple threads. See https://gerrit.onap.org/r/c/cps/+/139344
Test Setup
Parameter | Value | Notes |
---|---|---|
Cm Handles | 500 |
|
Module Set Tags | 2 | 250 CM Handles Each |
Worker Threads (parallelism) | 2 |
|
Environment | Windows 11. 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz |
|
Registration: Measurements Before & After PoC
Method | Before (avg. 4 runs) | After (avg. 6 runs) | Notes (improvements) | ||||
---|---|---|---|---|---|---|---|
# Calls | Time Spent (ms) | % | # Calls | Time Spent (ms) | % |
| |
query module references | 500 | 1,017 | 7% | 2 | 5 | 0% | Used ‘privateModuleSetCache’ map to locally store required data on each thread. Data discarded when thread finishes but this eliminates vast majority of DB calls. |
get modules from DMI | 100-200 | 1,326 | 9% | 2 | 13 | 0% | Use a Hazelcast distributed Set: ‘moduleSetTagsBeingProcessed’ to prevent multiple threads/instances attempting to process the same new tag. |
store schema set | 500 | 10,449 | 73% | 500 | 5,156 | 86% | 2 x faster. Probably due to less contention with read queries |
update states | 5+ | 1,429 | 10% | 5+ | 833 | 14% | 1.7 x faster |
Total | 14,221 |
|
| 6,006 |
| > 2 x faster! |
Registration: Extrapolated Results for 20,000 Nodes and DMI Delay
below figures are calculated by multiplying the total time and adding fix delays for DMI requests
Methods | Before | After | Notes | ||
---|---|---|---|---|---|
Time Spent (ms) | % | Time Spent (ms) | % | ||
query module references | 40,670 | 7% | 5 | 0% |
|
get modules from DMI | 54,050 | 9% | 2,667 | 1% | add 200ms delay for first 10 batch of 100 |
store schema set | 417,960 | 73% | 206,220 | 85% |
|
update states | 57,170 | 10% | 33,313 | 14% |
|
Total | 569,850 | ~9m30s | 242,205 | ~4m3s | Need to add 2 minutes for initial delay: ~6m ~ 55 CM Handles/sec |
Registration: K6 Historical and current results, detailed analysis (excell)
De-Registration: Test Measurements With and Without Orphanage removal
During de-registration the system remove ‘orphaned’ yang resources ie modules that are not in use any more after each batch using an expensive query.
Instrumentation showed a very high percentage of time was spent in this method when >20K cm handles were added to the system. This is because of the exponential growth of the relations between modules (schema sets) and yang resources.
The idea of this experiment/PoC is to no longer do this deletion during de-registration for each bath but do it on a much less frequent basis e.g. system start-up. In practice a module will rarely become orphanaged and even if it does there is no harm for that data to be present in the DB until the next restart of the system.
To see the impact of this change I simply removed the relevant call in the de-registration algorithm (during schema set deletion). The effect is very small (within the margin of error) for the original test sample size of 500. SO I temporally increased the sample size to 20,000 cm handles and record the following data:
Before | After (without orphan removal) | Notes (improvements) | ||
---|---|---|---|---|
Milliseconds | CM Handles/Sec | Milliseconds | CM Handles/Sec |
|
144,720 | 138 | 54,225 | 369 |
|
151,189 | 132 | 57,061 | 351 |
|
141,082 | 142 | 53,981 | 371 |
|
Average | 137 | Average | 363 | > 2.6 x faster! |