CPS-1126 : CmHandle creation performance degradation
References:
CPS-1126: CmHandle creation performance degradationClosed
Problem Statement
With previous versions we were able to register 10k cmhandles (in 4 threads) in 12 minutes.
With the current CPS master we sent the 10k cmHandles around 10 minutes to NCMP, so it is not faster than before. What I see is that NCMP starts getting modules in a background thread (seems to be one thread) and handling one cmhandle takes more seconds.
Currently in 12 minutes only 150 cmHandles got available with ready state in NCMP, because getting modules and changing the state is quite slow this way.
Expectation/ goal: cm-handle registration + module sync → 25 cm handles / sec (see CPS Interface Performance Requirements Overview)
Improvements Overview
Test Environment | Secs Taken for ModuleSyncWatchdog (Cm Handle moves ADVISED->READY) | Secs taken for | Commit id | Parrealel Request to DMI | Main improvement | Comments |
---|---|---|---|---|---|---|
Ericsson Testing cluster (with Model adapter that is implementation of dmi-plugin) Total 1458 cm Handles | 6,998 → 0.20/sec | 2,278 → 0.64/sec | 1410509e33142c0c79ff0e111c63abc47f5936d3 https://gerrit.onap.org/r/gitweb?p=cps.git;a=commit;h=1410509e33142c0c79ff0e111c63abc47f5936d3 | N/A (1) | N/A | Baseline as per the bug |
600 → 2.43/sec | 70 → 20.82/sec | c3596fe3a85381668598c75a1ffee4bf973a8518 | N/A (1) |
Note : Root cause : We have 3 watchdogs (2 @Scheduled into module sync and 1 into data sync) that was blocking each other as all @Scheduled methods share a single thread by default. ) | ||
540 → 2.70 /sec | 88(?!) → 16.57/sec | 6e97bc6ee9f1cb2722aafc42325918e98da5cff2 | N/A (1) |
| ||
10,000 → 4.17 / sec 10K in 40 mins | N/A | 0e7970a1ec12eac0f45fa016f98148da7a2266a8 | 10 | 1. Introduced timeout for connect and reading for RestTemplate while connecting dmi-plugin. 2. Modified data structure of moduleSyncSemaphore to hold YangModelCmHandle referance (cm handle id) as key of map 3. Working of Modulewatchdog is modified with parallelism and broken into async tasks. 4. ModuleSyncSemaphoreMap is now populated with advised cm handles and processed by Modulewatchdog by batch size (Dynamic batch size at cm registration). 5. Introduled parallelism level for async tasks that are suppiled. 6. Includes cm handle registration preformance code change. 7. Performance Improvement: Postgres Pool Configuration 8. Performance Improvement: Async Notification Pool Configuration. 9. Performance Improvement: Save/update batch of CM-Handle States in one operation. 10. Performance Improvement: Updating (many) child datanodes/datanode tree into one SQL operation 11. Performance Improvement: Native query to more efficiently get all yangResourceIds. 12. Performance Improvement: Detaching fragment entities early. 13. Performance Improvement: Decouple Temporal DB Notification from main thread. Note : It includes bottleneck of in my recursive method (@Joe K is working on it to improve. org.onap.cps.spi.impl.CpsDataPersistenceServiceImpl#toDataNode) | ||
Local Development Machine with CSIT docker deployment (with dmi-plugin that is implemented by CPS team) | 97+296 = 393 | 30 → 48.6 / sec | 5 | Removed YangModelCmHandle object as key of map and stores only the reference that is cm handle id. | Note :
(org.onap.cps.spi.impl.CpsDataPersistenceServiceImpl#toDataNode)@Joe K is working on it to improve. 2. dmi-plugin is able to handle only 5-8 cm handle per seconds concurrently. | |
125+378 = 503 | 33 → 45.45 / sec | 10 | ||||
246 +356 = 602 | 31 → 48.39 / sec | 30 | ||||
Local Development Machine wit h dmi-plugin-perf-stub | 10,000 10K in 35 mins | Not measured | 0e7970a1ec12eac0f45fa016f98148da7a2266a8 | 10 |
|
|
Test Configuration
Use Case
# | Configuration | Description |
---|---|---|
1 | Number of CPS Instance | 1 |
2 | Total number of threads to send each batch (Register cm handle) | 4 |
3 | Batch size | 100 |
4 | To register 1458 cm handles | Used to send 15 batchs (100 cm handles in each batch) |
CPS Parameters
# | Parameter | Value |
---|---|---|
1 | timers: advised-modules-sync: sleep-time-ms: | 30000 |
2 | timers: locked-modules-sync: sleep-time-ms: | 300000 |
3 | timers: cm-handle-data-sync: sleep-time-ms: | 30000 |
4 | data-sync: cache: enabled: | false |
Improvement Details
Performance matrix of improved methods that are called by ModuleSyncWatchdog or Cm handle registration API
Test Environment | Method | Sequential processing time | Parallel processing time(Task Parallelism Level = 30) | Number of DB calls reduced by parallel / batch processing | ||||
---|---|---|---|---|---|---|---|---|
Best Case | Worst Case | Average | Best Case | Worst Case | Average | |||
Local Development Machine with CSIT docker deployment (with dmi-plugin that is implemented by CPS team) | org.onap.cps.ncmp.api.impl.event.lcm.LcmEventsCmHandleStateHandlerImpl#registerNewCmHandle | |||||||
org.onap.cps.ncmp.api.inventory.InventoryPersistence#saveCmHandleStates | ||||||||
org.onap.cps.spi.impl.CpsDataPersistenceServiceImpl#addChildDataNodes | ||||||||
org.onap.cps.spi.impl.CpsDataPersistenceServiceImpl#replaceDataNodeTree | ||||||||
org.onap.cps.spi.repository.FragmentRepositoryCpsPathQueryImpl#findByAnchorAndCpsPath | ||||||||
org.onap.cps.spi.repository.SchemaSetYangResourceRepositoryImpl#insertSchemaSetIdYangResourceId | ||||||||
org.onap.cps.spi.repository.YangResourceNativeRepository#getResourceIdsByModuleNameAndRevision |
# | Operations | Average Time elapsed (in ~ ms) | |||
---|---|---|---|---|---|
1 | To save cm handle state with id | Batch #.Iteration # | Time elapsed (in ~ ms) | ||
1.1 | 293 | ||||
1.2 | 84 | ||||
2.1 | 14 | ||||
2.2 | 8 | ||||
Average Time elapsed per cm handle (in ~ ms) | 53 | ||||
2 | To synchronize, create schema set and anchor with id | Batch #.Iteration # | Time elapsed (in ~ ms) | ||
1.1 | 38393 | ||||
1.2 | 599 | ||||
2.1 | 237 | ||||
2.2 | 122 | ||||
Average Time elapsed per cm handle (in ~ ms) | 0.483 | ||||
3 | Delete schema set If exists with id | Batch #.Iteration # | Time elapsed (in ~ ms) | ||
1.1 | 17 | ||||
1.2 | 2 | ||||
2.1 | 2 | ||||
2.2 | 2 | ||||
Average Time elapsed per cm handle (in ~ ms) | 11.5 | ||||
4 | Get cm handle state with id | Batch #.Iteration # | Time elapsed (in ~ ms) | ||
1.1 | 83 | ||||
1.2 | 7 | ||||
2.1 | 10 | ||||
2.2 | 12 | ||||
Average Time elapsed per cm handle (in ~ ms) | 56 | ||||
5 | To get yang models | Batch # | Batch Size | Time elapsed (in ~ ms) | Time elapsed per cm handle (in ~ ms) |
1 | 601 | 63107 | 105.00 | ||
2 | 857 | 37337 | 43.56 | ||
Average Time elapsed per iteration (in ~ ms) | 74.28 | ||||
6 | To shuffle cm handles | Batch # | Batch Size | Time elapsed (in ~ ms) | |
1 | 601 | 0 | |||
2 | 857 | 0 | |||
Average Time elapsed per iteration (in ~ ms) | 0 | ||||
7 | Get / fetch cm handles by state ADVISED | Batch # | Batch Size | Time elapsed (in ~ ms) | Time elapsed per cm handle (in ~ ms) |
1 | 601 | 19404 | 32.28 | ||
2 | 857 | 3485 | 4.06 | ||
Average Time elapsed per iteration (in ~ ms) | 18.17 |