Honolulu stability test evolution
History
Before Frankfurt
Until Frankfurt there were 2 tests
stability test: vFw (then vFWCL) run continuously during 72h (https://docs.onap.org/en/elalto/submodules/integration.git/docs/integration-s3p.html?highlight=stability)
resilency test: test when we destroy some pods and retest that the use case vFw is still OK (only up to El Alto)
In frankfurt we also consider the stability of the installation through the Daily chains (https://docs.onap.org/projects/onap-integration/en/frankfurt/integration-s3p.html#integration-s3p)
Guilin
The stability tests considered for the release were:
1 week stability test based on basic_vm
1 day HC verification
Daily CI Guilin installation chain
See https://docs.onap.org/projects/onap-integration/en/guilin/integration-s3p.html#integration-s3p
Evolution for Honolulu
In Honolulu we would like to revisit the stability/resiliency testing part by introducing automated tests on CI weekly chain.
It means we want to execute tests over a week to verify the resiliency and the stability of the solution during the development life cycle.
Definition of the KPIs
what do we want to test, which figures? Nb of onboardings / instantiations? test duration//
In a first step, we estimate our needs to
10 parallel service onboarding - 10 simultaneous module upload in ONAP
50 parallel instantiation - 50 simultaneous service creations of service declared in ONAP
We could imagine additional KPIs
number of simultaneous loop creation/instantiation
number of dmaap messages
number of event messages
The question has been raised at the community level especially among the service provider operating ONAP in production.
Testcase 1: Parallel onboarding tests
Description
The goal of this test is to create in parallel several services in the SDC.
We estimate that this number is not very high in the reality of operations because it corresponds to the upload of a new service model, which does not occur frequently.
Environment
Tests executed from 07/01/2021 to 13/01/2021 on a Guilin lab. Reusing the basic_vm with different service names (it means that we recreate all the SDC objects VSP, VF Services).
2 series run several times:
5 simultaneous onboarding
10 simultaneous onboarding
The main component used for this test is the SDC (+AAI).
the reporting page can be described as follows:
The name of the service is basic_onboard_<Random string>, the random string is needed to ensure we reuse the onboarding mechanism (with the same name pythonsdk will retrieved the service already onboarded)
During the test we monitor the ONAP cluster resources through a prometheus/grafana:
Common Cassandra resource consumption:
Results
Data format is MM:SS
5 parallel onboarding (10 series)
criteria \ Serie | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Global |
---|---|---|---|---|---|---|---|---|---|---|---|
Success rate (%) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Min duration | 27:39 | 10:24 | 10:15 | 10:26 | 11:18 | 07:42 | 07:54 | 08:05 | 08:35 | 08:20 | 07:42 |
Max duration | 27:43 | 10:36 | 10:17 | 10:27 | 11:22 | 07:53 | 08:00 | 08:19 | 08:42 | 08:44 | 27:43 |
Average duration | 27:41 | 27:40 | 10:16 | 10:27 | 11:20 | 07:48 | 07:58 | 08:12 | 08:39 | 08:42 | 11:09 |
Median duration | 27:41 | 10:26 | 10:16 | 10:27 | 11:19 | 07:49 | 07:59 | 08:13 | 08:40 | 08:38 | 09:30 |
Comments/Errrors | / | / | / | / | / | / | / | / | / | / | / |
Evolution of the average duration in seconds over time for series of 5.
10 parallel onboarding (5 series)
criteria \ Serie | 1 | 2 | 3 | 4 | 5 | Global |
---|---|---|---|---|---|---|
Success rate (%) | 100 | 100 | 100 | 100 | 100 | 100 |
Min duration | 16:04 | 15:24 | 16:32 | 19:40 | 19:07 | 15:24 |
Max duration | 16:22 | 17:10 | 17:36 | 20:01 | 19:50 | 20:01 |
Average duration | 16:15 | 16:51 | 17:23 | 19:52 | 19:46 | 18:00 |
Median duration | 16:20 | 17:08 | 17:33 | 19:53 | 19:38 | 17:33 |
Comments/Errrors | / | / | / | / | / | / |
Evolution of the average duration in seconds over time for series of 10.
Evolution of test durations over the campaign for series of 5 (red/first circle) and 10 (green/second circle).
Conclusions
ONAP Guilin is able to support 10 parallel onboarding, which is what we do expect.
We may also observe that:
The number of previous onboarded services has no impact on the onboarding duration. The creation of resources is linear. It means that on serie 10, 9 services have been already created. We could have expected a linear increase of the onboarding duration because the client used for test list several times the services.So the more services in SDC, the bigger the list is. So globally the SDC resources increases continuously because we cannot delete them but it has no direct impact on the onboarding duration. The duration evolution is not linear and the duration may depend mainly on the cluster status.
The more // processing we have, the slower the onboarding this. duration = f(nb parallel onboarding) seems almost linear.