Honolulu stability test evolution

History

Before Frankfurt

Until Frankfurt there were 2 tests

In frankfurt we also consider the stability of the installation through the Daily chains (https://docs.onap.org/projects/onap-integration/en/frankfurt/integration-s3p.html#integration-s3p)

Guilin

The stability tests considered for the release were:

  • 1 week stability test based on basic_vm

  • 1 day HC verification

  • Daily CI Guilin installation chain

See https://docs.onap.org/projects/onap-integration/en/guilin/integration-s3p.html#integration-s3p

Evolution for Honolulu

In Honolulu we would like to revisit the stability/resiliency testing part by introducing automated tests on CI weekly chain.

It means we want to execute tests over a week to verify the resiliency and the stability of the solution during the development life cycle.

Definition of the KPIs

what do we want to test, which figures? Nb of onboardings / instantiations? test duration//

In a first step, we estimate our needs to

  • 10 parallel service onboarding - 10 simultaneous module upload in ONAP

  • 50 parallel instantiation - 50 simultaneous service creations of service declared in ONAP

We could imagine additional KPIs

  • number of simultaneous loop creation/instantiation

  • number of dmaap messages

  • number of event messages

The question has been raised at the community level especially among the service provider operating ONAP in production.

Testcase 1: Parallel onboarding tests

Description

The goal of this test is to create in parallel several services in the SDC.

We estimate that this number is not very high in the reality of operations because it corresponds to the upload of a new service model, which does not occur frequently.

Environment

Tests executed from 07/01/2021 to 13/01/2021 on a Guilin lab. Reusing the basic_vm with different service names (it means that we recreate all the SDC objects VSP, VF Services).

2 series run several times:

  • 5 simultaneous onboarding

  • 10 simultaneous onboarding

The main component used for this test is the SDC (+AAI).

the reporting page can be described as follows:



The name of the service is basic_onboard_<Random string>, the random string is needed to ensure we reuse the onboarding mechanism (with the same name pythonsdk will retrieved the service already onboarded)



During the test we monitor the ONAP cluster resources through a prometheus/grafana:





Common Cassandra resource consumption:

Results

Data format is MM:SS

5 parallel onboarding (10 series)

criteria \ Serie

1

2

3

4

5

6

7

8

9

10

Global

criteria \ Serie

1

2

3

4

5

6

7

8

9

10

Global

Success rate (%)

100

100

100

100

100

100

100

100

100

100

100

Min duration

27:39

10:24

10:15

10:26

11:18

07:42

07:54

08:05

08:35

08:20

07:42

Max duration

27:43

10:36

10:17

10:27

11:22

07:53

08:00

08:19

08:42

08:44

27:43

Average duration

27:41

27:40

10:16

10:27

11:20

07:48

07:58

08:12

08:39

08:42

11:09

Median duration

27:41

10:26

10:16

10:27

11:19

07:49

07:59

08:13

08:40

08:38

09:30

Comments/Errrors

/

/

/

/

/

/

/

/

/

/

/



Evolution of the average duration in seconds over time for series of 5.





10 parallel onboarding (5 series)

criteria \ Serie

1

2

3

4

5

Global

criteria \ Serie

1

2

3

4

5

Global

Success rate (%)

100

100

100

100

100

100

Min duration

16:04

15:24

16:32

19:40

19:07

15:24

Max duration

16:22

17:10

17:36

20:01

19:50

20:01

Average duration

16:15

16:51

17:23

19:52

19:46

18:00

Median duration

16:20

17:08

17:33

19:53

19:38

17:33

Comments/Errrors

/

/

/

/

/

/



Evolution of the average duration in seconds over time for series of 10.

Evolution of test durations over the campaign for series of 5 (red/first circle) and 10 (green/second circle).



Conclusions

ONAP Guilin is able to support 10 parallel onboarding, which is what we do expect.



We may also observe that:

  1. The number of previous onboarded services has no impact on the onboarding duration. The creation of resources is linear. It means that on serie 10, 9 services have been already created. We could have expected a linear increase of the onboarding duration because the client used for test list several times the services.So the more services in SDC, the bigger the list is. So globally the SDC resources increases continuously because we cannot delete them but it has no direct impact on the onboarding duration. The duration evolution is not linear and the duration may depend mainly on the cluster status.

  2. The more // processing we have, the slower the onboarding this. duration = f(nb parallel onboarding) seems almost linear.