The intent of the 72 hour stability test is not to exhaustively test all functions but to run a steady load against the system and look for issues like memory leaks that aren't found in the short duration install and functional testing during the development cycle.

This page will collect notes on the 72 hour stability test run for Frankfurt.

See El Alto Stability Run Notes for comparison to previous runs.

Summary of Results

WORK IN PROGRESS

Setup

The integration-longevity tenant in Intel/Windriver environment was used for the 72 hour tests.

The onap-ci job for "Project windriver-longevity-release-manual" was used for the deployment with the OOM set to frankfurt and Integration branches set to master. Integraiton master was used so we could catch the latest updates to integration scripts and vnf heat templates.

The jenkins job needs a couple of updates for each release:

Set the integration branch to 'origin/master'
Modify the parameters to deploy.sh to specify "-i master" and "-o frankfurt" to get integration master an oom frankfurt clones onto the nfs server.

The path for robot logs on dockerdata-nfs changed in Frankfurt so the /dev-robot/ becomes /dev/robot

The stability tests used robot container image 1.6.1-STAGING-20200519T201214Z

robot container updates:

API_TYPE was set to GRA_API since we have deprecated VNF_API.

Shakedown consists of creating some temporary tags for stability72hrvLB, stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully (including cleanup) in the environment before the jenkins job started with the higher level testsuite tag stability72hr that covers all three test types.

Clean out the old buid jobs using a jenkins console script (manage jenkins)

def jobName = "windriver-longevity-stability72hr"

def job = Jenkins.instance.getItem(jobName)

job.getBuilds().each { it.delete() }

job.nextBuildNumber = 1

job.save()

appc.properties updated to apply the fix for DMaaP message processing to call http://localhost:8181 for the streams update.

VNF Orchestration Tests

This test uses the onap-ci job "Project windriver-longevity-stability72hr" to automatically onboard, distribute and instantiate the ONAP opensource test VNFs vLB, vVG and vFWCL.

The scripts run validation tests after the install.

The scripts then delete the VNFs and cleans up the environment for the next run.

The script tests AAF, DMaaP, SDC, VID, AAI, SO, SDNC, APPC with the open source VNFs.

There was a problem with the robot scripts for vLB where it was not finding the base_lb.yaml file in the artifacts due to a change in the structure. A two line change to the vnf orchestration script to look for the 'heat3' key was made to resolve the issue. A Jira was created to track the changes to the robot scrips. INT-1598 - Getting issue details... STATUS

These tests started at jenkins job #1

Each test run generates over 500 MB of data on the test through robot framework.

Each test run also runs the kubectl top nodes command to see cpu and memory utilization across the k8 cluster.

We periodically will run the top pods command as well to check on the top memory and cpu using pods.

http://10.12.6.182:8080/jenkins/job/windriver-longevity-stability72hr/

Test #	Comment	Message
	k8 utilization Wed May 20 18:45:15 UTC 2020	Memory: root@long-nfs:~/oom/kubernetes/robot# kubectl -n onap top pods \| sort -rn -k 3 \| head -25 dev-appc-0 7m 2901Mi dev-portal-cassandra-59f5cb4cf5-9phmg 159m 2777Mi dev-appc-2 10m 2705Mi dev-appc-1 19m 2681Mi dev-cassandra-0 73m 2417Mi dev-cassandra-2 48m 2394Mi dev-cassandra-1 70m 2391Mi dev-sdnc-2 71m 1868Mi dev-policy-59f48bd84b-q2fp8 7m 1820Mi dev-sdnc-0 139m 1627Mi dev-sdnc-1 26m 1574Mi dev-vid-5b7558dcdc-rx2d7 9m 1510Mi dev-clamp-dash-es-6cb85979b5-cvrcs 32m 1480Mi dev-awx-0 244m 1434Mi dev-aai-elasticsearch-55b56f855c-f5pp5 2m 1422Mi dev-sdc-be-77d55774f5-zkfrt 6m 1381Mi dev-dcae-cloudify-manager-6f854859f9-ctdcv 90m 1312Mi dep-dcae-tca-analytics-55dbd5cd9d-fsm89 511m 1262Mi dev-aaf-cass-7d55bfc874-sqcdq 6m 1244Mi dev-aai-traversal-847c4c6994-qbpst 3m 956Mi dev-so-bpmn-infra-7b58b75b76-n59sf 5m 953Mi dev-message-router-zookeeper-2 2m 946Mi dev-aai-resources-74dd6994d4-nh24m 5m 869Mi dev-aai-graphadmin-65db8cfc67-svvkd 2m 836Mi dev-music-cassandra-2 147m 801Mi
#1	TOOLING Startup issues - modified customer uuid to shorten the string in the tooling since it looked like robot selenium was having trouble "seeing" the string in the drop down.	vDNS: NoSuchElementException: Message: Could not locate element with visible text: ETE_Customer_aaaf3926-d765-4c47-93b9-857e674d2d01 vvG: NoSuchElementException: Message: Could not locate element with visible text: ETE_Customer_08f8a099-3e2b-480f-8153-5b4173d9394a vFW: Succeeded
#4	ENV ${vnf} = vFWCLvPKG Robot heat bridge run after the deployment failed trying to find the stack in openstack usually means that openstack was slow in deploying the VNF. Heatbridge had succeeded for the vFWCLvSNK inside the same service instantiate.	Keyword 'Get Deployed Stack' failed after retrying for 10 minutes. The last error was: KeyError: 'stack'
#13	ENV ${vnf} = vFWCLvPKG Robot heat bridge run after the deployment failed trying to find the stack in openstack usually means that openstack was slow in deploying the VNF. Heatbridge had succeeded for the vFWCLvSNK inside the same service instantiate.	Keyword 'Get Deployed Stack' failed after retrying for 10 minutes. The last error was: KeyError: 'stack'
#14	TOOLING or ENV vDNS and vVG robot script couldnt find elements on the GUI drop downs. Likely transient networking issues. vFW succeeded and all three are in the test run (vDNS, vVG, vFW in that order).	vDNS : Keyword 'Wait For Model' failed after retrying for 3 minutes. The last error was: Element 'xpath=//tr[td/span/text() = 'vLB 2020-05-20 13-06-03']/td/button[contains(text(),'Deploy')]' not visible after 1 minute. vVG: NoSuchElementException: Message: Could not locate element with visible text: ETE_Customer_9f739343-cbc7-4ee4-8697-ea52f06e7796 vFW Succeeded
#15	TOOLING Virtual Volume Group - Failure in robot selenium to find customer in search window. Timing issue.	NoSuchElementException: Message: Could not locate element with visible text: ETE_Customer_26e85655-1f44-4e7e-8cd2-e9fab290af01
#17	ENV or TOOLING Failure in robot selenium at second VNF in service package. Likely tuning of robot needed waiting for the module name to appear in the drop down under transient conditions.	Element 'xpath=//div[contains(.,'Ete_vFWCLvPKG_f716b1bd_1')]/div/button[contains(.,'Add VF-Module')]' did not appear in 1 minute.
#18	ENV K8 worker node problem . kubectl top nodes listed k8s-04 as unkown. k8s-04 is on 10.12.6.0 which could be contributing factor - .0 and .32 addresses in windriver have suspect behavoir. Worker down caused a set of containers to be restarted which is the right behavoir from a k8 standpoint. Test could not run while robot container was down.	12:00:25 Instantiate Virtual DNS GRA command terminated with exit code 137 12:22:22 + retval=137 12:22:22 ++ echo 'kubectl exec -n onap dev-robot-56c5b65dd-dkks4 -- ls -1t /share/logs \| grep stability72hr \| head -1' 12:22:22 ++ ssh -i /var/lib/jenkins/.ssh/onap_key ubuntu@10.12.5.205 sudo su 12:22:25 error: unable to upgrade connection: container not found ("robot")
#19 #20	TOOLING k8 restarted robot pod. Manual fixes to vnf_orchestration_test_template to fix heat3 parsing issues were removed. reapplied manual fixes so parsing sdc artifacts to find the base_vlb resource succeeded again.	Unable to find catalog resource for vLB base_vlb'
#32	TOOLING Robot script did not find subscriber name in search results Likely timing issue that robot is too fast in looking for json data in the drop down before it is fully loaded.	Create Service Instance → vid_interface . Click On Element When Visible //select[@prompt='Select Subscriber Name'
#35	ENV vDNS instantiate failed at openstack stage. Potentially slowed openstack caused SO to resubmit a request that subsequently became a duplicate from openstack perspective. Looks like functional bug with SO to Openstack issue triggered by the environment not stability related.	CREATE failed: Conflict: resources.vlb_0_onap_private_port_0: IP address 10.0.211.24 already allocated in subnet be057760-1ffa-4827-a6df-75d355c4d45a\nNeutron server returns request_ids: ['req-ca6e5f39-7462-47c6-aaa8-9653783828cb']
#37	ENV vG and vFW failed on VID screen errors looking for data items. Investigation shows that aai-traversal pod restarted. Looks like slow networking caused the pod to be redeployed but not conclusive. Initially so, vid failed healtch check until aai traversal was up then both passed healthcheck.
	Thu May 21 12:33:45 UTC 2020	Memory: root@long-nfs:/home/ubuntu# kubectl -n onap top pod \| sort -rn -k3 \| head -20 dev-appc-0 7m 2834Mi dev-portal-cassandra-59f5cb4cf5-9phmg 152m 2780Mi dev-appc-1 19m 2700Mi dev-appc-2 10m 2694Mi dev-cassandra-2 15m 2449Mi dev-cassandra-1 21m 2434Mi dev-vid-5b7558dcdc-rx2d7 16m 1786Mi dev-sdnc-2 64m 1664Mi dev-sdnc-0 131m 1631Mi dev-sdc-be-77d55774f5-zkfrt 9m 1578Mi dev-sdnc-1 29m 1566Mi dev-awx-0 291m 1524Mi dev-clamp-dash-es-6cb85979b5-cvrcs 37m 1496Mi dep-dcae-tca-analytics-55dbd5cd9d-fsm89 664m 1318Mi dev-dcae-cloudify-manager-6f854859f9-ctdcv 76m 1302Mi dev-aaf-cass-7d55bfc874-sqcdq 5m 1250Mi dev-cds-blueprints-processor-7fd988d584-mvdkz 40m 1228Mi dev-message-router-zookeeper-1 5m 1127Mi dev-message-router-zookeeper-0 6m 1023Mi dev-so-bpmn-infra-7b58b75b76-n59sf 8m 941Mi
#38	ENV vDNS - Timeout waiting for model to be visible via Deploy button in VID vVG and vFW Succeeded Transient Slowness since the 2nd and 3rd VNF succeeded.	Keyword 'Wait For Model' failed after retrying for 3 minutes. The last error was: TypeError: object of type 'NoneType' has no len()
#47	TOOLING vDNS - Seleinum error seeing the Subscriber Name vVG and vFW worked. Transient	vid_interface . Click On Element When Visible //select[@prompt='Select Subscriber Name'] StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
	Fri May 22 03:41:11 UTC 2020	root@long-nfs:/home/ubuntu# kubectl -n onap top pod \| sort -nr -k 3 \| head -20 dev-appc-0 7m 2839Mi dev-portal-cassandra-59f5cb4cf5-9phmg 127m 2781Mi dev-appc-2 11m 2702Mi dev-appc-1 39m 2576Mi dev-cassandra-2 62m 2517Mi dev-cassandra-1 69m 2502Mi dev-cassandra-0 64m 2433Mi dev-vid-5b7558dcdc-rx2d7 10m 2050Mi dev-policy-59f48bd84b-6h4xt 23m 1892Mi dev-sdnc-0 154m 1622Mi dev-sdnc-2 89m 1586Mi dev-sdnc-1 25m 1566Mi dev-awx-0 351m 1525Mi dev-clamp-dash-es-6cb85979b5-cvrcs 52m 1504Mi dev-pdp-0 4m 1434Mi dev-aai-elasticsearch-55b56f855c-qbzfl 11m 1428Mi dep-dcae-tca-analytics-55dbd5cd9d-fsm89 452m 1380Mi dev-dcae-cloudify-manager-6f854859f9-ctdcv 88m 1345Mi dev-cds-blueprints-processor-7fd988d584-mvdkz 38m 1286Mi dev-sdc-be-77d55774f5-zkfrt 7m 1253Mi
#53	ENV vDNS instantiate failed at openstack stage. Potentially slowed openstack caused SO to resubmit a request that subsequently became a duplicate from openstack perspective. Looks like functional bug with SO to Openstack issue triggered by the environment not stability related. vVG and vFW Succeeded in same test.	STATUS: Received vfModuleException from VnfAdapter: category='INTERNAL' message='Exception during create VF org.onap.so.openstack.utils.StackCreationException: Stack Creation Failed Openstack Status: CREATE_FAILED Status Reason: Resource CREATE failed: Conflict: resources.vlb_0_onap_private_port_0: IP address 10.0.250.24 already allocated in subnet be057760-1ffa-4827-a6df-75d355c4d45a\nNeutron server returns request_ids
	Fri May 22 09:35:28 UTC 2020	root@long-nfs:/home/ubuntu# kubectl -n onap top pod \| sort -nr -k 3 \| head -20 dev-appc-0 6m 2837Mi dev-portal-cassandra-59f5cb4cf5-9phmg 125m 2792Mi dev-appc-2 10m 2704Mi dev-appc-1 26m 2568Mi dev-cassandra-1 71m 2501Mi dev-cassandra-2 62m 2499Mi dev-cassandra-0 54m 2448Mi dev-vid-5b7558dcdc-rx2d7 9m 2074Mi dev-policy-59f48bd84b-6h4xt 19m 1880Mi dev-sdnc-0 108m 1620Mi dev-sdnc-2 71m 1586Mi dev-sdnc-1 29m 1568Mi dev-awx-0 239m 1523Mi dev-clamp-dash-es-6cb85979b5-cvrcs 44m 1513Mi dev-sdc-be-77d55774f5-zkfrt 39m 1439Mi dev-pdp-0 3m 1436Mi dev-aai-elasticsearch-55b56f855c-qbzfl 6m 1423Mi dep-dcae-tca-analytics-55dbd5cd9d-fsm89 425m 1391Mi dev-cds-blueprints-processor-7fd988d584-mvdkz 27m 1375Mi dev-dcae-cloudify-manager-6f854859f9-ctdcv 86m 1311Mi root@long-nfs:/home/ubuntu# kubectl -n onap top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% long-k8s-01 699m 8% 15077Mi 94% long-k8s-02 1688m 21% 13367Mi 83% long-k8s-03 166m 2% 6085Mi 38% long-k8s-04 919m 11% 14554Mi 91% long-k8s-05 636m 7% 12823Mi 80% long-k8s-06 905m 11% 14291Mi 89% long-k8s-07 480m 6% 8883Mi 55% long-k8s-08 842m 10% 13220Mi 82% long-k8s-09 1692m 21% 5594Mi 35% long-orch-1 228m 11% 1454Mi 37% long-orch-2 212m 10% 1350Mi 35% long-orch-3 129m 6% 1260Mi 32%
#58	ENV ODL cluster communication error on vFW preload. This type of error usually is associated with network latency issues between nodes. Akka configuration should be evaluated to loosen up the timeout settings for public cloud or other slow environments. Discuss with Dan GET to https://{{sdnc_ssl_port}}/restconf/config/VNF-API:preload-vnfs/ Succeeds	O Get Request using : alias=sdnc, uri=/restconf/config/VNF-API:preload-vnfs/vnf-preload-list/Vfmodule_Ete_vFWCLvFWSNK_e401f06d_0/VfwclVfwsnkA143de8bE20f..base_vfw..module-0, headers={'X-FromAppId': 'robot-ete', 'X-TransactionId': '922f999d-2444-4bcd-b5ad-60fbf553735d', 'Content-Type': 'application/json', 'Accept': 'application/json'} json=None 04:36:17.031 INFO Received response from [sdnc]: {"errors":{"error":[{"error-type":"application","error-tag":"operation-failed","error-message":"Error executeRead ReadData for path /(org:onap:sdnctl:vnf?revision=2015-07-20)preload-vnfs/vnf-preload-list/vnf-preload-list[{(org:onap:sdnctl:vnf?revision=2015-07-20)vnf-type=VfwclVfwsnkA143de8bE20f..base_vfw..module-0, (org:onap:sdnctl:vnf?revision=2015-07-20)vnf-name=Vfmodule_Ete_vFWCLvFWSNK_e401f06d_0}]","error-info":"Shard member-2-shard-default-config currently has no leader. Try again later."}]}} https://{{sdnc_ssl_port}}/jolokia/read/org.opendaylight.controller:type=DistributedOperationalDatastore,Category=ShardManager,name=shard-manager-operational cluster health Expand source { "request": { "mbean": "org.opendaylight.controller:Category=ShardManager,name=shard-manager-operational,type=DistributedOperationalDatastore", "type": "read" }, "value": { "LocalShards": [ "member-3-shard-default-operational", "member-3-shard-prefix-configuration-shard-operational", "member-3-shard-topology-operational", "member-3-shard-entity-ownership-operational", "member-3-shard-inventory-operational", "member-3-shard-toaster-operational" ], "SyncStatus": true, "MemberName": "member-3" }, "timestamp": 1590141147, "status": 200 }

Interim Status on VNF Orchestration

Notice the improved test duration after the K8 node automated reconfiguration to move loads off k8s-04.

We will run final numbers at the end of the test but most of the problems appear to be environment and tooling issues.

Closed Loop Tests

This test uses the onap-ci job "Project windriver-longevity-vfwclosedloop".

The test uses the robot test script "demo-k8s.sh vfwclosedloop ". The script sets the number of streams on the vPacket Generator to 10 , waits for the change from 10 set sreams to 5 streams by the control loop then sets the stream to 1 and again waits for the 5 streams.

Success tests the loop from VNF through DCAE, DMaaP, Policy, AAI , AAF and APPC.

In the jenkins job:

Modify the NFS_IP and PKG_IP in the jenkins job to point to the current nfs server and packet generator in the tenant

NFS_IP=10.12.5.205

PKG_IP=10.12.5.247

Initially the policy in TCA Key Value store was not in synch with Policy due to the instantiation of the Demo VNF issue.

Since consul-server-ui is not enabled by default , we had to edit the service to expose the consul-server-ui as a NodePort and then go to the ui page to edit the ControlLoop vFW policy to use the same model-invariant-id that was used with the instantiate so A&AI query would succeed.

http://10.12.5.185:32512/ui/#/dc1/kv/dcae-tca-analytics/edit (node the nodeport was epheremal)

closedLoopControlName was edited in two places (for Hi and Low) to specify "ControlLoop-vFirewall-cdf42e53-b49b-4d9f-a621-fa9521111615". "cdf42e53-b49b-4d9f-a621-fa9521111615" was the new , matching model-invariant-id.

The tests start with #1

http://10.12.6.182:8080/jenkins/job/windriver-longevity-vfwclosedloop/

Test #	Comment	Message
0 - 20	No errors
21-40	No errors

Interim Status on closed loop testing ~30% through stability run

Frankfurt Stability Run Notes

Summary of Results

Setup

VNF Orchestration Tests

Closed Loop Tests