The intent of the 72 hour stability test is not to exhaustively test all functions but to run a steady load against the system and look for issues like memory leaks that aren't found in the short duration install and functional testing during the development cycle.
This page will collect notes on the 72 hour stability test run for El Alto.
Setup
The integration-longevity tenant in Intel/Windriver environment was used for the 72 hour tests.
The onap-ci job for "Project windriver-longevity-release-manual" was used for the deployment with the OOM and Integration branches set to elalto.
The deployment was fairly clean but there was an environment issue that required a few pods to be recycled by the normal k8 delete pod due to a what looked like a network blimp during the install.
We also hit the environment dhcp bug where the VMs would get an external dhcp address from a different network than openstack's dhcp. The symptom is not being able to log into the external IP of the VM.
This is solved by a force reboot of the VM from the horizon portal but unfortunately this prevents the installation of the demo VNF config files so the VM install script has to be re-run from inside the VM.
Changes were made to the testsuite robot scripts for instantiateDemoVFWCL robot flows to fix changes in the customer name/stack name generation to match the jenkins job setup for closed loop.
These were a side affect of the El Alto refactoring for python 2.7/3 migration that hadnt been detected in the previous test cases due to the need for unique Naming requirements in the jenkins jobs.
Shakedown consistent of creating some temporary tags for stability72hrvLB, stability72hrvVG,stability72hrVFWCL to make sure each sub test ran successfully (including cleanup) in the environment before the jenkins job started with the higher level testsuite tag stability72hr that covers all three test types.
During shake down of the environment we exceeded the quota on key pairs again (a recurring problem due to testing in the environment where the keypair delete is not run after deleting the VMs).
We used the horizon portal to delete keypairs for a large set of the previous robot test runs using the common admin tenant to free up quota space which should be sufficent for the duration of the tests but we will delete key pairs during the run just in case if needed.
VNF Orchestration Tests
This test uses the onap-ci job "Project windriver-longevity-stability72hr" to automatically onboard, distribute and instantiate the ONAP opensource test VNFs vLB, vVG and vFWCL.
The scripts run validation tests after the install.
The scripts then delete the VNFs and cleans up the environment for the next run.
The script tests AAF, DMaaP, SDC, VID, AAI, SO, SDNC, APPC with the open source VNFs.
These tests started at jenkins job #243 at October 12 at 1:00 PM EST
Each test run generates over 500 MB of data on the test through robot framework.
Test # | Comment | Message |
---|---|---|
Test start #243 1 PM Oct 12 | ||
245 | Validate vServer in testsuite HeatBridge needed to wait for the AAI index update. Wrapped this step in a Wait For Keyword Success | post response: {"requestError":{"serviceException":{"messageId":"SVC3001","text":"Resource not found for %1 using id %2 (msg=%3) (ec=%4)","variables":["POST Search","getNamedQueryResponse","Node Not Found:No Node of type vserver found for properties","ERR.5.4.6114"]}}} |
260 | Received failure response from so {"request":{"requestId":"79264729-04ab-4738-a27d-29013c59218c","startTime":"Sun, 13 Oct 2019 09:38:20 GMT","finishTime":"Sun, 13 Oct 2019 09:39:14 GMT","requestScope":"vfModule","requestType":"createInstance","requestDetails":{"modelInfo":{"modelCustomizationName":"VfwclVfwsnk0f6a8e47E64e..base_vfw..module-0","modelInvariantId":"e994097b-6285-49e1-a87c-76ba6e0371ab","modelType":"vfModule","modelName":"VfwclVfwsnk0f6a8e47E64e..base_vfw..module-0","modelVersion":"1","modelCustomizationUuid":"6ce786ef-31e8-4f00-bdb4-1c66f54eaffd","modelVersionId":"72f56293-fbf2-49fa-bb13-1df8f5f88548","modelCustomizationId":"6ce786ef-31e8-4f00-bdb4-1c66f54eaffd","modelUuid":"72f56293-fbf2-49fa-bb13-1df8f5f88548","modelInvariantUuid":"e994097b-6285-49e1-a87c-76ba6e0371ab","modelInstanceName":"VfwclVfwsnk0f6a8e47E64e..base_vfw..module-0"},"requestInfo":{"source":"VID","instanceName":"Vfmodule_Ete_vFWCLvFWSNK_031aaae1_0","suppressRollback":false,"requestorId":"demo"},"relatedInstanceList":[{"relatedInstance":{"instanceId":"fc4a3aac-e15e-4cf2-b85c-93eee3cdf3cc","modelInfo":{"modelInvariantId":"ed6ca1d8-cf38-455b-bb0a-75ae84d51715","modelType":"service","modelName":"vFWCL 2019-10-13 09:29:","modelVersion":"1.0","modelVersionId":"1c3dece0-945e-4f38-b5d2-f1d3fe7579e1","modelUuid":"1c3dece0-945e-4f38-b5d2-f1d3fe7579e1","modelInvariantUuid":"ed6ca1d8-cf38-455b-bb0a-75ae84d51715"}}},{"relatedInstance":{"instanceId":"d4cc80c3-367c-4de2-8dd2-52904466b60a","modelInfo":{"modelCustomizationName":"vFWCL_vFWSNK 0f6a8e47-e64e 0","modelInvariantId":"dcbe3ca3-b9c3-4042-a06f-5ad83f1be089","modelType":"vnf","modelName":"vFWCL_vFWSNK 0f6a8e47-e64e","modelVersion":"1.0","modelCustomizationUuid":"9eaff9be-ac20-4872-9804-7bd45515a351","modelVersionId":"2de4b9dd-b6d6-4822-92c0-670c9329557f","modelCustomizationId":"9eaff9be-ac20-4872-9804-7bd45515a351","modelUuid":"2de4b9dd-b6d6-4822-92c0-670c9329557f","modelInvariantUuid":"dcbe3ca3-b9c3-4042-a06f-5ad83f1be089","modelInstanceName":"vFWCL_vFWSNK 0f6a8e47-e64e 0"}}}],"cloudConfiguration":{"tenantId":"28481f6939614cfd83e6767a0e039bcc","cloudOwner":"CloudOwner","lcpCloudRegionId":"RegionOne"},"requestParameters":{"usePreload":true,"testApi":"VNF_API"}},"instanceReferences":{"serviceInstanceId":"fc4a3aac-e15e-4cf2-b85c-93eee3cdf3cc","vnfInstanceId":"d4cc80c3-367c-4de2-8dd2-52904466b60a","vfModuleInstanceName":"Vfmodule_Ete_vFWCLvFWSNK_031aaae1_0","requestorId":"demo"},"requestStatus":{"requestState":"FAILED","statusMessage":"STATUS: Received vfModuleException from VnfAdapter: category='INTERNAL' message='Exception during create VF org.onap.so.openstack.utils.StackCreationException: Stack Creation Failed Openstack Status: CREATE_FAILED Status Reason: Resource CREATE failed: Conflict: resources.vsn_0_onap_private_port_0: IP address 10.0.235.102 already allocated in subnet 4ed99c09-aed6-4eca-8f94-48357ab4e5d1\nNeutron server returns request_ids: ['req-f60a93ff-ecbf-4c5e-b149-8ebdf64e38f2'] , Rollback of Stack Creation completed with status: DELETE_COMPLETE Status Reason: Stack DELETE completed successfully' rolledBack='true'","percentProgress":100,"timestamp":"Sun, 13 Oct 2019 09:39:14 GMT"}}} | |
Test Status : #261 7 AM Oct 13 | No left over VMs or Stacks from delete Docker-data-nfs at 21% of available capacity robot container: 10.0.0.4:/dockerdata-nfs/dev-robot/robot/logs 162420736 33509376 128894976 21% /share/logs. 17 keypairs under demo account Environment Spot Check when tests are not running look okay. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% | |
Test Status: #267 12:00 PM Oct 13 | /dev/vda1 162420480 36636868 125767228 23% / No left over VMs or Stacks from previous runs RegionOne_ONAP-NF_20191013T150300143Z_olc-key_PlYL style keypairs added in the morning. Up to 27 keypairs | |
268 | Same signature as #260 | ,"requestStatus":{"requestState":"FAILED","statusMessage":"STATUS: Received vfModuleException from VnfAdapter: category='INTERNAL' message='Exception during create VF org.onap.so.openstack.utils.StackCreationException: Stack Creation Failed Openstack Status: CREATE_FAILED Status Reason: Resource CREATE failed: Conflict: resources.vdns_0_onap_private_port_0: IP address 10.0.236.25 already allocated in subnet 4ed99c09-aed6-4eca-8f94-48357ab4e5d1\nNeutron server returns request_ids: ['req-2a9b2ed0-0502-4377-b209-59f36a27d8dd'] , Rollback of Stack Creation completed with status: DELETE_COMPLETE Status Reason: Stack DELETE completed successfully' |
Test Status #271 4:00 PM Oct 13 | /dev/vda1 162420480 40495064 121909032 25% / No left over VMs or Stacks Up to 33 keypairs root@long-nfs:/home/ubuntu# kubectl -n onap top nodes | |
276 | Same signature as #260 | questParameters":{"usePreload":true,"testApi":"VNF_API"}},"instanceReferences":{"serviceInstanceId":"7374c399-e4af-4cc8-81b3-cb0ff810ac7c","vnfInstanceId":"06888576-bd1b-4b30-b27f-3b61a0898bee","vfModuleInstanceName":"Vfmodule_Ete_vFWCLvPKG_3cd57462_1","requestorId":"demo"},"requestStatus":{"requestState":"FAILED","statusMessage":"STATUS: Received vfModuleException from VnfAdapter: category='INTERNAL' message='Exception during create VF org.onap.so.openstack.utils.StackCreationException: Stack Creation Failed Openstack Status: CREATE_FAILED Status Reason: Resource CREATE failed: Conflict: resources.vpg_0_onap_private_port_0: IP address 10.0.158.103 already allocated in subnet 4ed99c09-aed6-4eca-8f94-48357ab4e5d1\nNeutron server returns request_ids: ['req-3e4fd376-8698-4211-95a1-eb1312a71c28'] , Rollback of Stack Creation completed with status: DELETE_COMPLETE Status Reason: Stack DELETE completed successfully' rolledBack='true'","percentProgress":100,"timestamp":"Mon, 14 Oct 2019 01:35:58 GMT"}} |
Test Status #276 10:00 PM Oct 13 | No stranded VMs or Stacks robot log storage up to 28% usage 10.0.0.4:/dockerdata-nfs/dev-robot/robot/logs 162420736 44839936 117564416 28% /share/logs root@long-nfs:/home/ubuntu# kubectl -n onap top nodes | |
281 | Same signature as #260 | vFWCL Heat address 10.0.241.102 already allocated |
282 | Same signature as #260 | vFWCL Heat vfmodule duplicate |
285 | Same signature as #260 | vVWCL Duplicate IP address 10.0.187.102 |
286 | Same signature as #260 | Potential cause is conflict with vfwclosedloop vnf in preload / test tool data 10.0.251.101, 102, 103 are used by vfwclosed loop but not excluded from test data for instantiate tests. Still need to look at ip address removal in openstack to see if ip address aging is affecting the tests 2019-10-14T10:35:21.610Z||org.onap.so.adapters.vnf.VnfAdapterRest - Create VfModule enter inside VnfAdapterRest: {"createVfModuleRequest":{"messageId":"94679b5b-a360-4f78-a9c3-d097e1b2ec25-1571049321389","skipAAI":true,"notificationUrl":"http://so-bpmn-infra.onap:8081/mso/WorkflowMessage/VNFAResponse/94679b5b-a360-4f78-a9c3-d097e1b2ec25-1571049321389", .... -------------------------------------- ... 2019-10-14T10:35:22.015Z|94679b5b-a360-4f78-a9c3-d097e1b2ec25|org.onap.so.openstack.utils.MsoHeatUtils - queryHeatStack - stack not found: Vfmodule_Ete_vFWCLvFWSNK_b0857107_0
Looks like openstack does not respond with status correctly after a vfModule create. vFWCL Duplicate IP address 10.0.251.103 |
Test Status: 8 AM Oct 14 | /dev/vda1 162420480 52463904 109940192 33% / root@long-nfs:/home/ubuntu# kubectl -n onap top nodes
| |
288 | Same problem as #260 | vVG : Stack Vfmodule_Ete_vVG_b6aa2967_0 already exists in Tenant duplicate stack name instead of duplicate ip address but same problem |
289 | Heatbridge Validation | AAI query on reverse heat bridge Testsuite should wrap in Wait For Keyword Success instead of justh query - cassandra replication delay |
290 | Same problem as #260 | vFWCL: vfmodule name duplicate |
Test Status 1 PM Oct 14 | /dev/vda1 162420480 56555456 105848640 35% / No stranded VMs or Stacks 36 keypairs (during active instantiate phase) root@long-nfs:/tmp# kubectl -n onap top nodes
| |
292 | Same problem as #260 | vFWCL : IP address 10.0.227.101 already allocated in subnet Latest analysis indicates it may be a problem with shared mariadb-galera server.cnf that is not providing the right locking to Camunda under load. |
Test Status 5 PM Oct 14 | /dev/vda1 162420480 59795972 102608124 37% / No stranded VMs or Stacks root@long-nfs:/home/ubuntu# kubectl -n onap top nodes
| |
Closed Loop Tests
This test uses the onap-ci job "Project windriver-longevity-vfwclosedloop".
The test uses the robot test script "demo-k8s.sh vfwclosedloop ". The script sets the number of streams on the vPacket Generator to 10 , waits for the change from 10 set sreams to 5 streams by the control loop then sets the stream to 1 and again waits for the 5 streams.
Success tests the loop from VNF through DCAE, DMaaP, Policy, AAI , AAF and APPC.
The tests start with #1595 on October 12 at 4:00 PM EST
Test # | Comment | Message |
---|---|---|
Test Start #1595 4 PM Oct 12 | ||
Test Status: #1610 7 AM Oct 13 | No issues. No failed tests | |
Test Status: #1615 12 PM Oct 13 | No issues. No failed tests | |
Test Status #1620 5 PM Oct 13 | No issues. No failed tests | |
Test Status #1625 10 PM Oct 13 | No issues. No failed tests | |
Test Status #1635 8 AM Oct 14 | No issues. No failed tests | |
Test Status #1640 1 PM Oct 14 | No issues. No failed tests | |
Test Status #1644 5 PM Oct 14 | No issue. No failed tests. Comparing El Alto to Dublin we see that the average loop response time is shorter. for El Alto (2:17 minutes) vs Dublin (3:19 minutes) This is likely because the TCA polling interval was reduced in El Alto to speed up the loop in recognition that the VES reporter default configuration in the ONAP test VNFs was set aggressively to emit status every 10 seconds. | |
Summary
To be completed after the test run