Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

CSIT ("Continuous System and Integration Testing", even though you could also read that as "Component-Specific Integration Testing") cases in ONAP exist typically for verifying functionalities of a single or few ONAP components in limited testing environments (suitable for lightweight installation in local development environment and Jenkins) and often rely on simulators. Currently these tests are executed once per day on images from master branch (as well as on maintenance branches of selected previous releases) or whenever the test cases themselves are committed for review. This means that by ONAP's current automated verification procedure any new bugs that could be caught by the existing CSIT cases are not discovered until the bugs are already merged to master. In addition, any failures in daily CSIT Jenkins are not sending any kind of notifications or raising tickets or alarms, so the teams are not even getting any automated feedback from the failures that can therefore go days or in worst cases weeks without notice (much less action).

...

  • Component code and CSIT cases are in different repos, which means that introducing non-backward-compatible changes leads to egg-chicken problems where neither test case  nor implementation modifications can pass their respective review verifications without the other already merged either one of them being already merged or then by currently unexplored gerrit patch dependency tricks
  • Many CSIT cases take a long time to execute, prolonging review verification feedback significantly and most likely causing bottlenecks in Jenkins leading to even further delays in day-to-day development
  • Current CSIT tools do not fully support building and testing of local images
  • Review-specific verification of images would significantly increase the space used in Nexus and could lead to capacity problems if not considered properly
  • CSIT tests might not be stable enough in all cases to rely on in continuous development yet - we can't have random failures delaying or preventing merges and we have to be able to rely on the results
  • Every committer would have to be able to at least execute and troubleshoot Robot test cases in order to commit any changes - do the teams currently have sufficient competence?
  • Related to the required competence: executing CSIT cases in local development environments currently requires some effort - there are both some generic and project-specific peculiarities that are not sufficiently well documented

...

  • The most important thing needed to incorporate CSIT cases into code review verifications smoothly is to either put them under the same repos repo or force the use of gerrit patch dependency functionality so any code commit that changes some already verified functionality (or introduces some new functionality that requires new tests) also can (and must) bring the related CSIT changes. 
    • This would mean that current repository structures regarding CSIT would be heavily redesigned and only common, generic parts would remain under existing csit repo
    • If there are CSIT cases that are responsible for verifying functionalities in multiple different repositories at the same time, the redesign would become complicated (such projects should probably reconsider their repo structures, or alternatively the scope of the CSIT cases in question would have to be reconsidered)
  • CSIT CSIT execution and environment setup should be enhanced to be able to build and use images coming from the review branch 
    • If the images from each new review patch are uploaded to Nexus, we would be using multiple times more space there than we do now (and if we do not, test env deployment would need some more redesign)
  • Execution times of the CSIT tests should be shortened to the absolute minimum 
    • A major part of this is closely related to the product maturity in general - image sizes (which  affects the download time from Nexus) and container startup time optimization could help a lot also in CSIT cases where significant amount of time is spent in setting up the test environment 
      • Building a new image and uploading it to Nexus for every new review patch would make this optimization even more crucial
    • Reducing all kinds of test-specific retries and waiting times to absolute minimum but no more than that
      • Note that the other side of this coin is test stability in different environments with varying resources, so special care also needs to taken to ensure that timeouts are not overoptimized
    • In the long run this requirement is also at direct odds with the requirement that every integration-level bug correction and new feature should be verified in CSIT (if you didn't have this requirement before, you have it now (big grin) ) - eventually the total mass of things that need to be covered in CSIT will (or at least should) grow too big to be executed in every review verification
      • Streamlined set of critical CSIT tests executed in review verification may have to be separated from full regression set that would be executed only on merged code
  • CSIT tests should be reliable and working
    • non-problematic code changes would always pass on the first attempt without needing to retry them a couple of times just in case they failed for some unrelated random causes
    • people are less likely to just ignore and override negative CSIT votes if they know failing CSIT is an indication of a real problem in the commit 

...