El-Alto Retrospective
Sorted inputs - by category
Release Process
Self release process had a few hiccups but seems better now → what would we like to fix/improve based on hiccups?
@Dan Timoney suggests that new processes should be better tested before being rolled out to the community. Engage projects to help with testing
gerrit/jenkins/jira/nexus stability issues still occurring sporadicly → link to LFN Infrastructure Initiative (next generation of toolchain) and KaaS Update (TSC Call 12/12). Is there anything else?
"one size fits all" relman JIRA tickets. Should we have different tickets for different projects? Extra effort worthwhile?
→ Classify the nature of projects and have RELMAN tickets per category: DEV projects (code impact) like Policy, AAI etc; NO-DEV projects like doc, vnfrqs and finally testing/deployment prokects: Integration, VVP, VNFSDK, OOM
→ we invite PTLs to review Frankfurt Milestone for additional suggestion(s) - Frankfurt Deliverables by Milestone
@Pamela Dragosh says that tasks need to be culled, clarified, better documentation on how to complete
Form a small working group. Meet 3 -4 weeks to review and recommend changes/updates. @Andy Mayersays to coordinate with subcommittees to avoid distorting intent of tasks.
Should we track "planned" vs. "actuals" so that we have more data on where we have schedule problems? This would provide a source of data for retrospectives
→Ok - let's track from M1 since this is the milestone representing the Community Commitments
→ Continue to raise your risks as soon as you have identified them so we can explore if we can mitigate them with additional support - Frankfurt Risks
Anything else?
Product Creation
Job Deployer for OOM (and now SO) was great CI improvement. Noticeable impact to reviewing merges when Job Deployer was offline due to lab issues.
→ Any recommendation/improvement suggestions?
Addition of Azure public cloud resources is helping with the verify job load.
Need to continue with more projects added to CI pipeline and tests targeted for the specific project (e.g., instantiateVFWCL , ./vcpe.py infra , rescustservice for SO)
→ Agreed - current activities#1 OOM Gating (Integration Team)#2 Introduction of ONAP "Use Case/Requirement" Integration Lead - first experimentation with Frankfurt#3 Continue to meet Test Coverage Target (Project Team)#4 Automate your CSIT and/or pairwise testing (Project team)#5 Anything else?
Testing Process
Adding Orange and Ericsson Labs to installation testing was a good change. More labs coming on board for this.
Vijay - need more clarity on how participants may use the labs
→ Ericsson Lab details (not an open lab - results shared with community)?
Still some issues due to lab capabilities, seems to be better but some slowliness are still occuring and hard to troubleshoot (infra has still a significant impact)
→ Re-activate/Review the OpenLab Subcommittee? (@Morgan Richommesays already initiated)
→ To be revisited based on Dev/Integration WindRiver needs considering the KaaS initiative
→ suggestion to review the lab strategy i.e. Orange for Integration Team; WindRiver for Dev?
CSIT refactoring provided more clarity (some tests were running on very old versions and sometimes not maintained since casablanca), moreover the teams were not notified in case of errors (changed in Frankfurt) - no action
Still space for improvements
robot healthcheck have not the same level of maturity from a component to another - some of them are still PASS even the component is clearly not working as expected (just check that a webserver is answering without really checking the component features), good examples shoudl be promoted as best practices
@Morgan Richomme says this was discussed during integration meeting and will be the subject of a session at DDF in Prague.
CSIT tests are more functional tests (which is good), integration tests in target deployement (using OOM) should be possible by the extension of the gating to the different components (but resources needed)
@Morgan Richomme - project functional testing should be done at project scope and should not rely on integration
Still lots of manual processing to deal with the use cases - no programmatic way to verify all the release use cases on any ONAP solution
@Morgan Richomme pair-wise testing is primarily manual. Making progress in each release on automation.
hard to get a good view of the real coverage in term of API/components - the Daily chain mainly used VNF-API for instance, no end to end automated tests dealing with policy/dcae
Need more consideration of test platforms, e.g., which version of OpenStack, k8s, etc. EUAG input? Something for consideration by TSC?
Should it be a new process based on combination of Security recommendations being reviewed with the PTLs (including Integration) and TSC approval.
Need automated test result tracking for use cases (partially done in el alto through the first PoC leveraging xtesting - test DBs was used and collected results from E/// and Orange labs)
→ Any particular plan/target for Frankfurt ?
ONAP Project Oversight
Jira was improvement over wiki pages for milestone tracking but it still seems onerous on the PTLs
See "one size fits all" comment under release process. We can have different tickets for different projects, but this is much more work to support.
See feedback provided above
Planning failed to take into consideration holidays
noted
Others
PTL meetings seem more like tracking calls. Might want to consider a PTL committee that would run the PTL meetings.
meeting format changed in October. Better now?
SSL rocket chat seemed to work for folks - need to consider moving this to a supported solution.
Rocket chat private server ACL issue
onapci.org access to jenkins is conflicting with rocket chat since they share the same gateway
IP ACL's that blocked some high volume downloads of logs from jenkins also blocked access to rocket chat fro some proxy's
→ Shall we use Slack for Frankfurt?
====================================================================================================================================================
Have we fixed anything captured during the Dublin Retrospective?
Release Process
Nexus -> Dockerhub migration- Dockerhub with fat (v2) manifest support might help understand image content visibility, but what is current state of that ? PoC with Policy after 6 months ?
https://lf-onap.atlassian.net/wiki/display/DW/Migration+to+DockerHub https://lists.onap.org/g/onap-discuss/message/17999
Work is in progress tracked outside ONAP Release Cycle - previous dependency on Global JBB
Docker images optimisation - has CIA team sufficient support for this ? Different projects are doing it differently
https://lf-onap.atlassian.net/wiki/display/DW/ONAP+Normative+container+base+images https://lf-onap.atlassian.net/wiki/display/DW/CIA+Dublin+Release+Planning https://lf-onap.atlassian.net/wiki/display/DW/Container+Image+Minimization+Guidelines
Shall we follow up with the CIA Team to understand their 2020 goals?
Need devops CI/CD to get more end2end visibility from developer perspective
72 hr (longevity) stability runs should be reviewed by the PTLs - monthly review in the PTL meeting
Is it something the Integration Team will support starting with Frankfurt?
Product Creation
OOM verify job was very helpful - finding defects before merged into the charts
SECCOM TTL for test certificates doesn't impact the security - probably longer than a year
→ Any input from SECCOM?
AAF underestimated the difficulty of changing the service locator
Infrastructure changes need to start very early in the release
Plan for backward compatibility up front
→ to be checked with AAF Team
Addition of OJSI Jira has been useful to create more focus on security issues - vulnerabilities are easier to track
Nexus vulnerabilities analysis still is very difficult - need some innovation
To be checked with SECCOM
Testing Process
No overall platform team - not enough people have a picture of the complete platform
→ the OOM gating and ONAP "Use Case/Requirement" Integration Lead might bring some adding values. Anything else?
Off-line installation finding new issues - could be done earlier to find issues
→ agreed - who will perform these offline installation? can we automate or shall we replace this with OOM Gating?
Need to do installation testing in other places than just Windriver to catch environment assumptions
Diverse lab test envs (Orange) was helpful
Many versions of OpenStack
→ See previous feedback from El-Alto retrospective
Project team engaging in the integration process accelerated the process of integration - zoom shared debug sessions (much faster than jira)
Early start of integration testing helps issue resolution earlier than the tail end of the release
RC0 attempt to integrate and test was not successful - no CI/CD and a large set of changes integrated at once negatively impacted system stability
→ Is it something that PTL will invest as part of Frankfurt since limited improvement was made in El-Alto
ONAP Project Oversight
POC should not intervene with the ongoing ONAP Release content/timeline i.e. Dublin. POC should not hold the following release i.e. El-Alto
Retrospectives best done in person
Suggestion is that we setup 1h per ONAP DDF event?
Delay in Dublin - issues not found until the very end
Testing started earlier in the release
Functional deployments exposed more defects and happen late in the release
Intersystem testing should start sooner
Instantiate vFW as a step in the CI/CD process - fully automated testing and very stable
Requirements came too late in the process
Need to begin the testing of Functional requirements earlier
Teams need to be transparent into the actual status of the milestones - example: projects marked as code complete were not complete
Delay generated by the management of the Casablanca Maintenance Release
Lack of available lab resources
Need to start with keystone V3 install in El Alto - installation jobs will be changed (authentication)
→Who can tell us the current status? no requirement as part of Frankfurt?