Carrier Grade Discussion

72 hour stability run to ensure containers 

key kpis regarding component targets & performance

independent upgrade of components 

Secure- no known threat levels above x, oasis security standards

secure microservices and AAA

Ensuring that E2E scenarios are under load, then kill single vm/container and observe continued operation of that function

Operator metrics? 

Separate VNFs availability from 5x 9s .  baseline versus ??

Support for new functional use cases

VM recovery / data replication 

Logging of diagnostics.

data plane vs control plane vnf metrics 

failure recovery of the platform

(discussion of challenges with timelines and holidays)

AI: mazin taking to board the level of participation from member companies.

how do we reduce the footprint for stability? Are we ready to sacrifice features?

Detection times of . close loop lifecycle

Can we please consider commenting the code as one of the key criteria
In Carrier Grade Enviroment, it becomes tedious when code is not commented (when P1 defect occurs)

VM recover/ data replication - should include a test of failover/recovery from site failure - replication is one of the enablers but the test is larger in scope

modeling normalizatrion

hi priority R2 list - services will be determined by carriers

scalability - not all components scale the same way. 

capability of that the carrier would like to deploy. what impact does that have on the suite of usecases?

(discussion of 5 G use cases)

latency of control loop-  design and run time - a concern. need latency constraints.

what about failure recovery of any platform component and how to restore the state of individual connections?

We need to move away from measuring service availability in terms of the fraction of *time* a given service is running to the fraction of service *requests/operations/sessions* that are delivered successfully. This is the pre-requisite for DevOps and agile service delivery.

stability of APIs and internal interfaces to the list for carrier grade

we should have the concept of unique transaction id/correlation id on each component for traceability perspective so we can investigate/trouble shoot call flow even under load - it does improve operability

We have to narrow down the redundancy schemes to be supported by each component of the ONAP platform.

Redundancy schemes which are used by carrier grade systems include 1+1, N-way active, M+N active-standby.

For the Beijing R2 release, we should create the targets for supporting redundancy schemes which will eventually lead to higher availability and make the platform carrier grade 

I would like to suggest the ONAP community to consider the functionality about operation and management, it will be benefit to SP to manage their network. For example,
1. ONAP should be able to collect alarms/events for different level, and do the correlation and display to network operators of SP.
2. ONAP should be able to persistently store and manage kinds of instances, for example, virtual resources and physical resources, like bare mental servers, physical routes, etc.
3. ONAP should be able to consider how to bring benefits to troubleshooting of ONAP platform itself. 





  • Modular architecture where one can mix and match ONAP components with other open source/commercial options

  • Flexible deployment options where selected ONAP components can be deployed on a regional and/or global basis and coordinated with each other

  • Scalability and availability should be completely configurable option by each SP for each module and the platform as a whole and not stipulated to be any specific number. The only requirement should be that each ONAP component is built as a cloud application (eg., state is maintained externally to the application), so then it is easy to support any kind of scaling (eg., horizontal) and availability (1+1, n:1 ..) by having mechanisms external to the application (such as stateless load balancer ..)



VNFs lifecycle and availability need to be handled through ONAP, to get ONAP best added value using closed loop control and lifecycle orchestration. 
Therefore, we definitely need RESILIENCY, especially concerning the closed loop modules (DCAE, Policy, controllers,...)
On top of resiliency, we do need to be able to efficiently operate ONAP itself, and to operate the vnf service through ONAP efficiently. 
To sum up: resiliency, ONAP operability and VNF operability through ONAP are key