Control Loop Operation and Improvements.
This page has been created as part of the work activities for POLICY-1367 under POLICY-1068. Limitations on the management and operation of Control Loops are outlined and potential improvements are described and opened for discussion.
This page is also related to the limitations as described in The ONAP Policy Framework: Architectural Improvements for Dublin and to be compatible with the long term Policy architectural view as described in The ONAP Policy Framework.
For the sake of discussion, the basic design artifacts involved in a Control Loop are described next. These are:
Operational Policies.
Rules and Support Libraries.
I. Basic Design Artifacts
I.A. Operational Policies
An Operational Policy is the highest level of abstraction when designing a Control Loop. Operational Policies are written in YAML markup language and conforms to a predefined grammar. This policy type describes the runtime behavior of a Control Loop, including interactions with other ONAP components to realize the policy, failure treatments, and chaining with other policies.
A sample Operational Policy that realizes the vCPE flow can be found here.
The Operational Policies are simple and human readable. They can be directly composed by operations personnel with control loop domain knowledge, or guided via GUI as CLAMP does..
I.B. Rules and Support Libraries
The underlying technologies that enable run time execution of Operational Policies and therefore Control Loops are rules based (specifically Drools rules) with supporting Java libraries.
The rules and support libraries requires more sophisticated development skills as they must support parallel execution of Operational Policies, interaction with other components (actors), success and failure treatments.
The rules and support libraries code base are much more static entities than the Operational Policies which are more dynamic in nature. More to this point, currently, the Policy development team each release blesses the rules and support libraries to support the ONAP Use Cases. This code base typically does not change (unless bugs are identified) until next release when new or enhanced use cases are to supported.
An example of a rules file can be seen here and the support libraries can be find here.
II. Deployment of Control Loops
In Casablanca, the official mechanism to deploy control loops requires that the rules template is uploaded to the PAP, previous to creating and pushing policies (see sample script). The rules template is not a valid rules file (drl). rather a parameterized version, where the parameters to be expanded are contained within ${}.
For the purpose of discussion, the high level operation between the BRMS Gateway (BRMSGW) and Drools PDP (PDP-D) is outlined below (it may include some steps perform by the PDP-X and PAP in the flow).
A parameterized drl template must be uploaded through the PDP-X through the policyEngineImport API.
Upon a request to create and deploy a new Control Loop, a new PAP/PDP-X BRMS Policy is created and pushed through the PDP-X. The createPolicy operation will contain the name of the control loop, the encoded Operational Policy, PAP relevant information, and the parametarized template. A new instance of the rules to support the control loop in the form of a drl file will be instantiated after performing parameter expansion on the the template from #1.
A new or existing maven project in the BRMSGW file system is created or updated to incorporate a new rules (drl) file (from #2) that realizes the particular control loop. In order to determine the next version of the rules artifact that will contain rules to handle this control loop, the BRMSGW may crawl the policy nexus maven repository to determine the next version to be used.
The updated maven project from #3 will be packaged into a jar artifact and deployed into the nexus maven repository.
The BRMSGW will notify the PDP-D via DMaaP of a new rules version artifact available with maven coordinates x.y.z in the Policy nexus maven repository.
The PDP-D fetches the new artifact from nexus that contains the newly generated Control Loop rules, as well as any dependencies.
The PDP-D compiles the rules.
The PDP-D updates the controller to the new set of rules that enables run time support for this new Control Loop while preserving the current set of facts in the working memory.
These steps are performed every time a new Control Loop is added, modified, or deleted.
These operations are resource, and time sensitive and could spawn well over several minutes.
III. Current Limitations.
The current architecture imposes limitations, both at design, and runtime.
III.A. Design Time Limitations.
The set of support libraries (I.B) is constrained as it requires modifications to the BRMSGWcode (most notably dependency.json) every time there is a change. The dependency.json is a single file that describes a single set of dependencies for all control loop use cases. The maintenance of this file has been proven problematic, as there is tight coupling between the control loop application, and the Policy releases system. The designer must work closely with the BRMS development team for proper inclusion of all dependencies a priori for a given release.
The rule execution environment description embedded in the kmodule.xml, is also fixed to cloud operation mode in BRMSGW when the rules artifact is put together. Event streaming, and multiple session support are not supported.
The designer is restricted to work with one drl file. This file must be parameterized with configuration parameters to be useful in a runtime environment. The parameterized drl is called a template. Identification of the parameters in the template must occcur a priori during a release development cycle, and coordinated changes are required on at least a couple ONAP teams, CLAMP and Policy.
It is worth to note that the time phase deliveries does not prevent from errors being introduces in the design phase output artifacts, such as potential for rule collision, missing dependencies, or invalid syntax.
III.B. Runtime Limitations.
III.B.1. Management Limitations.
Management of Operation Policies is dispersed and loosely coupled:
While CLAMP can manage Operational Policies, the drl template is an isolated step in the Policy subsystem via the policyEngineImport API. Inspecting the template requires to look at the database.
Operational policies are encoded and embedded in BRMS Policies.
The PDP-Ds do not conform to the grouping architecture of PDP-Xs, challenging the visibility of mappings from Operational Policies to PDP-Ds.
III.B.2. Runtime Limitations.
The upgrade operation in the PDP-D heavily used. As new/updated/deleted Operational Policies (Control Loops) are produced, a new rules artifact is generated following the steps 1-8 from Section II. As seen, each such modification is time and resource intensive, and it is known it may cause instabilities during the upgrade periods, which can take in the order of minutes. Unpredictable interactions have also been observer between the processing of incoming traffic and the upgrade itself.
Invalid rules artifacts deployed by the BRMSGW (see III.A) clearly will cause system disruptions, for all control loops running in the same session.
IV. Experimentation.
The aim is to alleviate the previous shortcomings with the current approach. The main observation that the Templates and Operational Policies have their own separate and well defined lifecycles:
Templates change rarely, and requires specialized knowledge of drools, policy, and java.
Operational Policies change often, it is the working horse of the PDP-D, and should not require such specialize knowledge to design them.
Decoupling the lifecycle of both allow us to design a system where upgrades (and its harmul implications) are reduced to little or none.
The controller-casablanca and feature-controlloop-management in the policy/drools-applications play with these concepts. Once the casablanca is installed, dynamic Control Loop addition, or deletion, does not require steps 1-8 from Section II. Instead, an Operation Policy fact is injected into the working memory, just one simple step. This will be the common mode of operation.
Under this new paradigm, if there were 100 Control Loops, there still would be 1 single drl with 20 rules, and if there are 100 fact, one for each Control Loop (Operational Policy) Compared this with the existing default mode of operation, there would be 100 drls (which had to be upgraded and compiled one by one), 2000 rules at play, and 100 facts. More importantly, with the new approach, there would be no upgrades, nor rule compilations, and therefore no service disruption.
Upgrade of the drl and support libraries will become the exception rather than the norm but still need to be supported but without the restrictions described in III.A. The concept or a parameterized template goes away, a designer works directly with drl, and the blessing of a designer artifact of type template must follow a different path of certification rather than through the BRMSGW (TODO: more details of the process to be provided from architecture, and proper procedures to make it available).