TOSCA Defined Control Loop: PoC Architecture and Design
This page describes the Architecture and Design used for the Tosca Defined Control Loop PoCs executed in the Guilin and Honolulu releases. Please see TOSCA Defined Control Loops: Architecture and Design for the Architecture and Design being used in the Istanbul release.
The idea of using control loops to automatically (or autonomously) perform network management has been the subject of much research in the Network Management research community, see this paper for some background. However, it is only with the advent of ONAP that we have a platform that supports control loops for network management. Before ONAP, Control Loops have been implemented by hard-coding components together and hard coding logic into components. ONAP has taken a step forward towards automatic implementation of Control Loops by allowing parameterization of Control Loops that work on the premise that the Control Loops use a set of analytic, policy, and control components connected together in set ways.
The goal of the work is to extend and enhance the current ONAP Control Loop support to provide a complete open-source framework for Control Loops. This will enhance the current support to provide TOSCA based Control Loop definition and development, commissioning and run-time management. The participants that comprise a Control Loop and the metadata needed to link the participants together to create a Control Loop are specified in a standardized way using the OASIS TOSCA modelling language. The TOSCA description is then used to commission, instantiate, and manage the Control Loops in the run time system.
We consider Control Loops at Design Time and Run Time.
At Design Time, there are two capabilities to be supported:
- Participant Metadata Definition. This capability allows external users and systems (such as SDC or DCAE-MOD) to define participants that can take part in a control loop and to define the metadata that can be used on and configured on a participant when it is taking part in in a control loop. The post condition of an execution of this capability is that a participant is defined in the Control Loop Design Time Catalogue together with sets of metadata that can be used with this participant in control loops.
- Control Loop Composition. This capability allows users and other systems to create a control loop type by connecting a chain of participants together from the participants that are available in the Control Loop Design Time Catalogue. In an execution of this capability, a user will define the control loop chain, define the connections between participants, will select the correct metadata sets for each participant, and will define the overall control loop metadata. The post condition for an execution of this capability is a Control Loop definition in TOSCA stored in the Control Loop Design Time Catalogue.
At Run Time, the following capabilities are to be supported:
- Participant Registration. This capability allows participants to register and deregister with CLAMP. The post condition for an execution of this capability is that a participant is available for participation in a control loop.
- Control Loop Commissioning. This capability allows version controlled Control Loops to be taken from the Control Loop Design Time Catalogue and be placed in the Control Loop Run Time Inventory. It also allows configuration of parameters that apply to the Control Loop Type, that is parameters that will apply to all control loop instances. Further, it allows control loop types to be commissioned on participants. Data that applies to all instances of a control loop type on a participant is sent to a participant. The participant can then take whatever actions it need to do to support the control loop type in question. The post condition for an execution of this capability is that the Control Loop definition is in the Control Loop Run Time Inventory and all participants in this control loop type are commissioned, that is they are prepared to run instances of this control loop type.
- Control Instantiation. This capability allows an instance of a control loop to be created. The control loop definition is read from the Control Loop Run Time Inventory and values are assigned to the parameters defined for the control loop in the same manner as the existing CLAMP client does. A control loop that has been created but has not yet been sent to participants is in state UNINITIALIZED. The control loop instance parameters can be revised and updated as often as the user requires. Once the user is happy with the parameters, the control loop instance is sent to participants and the control loop instance elements on each participant are started by participants using the control loop metadata. Once the control loop is instantiated on each participant, the Control Loop instance is set as being in state PASSIVE in the Control Loop Run Time Inventory. The user can now order the participants to change the state of the control loop to state RUNNING. Each participant begins accepting and processing control loop events and the control loop is set to state RUNNING in the control loop inventory. The post condition for an execution of this capability is that the Control Loop instance is running on participants and is processing events.
- Control Loop Monitoring. This capability allows control loops to be monitored. Users can check the status of a control loop instances and the status of each participant in a control loop instance. Control loop participants report their overall status and domain status periodically to CLAMP. Clamp aggregates these status reports into an aggregated control loop instance status record, which is available for monitoring. The post condition for an execution of this capability is that control loop instances are being monitored.
Once a control loop definition has been commissioned, instances of the control loop can be created, updated, and deleted. The system manages the lifecycle of control loops following the state transition diagram below.
1: Overall Architecture
The diagram below shows an overview of the architecture of TOSCA based Control Loop management.
The Design Time Catalogue contains the metadata primitives and control loop definition primitives for composition of control loops. As shown in the figure above, the Design Time component provides a system where Control Loops can be designed and defined in metadata. This means that a Control Loop can have any arbitrary structure and the Control Loop developers can use whatever analytic, policy, or control participants they like to implement their Control Loop. At composition time, the user parameterises the Control Loop and stores it in the design time catalogue. This catalogue contains the primitive metadata for any participants that can be used to compose a Control Loop. A Control Loop SDK is used to compose a Control Loop by aggregating the metadata for the participants chosen to be used in a Control Loop and by constructing the references between the participants.
Composed Control Loops are commissioned on the run time part of the system, where they are stored in the run time inventory and are available for instantiation.
When a user wishes to instantiate a Control Loop, they set values for the parameters of the Control Loop. Once the parameterization has been carried out, the Control Loop instantiated, with the metadata and whatever other artifacts are required being passed to the participants in the Control Loop. At runtime, the Control Loop can be monitored and analysed. It can also be updated as required and can be deleted when it is on longer needed.
The Control Loop Runtime Management will use ONAP services for non-functional aspects such as inventory, topology and data delivery.
1.1: Class Diagrams
1.1.1 Design Time
1.1.2 Runtime
Comment
1.2: ERD
2: Control Loop Modelling
Joseph O'Leary to pad out this section
node_templates
which makes up the loop itself.
Applications can be a DCAE microservice, an operational policy, or any other application as long as it can be modeled, and the targeted ecosystem to has a participant client waiting for the event distributions from CLAMP via DMaaP Message Router.
2.1: Control Loop TOSCA file definition
2.1.1 Control Loop Component Definition
A Control Loop Component that can be part of a control loop, it defines the components that partake in a control loop, and are implemented at run time by participants. The control loop component definition is truly dynamic and, as long as the participant that the control loop component definition relates to understands its definition, it can be anything. However, we have designed a base control loop component attribute that's generic and that can act as a good starting point.
2.1.2 Loop Definition
The loop definition is explicit in the node_templates
within the topology_template
, a Control Loop node template is specified and any node tempalte specified in the Control Loop node tepolcate is part of the control loop managed by CLAMP.
The below example doesn't explicitly include any order, ordering of control loop execution is to be considered in the future which likely would lead to changes to this
2.1.3 Example of two Control Loop instances
2.2: Modelling from TOSCA to Commissioned Data in Run Time Inventory
2.3: Modelling from TOSCA to Instance Data Run Time Inventory
2.4: Swagger REST APIs for Control Loop
ControlLoop Runtime Swagger REST APIs:
ControlLoop_Runtime_Swagger_API.yml
Participant Swagger REST APIs:
3: APIs and Sequence Diagrams
3.1: Commissioning
Ajay Deep Singh to pad out this section
This section defines Commissioning/CRUD Operations that can be performed on ControlLoops.
A Client, in this case CLAMP, can perform CRUD operations or can commission ControlLoops from DesignTime to RunTime Inventory Database.
DesignTime/RunTime Catalogue/Inventory Database stores ControlLoop definitions, CRUD operations on database supported by REST Endpoints like Get, Delete, Create allowing selection of a particular ControlLoop to be addressed, below sequence diagram will help you understand flow how a client(Clamp) application can initiate Rest call for performing different operations on Database.
API_Gateway Service is for interacting to different database DesignTime/RunTime and should be responsible for responding success or failure status on different operations.
The commissioning of ControlLoops definition from DesignTime Catalogue to RunTime Inventory Database can we achived using the commissioning Rest Endpoint, in this process when a rest request is initiated from a client(Clamp) the API_Gateway Service take cares of fetching ControlLoops metadata from DesignTime and creates in RunTime Inventory Database, Commissioning API ControlLoop Sequence diagram will help you understand the flow.
In future commissioning Rest Endpoint might be updated to push ControlLoops not only in RunTime Database but to the participants involved in ControlLoop.
3.1.1: Commissioning REST API
3.1.2: Commissioning Sequence Diagrams
GET, DELETE, CREATE API ControlLoop Sequence Diagram
Commission API ControlLoop Sequence Diagram
3.2: Instantiation
Robertas Rimkus to pad out this section
This section refers to Instantiation of a Commissioned control loop. A client, in this case CLAMP (potentially DCAEMOD, etc in the future) will render the commissioned control loops allowing selection of a particular control loop to be instantiated. User will then provide the configurations needed to instantiate the selected control loop which will be sent onto the CL_Instance_Control Service. The service will then distribute the configurations to DMaaP topic. Participants (agents) will pull the event containing the config and pick out their control loop components to be instantiated and start/set up those particular components. CL_Instance_Control Service will be waiting for a response back from all participants involved in the instantiation of the control loop, in regards to the state of instantiation. In successful response case the service will store the CL Instance LCM (Life Cycle management) data into the runtime DB as well as providing a message back to the client of the successful instantiation. In failure to receive the response case, a timeout will be called, which will result in a teardown event being sent to DMaaP. The participants will then receive the event and proceed to teardown the components that were instantiated or check that they have failed to instantiate in the first place and send a Teardown ACK back to the CL_Instance_Control Service. No CL Instance LCM data will be stored and a message indicating failure to instantiate the CL along side with the error will be sent back to the client (CLAMP).
3.2.1: Instantiation REST API
3.2.2: Instantiation Sequence Diagrams
3.2.3: Instantiation DMaaP API
Initial Thought for an event to be sent from CL_Instance_Control onto DMaaP for Participants to consume. The event would go onto an output topic which the Participants would be polling/subscribed to
e.g url : https://{{ONAPIP}}:{{DMaaPPort}}/events/CL_INSTANCE_CONTROL_OUTPUT
*Preferred solution is to send TOSCA in the body. Meaning we could reuse the parsing code which is already present and provide it to the participant.
3.2.4: Instantiation Participant API
*Suggestion was to put JAVA API code in this section for the participant talking to DMaaP. TBD
3.3: Monitoring
In this case it refers to monitoring the data that the participants will provide to DMaaP. Participants will send events to DMaaP which will be pulled by the CL_Supervision_Service in to the runtime database. Monitoring service provides APIs to display the statistics data from runtime database to the Monitoring GUI. The data provided should include a reference id to the control loops that are instantiated on the participant, as well as the applications that have been instantiated as a part of that control loop for that participant. Data should also include the time that the application has started, state of it (running/terminated) and any other critical information which would help to determine the health of an instantiated control loop and its components. Idea is for the participant to provide events every certain period of time, similar to a health-check, in order to provide consistent monitoring.
3.3.1: Monitoring REST API
3.3.2: Monitoring Sequence Diagrams
3.3.3: Monitoring DMaaP API
Participants will send an event containing monitoring data to a DMaaP topic at a set interval after participant has received an event to instantiate a control loop
e.g url: https://{{ONAPIP}}:{{DMaaPPort}}/events/CL_MONITORING_SERVICE_INPUT
3.3.4: Monitoring Participant API
Presume similar thinking to Instantiation Participant API
*Suggestion was to put JAVA API code in this section for the participant talking to DMaaP. TBD
3.4: Supervision
Supervision is responsible for ensuring that
- control loops are established once their initiation has been ordered
- control loops are running correctly once their initiation is completed
- control loops are correctly removed once their removal has been ordered
3.4.1: Supervision Sequence Diagrams
3.4.2: Supervision APIs to other components
4: Design
4.1: Server Side
4.1.1 Database Schema and JPA
4.1.2: TOSCA Processing
4.1.3: Instance Control
4.1.4: Execution Monitoring
4.2: Participant Side
Participant is a component that acts as a bridge between Runtime and components like Policy-framework, DCAE, Kubernetes cluster etc.
It listens to Dmaap to receive messages from runtime and performs operations towards control loop components.
Every participant has two parts Participant-Intermediary and a Participant-Impl.
Participant-Intermediary is a common component that listens to Dmaap and acts on the messages, participant-impl handles the logic towards
control loop element.
4.2.1: Participant Message handling
Participant handles 4 types of messages
1. Participant State Change : This message handles states of a participant. Runtime can order participant for a state change.
ParticipantState can be set to one of the following
UNKNOWN : Control Loop execution is unknown.
PASSIVE : Control Loop execution is always rejected.
SAFE : Control Loop execution proceeds, but changes to domain state or context are not carried out.
The participant returns an indication that it is running in SAFE mode together with the action it would
have performed if it was operating in ACTIVE mode.
TEST : Control Loop execution proceeds and changes to domain and state are carried out in a test environment.
The participant returns an indication that it is running in TEST mode together with the action it has performed
on the test environment
ACTIVE : Control Loop execution is executed in the live environment by the participant.
TERMINATED : Control Loop execution is terminated and not available.
2. Control Loop Update: This message creates the control loop elements and brings them from UNINITIALIZED to PASSIVE state.
ControlLoopUpdate message contains full ToscaServiceTemplate describing all components participating in a control loop.
This acts as a template for any control loop to be created according to the template.
When participant-intermediary receives this message, it triggers creation of policy-types and policies in Policy-Framework by Policy-Participant,
and deploys DCAE from DCAE-participant
3. Control Loop State change: This message is used to order a state change in control loop element.
Runtime can order one of the following ordered states.
UNINITIALIZED : The control loop or control loop element should become uninitialized on participants, it should not exist on participants.
PASSIVE : The control loop or control loop element should initialized on the participants and be passive, that is,
it is not handling control loop messages yet.
RUNNING : The control loop or control loop element should running and is executing control loops. Once any of above states are ordered, then control loop element transitions into
UNINITIALIZED : The control loop or control loop element is not initialized on participants, it does not exist on participants.
UNINITIALIZED2PASSIVE : The control loop or control loop element is changing from uninitialized to passive,
it is being initialized onto participants.
PASSIVE : The control loop or control loop element is initialized on the participants but is passive, that is, it is not
handling control loop messages yet.
PASSIVE2RUNNING : The control loop or control loop element is changing from passive to running,
the participants are preparing to execute control loops.
RUNNING : The control loop or control loop element is running and is executing control loops.
RUNNING2PASSIVE : The control loop or control loop element is completing execution of current control loops but
will not start running any more control loops and will become passive.
PASSIVE2UNINITIALIZED : The control loop or control loop element is changing from passive to uninitialized,
the control loop is being removed from participants
4. Participant Healthcheck: This message is used to learn the health status of a participant.
As a response to any of the above message participant returns a Participant Status message, holding respective message response.
Runtime receives Participant Status message and stores relevant information in database, Or performs respective actions.
4.2.2: Policy Participant Agent
Policy participant receives messages through participant-intermediary common code, and handles them by invoking REST APIs towards policy-framework.
For example, When a ControlLoopUpdate message is received by policy participant, it contains full ToscaServiceTemplate describing all components
participating in a control loop. When the control loop element state changed from UNINITIALIZED to PASSIVE, Policy-participant triggers creation
of policy-types and policies in Policy-Framework.
When the state changes from PASSIVE to UNINITIALIZED, Policy-Participant deletes the policies, policy-types by invoking REST APIs towards policy-framework.
4.2.4: DCAE Participant Agent
DCAE participant receives messages through participant-intermediary common code, and handles them by invoking CLAMP DCAE methods,
which internally work towards DCAE.
For example, When a ControlLoopUpdate message is received by DCAE participant, it contains full ToscaServiceTemplate describing all components
participating in a control loop. When the control loop element state changed from UNINITIALIZED to PASSIVE, DCAE-participant triggers deploy
of DCAE.
When the state changes from PASSIVE to UNINITIALIZED, DCAE-Participant un-deploys DCAE by invoking methods towards CLAMP.
4.2.5: Kubernetes Participant Agent
Kubernetes participant receives messages through participant-intermediary common code, and handles them by invoking Kubernetes Open API.
For example, When a ControlLoopUpdate message is received by Kubernetes participant, When the control loop element state changed from UNINITIALIZED to PASSIVE, Kubernetes-participant triggers Kubernetes Open API and passes the HELM charts towards cluster.
4.3: Client Side
4.3.1: Client SDK: Composition of Control Loop Tosca
4.3.2: Client User Interface
4.4 Other Considerations
4.4.1 Upgrade
Performing a hot upgrade of the Control Loop at run time as well as handling an upgrade of the software in one or more of the participants in an Control Loop is a particularly challenging issue because upgrading must handle the following cases without tearing down the Control Loop:
- Upgrade and changes of the configuration data of participants
- Addition of or removal of participants in an Control Loop
- Upgrade of software in one or more participants in an Control Loop
- Maintenance of compatibility between participants when an update of more than one participant must be done together to ensure compatibility, for example, when a protocol being used by two participants to communicate is upgraded
4.4.2 Scalability
The system is designed to be inherently scalable. The control loop runtime server is stateless, all state is preserved in the run time inventory in the database. When the user requests a control loop operation (such as an instantiation, activation, passivation, or an ininitialization) the server broadcasts the request to participants over DMaaP and saves details of the request to the database. The server does not directly wait for responses to requests.
When a request is broadcast on DMaaP, the request is asynchronously picked up by participants of the types required for the control loop instance and those participants manage the life cycle of its control loop elements. Periodically, each participant reports back on the status of operations it has picked up for the control loop elements it controls, together with statistics on the control loop elements over DMaaP. On reception of these participant messages, the server stores this information to its database.
The server periodically runs a supervion function, which checks the status of all existing control loop instances and the status of outstanding requests. It builds a picture of the current status of each control loop instance from the reports on the elements of the control loop instances. Once the server has a full picture, it checks that the control loop instance is in the correct state as requested by the user of the system. If the control loop is not in the correct state, the supervision function can initiate actions such aas performing retries on operations or issuing alarms or notificaitons on control loop instances.
This approach makes it easy to scale control loop LCM. As control loop instance counts increase, more than one runtime server can be deployed and REST/supervision operations on control loop instances can run in parallel. The number of participants can scale because an asynchronous broadcast mechanism is used for server-participant communication and there is no direct connection or communication channel between participants and runtime servers. Participant state, control loop instance state, and control loop element state is held in the database, so any runtime server can handle operations for any participant. Because many participants of a particular type can be deployed and participant instances can load balance control loop element instances for different control loops of many types across themselves using a mechanism such as a Kubernetes cluster.
5: Goals
5.1: MVP
5.2: ControlLoop in Tosca LCM Istanbul Jiras
- Design Time
- Support design of multiple control loops*
- Support design of individual control loop component**
- Support composition of control loops**
- Runtime
- Participant registration and participant deregistration
- Support commissioning of control loops
- Ingestion with artifact references*
- Ingestion with artifact embedded**
- Support instantiation of control loop
- Support instantiation of control loop TOSCA to DMaaP MR*
- Support instantiation of config for the control loop*
- Support monitoring of control loops
- Receive control loop heartbeat events (heartbeat starts when component of control loop is running)*
- Support supervision of control loops
- Periodically check monitored data, and update state of control loop*
- Participants
- Agent library*
- Reference(test) participant*
- CDS participant*
- DCAE participant*
- Policy participant*
- Demo**
- Throwaway Monitoring/Control GUI