The current ACM state machine works but it is incosistent in the way it handles error states or failed transitions. A composition and its elements can get "stuck" in transition states.
We need to
- Specify what the current state machine is for both compositions and elements and describe what the state machine for both should be
- Specify what the behaviour of the runtime and participants should be in each state
- Specify what the behaviour should be for the runtime and participants should be in transitions
Specifically we need to clarify:
- State of the composition elements
- State of the overall composition is derived from the composition element states
- Admin state/Running state
- When all the elements are fully up and configured, the go to state Passive, when all elements are in Passive, the full composition goes to Passive
- Error states: Are they parallel sates or part of the same state?
- There should “it didn’t work” states like “Passive-Error” or “Run_Error” (names to be decided later)
- Describe what the “Running” state means and what the participant should do in Passive->Running and Running->Passive transitions.
- Say a K8S service crashes, how do we feed that back? Running_Error. The state of the POD is only checked during startup. It is not periodically checked. There should be supervision.
State Machine for Automation Compositions
Current State Machine
- Composition in UNINITIALIZED state: all elements of a composition are in UNINITIALIZED state, all applications are not deployed and policy types are not deployed and not present in Api.
- User triggers to move the composition from UNINITIALIZED to PASSIVE: runtime-acm moves elements from UNINITIALIZED state to UNINITIALIZED_TO_PASSIVE.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: deploys applications
- participant-policy: creates policy types in Api and deploys them with Pap.
- participant-http: configures applications.
- Element in PASSIVE state:
- participant-ks8: applications are deployed.
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are configured.
- Composition in PASSIVE state: all elements are in PASSIVE state, all applications are deployed and configured.
- User triggers to move the composition from PASSIVE to UNINITIALIZED: runtime-acm moves elements from PASSIVE state to PASSIVE_TO_UNINITIALIZED.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: undeploys applications
- participant-policy: undeploys policy types with Pap and deletes them in Api.
- participant-http: do nothing
- Element in UNINITIALIZED state:
- participant-ks8: applications are undeployed.
- participant-policy: policy types are not deployed and not present in Api.
Proposed State Machine
State Machine for Automation Composition Elements
Current State Machine
TBC
Proposed State Machine
Proposed State Machine
- Composition in UNINITIALIZED state: all elements of a the composition are in UNINITIALIZED state, all applications are not deployed and policy types are not deployed and not present in Api.
- User triggers to move the composition from UNINITIALIZED to PASSIVE: runtime-acm moves elements from UNINITIALIZED state to UNINITIALIZED_TO_PASSIVE.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: deploys applications
- participant-policy: creates policy types in Api and deploys them with Pap.
- participant-http: checks if applications are healthy.
- Element in UNINITIALIZED_TO_PASSIVE_ERROR state: participant got error during deploy.
- Composition in UNINITIALIZED_TO_PASSIVE_ERROR state: at least one element is in UNINITIALIZED_TO_PASSIVE_ERROR state.
- User can re-try UNINITIALIZED_TO_PASSIVE.
- User can go back to UNINITIALIZED.
- Element in PASSIVE state:
- participant-ks8: applications are deployed.
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are healthy but not configured yet.
- Composition in PASSIVE state: all elements are moved to PASSIVE, all applications are deployed but not configured.
- User triggers to move the composition from PASSIVE to RUNNING: runtime-ACM moves elements from PASSIVE state to PASSIVE_TO_RUNNING.
- Element in PASSIVE_TO_RUNNING state:
- participant-ks8: do nothing (maybe checks if applications are running).
- participant-policy: do nothing (maybe checks if policy types are running).
- participant-http: configures applications.
- Element in PASSIVE_TO_RUNNING_ERROR state: participant got error during configuration.
- Composition in PASSIVE_TO_RUNNING_ERROR state: at least one element is in PASSIVE_TO_RUNNING_ERROR state.
- Element in RUNNING state:
- participant-ks8: applications are deployed (periodically checks if applications are running).
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are healthy and configured (periodically checks if applications are healthy).
- Composition in RUNNING state: all elements of a ACM are in RUNNING state, all applications are running.
- Element in RUN_ERROR state: participant got error during running state (it periodically checks if applications are running).
- Composition in RUN_ERROR state: at least one element is in RUN_ERROR state
- User could decide to move the composition from RUN_ERROR to PASSIVE state.
- Application has been restarted by kubernetes, Participant detects that the application is running and move the element from RUN_ERROR to RUNNING.
- User triggers to move the composition from RUNNING to PASSIVE: runtime-acm moves elements from RUNNING state to RUNNING_TO_PASSIVE.
- Element in RUNNING_TO_PASSIVE:
- participant-ks8: do nothing
- participant-policy: do nothing
- participant-http: remove configuration
- Element in RUNNING_TO_PASSIVE_ERROR state: participant got error during removing configuration
- Composition in RUNNING_TO_PASSIVE_ERROR state: at least one element is in RUNNING_TO_PASSIVE_ERROR state.
- User triggers to move the composition from PASSIVE state to UNINITIALIZED: runtime-acm moves elements from PASSIVE state to PASSIVE_TO_UNINITIALIZED.
- Element in PASSIVE_TO_UNINITIALIZED:
- participant-ks8: undeploys applications
- participant-policy: undeploys policy types with Pap and deletes them in Api.
- participant-http: do nothing
- Element in PASSIVE_TO_UNINITIALIZED_ERROR state: participant got error during undeployment
- Composition in PASSIVE_TO_UNINITIALIZED_ERROR state: at least one element is in PASSIVE_TO_UNINITIALIZED_ERROR state.
- Element in UNINITIALIZED state:
- participant-ks8: applications are undeployed.
- participant-policy: policy types are not deployed and not present in Api.
- In any Error status the User can re-try the operation.
Note:
Whit this solution, User can move from RUNNING to PASSIVE, update the service template related to the configuration (participant-http) when applications are still up, and after move from PASSIVE to RUNNING.
Second Proposed State Machine
- Composition in UNINITIALIZED state: all elements of a the composition are in UNINITIALIZED state, all applications are not deployed and policy types are not deployed and not present in Api.
- User triggers to move the composition from UNINITIALIZED to PASSIVE: runtime-acm moves elements from UNINITIALIZED state to UNINITIALIZED_TO_PASSIVE.
- Element in UNINITIALIZED_TO_PASSIVE:
- participant-ks8: deploys applications
- participant-policy: creates policy types in Api and deploys them with Pap.
- participant-http: configures applications.
- Element in UNINITIALIZED_TO_PASSIVE_ERROR state: participant got error during deploy.
- Composition in UNINITIALIZED_TO_PASSIVE_ERROR state: at least one element is in UNINITIALIZED_TO_PASSIVE_ERROR state.
- User can re-try UNINITIALIZED_TO_PASSIVE.
- User can go back to UNINITIALIZED.
- Element in PASSIVE state:
- participant-ks8: applications are deployed.
- participant-policy: policy types are create in Api and deployed with Pap.
- participant-http: applications are configured.
- Composition in PASSIVE state: all elements are moved to PASSIVE, all applications are deployed and configured. Runtime-ACM automatically moves the composition from PASSIVE to RUNNING: runtime-ACM moves elements from PASSIVE state to PASSIVE_TO_RUNNING.
- Element in PASSIVE_TO_RUNNING state:
- participant-ks8: starts monitoring if applications are running.
- participant-policy: do nothing (maybe starts monitoring if policy types are running).
- participant-http: starts monitoring if applications are healthy.
- Element in PASSIVE_TO_RUNNING_ERROR state: participant got error during configuration.
- Composition in PASSIVE_TO_RUNNING_ERROR state: at least one element is in PASSIVE_TO_RUNNING_ERROR state.
- Element in RUNNING state:
- participant-ks8: monitoring if applications are running.
- participant-policy: do nothing (maybe monitoring if policy types are running).
- participant-http: monitoring if applications are healthy.
- Composition in RUNNING state: all elements of a ACM are in RUNNING state, all applications are running.
- Element in RUN_ERROR state: participant got error during running state (it periodically checks if applications are running).
- Composition in RUN_ERROR state: at least one element is in RUN_ERROR state
- User could decide to move the composition from RUN_ERROR to PASSIVE state.
- Application has been restarted by kubernetes, Participant detects that the application is running and move the element from RUN_ERROR to RUNNING.
- User triggers to move the composition from RUNNING to PASSIVE: runtime-acm moves elements from RUNNING state to RUNNING_TO_PASSIVE.
- Element in RUNNING_TO_PASSIVE:
- participant-ks8: stop monitoring
- participant-policy: stop monitoring
- participant-http: stop monitoring
- User triggers to move the composition from PASSIVE state to UNINITIALIZED: runtime-acm moves elements from PASSIVE state to PASSIVE_TO_UNINITIALIZED.
- Element in PASSIVE_TO_UNINITIALIZED:
- participant-ks8: undeploys applications
- participant-policy: undeploys policy types with Pap and deletes them in Api.
- participant-http: do nothing
- Element in PASSIVE_TO_UNINITIALIZED_ERROR state: participant got error during undeployment
- Composition in PASSIVE_TO_UNINITIALIZED_ERROR state: at least one element is in PASSIVE_TO_UNINITIALIZED_ERROR state.
- Element in UNINITIALIZED state:
- participant-ks8: applications are undeployed.
- participant-policy: policy types are not deployed and not present in Api.
- In any Error status the User can re-try the operation.
Note:
Whit this solution, User can move from RUNNING to PASSIVE, update the service template related to the configuration (participant-http) when applications are still up, and after move from PASSIVE to RUNNING.
ACM Element States in Participants
This section describes the state handling in ACM Elements in Participants
The following states are the only states in participants:
Then, a Control Loop Element can be running a number of operations, each of which has an operational state:
Operational State | From State | To State | Description |
---|---|---|---|
No_Operation | None | None | No operation in progress |
Initializing | Unitialized | Passive | Triggered by ACM Runtime to prepare an ACM Element for operation |
Uninitializing | Passive | Uninitiated | Triggered by ACM Runtime to bring an ACM Element out of operation |
Activating | Passive | Running | Triggered by the Participant to bring an ACM element into service |
Passivating | Running | Passive | Triggered by the Participant to bring an ACM element out of service |
A Control Loop Element has a status indicator
Status Indicator | Description |
---|---|
OK | The ACM Element is stable in its current state |
Information | The ACM Element is stable in its current state, and there is information available on the last operation on this ACM element |
Warning | The ACM Element has a warning on its current state, and there is a warning on the last operation on this ACM element |
Error | The ACM Element has an error on its current state, and there is a error message on the last operation on this ACM element |
Each participant also records a log of all operations that occur, recording the information below:
Timestamp | Operational State | From State | To | Status Before | Status After | Message |
---|---|---|---|---|---|---|
<..time..> | Initializing | UNINITIALIZED | PASSIVE | OK | OK | |
<..time..> | Activating | PASSIVE | RUNNING | OK | OK | |
<..time..> | Passivating | RUNNING | PASSIVE | OK | OK | |
<..time..> | Uninitializing | PASSIVE | UNINITIALIZED | OK | OK | |
<..time..> | Initializing | UNINITIALIZED | UNINITIALIZED | OK | ERROR | Error Messsage |
@startuml
[*] --> UNINITIALIZED: Created by\nACM Runtime
UNINITIALIZED --> PASSIVE: Initializing (Success)
UNINITIALIZED --> UNINITIALIZED_ERROR: Initializing (Error)
UNINITIALIZED_ERROR --> UNINITIALIZED: Uninitializing (Clear Error)
PASSIVE --> UNINITIALIZED: Uninitializing (Success)
PASSIVE --> PASSIVE_ERROR: Uninitializing (Error)
PASSIVE_ERROR --> UNINITIALIZED: Uninitializing (Success)
PASSIVE_ERROR --> UNINITIALIZED_ERROR: Uninitializing (Error)
PASSIVE_ERROR --> PASSIVE: Initializing (Clear Error)
PASSIVE --> RUNNING: Activating (Success)
PASSIVE --> PASSIVE_ERROR: Activating (Failure)
RUNNING --> PASSIVE: Passivating (Success)
RUNNING --> PASSIVE_ERROR: Passivating (Error)
RUNNING --> RUNNING_ERROR: RuntimeEror (Error)
RUNNING_ERROR --> RUNNING: ClearRuntimeEror
RUNNING_ERROR --> PASSIVE: Passivating (Success)
RUNNING_ERROR --> PASSIVE_ERROR: Passivating (Error)
@enduml