The current ACM state machine works but it is incosistent in the way it handles error states or failed transitions. A composition and its elements can get "stuck" in transition states.
We need to
- Specify what the current state machine is for both compositions and elements and describe what the state machine for both should be
- Specify what the behaviour of the runtime and participants should be in each state
- Specify what the behaviour should be for the runtime and participants should be in transitions
Specifically we need to clarify:
- State of the composition elements
- State of the overall composition is derived from the composition element states
- Admin state/Running state
- When all the elements are fully up and configured, the go to state Passive, when all elements are in Passive, the full composition goes to Passive
- Error states: Are they parallel sates or part of the same state?
- There should “it didn’t work” states like “Passive-Error” or “Run_Error” (names to be decided later)
- Describe what the “Running” state means and what the participant should do in Passive->Running and Running->Passive transitions.
- Say a K8S service crashes, how do we feed that back? Running_Error. The state of the POD is only checked during startup. It is not periodically checked. There should be supervision.
Discuss priming and what the relationship is with this.
What does PASSIVE and RUNNING mean?
UNINITIALIZED: The participant knows about the AC Element and it's values and definition are in the participant but it hasn't started it yet. For example, in the K8S participant, the K8S participant knows the helm charts to use, the aprameter vales for config, but it hasn't called HELM yet to start the microservice
PASSIVE: The participant has started the AC Element. EG in the K8s PPNT, thelm has started the microservice, the micorservice is up and running but it is not doing anything yet.
ACTIVE: The participant has decided that the AC Element is working doing "Application" work. The state ACTIVE is controlled by the participant.
Indicate in each state, what is the expectation of the participants in each state.
ACM Element States in Participants
This section describes the state handling in ACM Elements in Participants
The following states are the only states in participants:
Then, a Control Loop Element can be running a number of operations, each of which has an operational state:
Operational State | From State | To | Result | Triggered | Description |
---|---|---|---|---|---|
No_Operation | None | None | N/A | None | No operation in progress |
Initialize | UNINITIALIZED UNINITIALIZED | PASSIVE UNINITIALIZED_ERROR | Success Fail | ACM Runtime | Makes an ACM Element ready for operation |
Uninitialize | PASSIVE PASSIVE PASSIVE_ERROR PASSIVE_ERROR | UNINITIALIZED PASSIVE_ERROR UNINITIALIZED PASSIVE_ERROR | Success Fail Success Fail | ACM Runtime | Takes an ACM Element out of operation |
UnitializeReset | UNINITIALIZED_ERROR UNINITIALIZED_ERROR | UNINITIALIZED UNINITIALIZED_ERROR | Success Fail | ACM Runtime | Clear an uninitialization error on an ACM Element for operation |
Activate | PASSIVE PASSIVE | RUNNING PASSIVE_ERROR | Success Fail | Participant | Bring an ACM element into service |
PassiveReset | PASSIVE_ERROR PASSIVE_ERROR | PASSIVE PASSIVE_ERROR | Success Fail | ACM Runtime | Clear an error on an ACM Element that is passive |
ForceUninitialize | PASSIVE_ERROR | UNINITIALIZED_ERROR | N/A | ACM Runtime | Force a participant out of operation |
Passivate | RUNNING RUNNING RUNNING_ERROR RUNNING_ERROR | PASSIVE RUNNING_ERROR PASSIVE RUNNING_ERROR | Success Fail | ACM Runtime Participant | Bring an ACM element out of service |
RuntimeError | RUNNING | RUNNING_ERROR | N/A | Participant | Participant flags a runtime error |
ClearRuntimeError | RUNNING_ERROR | RUNNING | N/A | pariticpant | Participant clears a runtime error flag |
ForcePassive | RUNNING_ERROR | PASSIVE_ERROR | N/A | ACM Runtime | Force a participant out of runtime state |
A Control Loop Element has a status indicator
Status Indicator | Description |
---|---|
OK | The ACM Element is stable in its current state |
Information | The ACM Element is stable in its current state, and there is information available on the last operation on this ACM element |
Warning | The ACM Element has a warning on its current state, and there is a warning on the last operation on this ACM element |
Error | The ACM Element has an error on its current state, and there is a error message on the last operation on this ACM element |
Each participant also records a log of all operations that occur, recording the information below:
Timestamp | Operational State | From State | To | Status Before | Status After | Message |
---|---|---|---|---|---|---|
<..time..> | Initializing | UNINITIALIZED | PASSIVE | OK | OK | |
<..time..> | Activating | PASSIVE | RUNNING | OK | OK | |
<..time..> | Passivating | RUNNING | PASSIVE | OK | OK | |
<..time..> | Uninitializing | PASSIVE | UNINITIALIZED | OK | OK | |
<..time..> | Initializing | UNINITIALIZED | UNINITIALIZED | OK | ERROR | Error Messsage |