Table of Contents |
---|
...
Assumptions
Assumption | Notes | |||
---|---|---|---|---|
1 | When a DMI restarts all cm-handles related to that DMI will consider to have trust-level COMPLETETemporary assumption until the 'Audit' function has been implemented |
Issues and Decisions
# | Issue | Notes | Decision | |
---|---|---|---|---|
1 | howHow fast should CPS (and DB) be able to process max heart beat failures? | is 60K really realistic if ENM goes down we should get a notification for each node do we ?! | PoC has shown 60 seconds is reasonable | |
2 | Restart of NCMP | shouldShould/can Can this be handled? | As of now, there should no be re-registration flowis no such case is being considered. | |
3 | Does DMI Plugin provide NCMP with a health check URL during registration? Or Either, just rely on the default one provided with Spring boot actuator? | Document the contract. Its just the interface that matters and not the implementation. | Spring boot actuator interface4 | Look for the dmi data service (dmiDataPlugin) for the healthcheck |
Description
- Define scenarios which cause a CM Handle to go stale.
- Implement changes to support tracking of CM Handle Freshness/Staleness.
What might trigger a cmHandle to go to STALE?
- dmi plugin identifies that the device is no longer contactable.
- dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself.
...
# | Interface | Requirement | Additional Information | Sign-off |
---|---|---|---|---|
1 | CPS-NCMP-E-05 | The 'trustlevel' is visible on all REST methods that currently include the 'cm handle state' | existing endpoints |
|
2 | CPS-NCMP-E-05 | CM Handles can be queried (filter condition) on 'trustlevel' | using a new 'trustLevel' condition (cannot use cpsPath condition) |
|
3 | CPS-NCMP-I-01 | During registration, DMI plugin can report initial trustlevel. If the state is not 'complete', it should be considered as 'Trustlevel change' (See req 5) | Initial trust level will be backward compatible if not set, we assume trustlevel is 'complete' For a new cm-handle where the trustlevel is 'complete' this is NOT considered a chance and no notifications should be sent |
|
4 | CPS-NCMP-E-05 | Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be set to be 'NONE'. This wil also lead to many notifcations as per req. #5 | this might lead to a high level (20K) of notifications (need to discuss capabilities) |
|
5 | CPS-NCMP-E-05.e | NCMP notification shall be sent when the trustlevel changes | Notification be sent externally based on Kafka many small or bulk: Agreed Many notifications, one for each cm-handle |
|
6 | CPS-NCMP-I-01.e | It shall be possible to report any trustlevel of one CM Handle DMI plugin can report the current trustLevel of a single cm handle id | i.e. the DMI can tell NCMP the trustLevel is 'NONE' when a node heartbeat failure is detected and 'COMPLETE' once it is restored. Again this should lead to notifications on the external interface as per req #5 |
|
Error Handling
# | Error Scenario | Expected behavior | Sign-off |
---|---|---|---|
1 | NCMP restart (all instances) | To be discussed, not sure if it can/should be handled TrustLevels should be 'NONE' and need to be restored using an audit-request (not in scope) |
Characteristics
# | Parameter | Expectation | Notes | Sign-off |
---|---|---|---|---|
1 | dmi-down detection speed | 30 seconds | ||
2 | device heartbeat frequency (message emitted by DMI plugin for each device) | 60 seconds | ||
3 | maximum supported devices (by NCMP) | 60,0000 | Given #2 and #3 this means NCMP needs to process 60,000 message / minute! | |
4 | maximum number of cm-handles down report by DMI in one request and/or per minute | 30,000 / minute | a peak can be processed within 60 seconds | |
5 | processing of all trustLevel time for DMI-Down and/or peak load by DMI | 1 second | ||
6 | If we incorporate into searches endpoints the speed should not be impacted |
...
- This epic will only introduce trustLevels trustLevel NONE and COMPLETE. PARTIAL and POOR may be added later as below.
- Re-registration i.e. resolving trutsLevel degradation is not in scope of this epic
- NCMP will not send notification on trustLevel changes for external consumers
High Level Interactions
Drawio | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Interface | Name | Trigger | Description | Type | Endpoint or Topic | Schema |
---|---|---|---|---|---|---|
1 | HealthCheck | 30 second interval (configurable) | NCMP is to perform a health check against each of the DMI Plugins | REST | http://<dmiPluginServiceName>/manage/health This endpoint will be the standard heath check endpoint provided by spring boot actuator. We don't store it anywhere. We just document it for now. | |
2 | CMHandle trust level change | A CMHandle managed by DMI Plugin's trust level has changed | data contains {trustLevel: ENUM} event id is cmhandle id in kafka header | Kafka | kafka topic: dmi-device-heartbeat | <cloudEvents-header> id : <cmhandleId> type : org.onap.cm.events.trustlevel-notification data : { |
3 | TrustLevel RequestCMHandle Query API with trustLevel Query Condition | Client Request | TrustLevel CmHandle is to be returned based on the values in above MapsCMHandle Trust Map | REST |
| { |
Managing
...
TrustLevel
DMI Plugins
- NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the DMI Trust Map
- IF a DMI Plugin goes down, that DMI Plugin's trust level is updated to NONE in the DMI Trust Map
- The CM handles corresponding to DMI should be set to NONE.
- IF a DMI Plugin comes back up, Trust level is set back to COMPLETE for that DMI plugin only.
More details of health check URL can be accessed via:
CPS-1857 Document watchdog job impl. with health check URL
...
CMHandle Heartbeat
- It is the responsibility of the DMI Plugins to update NCMP about the HBs heartbeat of CMHandlesCMHandle.
- Through interface 2, DMI Plugins will provide a kafka Kafka event on the changing of trustworthiness state of a CMHandle.
- NCMP receives this event and updates the Untrustworthy CMHandles Set accordinglyCM Handle Trust Map accordingly
- Needs to be able to handle a throughput of 60,000 State changes per minute for 2 instances
...
Body of request will be in the format as below:
Code Block language text title Search Trust Level Request Body { "cmHandleQueryParameters": [ { "conditionName": "cmHandleWithTrustLevel", "conditionParameters": [ {"trustLevel": "COMPLETE"} ] } ] }
There are two end points will be subject to query:
http://<host>:<port>/ncmp/v1/ch/id-searches
http://<host>:<port>/v1/ch/searches- Interface 3
- NCMP will first check trust level query parameters to determine which trust level (NONE, COMPLETE) is being searched.
- if the target trust level is NONE
- The cm handles stored in untrustworthyCmHandleSet CM Handle Trust Map having NONE will be returned.
- if the target trust level is COMPLETE
- If that DMI which is managing the CMHandle is marked as untrustworthy then we return NONE
- If that DMI is trustworthy, the cm handles for that DMI The cm handles stored in CM Handle Trust Map having COMPLETE will be returned.
- if the target trust level is NONE