Related Jira(s)
- CPS-1415Getting issue details... STATUS
- CPS-1638Getting issue details... STATUS
Assumptions
Assumption | Notes | |
---|---|---|
1 |
Issues and Decisions
# | Issue | Notes | Decision |
---|---|---|---|
1 | How fast should CPS (and DB) be able to process max heart beat failures? | is 60K really realistic if ENM goes down we should get a notification for each node do we ?! | PoC has shown 60 seconds is reasonable |
2 | Restart of NCMP | Should/Can this be handled? | As of now, there is no such case is being considered. |
3 | Does DMI Plugin provide NCMP with a health check URL during registration? Either, just rely on the default one provided with Spring boot actuator? | Document the contract. Its just the interface that matters and not the implementation. | Spring boot actuator interface |
Description
- Define scenarios which cause a CM Handle to go stale.
- Implement changes to support tracking of CM Handle Freshness/Staleness.
What might trigger a cmHandle to go to STALE?
- dmi plugin identifies that the device is no longer contactable.
- dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself.
Requirements
Functional
# | Interface | Requirement | Additional Information | Sign-off |
---|---|---|---|---|
1 | CPS-NCMP-E-05 | The 'trustlevel' is visible on all REST methods that currently include the 'cm handle state' | existing endpoints |
|
2 | CPS-NCMP-E-05 | CM Handles can be queried (filter condition) on 'trustlevel' | using a new 'trustLevel' condition (cannot use cpsPath condition) |
|
3 | CPS-NCMP-I-01 | During registration, DMI plugin can report initial trustlevel. If the state is not 'complete', it should be considered as 'Trustlevel change' (See req 5) | Initial trust level will be backward compatible if not set, we assume trustlevel is 'complete' For a new cm-handle where the trustlevel is 'complete' this is NOT considered a chance and no notifications should be sent |
|
4 | CPS-NCMP-E-05 | Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be set to be 'NONE'. This wil also lead to many notifcations as per req. #5 | this might lead to a high level (20K) of notifications (need to discuss capabilities) |
|
5 | CPS-NCMP-E-05.e | NCMP notification shall be sent when the trustlevel changes | Notification be sent externally based on Kafka many small or bulk: Agreed Many notifications, one for each cm-handle |
|
6 | CPS-NCMP-I-01.e | It shall be possible to report any trustlevel of one CM Handle DMI plugin can report the current trustLevel of a single cm handle id | i.e. the DMI can tell NCMP the trustLevel is 'NONE' when a node heartbeat failure is detected and 'COMPLETE' once it is restored. Again this should lead to notifications on the external interface as per req #5 |
|
Error Handling
# | Error Scenario | Expected behavior | Sign-off |
---|---|---|---|
1 | NCMP restart (all instances) | To be discussed, not sure if it can/should be handled TrustLevels should be 'NONE' and need to be restored using an audit-request (not in scope) |
Characteristics
# | Parameter | Expectation | Notes | Sign-off |
---|---|---|---|---|
1 | dmi-down detection speed | 30 seconds | ||
2 | device heartbeat frequency (message emitted by DMI plugin for each device) | 60 seconds | ||
3 | maximum supported devices (by NCMP) | 60,0000 | Given #2 and #3 this means NCMP needs to process 60,000 message / minute! | |
4 | maximum number of cm-handles down report by DMI in one request and/or per minute | 30,000 / minute | a peak can be processed within 60 seconds | |
5 | processing of all trustLevel time for DMI-Down and/or peak load by DMI | 1 second | ||
6 | If we incorporate into searches endpoints the speed should not be impacted |
Out-of-Scope
- This epic will only introduce trustLevel NONE and COMPLETE. PARTIAL and POOR may be added later as below.
- Re-registration i.e. resolving trutsLevel degradation is not in scope of this epic
High Level Interactions
Interface | Name | Trigger | Description | Type | Endpoint or Topic | Schema |
---|---|---|---|---|---|---|
1 | HealthCheck | 30 second interval (configurable) | NCMP is to perform a health check against each of the DMI Plugins | REST | http://<dmiPluginServiceName>/manage/health This endpoint will be the standard heath check endpoint provided by spring boot actuator. We don't store it anywhere. We just document it for now. | |
2 | CMHandle trust level change | A CMHandle managed by DMI Plugin's trust level has changed | data contains {trustLevel: ENUM} event id is cmhandle id in kafka header | Kafka | kafka topic: dmi-device-heartbeat | <cloudEvents-header> id : <cmhandleId> type : org.onap.cm.events.trustlevel-notification data : { |
3 | CMHandle Query API with trustLevel Query Condition | Client Request | CmHandle is to be returned based on the values in above CMHandle Trust Map | REST |
| { |
Managing TrustLevel
DMI Plugins
- NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the DMI Trust Map
- IF a DMI Plugin goes down, that DMI Plugin's trust level is updated to NONE in the DMI Trust Map
- The CM handles corresponding to DMI should be set to NONE.
- IF a DMI Plugin comes back up, Trust level is set back to COMPLETE for that DMI plugin only.
More details of health check URL can be accessed via:
CPS-1857 Document watchdog job impl. with health check URL
CMHandle Heartbeat
- It is the responsibility of the DMI Plugins to update NCMP about the heartbeat of CMHandle.
- Through interface 2, DMI Plugins will provide a Kafka event on the changing of trustworthiness state of a CMHandle.
- NCMP receives this event and updates the CM Handle Trust Map accordingly
- Needs to be able to handle a throughput of 60,000 State changes per minute for 2 instances
Reading Trust Level
Body of request will be in the format as below:
There are two end points will be subject to query:Search Trust Level Request Body{ "cmHandleQueryParameters": [ { "conditionName": "cmHandleWithTrustLevel", "conditionParameters": [ {"trustLevel": "COMPLETE"} ] } ] }
http://<host>:<port>/ncmp/v1/ch/id-searches
http://<host>:<port>/v1/ch/searches- Interface 3
- NCMP will first check trust level query parameters to determine which trust level (NONE, COMPLETE) is being searched.
- if the target trust level is NONE
- The cm handles stored in CM Handle Trust Map having NONE will be returned.
- if the target trust level is COMPLETE
- The cm handles stored in CM Handle Trust Map having COMPLETE will be returned.
- if the target trust level is NONE