In the current implementation, ACM supports multi-participant with same supported element Type but different participantId, so they need different properties file.
In order to support replica, it needs to support multi-participant using same properties file.
Note:
- In a scenario of high number of compositions, if participant is restarting it will be slow-down the restarting action: AC-runtime will send a message for each composition primed and instance deployed to the participant.
To avoid the restarting action, participant needs a database support; - In a scenario where a participant is stuck in deploying, the instance will be in TIMEOUT and the user can take action like deploy again or undeploy. In that scenario the intermediary-participant has to receive the next message, kill the thread that is stuck in deploying and create a new thread.
Solution
Add dynamic participantId support and add database support,
Changes in Participant:
- UUID participantId will be generated in memory instead to fetch it in properties file.
- cosumerGroup will be empty (kafka configuration): any intermediary-participant will have unique Kafka queue, so they will receive same message that will be filtered by participantId.
- Add client support for no-sql database.
- Add no-sql database or mock for Unit Tests.
- Refactor CacheProvider to support insert/update, intermediary-participant can still use the cache in memory.
- Any new/change composition and instance will be saved in database.
- Refactor Participants that are using own cache in memory (Policy Participant saves policy and policy type in memory)
Changes in ACM-runtime:
- When participant go OFF_LINE:
- if there are compositions connected to that participant, ACM-runtime will find other ON_LINE participant with same supported element type;
- if other ON_LINE participant is present it will change the connection with all compositions and instance;
- after that, it will execute restart for all compositions and instances to the ON_LINE participant.
- When receive a participant REGISTER:
- it will check if there are compositions connected to a OFF_LINE participant with same supported element type;
- if there are, it will change the connection with all compositions and instances to that new registered participant;
- after that it will execute restart for all compositions and instances changed.
- Refactor restarting scenario to apply the restarting only for compositions and instances in transition
Changes in docker/Kubernetes environment
- Refactor CSIT to support no-sql database
- Refactor performance and stability test to support no-sql database
- Refactor OOM to support no-sql database
Database in Ericsson ADP marketplace
The no-sql database could be one that is already in ADP marketplace,
Distributed Coordinator ED (Etcd): Distributed systems use etcd as a consistent key-value store for configuration management, service discovery, and coordinating distributed work. Many organizations use etcd to implement production systems such as container schedulers, service discovery services, and distributed data storage.
- Document Database PG (PostgreSql)
- Key Value Database AG (Apache Geode): Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing.(https://geode.apache.org/docs/guide/114/getting_started/intro_to_clients.html) or (https://docs.spring.io/spring-boot-data-geode-build/1.7.5/reference/html5/).
- Key Value Database RD (Redis): RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working.
- Wide Column Database CD (Apache Cassandra)
- Search Engine (Open Search): It supports Rest Api, https://opensearch.org/docs/latest/clients/java/