Add Kafka Health Check
There are some scenario where participant when starting, is not able to connect properly with Kafka:
Participant start before Kafka is up and running
Participant start after Kafka is up and running, but the topics are not available yet
Due the failing during the initial configuration, that create a chain of errors, and the participant application is crashing.
In the scenario that the Kafka topic are created from an external code configured into the ACM-Runtime chart, the ACM-Runtime will be not affected.
For the solution, we can use the library “org.apache.kafka.clients.admin.AdminClient
that is an Admin Kafka Client. The functionalities that we are interested are: check if Kafka Nodes are available and fetch the topics configured in Kafka.
Currently the implementation is base with an abstract layer of infrastructure, and the connection to Kafka is implemented in policy/common. So policy/common will be the best place to implement the Admin “Kafka health check” and “fetch Kafka topics“.
The health check will be implemented using a new specific abstract layer, the class “TopicParameters
" can be used to fetch properties from properties file, for the admin connection to Kafka.
Add a validation that topics should be already created into Kafka, means that in all docker/Kubernetes CITS tests, has to be present the script that create those topics.
When the auto.create.topics.enable is true in Kafka properties, the topic can be created after a message is sent, so in this case the script that create topics is not necessary. But if the participant is the first to send registration message, a validation a that point it will be an issue. A solution is to add a property to enable the topics validation in Participant (“topicValidation
“).
In the example below, there is clampAdminTopics that contains new Kafka Admin Client properties and validation property:
broker:
server: kafka:9092
infrastructure: NOOP
fetchTimeout: 15000
participant:
intermediaryParameters:
topics:
operationTopic: policy-acruntime-participant
syncTopic: acm-ppnt-sync
reportingTimeIntervalMs: 120000
description: Participant Description
participantId: 101c62b3-8918-41b9-a747-d21eb79c6c01
topicValidation: true
clampAutomationCompositionTopics:
topicSources:
- topic: ${participant.intermediaryParameters.topics.operationTopic}
servers:
- ${broker.server}
topicCommInfrastructure: ${broker.infrastructure}
fetchTimeout: ${broker.fetchTimeout}
- topic: ${participant.intermediaryParameters.topics.syncTopic}
servers:
- ${broker.server}
topicCommInfrastructure: ${broker.infrastructure}
fetchTimeout: ${broker.fetchTimeout}
topicSinks:
- topic: ${participant.intermediaryParameters.topics.operationTopic}
servers:
- ${broker.server}
topicCommInfrastructure: ${broker.infrastructure}
clampAdminTopics:
servers:
- ${broker.server}
topicCommInfrastructure: ${broker.infrastructure}
fetchTimeout: ${broker.fetchTimeout}
for backward compatibility, clampAdminTopics could be implemented as optional (not mandatory), so Kafka Health check will be optional.
The code below shows the abstraction layer for topic heath check:
public class TopicHealthCheckFactory {
/**
* Get Topic HealthCheck.
*
* @param param TopicParameters
* @return TopicHealthCheck
*/
public TopicHealthCheck getTopicHealthCheck(TopicParameters param) {
return switch (Topic.CommInfrastructure.valueOf(param.getTopicCommInfrastructure().toUpperCase())) {
case KAFKA -> new KafkaHealthCheck(param);
case NOOP -> new NoopHealthCheck();
default -> null;
};
}
}
The Kafka configuration implemented into the “IntermediaryActivator“ constructor class, has to be move in a method, so the class can be created with no issue related to Kafka connection.
@Component
public class IntermediaryActivator extends ServiceManagerContainer implements Closeable {
------------------------------------
------------------------------------
------------------------------------
public IntermediaryActivator() {
msgDispatcher = new MessageTypeDispatcher(MSG_TYPE_NAMES);
syncMsgDispatcher = new MessageTypeDispatcher(MSG_TYPE_NAMES);
}
public <T> void config(ParticipantParameters parameters,
List<Publisher> publishers, List<Listener<T>> listeners) {
// topics, initialization
------------------------------------
------------------------------------
------------------------------------
}
}
Details of implementation
In policy/common will be implemented the two infrastructures NOOP for testing and KAFKA.
In participant intermediary will be implemented the business logic:
if Kafka is not UP yet, and health check fail, the application just waits and try again later using “
fetchTimeout
“ property for the delayif Kafka is UP, and health check is OK, the application start the Kafka configuration of publishers and listeners.
If clampAdminTopics is not defined into the properties, NOOP will be used as infrastructure, so backward compatibility will be preserved.