Call Tracing in A1 Policy Management: Open Telemetry/Micrometer Tracing Investigation
CCSDK-4010: Enable REST tracing in A1-PMSClosed
TLDR: Tracing has been added for A1 Policy Management Service. By default tracing is disabled. To enable it there are two ways:
A) System Property
Change the flag otel.sdk.disabled to false in the application.yaml (New Delhi)
otel:
sdk:
disabled: ${ONAP_SDK_DISABLED:false}
south: ${ONAP_TRACING_SOUTHBOUND:true}
instrumentation:
spring-webflux:
enabled: true
otel.sdk.disabled: enable/disable tracing all toghether
otel.sdk.south: if ONAP_SDK_DISABLED is false then we can enable/disable southbound tracing
otel.instrumentation.spring-webflux.enabled: if ONAP_SDK_DISABLED is false we can enable/disable northbound tracing
B) Enviroment Variable
Have the environment variables, this way you don't need to change the application.yaml and rebuild the docker image
ONAP_SDK_DISABLED=false
ONAP_TRACING_SOUTHBOUND=true
OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED=true
ONAP_SDK_DISABLED: enable/disable tracing all toghether
ONAP_TRACING_SOUTHBOUND: if ONAP_SDK_DISABLED is false then we can enable/disable southbound tracing
OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED: if ONAP_SDK_DISABLED is false we can enable/disable northbound tracing
Possible Combinations
So we can have the following combinations:
Tracing | Northbound | Southbound | Flags |
---|---|---|---|
ONAP_SDK_DISABLED=true | |||
ONAP_SDK_DISABLED=false; ONAP_TRACING_SOUTHBOUND=true; OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED=true | |||
ONAP_SDK_DISABLED=false; ONAP_TRACING_SOUTHBOUND=false; OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED=true | |||
ONAP_SDK_DISABLED=false; ONAP_TRACING_SOUTHBOUND=true; OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED=false |
Tracing Test
a) A docker compose with a1pms, a1-osc-simulator, and jaeger that acts as a collector and exporter. Note: onap/ccsdk-oran-a1policymanagementservice:1.7.0-SNAPSHOT is built locally by doing "mvn clean install", you can use the nexus hosted image changing the prefix.
version: '3.7'
services:
a1_policy_management:
container_name: a1-pms
image: onap/ccsdk-oran-a1policymanagementservice:1.7.0-SNAPSHOT
ports:
- "8433:8433"
- "8081:8081"
volumes:
- ./application_configuration.json.nosdnc:/opt/app/policy-agent/data/application_configuration.json:ro
networks:
- jaeger-example
depends_on:
- jaeger
environment:
- ONAP_SDK_DISABLED=false
- ONAP_TRACING_SOUTHBOUND=true
- OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED=true
- ONAP_OTEL_SAMPLER_JAEGER_REMOTE_ENDPOINT=http://jaeger:14250
- ONAP_OTEL_EXPORTER_ENDPOINT=http://jaeger:4317
- ONAP_OTEL_EXPORTER_PROTOCOL=grpc
- ONAP_OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
a1-sim-OSC:
image: "nexus3.o-ran-sc.org:10002/o-ran-sc/a1-simulator:2.1.0"
container_name: a1-sim-OSC
ports:
- "30001:8085"
- "30002:8185"
environment:
- A1_VERSION=OSC_2.1.0
- REMOTE_HOSTS_LOGGING=1
- ALLOW_HTTP=true
networks:
- jaeger-example
jaeger:
image: jaegertracing/all-in-one:latest
container_name: jaeger
ports:
- "16686:16686"
- "14250:14250"
- "14268:14268"
- "4317:4317"
- "4318:4318"
environment:
- JAEGER_DISABLED=true
- LOG_LEVEL=debug
- COLLECTOR_OTLP_ENABLED=true
networks:
- jaeger-example
networks:
jaeger-example:
driver: bridge
b) The application_configuration.json.nosdnc in the same folder
c) Creating a PolicyType in the simulator
d) Creating a policy in a1-pms, after the policy type is successfully registered (curl http://localhost:8081/a1-policy/v2/policy-types) and the response should be {"policytype_ids":[1]}
e) http://localhost:16686/ Load Jaeger UI, a1-pms traces, and a sample of the last call would be:
Steps Taken and Challenges:
Adding Telemetry to a1policymanagementservice: The application uses the WebClient from SpringWebflux to contact from the northbound interface a southbound interface (for the latter a A1-OSC simulator has been used).
Opentelemetry documentation provides a bean to mutate the default WebClient builder and to add tracing filters.
In our case the AsyncRestClient manually builds a WebClient for every asynchronous request.
The challenge was to add the tracing filters to this non-Spring class.
1.Adding OpenTelemetry Bean
@Bean public OpenTelemetry openTelemetry() { return AutoConfiguredOpenTelemetrySdk.initialize().getOpenTelemetrySdk(); }
Introduced circular dependency openTelemetryConfig defined in URL [jar:file:/opt/app/policy-agent/a1-policy-management-service.jar!/BOOT-INF/classes!/org/onap/ccsdk/oran/a1policymanagementservice/configuration/OpenTelemetryConfig.class
2. Adding filters into AsyncRestClient directly and not into builder bean, but the AutoConfiguredOpenTelemetrySdk uses by default parameters such as localhost:4317 to export grpc, so we opted for using the application.yaml parameters to build the exporters beans.
AsyncRestClient.java ... OpenTelemetry openTelemetry = AutoConfiguredOpenTelemetrySdk.initialize().getOpenTelemetrySdk(); var webfluxTelemetry = SpringWebfluxTelemetry.builder(openTelemetry).build(); return WebClient.builder() // ... .filters(webfluxTelemetry::addClientTracingFilter) .build();
3. Context Provider class to use get the ApplicationContext into Non-Spring Components
import org.springframework.beans.BeansException; import org.springframework.context.ApplicationContext; import org.springframework.context.ApplicationContextAware; import org.springframework.stereotype.Component; @Component public class ApplicationContextProvider implements ApplicationContextAware { private static ApplicationContext context; @Override public void setApplicationContext(ApplicationContext applicationContext) throws BeansException { context = applicationContext; } public static ApplicationContext getApplicationContext() { return context; } }
And then use var context = ApplicationContextProvider.getApplicationContext().getBean(OtelConfig.class); In the non Spring class, and if tracing is enabled to add the tracing filters.
4. The ApplicationContextProvider class got removed, because it can cause issues on different environment. The class during start up time, in rare cases, was null (if the dependant classes were initialized first). So the approach changed into wrapping the AsyncWebClient build function into a @Service with the Bean SpringWebfluxTelemetry in @Autowired(required = false) in case the telemetry is disabled an the bean does not start
5. Used opentelemetry-springboot-starter, we noticed more information getting traced automatically if we enabled this dependecy. So we control this dependency in the application yaml under the otel properties.
NOTES:
1.Using the ObservationRegistryCustomizer would still track /actuator manual calls, but it was kept in to kept UnitTests running
It's worth mentioning that if using the spring-boot auto configuration:
<dependency> <groupId>io.opentelemetry.instrumentation</groupId> <artifactId>opentelemetry-spring-boot-starter</artifactId> </dependency>
You can follow the below steps:
https://opentelemetry.io/docs/zero-code/java/spring-boot-starter/sdk-configuration/#exclude-actuator-endpoints-from-tracing
2. To retrieve multiple spans, and enable automatic context propagation to ThreadLocals used by FLUX and MONO operators we used:
Hooks.enableAutomaticContextPropagation(); only if tracing is enabled
https://docs.micrometer.io/context-propagation/reference/index.html
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>context-propagation</artifactId>
</dependency>
3.When disabling Telemetry micrometer-tracing-bridge-otel would still try to export spans, so we decided to use one flag to rule them both (micrometer and opentelemetry)
The flag controlling it is
managment
tracing
enable: true
Example of polluted logs when disabling only opentelemetry beans:
4. Flags to enable/disable northbound or southbound interfaces
Since we used Java Springboot starter library from OpenTelemetry we can use their flags to enable or disableinstrumentation libraries.
https://opentelemetry.io/docs/zero-code/java/agent/configuration/#suppressing-specific-agent-instrumentation
OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED=true from the documentation we can use this flag to disable the automatic spring instrumentation and we keep a separate manual flag ONAP_TRACING_SOUTHBOUND for the AsyncRestClient requests made to the southbound.
System property: otel.instrumentation.[name].enabled
Environment variable: OTEL_INSTRUMENTATION_[NAME]_ENABLED
Note: When using OPENTELEMETRY (Evrything starting with otel) environment variables, dashes (-) should be converted to underscores (_). For example, to suppress traces from spring-webflux library, set OTEL_INSTRUMENTATION_SPRING_WEBFLUX_ENABLED to false
Full Tracing:
Only Southbound Tracing Output:
Only Northbound Tracing Output: