/
CPS Instrumentation Overview

CPS Instrumentation Overview

Type

What’s this?

Scope

How to use it?

Usefulness

Type

What’s this?

Scope

How to use it?

Usefulness

Spring AOP (Aspect-Oriented Programming

  • Intercept method calls within a specific package

  • Measures execution time and log relevant details

  • Pure Java

  • Dependency

    • aspectjrt

    • spring-boot-starter-aop

  • Annotation and set up

    • @Aspect @Component @Slf4j public class CpsLoggingAspectService {
      • sets up spring AOP

    • private static final String CPS_PACKAGE_NAME = "org.onap.cps"; private static final String ALL_CPS_METHODS = "execution(* " + CPS_PACKAGE_NAME + "..*(..)))"; private static final String METHOD_RETURNING_SENSITIVE_DATA = "AuthPassword"; private static final String SENSITIVE_DATA_MASK = "***********"; @Around(ALL_CPS_METHODS) @SneakyThrows public Object interceptMethodCall(final ProceedingJoinPoint proceedingJoinPoint) {...}
      • line 5 expression tells spring to intercept all cps methods in the given package

        • uses stopwatch to record start and end time of method execution

      • the method above also states to mask all logging data from methods that contains ‘AuthPassword’ (line 3) from the method signature name

        • this method can be modified to tell what to do before and after a method runs

    • in logs, sample result:

  • all public methods in the packages configured in set up

    • no annotation needed

  1. Default logging level of “logging.level.org.onap.cps” is set to “INFO”.

    image-20241111-025751.png

    to change this, set line 194 (logging.level.org.onap.cps to DEBUG)

  2. Changing logging level dynamically

    1. see Change logging level

  • can control easily which packages and/or methods to run by changing the constants

  • no repetitive code for debugging all method performances to see when methods are executed, their parameters, return values etc.

  • note that there might be degradation in performance when enabled

  • “@Around"/@Before/@After annotation takes other expressions to specify to Spring which methods you want to monitor , can be based on their name, parameters, or return type…examples:

    • @Around("execution(* org.thirdparty.library.*.*(..))") - only intercepts methods of the given library

    • @Around("execution(*org.onap.cps.*.*(org.onap.TrustLevel))") - only intercepts methods with TrustLevel as first parameter

MicroMeter

  • allows to collect metrics with use of annotation

  • integrates with Spring boot Actuator

  • TimedAspect (micrometer class) uses AOP

  • Annotation and set up

    • MicroMeterConfig.java

    • wherever annotation “@Timed” is used, timing metrics is automatically applied to that method (org/onap/cps/config/MicroMeterConfig.java)

      • our methods are annotated in the following way:

        • @Timed(value = "cps.rest.admin.controller.schemaset.create", description = "Time taken to create schemaset from controller")
          • specified given “value” will be the name of the metric

          • description is the details of the given metric

  • all methods with @Timed annotation

    cps-ncmp-rest @Timed(value = "cps.ncmp.controller.get", description = "Time taken to get resource data from datastore") @Timed(value = "cps.ncmp.inventory.controller.update", cps-ncmp-service @Timed(value = "cps.ncmp.dmi.get", @Timed(value = "cps.ncmp.inventory.persistence.schemaset.delete", @Timed(value = "cps.ncmp.inventory.persistence.schemaset.delete.batch", @Timed(value = "cps.ncmp.inventory.persistence.datanode.get", @Timed(value = "cps.ncmp.inventory.persistence.datanode.get", @Timed(value = "cps.ncmp.cmhandle.state.update.batch", cps-rest @Timed(value = "cps.rest.admin.controller.schemaset.create", @Timed(value = "cps.data.controller.datanode.get.v1", @Timed(value = "cps.data.controller.datanode.get.v2", @Timed(value = "cps.data.controller.get.delta", @Timed(value = "cps.data.controller.datanode.query.v1", @Timed(value = "cps.data.controller.datanode.query.v2", @Timed(value = "cps.data.controller.datanode.query.across.anchors", cps-ri @Timed(value = "cps.data.persistence.service.datanode.query", @Timed(value = "cps.data.persistence.service.datanode.query.anchors", @Timed(value = "cps.data.persistence.service.datanode.get", @Timed(value = "cps.data.persistence.service.datanode.batch.get", cps-service @Timed(value = "cps.data.service.datanode.root.save", @Timed(value = "cps.data.service.datanode.child.save", @Timed(value = "cps.data.service.list.element.save", @Timed(value = "cps.data.service.datanode.get", @Timed(value = "cps.data.service.datanode.batch.get", @Timed(value = "cps.data.service.datanode.leaves.update", @Timed(value = "cps.data.service.datanode.leaves.descendants.leaves.update", @Timed(value = "cps.data.service.get.delta", @Timed(value = "cps.data.service.get.deltaBetweenAnchorAndPayload", @Timed(value = "cps.data.service.datanode.descendants.update", @Timed(value = "cps.data.service.datanode.descendants.batch.update", @Timed(value = "cps.data.service.list.update", @Timed(value = "cps.data.service.list.batch.update", @Timed(value = "cps.data.service.datanode.delete", @Timed(value = "cps.data.service.datanode.batch.delete", @Timed(value = "cps.data.service.datanode.delete.anchor", @Timed(value = "cps.data.service.datanode.delete.anchor.batch", @Timed(value = "cps.data.service.list.delete", @Timed(value = "cps.module.service.schemaset.create", @Timed(value = "cps.module.service.module.reference.query", @Timed(value = "cps.data.service.datanode.query", @Timed(value = "cps.dataupdate.events.publish", description = "Time taken to publish Data Update event") @Timed(value = "cps.utils.yangparser.nodedata.with.parent.parse", @Timed(value = "cps.utils.yangparser.nodedata.with.parent.with.yangResourceMap.parse", @Timed(value = "cps.yangtextschemasourceset.build", @Timed(value = "cps.yang.schemasourceset.build", description = "Time taken to build a ODL yang Model")

 

 

  • all @Bean annotated methods returning Gauge

    • public Gauge advisedCmHandles(final MeterRegistry meterRegistry) {} public Gauge readyCmHandles(final MeterRegistry meterRegistry) {} public Gauge lockedCmHandles(final MeterRegistry meterRegistry) {} public Gauge deletingCmHandles(final MeterRegistry meterRegistry) {} public Gauge deletedCmHandles(final MeterRegistry meterRegistry) {}
  • other annotations that can be added

    • @Counted

  • enabled by default

  • collected metrics are accessible via endpoint of actuator/prometheus

    • see application.yml - management.endpoint.web.exposure.include: info,health,loggers,prometheus

    • http://localhost:8080/actuator/prometheus

  • Grafana -

    • uses prometheus readings to visualise

    • see cps docker-compose

      • includes custom dashboards such as LCM states

  • what can we get?

    • metrics: counters, gauges, timers

      • right now have timer and gauges but counters can be added to the configuration

    • HTTP request metrics response times, request counts etc

    • JVM metrics (memory usage,garbage collection, thread count, class loading stats)

    • Datasource metrics (connection pool stats)

    • System metrics (disk space, CPU usage etc)

    • { "names": [ "application.ready.time", "application.started.time", "cache.eviction.weight", "cache.evictions", "cache.gets", "cache.puts", "cache.size", "cps.data.persistence.service.datanode.batch.get", "cps.data.persistence.service.datanode.get", "cps.data.persistence.service.datanode.query", "cps.data.service.datanode.batch.get", "cps.data.service.datanode.child.save", "cps.data.service.datanode.descendants.batch.update", "cps.data.service.datanode.get", "cps.data.service.datanode.query", "cps.data.service.list.element.save", "cps.dataupdate.events.publish", "cps.module.service.schemaset.create", "cps.ncmp.cmhandle.state.update.batch", "cps.ncmp.inventory.controller.update", "cps.ncmp.lcm.events.publish", "cps.utils.yangparser.nodedata.with.parent.parse", "cps.yangschema.cache.gauge", "disk.free", "disk.total", "executor.active", "executor.completed", "executor.pool.core", "executor.pool.max", "executor.pool.size", "executor.queue.remaining", "executor.queued", "hikaricp.connections", "hikaricp.connections.acquire", "hikaricp.connections.active", "hikaricp.connections.creation", "hikaricp.connections.idle", "hikaricp.connections.max", "hikaricp.connections.min", "hikaricp.connections.pending", "hikaricp.connections.timeout", "hikaricp.connections.usage", "http.client.requests", "http.client.requests.active", "http.server.requests", "http.server.requests.active", "jdbc.connections.active", "jdbc.connections.idle", "jdbc.connections.max", "jdbc.connections.min", "jetty.connections.bytes.in", "jetty.connections.bytes.out", "jetty.connections.current", "jetty.connections.max", "jetty.connections.messages.in", "jetty.connections.messages.out", "jetty.connections.request", "jetty.threads.busy", "jetty.threads.config.max", "jetty.threads.config.min", "jetty.threads.current", "jetty.threads.idle", "jetty.threads.jobs", "jvm.buffer.count", "jvm.buffer.memory.used", "jvm.buffer.total.capacity", "jvm.classes.loaded", "jvm.classes.unloaded", "jvm.compilation.time", "jvm.gc.live.data.size", "jvm.gc.max.data.size", "jvm.gc.memory.allocated", "jvm.gc.memory.promoted", "jvm.gc.overhead", "jvm.gc.pause", "jvm.info", "jvm.memory.committed", "jvm.memory.max", "jvm.memory.usage.after.gc", "jvm.memory.used", "jvm.threads.daemon", "jvm.threads.live", "jvm.threads.peak", "jvm.threads.started", "jvm.threads.states", "logback.events", "process.cpu.usage", "process.files.max", "process.files.open", "process.start.time", "process.uptime", "spring.data.repository.invocations", "spring.kafka.listener", "spring.kafka.template", "spring.kafka.template.active", "system.cpu.count", "system.cpu.usage", "system.load.average.1m" ] }
  • sample actuator/prometheus output

    • image-20241111-053418.png

       

  • additional sample /actuator/metrics/cps.module.service.schemaset.create output

    • add metric to actuator endpoint exposure -management.endpoint.web.exposure.include: info,health,loggers,prometheus,metric

    • image-20241111-060630.png

OpenTelemetry

end-to-end tracing (tracing requests across services)

OpenTelemetry - open-source set of APIs, libraries, agents, and instrumentation to monitor application. Allows to collect metrics, logs, and traces

Jaeger - open-source distributed tracing system used with OpenTelemetry to visualise traces and understand flow of requests

** we use OpenTelemetry to send trace data to Jaeger and view it in Jaeger UI

 

  • Annotation and set up

  • OpenTelemetryConfig.java

    • @Value("${spring.application.name:cps-application}") private String serviceId; @Value("${cps.tracing.exporter.endpoint:http://onap-otel-collector:4317}") private String tracingExporterEndpointUrl; @Value("${cps.tracing.sampler.jaeger_remote.endpoint:http://onap-otel-collector:14250}") private String jaegerRemoteSamplerUrl; @Value("${cps.tracing.excluded-observation-names:tasks.scheduled.execution}") private String excludedObservationNamesAsCsv;
      • serviceId : states the application to be monitored

      • tracingExporterEndpointUrl : (OpenTelemetry collector endpoint) this is where traced data by openTelemetry will be exported

      • jaegerRemoteSamplerUrl : (remote sampler endpoint) this controls the sampling trace behaviour

    • @Bean @ConditionalOnExpression( "${cps.tracing.enabled} && 'grpc'.equals('${cps.tracing.exporter.protocol}')") public OtlpGrpcSpanExporter createOtlpExporterGrpc() { return OtlpGrpcSpanExporter.builder().setEndpoint(tracingExporterEndpointUrl).build(); } @Bean @ConditionalOnExpression( "${cps.tracing.enabled} && 'http'.equals('${cps.tracing.exporter.protocol}')") public OtlpHttpSpanExporter createOtlpExporterHttp() { return OtlpHttpSpanExporter.builder().setEndpoint(tracingExporterEndpointUrl).build(); }
      • **OTLP is the protocol openTelemetry uses to send traces and metrics. In this case we have two ways HTTP and grpc. NOTE: we are by default using ‘grpc’

    • @Bean @ConditionalOnProperty("cps.tracing.enabled") public JaegerRemoteSampler createJaegerRemoteSampler() { return JaegerRemoteSampler.builder() .setEndpoint(jaegerRemoteSamplerUrl) .setPollingInterval(Duration.ofSeconds(JAEGER_REMOTE_SAMPLER_POLLING_INTERVAL_IN_SECONDS)) .setInitialSampler(Sampler.alwaysOff()) .setServiceName(serviceId) .build(); }
      • creates the JaegerSampler bean specifying

        • sampling happens every 30 seconds

          • int JAEGER_REMOTE_SAMPLER_POLLING_INTERVAL_IN_SECONDS = 30;
    • @Bean @ConditionalOnProperty("cps.tracing.enabled") public ObservationRegistryCustomizer<ObservationRegistry> skipActuatorEndpointsFromObservation() { final PathMatcher pathMatcher = new AntPathMatcher("/"); return registry -> registry.observationConfig().observationPredicate(observationPredicate(pathMatcher)); } private ObservationPredicate observationPredicate(final PathMatcher pathMatcher) { return (observationName, context) -> { if (context instanceof ServerRequestObservationContext observationContext) { return !pathMatcher.match("/actuator/**", observationContext.getCarrier().getRequestURI()); } else { return !excludedObservationNames.contains(observationName); } }; }
      • the ObservationRegistry specifies how traces and metrics are handled in the application

        • in our code above, it is specified that any endpoints with “/actuator” is excluded from traces

 

 

  • service interactions

  • excluded-observation-names: ${ONAP_EXCLUDED_OBSERVATION_NAMES:tasks.scheduled.execution}

  • disabled by default

  • run CPS via docker-compose file with ‘tracing’ profile

    • set ‘ONAP_TRACING_ENABLED’ to true for cps-and-ncmp service

      • image-20241113-104243.png

         

  • access tracing through accessing http://localhost:16686 in browser

    • see docker-compose notes

** Trace shows the journey of the request flow. It is made up of spans which shows each operations

** Each span has metadata: id ,start and end time, operation name (i.e. HTTP GET), status code, response time etc..


What gets traced?

  • HTTP Requests (Inbound)

    • when requests comes in , trace is generated. The span for this could have http method, url path, status code, time taken to process the request etc…

  • Internal Rest Calls (Outbound)

    • when cps makes an outbound http call to another service , trace is created. span can include url of the called service,http method, status code, duration of the call

  • Method Calls within Your Application (Service Layer)

    • public

  • Asynchronous Operations

    • @Async annotated operations

  • External System Calls

    • Kafka

  • Error Handling (Exceptions)

~Database Queries


  • effect on performance

  • can compare in UI two traces for analysis

    • image-20241112-113540.png

  • can use to identify where failures occur in the service chain

    • example:

      • after registering cmhandles for dm1, i tried to get all cmhandles for that dmi and got 200 and the correct results

      • OpenTelemetry though shows where an exception occured in the background

        image-20241112-113804.png

 

Related content