High Volume VES Collector

General information

Purpose

The goal of the collector is to support high volume data. It uses plain TCP connections tunneled in SSL/TLS. Connections are stream-based (as opposed to request-based) and long running. Payload is binary-encoded (currently we are using Google Protocol Buffers). HV-VES uses direct connection to DMaaP's Kafka. All these decisions were made in order to support high-volume data with minimal latency.

Implementation details

Technology stack

Project Reactor is used as a backbone of the internal architecture.
Netty is used by means of reactor-netty library.
We are using Kotlin so we can write very concise code with great interoperability with existing Java libraries.
Types defined in Λrrow library are also used when it improves readability or general cleanness of the code.

Rules

Do not block. Use non-blocking libraries. Do not use block* Reactor calls inside the core of the application.
Pay attention to memory usage.
Do not decode the payload - it can be of a considerable size. The goal is to direct the event into a proper Kafka topic. The routing logic should be based only on VES Common Header.
All application logic should be defined in hv-collector-core module and tested on a component level by tests defined in hv-collector-ct. The core module should have a clean interface (defined in boundary package: api and adapters).
Use Either functional data type when designing fail-cases inside the main Flux. Using exceptions is a bit like using goto + it adds some performance penalty: collecting stack trace might be costly but we do not usually need it in such cases. RuntimeExceptions should be treated as application bugs and fixed.