ONAP Logging PoC 2022/2023
- 1 Logging PoC Work
- 1.1 Overview
- 1.1.1 Background
- 1.1.2 Initial PoC
- 1.1.2.1 High Level Architecture
- 1.1.2.2 Fluent Bit Pipeline
- 1.1.3 Notes on Security
- 1.2 Development Notes
- 1.2.1 Pre-requisites
- 1.2.2 Initial PoC Configuration
- 1.2.2.1 Configuring Fluent Bit
- 1.2.2.2 Starting Logging Pods
- 1.2.2.3 Viewing Collected Logs
- 1.2.2.3.1 Elasticsearch
- 1.2.2.3.2 Kibana
- 1.2.3 Security - Namespace Isolation Through Kyverno
- 1.2.3.1 Kyverno and ClusterPolicies
- 1.2.3.2 Starting Logging Pods
- 1.1 Overview
Logging PoC Work
Overview
Background
The logging PoC was carried out as it was desired to replace some of the old logging architecture with a revised solution. Reference information on the background and architecture can be found here: https://lf-onap.atlassian.net/wiki/pages/viewpage.action?pageId=16472429
This page covers what was done in the PoC (as of when this page was created/updated), and provides some development notes on reproducing / running the PoC code.
The initial logging PoC was demoed to Security Subcommittee here: https://lf-onap.atlassian.net/wiki/display/DW/2022-05-24+Security+Subcommittee+Meeting+Notes
Initial PoC
High Level Architecture
Below is a high-level view of the planned reference logging architecture that shows applications running inside containers on a couple of Kubernetes nodes.
We are introducing Fluent Bit for log collection (and parsing/enriching/forwarding). Fluent Bit is deployed on each node as a DaemonSet. FluentD was not included in the PoC-work, and Elasticsearch and Kibana were existing logging components in ONAP.
Fluent Bit Pipeline
Below is the Fluent Bit pipeline and how we set it up for the PoC.
We’re using the Fluent Bit tail input plugin to read the logs for each component.
We enabled the Fluent Bit JSON parser for parsing log information.
Note: the parsing relies on the standardization of the application log files, otherwise the logs will still be collected, but will not be specifically parsed. The PoC for the log collection was ongoing before the standardized logs were available so additional work may still be required on standardizing application logs and to update the parser configuration here.
We’re using the Fluent Bit Kubernetes filter to enrich the log files with additional Kubernetes metadata information that is not included in the application log files.
Information like the container image names and tags, the digest or hashcodes of the containers, the docker ids for the running containers, etc.
We are then using the Elasticsearch output plugin to send the log files to the existing Elasticsearch component in ONAP
We are then able to query Elasticsearch directly via Curl requests to its REST API, or query and visualize the logs via the existing Kibana frontend.
Notes on Security
During the PoC, it was found that application log files are stored in a location that requires Fluent Bit to have root access to be able to access the logs. This was the case whether Fluent Bit was run as a DaemonSet as planned, or injected as a sidecar into containers – which would have also required each application to be modified to direct their log files to an alternative location, with other knock-on effects as well. It was not feasible to change all of the applications, and not really desirable either, as then, if any future changes, all applications would have to be updated again. It was also not desired that Fluent Bit, or any other ONAP application, be allowed to run with root permissions.
During Security Subcommittee discussions, it was proposed to try isolating Fluent Bit from other ONAP components using namespace isolation, some of which will be included in the development notes below.
Brief summary/rationale – cluster logging via node agents is an industry standard and most feasible engineering option for logging in ONAP clusters. However, as mentioned, this requires the node level agent to have privileges on the Kubernetes worker nodes. To minimize security risks, privileges should be scoped to namespaces and then applying best practices within privileged namespaces
ONAP application services to run in a namespace with privileges and security contexts restricted via Pod Security Standards Restricted. No workloads in this namespace to be permitted to run with privileges or as root within the container.
ONAP logging services to run in separate namespace which allows Pod Security Standards Privileged profiles. Workloads within this namespace to be permitted to run with privileges. To mitigate the risk of such a namespace, use the Kubernetes Admission Controller, via Kyverno policies, to enforce strict admission requirements on workloads in this namespace.
Note that it is worth locking down the ONAP side of this equation to, i.e., ensuring that the ONAP namespace is restricted via Pod Security Policies and Standards and does not, e.g., allow the pods or workloads within the ONAP namespace to run with privileges, or as root.
For more information on the security conversations, see SECCOM meeting notes for January/February 2023: https://lf-onap.atlassian.net/wiki/display/DW/2023+Security+Subcommittee+Meeting+Notes
Discussions should be continued with the security subcommittee if continuing from this PoC work.
Development Notes
Pre-requisites
The PoC was done within OOM, and this guide assumes you are familiar with that project. The PoC code can be found in the Nordix repo here: https://gerrit.nordix.org/c/onap/oom/+/13370
The guide also assumes you have a Kubernetes cluster setup with enough resources to run the main ONAP components, and the logging components. ONAP documentation can be found among the ONAP wiki pages here: https://lf-onap.atlassian.net, or within the ONAP docs here: https://docs.onap.org/en/latest/
Note: The PoC was done prior to the London Release of ONAP and made use of some logging components (Elasticsearch and Kibana) that existed within ONAP but were removed in the London release. The information here assumes the use of those components, but you should also be able to modify the details should you wish to use alternative logging components.
Initial PoC Configuration
For the initial PoC, Fluent Bit was added into OOM in the same folder structure as the existing logging components. This section provides details on getting it up and running within that folder structure and setup, and it allows for seeing the logging running from log collection through Elasticsearch to visualization in Kibana. Some of the configurations within this section may need to be updated to work with the revised plan for security – see the Security – Namespace Isolation Through Kyverno section.
You can check out the code for the initial PoC using the following command:
git fetch https://gerrit.nordix.org/onap/oom refs/changes/70/13370/7 && git checkout FETCH_HEAD
Configuring Fluent Bit
The Fluent Bit version is set in the kubernetes/log/components/log-fluentbit/values.yaml file using the image: <value> tag. It is currently set to a “-debug” version, as this allows you to kubectl exec into the Fluent Bit pod for debugging purposes.
The main configuration for Fluent Bit is done in the kubernetes/log/components/log-fluentbit/templates/configmap.yaml file. Below you’ll find an outline of some of the primary configuration settings. For more detailed information, the official Fluent Bit documentation can be found here: https://docs.fluentbit.io/manual.
The input-kubernetes.conf section allows you to among other things,
set the input filter (here it is set to tail the logs)
set the path location to where the log files are to be collected from (here set to /var/log/containers/*.log)
exclude unwanted logs, e.g., system logs, etc
set the parser to use on the log files
The filter-kubernetes.conf section applies the Kubernetes filter which adds Kubernetes metadata to the processed logs.
The parsers.conf section contains several parser options. The one being used in the PoC is the one with name “docker” and format “json”.
The output-elasticsearch.conf section sets the details for outputting to the Elasticsearch service, including setting:
the Host – the name of the service and the namespace it’s running in – here log-es.onap
the port to use – here 9200
the “Logstash_Prefix”, which will be used as the index identifier in Elasticsearch and Kibana
the “type fluentd” was a workaround for the existing version of Elasticsearch in ONAP. This can likely be changed/removed for later versions of Elasticsearch.
Starting Logging Pods
Compile logging charts:
cd ~/oom/kubernetes
make log SKIP_LINT=TRUE;
Put made files into required location (workaround so don’t have to make all of ONAP – confirm versions and directory locations before doing this):
tar -C ~/.local/share/helm/plugins/deploy/cache/onap-subcharts -xvzf ~/oom/kubernetes/dist/packages/log-12.0.0.tgz
If previously had logging pods running, remove them, e.g.,
If you don’t know the name to use for the above command, you can get a list of available components using:
Before starting up new logging pods, confirm that logging pods are gone:
Startup logging pods (can change/replace names and files depending on requirements of your specific deployment):
It may take some time for the logging components to startup, particularly Kibana. Check their progress / status:
Viewing Collected Logs
For detailed information on Elasticsearch and Kibana, please refer to official documentation here: https://www.elastic.co/guide/index.html
Elasticsearch
You can check some basic information about the logs by sending curl request to Elasticsearch.
From inside another pod on the cluster curls can be sent to Elasticsearch service at:
Some sample curls that can be sent:
Get indices to see that the fluent bit index(es) has been created:
Get indexes with specific name “fluent-bit-temp*” and pipe output to json pretty print:
Get settings:
Delete index with a specific name (e.g., “fluent-bit-temp-2023.05.26”):
Delete all indexes:
Alternatively, from outside of pods, similar curls may be sent to Elasticsearch service at:
http://<controller node ip>:<log-es elasticsearch external nodeport port>
e.,g., something like: http://172.15.2.77:30254
Note: At this stage, while the logging components are running, manual cleanup of logs stored in elasticsearch is required – they can be cleaned up periodically using the above delete index commands. Alternatively, they might be located in the /dockerdata-nfs/onap/log folder (or possibly /dockerdata-nfs/dev/log folder).
Kibana
Kibana provides a frontend GUI for Elasticsearch. You can access it in your web browser at:
http://<controller node ip>:<kibana service nodeport>
e.g., something like: http://172.15.2.77:30253
To view logs in Kibana, the first thing you need to do is to set up an index in Kibana. Detailed instructions for this is outside the scope of this guide, but you can setup indexes in the Management section of Kibana, e.g., you could set up an index for “fluent-bit-temp-2023.*”, which would look for logs with index names “fluent-bit-temp” for the year 2023.
The search capabilities of the Kibana GUI are also outside the scope of this guide, but you can get an idea by watching the brief demo given during the SECCOM meeting here: https://lf-onap.atlassian.net/wiki/display/DW/2022-05-24+Security+Subcommittee+Meeting+Notes – note demo used the version of Kibana that was existing in ONAP, and it has likely received several updates since it was included in ONAP.
Security - Namespace Isolation Through Kyverno
For context, see above Notes on Security. Due to time constraints, only a small amount of PoC work has been done on this area.
You can check out the code for this section using the following command:
Note – this is a later patchset than that included in the initial PoC instructions above – the fluent-bit log folder was moved to another location to separate it from the other logging components and allow for it to more easily be started in a different namespace, paths changed:
From: kubernetes/log/components/log-fluentbit/*
To: kubernetes/log-collection/components/log-fluentbit/*
Below are some initial instructions that may help you get started, but significant further research and work should be done here. Kyverno runs as a dynamic admission controller, and is a policy engine designed specifically for Kubernetes. Documentation for Kyverno can be found here: https://kyverno.io/docs/
Kyverno and ClusterPolicies
Get and install Kyverno on your cluster:
The included ClusterPolicy – kubernetes/temp-kyverno/onap-validate-logging.yaml – provides a basic example of validating a Pod, before allowing it to run in a specific namespace, here the “onap-logging” namespace. This sample disallows the pod to startup with non-standard startup commands, and only allows images with a specific pattern to startup. The ClusterPolicy can be applied from within its containing folder using this command:
kubectl apply -f onap-validate-logging.yaml
With this cluster policy applied, only pods using an image with pattern "nexus3.onap.org:10001/fluent/fluent-bit:*" will be allowed to startup in the onap-logging namespace.
Starting Logging Pods
The existing logging components Elasticsearch and Kibana can still be started/stopped in the Initial PoC Configuration – Starting Logging Pods section.
The Fluent Bit logging is started up slightly differently, as follows:
Compile logging charts:
Put made files into required location (workaround so don’t have to make all of ONAP – confirm versions and directory locations before doing this):
If previously had logging pods running, remove them, e.g.,
If you don’t know the name to use for the above command, you can get a list of available components using:
Before starting up new logging pods, confirm that logging pods are gone:
Startup logging pods (can change/replace names and files depending on requirements of your specific deployment):
It may take some time for the logging components to startup. Check their progress / status:
In this configuration, Fluent Bit is still running as a DaemonSet on each node, and has privileged access, so is able to read and collect the application log files on the nodes. However, it is the only pod allowed to be installed into this “onap-logging” namespace, and pods running on the main “onap” namespace can still be restricted to non-privileged access – refer to Overview – Notes on Security.