This document relates to investigative work being carried out on the Jira ticket POLICY-3809. This work specification is in response to requirements set out by IDUN for integrating the policy framework kubernetes pods / helm charts into their system. The general requirements of the investigation are below:
- How to create a Kubernetes environment that can be spun up and made available on demand on suitable K8S infrastructure.
- How to write suitable test suites to verify the functional requirements below would be developed.
- How such test suites could be done using "Contract Testing".
Functional Requirements Detail
Note that in Postgres, many of the features below are available. In the verification environment, we want to verify that the Policy Framework continues to work in the following scenarios:
- Synchronization and Load Balancing
- Failover
- Backup and Restore
In addition the environment should:
- Support measurement of Performance Lag
- Use secure communication towards the Database
- Verify that auditing of database operations is working
Database Considerations
Database servers can work together to allow a second server to take over quickly if the primary server fails (high availability), or to allow several computers to serve the same data (load balancing). Ideally, database servers could work together seamlessly. Web servers serving static web pages can be combined quite easily by merely load-balancing web requests to multiple machines - this is very common in Kubernetes environments. In fact, read-only database servers can be combined relatively easily too. Unfortunately, most database servers have a read/write mix of requests, and read/write servers are much harder to combine. This is because though read-only data needs to be placed on each server only once, a write to any server has to be propagated to all servers so that future read requests to those servers return consistent results.
This synchronization problem is the fundamental difficulty for servers working together. Because there is no single solution that eliminates the impact of the sync problem for all use cases, there are multiple solutions. Each solution addresses this problem in a different way, and minimizes its impact for a specific workload.
Some solutions deal with synchronization by allowing only one server to modify the data. Servers that can modify data are called read/write, master or primary servers. Servers that track changes in the master are called standby or slave servers. A standby server that cannot be connected to until it is promoted to a master server is called a warm standby server, and one that can accept connections and serves read-only queries is called a hot standby server.
Some solutions are synchronous, meaning that a data-modifying transaction is not considered committed until all servers have committed the transaction. This guarantees that a failover will not lose any data and that all load-balanced servers will return consistent results no matter which server is queried. In contrast, asynchronous solutions allow some delay between the time of a commit and its propagation to the other servers, opening the possibility that some transactions might be lost in the switch to a backup server, and that load balanced servers might return slightly stale results. Asynchronous communication is used when synchronous would be too slow.
Performance must be considered in any choice. There is usually a trade-off between functionality and performance. For example, a fully synchronous solution over a slow network might cut performance by more than half, while an asynchronous one might have a minimal performance impact.
While it is easy in Kubernetes/Helm to create a deployment for database servers that have several replicas and can autoscale, Kubernetes uses volumes to backup each database using PersistentVolumes and PersistentVolumeClaims. These resources ensure that, even with the ephemeral nature of the pods running in the cluster, that if the database server pods fail, that the data will be retained. In OOM deployments, this is done using the hostPath volume type - data is backed up on the actual VM where the pods are running. However, this strategy does not take into consideration the functional requirements set out in this investigation. The remaining sub-sections of this section will outline existing solutions for the different database requirements i.e. Load Balancing, Synchronization, Failover and Backup and Restore.
Synchronization
As outline above, there are 2 general methods used to approach the sync problem in databases where:
- One replica is responsible for write operations - master. Others can only read - slave.
- Sync transactions where data is not made available until it is committed to all replicas. This can be done async but it increase risk of data loss and stale data.
Master-Slave Example 1: MariaDB/Mysql
There is an example of the first method here: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#deploy-mysql. This method use a mysql database but the procedure for Postgres should be much the same. a ConfigMap is written and created where there are 2 different configurations: one for the primary database server that will do the writing and one for the other (slave) servers.
apiVersion: v1 kind: ConfigMap metadata: name: mysql labels: app: mysql data: primary.cnf: | # Apply this config only on the primary. [mysqld] log-bin replica.cnf: | # Apply this config only on replicas. [mysqld] super-read-only
Two services are defined
- One responsible for reading. It will load balance connections across all replicas.
- One responsible for writes. This is a "headless" service. It allows the other pods to specify which db replica they wish to connect to i.e. the primary one...
# Headless service for stable DNS entries of StatefulSet members. apiVersion: v1 kind: Service metadata: name: mysql labels: app: mysql spec: ports: - name: mysql port: 3306 clusterIP: None selector: app: mysql --- # Client service for connecting to any MySQL instance for reads. # For writes, you must instead connect to the primary: mysql-0.mysql. apiVersion: v1 kind: Service metadata: name: mysql-read labels: app: mysql spec: ports: - name: mysql port: 3306 selector: app: mysql
Finally, a stateful set is created that contains
- initContainers - the main purpose of the init containers is to populate the main container with the correct configuration depending on whether it is a primary or slave.
- Also a main container is specified with 3 replicas.
- An xtrabackup container is used to create a database backup. → This free tool can be used to create backups of mariadb also.
Full details on the use case are available here.
Master-Slave Example 2: Postgres - Bitnami - To be continued...
Investigated Testing Approaches
This section will outline some of the approaches to tests that are commonly used but also some unique/less common approaches
Chart Tests
Chart tests are actually built into helm and detail on them can be found here: https://helm.sh/docs/topics/chart_tests/. The task of a chart test is to verify that a chart works as expected once it is installed. Each helm chart will have a templates directory under it. The test file contains the yaml definition of a Kubernetes Job. A Job in Kubernetes is basically a resource that creates a Pod that carries out a specific task. Once the task is executed, the Job deletes the pods and exits. In the test, the Job runs with a specified command and is considered a success if the container successfully exits with an (exit 0).
Examples:
- Validate that your configuration from the values.yaml file was properly injected.
- Make sure your username and password work correctly
- Make sure an incorrect username and password does not work
- Assert that your services are up and correctly load balancing
- Test successful connection to a database using a specified secret
The simplicity of specifying tests in this way is a major advantage. Tests can then simply be run with a "helm test" command.
Helm Unit Test Plugin
There is an open source project that has been defined and is present on GitHub - https://github.com/quintush/helm-unittest. It can be installed easily as it is designed as a helm plugin. The plugin allows definition of tests in yaml to confirm basic functionality of the deployed pod/chart. It is operated very simply. You can define a tests/ directory under your chart e.g. YOUR_CHART/tests/deployment_test.yaml. Then an example test suite is defined below:
suite: test deployment templates: - deployment.yaml tests: - it: should work set: image.tag: latest asserts: - isKind: of: Deployment - matchRegex: path: metadata.name pattern: -my-chart$ - equal: path: spec.template.spec.containers[0].image value: nginx:latest
The test asserts a few different things. The template is a Deployment type, the name of the chart and the container used. Simple cli command is then used to run the test.
helm unittest $YOUR_CHART
Although this library is useful, it does not actually serve to test the functionality of the chart, only the specification.
Octopus
The Kyma project is a cloud native application runtime that uses Kubernetes, helm and a number of other components. They used helm tests extensively and appreciated how easy the tests were to specify. However, they did find some shortcomings:
- Running the whole suite of integration tests took a long time, so they needed an easy way of selecting tests they wanted to run.
- The number of flaky tests increased, and they wanted to ensure they are automatically rerun.
- They needed a way of verifying the tests' stability and detecting flaky tests.
- They wanted to run tests concurrently to reduce the overall testing time.
For these reasons, Kyma developed their own tool called Octopus and it tackles all of the issues above: https://github.com/kyma-incubator/octopus/blob/master/README.md
In developing tests using Octopus, the tester defines 2 files
- TestDefinition file: Defines a test for a single component or a cross-component scenario. We can see in the example below that the custom TestDefinition resource is used to define a Pod with a specified image for the container and a simple command is carried out. This is not dissimilar to the way that helm test defines tests for the charts.
apiVersion: testing.kyma-project.io/v1alpha1 kind: TestDefinition metadata: labels: component: service-catalog name: test-example spec: template: spec: containers: - name: test image: alpine:latest command: - "pwd"
ClusterTestSuite file: This file defines which tests to run on the cluster and how to run them. In the below example, they specify to run only tests with the "service-catalog" label. It specifies how many times a test should be executed and how many retries of the test should be done. Also, concurrency is specified to define what the maximum number of concurrent tests should be running.
apiVersion: testing.kyma-project.io/v1alpha1 kind: ClusterTestSuite metadata: labels: controller-tools.k8s.io: "1.0" name: testsuite-selected-by-labels spec: count: 1 maxRetries: 1 concurrency: 2 selectors: matchLabelExpressions: - component=service-catalog
Although this project seems to make some improvement on the helm chart tests, it is unclear how mature the project is. Documentation details how to define the specified files and how to use kubectl CLI to execute a test - https://github.com/kyma-incubator/octopus/blob/master/docs/tutorial.md
Terratest
This testing framework is part of the terraform project but it seems it can be used for helm charts independent of terraform. All it requires is that you have a kubernetes and helm install and have the go language installed. A very simple example of how the tests are created is found here: https://github.com/gruntwork-io/terratest-helm-testing-example. Tests are specified in the go language and can include the instructions for deploying a chart. An example outlined here: https://blog.gruntwork.io/automated-testing-for-kubernetes-and-helm-charts-using-terratest-a4ddc4e67344 shows how the tests can be specified for 2 different scenarios - for template testing and for integration testing.
- Template Testing: This is used to catch syntax or logical issues in your defined helm charts. The example shown below points to an example helm chart directory and then sets the image value in the chart. It renders the template but doesn't actually deploy the pod and then confirms that the rendered template has the correct image set. After the test is run, an output is provided that displays the template and whether the test is successful or not. These tests are very quick because they do not actually involve deploying any pods.
This is fine as long as you don't want to test any functionality that depends on your chart being up-and-running.
- Integration Testing: These tests deploy the rendered template from above onto an actual Kubernetes cluster. So, inputs to create the actual pods must be provided in the testing script. Helm install the chart and then, once the test is finished uninstall the chart. See the example below:
Although the use of the go language is appealing for this kind of testing, there are some drawbacks when using this method when compared to others.
- Terratest was built to work in Terraform. It will work independently but there could be some "gotchas" here that may result in requiring some Terraform features.
- Terratest seems to do much the same thing as Chart Tests and one could argue that it is easier to use Chart Tests.
- Terratest does not provide the concurrency options that are present in Octopus.
Kubetest
Another tools that is being used for writing integration tests in helm/kubernetes is Kubetest - https://kubetest.readthedocs.io/en/latest/. This is a python-based pytest plugin that aims to make it easier to write tests on Kubernetes - even allowing us to automate tests for the Kubernetes infrastructure, networking and disaster recovery. It has many interesting features:
- Simple API for common cluster interactions.
- Uses the Kubernetes Python client as the backend, allowing more complex cluster control for actions not covered by our API.
- Load Kubernetes manifest YAMLs into their Kubernetes models.
- Each test is run in its own namespace and the namespace is created and deleted automatically.
- Detailed logging to help debug error cases.
- Wait functions for object readiness, deletion, and test conditions.
- Allows you to search container logs for expected log output.
- RBAC permissions can be set at a test-case granularity using pytest markers.
Although the documentation only gives examples for testing kubernetes directly, there are examples of it being used for helm too - https://github.com/omerlh/helm-chart-tests-demo.
Writing the tests looks straight-forward and documentation is good. However, the solution seems similar to Chart Tests and also does not support running concurrent integration tests.