Distributed Analytics Test Status (Dublin)



No

Test Case

Progress

Blocking Issue

No

Test Case

Progress

Blocking Issue

1.

Collection package

90%



1a.

Test collection package using Helm and ensure all pods and services come up fine

100%



1b.

Test collection package using Helm Template and Kubectl and ensure all pods and services come up fine

100%



1c.

Ensure Prometheus is up and running and the collectd endpoints are registered and metrics are scraped successfully.

100%



1d.

Ensure Prometheus is up and running and the cadvisor endpoints are registered and metrics are scraped successfully.

100%



1e.

Ensure Prometheus is up and running and the Node Exporter endpoints are registered and metrics are scraped successfully.

100%



1f.

Soft multi-tenancy support for collection package by bringing up 2 packages in 2 different namespace within the same cluster

100%



1g.

Ensure Prometheus end points visible in Grafana visualization package by enabling the Prometheus url and importing the dash board

50%



1h.

Test Day0 configuration

100%



1i.

Test Day2 configuration

75%

Prometheus remote_write Day-2 config not working as intended. Need to use a combination of kubectl apply and kubectl patch.

2.

Data Lake

60%



2a.

Make sure M3DB charts are running successfully and the pods/services are up

100%



2b.

Make sure HDFS charts are running successfully and the pods/services are up

100%



2c.

Prometheus integration with M3DB remote_write. Ensure metrics can be queried from M3DB co-ordinator service

100%



2d.

M3DB service export to visualization package

???

El-Alto???

2e.

Test Day-0 configuration

100%



2f.

Test Day-2 configuration

N/A

M3DB Day-2 configuration support (El-Alto)

HDFS Day-2 configuration support (El-Alto)

2g.

HDFS reachability testing with sample Spark application

100%

Part of Spark training sample application.

2h.

Test using Helm template and Kubectl

0%



3.

Messaging

75%



3a.

Pre-requisite testing. Test Kafka operator pods are up and running.

100%



3b.

Test Day-0 configuration

100%



3c.

Test Day-2 configuration. Test Kafka broker instances are coming up.

100%



3d.

Test Day-2 configuration. Test Kafka topics are created.

100%



3e.

Soft multi-tenancy support for messaging package by bringing up 2 packages in 2 different namespace within the same cluster

100%

Note - this needs to be still achieved with single Kafka operator. Ideally we will create a cluster wide operator wherever possible and monitor all namespaces.

3f.

Remote write to Kafka from Prometheus

50%

Basic Kafka adapter works. Needs to work on making the adapter robust. (El-Alto)

3g.

HDFS Writer - Kafka to HDFS. Verify platform metrics are written to HDFS.

???

El-Alto???

4. 

Training

50%



4a.

Sample application with Spark and HDFS

100%



4b.

Sample application with Horovod + Tensorflow

100%



4c.

Submitting a spark job using Spark operator in Python/Java

100%



4d.

Submitting a Horovod job using the helm chart

100%



4e.

Test Collection to Training to model repository workflow





4f.







5.

Model Repository





5a.

Save Tensorflow/Keras model to Minio repository optimized for inference.

50%

Tensorflow 1.13 support has deprecated library usage. 

Tensorflow2.0 is alpha version. Distributed Analytics doesn't support it yet.

5b.

Save Horovod based Tensorflow/Keras model to Minio repository optimized for inference

0%

Need to remove Horovod layers.

5c.

Save sci-kit learn model and optimize for inference

0%

El-Alto???

5d.

Save PyTorch learn model and optimize for inference

0%

El-Alto???

6. 

Inference





6a.

Inference package should bringup Tensorflow Serving container based on the model and bucket name. Ensure TF Serving operator works.

100%



6b.

Model Serving for Pytorch, sci-kit learn and other models

0%



6c.

Test Inference application for MNIST models for various frameworks

0%



7.

End to End Distributed Analytics stack composition

20%



7a.

Ingress support for exporting remote services across edge sites

50%

Only Minio is enabled at this time. Other services needs to create appropriate K8s ingress objects.

7b.

MetalLB support for ingress gateway

50%

Tested in Layer-2 mode. BGP mode is not tested.

7c. 

Ensure reachability between services using External DNS, MetalLB and Ingress path based routing.

0%



7d.

Training end to end

0%



7e.

Inference based on trained model

0%