Distributed Analytics Test Status (Dublin)
No | Test Case | Progress | Blocking Issue |
---|---|---|---|
1. | Collection package | 90% | |
1a. | Test collection package using Helm and ensure all pods and services come up fine | 100% | |
1b. | Test collection package using Helm Template and Kubectl and ensure all pods and services come up fine | 100% | |
1c. | Ensure Prometheus is up and running and the collectd endpoints are registered and metrics are scraped successfully. | 100% | |
1d. | Ensure Prometheus is up and running and the cadvisor endpoints are registered and metrics are scraped successfully. | 100% | |
1e. | Ensure Prometheus is up and running and the Node Exporter endpoints are registered and metrics are scraped successfully. | 100% | |
1f. | Soft multi-tenancy support for collection package by bringing up 2 packages in 2 different namespace within the same cluster | 100% | |
1g. | Ensure Prometheus end points visible in Grafana visualization package by enabling the Prometheus url and importing the dash board | 50% | |
1h. | Test Day0 configuration | 100% | |
1i. | Test Day2 configuration | 75% | Prometheus remote_write Day-2 config not working as intended. Need to use a combination of kubectl apply and kubectl patch. |
2. | Data Lake | 60% | |
2a. | Make sure M3DB charts are running successfully and the pods/services are up | 100% | |
2b. | Make sure HDFS charts are running successfully and the pods/services are up | 100% | |
2c. | Prometheus integration with M3DB remote_write. Ensure metrics can be queried from M3DB co-ordinator service | 100% | |
2d. | M3DB service export to visualization package | ??? | El-Alto??? |
2e. | Test Day-0 configuration | 100% | |
2f. | Test Day-2 configuration | N/A | M3DB Day-2 configuration support (El-Alto) HDFS Day-2 configuration support (El-Alto) |
2g. | HDFS reachability testing with sample Spark application | 100% | Part of Spark training sample application. |
2h. | Test using Helm template and Kubectl | 0% | |
3. | Messaging | 75% | |
3a. | Pre-requisite testing. Test Kafka operator pods are up and running. | 100% | |
3b. | Test Day-0 configuration | 100% | |
3c. | Test Day-2 configuration. Test Kafka broker instances are coming up. | 100% | |
3d. | Test Day-2 configuration. Test Kafka topics are created. | 100% | |
3e. | Soft multi-tenancy support for messaging package by bringing up 2 packages in 2 different namespace within the same cluster | 100% | Note - this needs to be still achieved with single Kafka operator. Ideally we will create a cluster wide operator wherever possible and monitor all namespaces. |
3f. | Remote write to Kafka from Prometheus | 50% | Basic Kafka adapter works. Needs to work on making the adapter robust. (El-Alto) |
3g. | HDFS Writer - Kafka to HDFS. Verify platform metrics are written to HDFS. | ??? | El-Alto??? |
4. | Training | 50% | |
4a. | Sample application with Spark and HDFS | 100% | |
4b. | Sample application with Horovod + Tensorflow | 100% | |
4c. | Submitting a spark job using Spark operator in Python/Java | 100% | |
4d. | Submitting a Horovod job using the helm chart | 100% | |
4e. | Test Collection to Training to model repository workflow | ||
4f. | |||
5. | Model Repository | ||
5a. | Save Tensorflow/Keras model to Minio repository optimized for inference. | 50% | Tensorflow 1.13 support has deprecated library usage. Tensorflow2.0 is alpha version. Distributed Analytics doesn't support it yet. |
5b. | Save Horovod based Tensorflow/Keras model to Minio repository optimized for inference | 0% | Need to remove Horovod layers. |
5c. | Save sci-kit learn model and optimize for inference | 0% | El-Alto??? |
5d. | Save PyTorch learn model and optimize for inference | 0% | El-Alto??? |
6. | Inference | ||
6a. | Inference package should bringup Tensorflow Serving container based on the model and bucket name. Ensure TF Serving operator works. | 100% | |
6b. | Model Serving for Pytorch, sci-kit learn and other models | 0% | |
6c. | Test Inference application for MNIST models for various frameworks | 0% | |
7. | End to End Distributed Analytics stack composition | 20% | |
7a. | Ingress support for exporting remote services across edge sites | 50% | Only Minio is enabled at this time. Other services needs to create appropriate K8s ingress objects. |
7b. | MetalLB support for ingress gateway | 50% | Tested in Layer-2 mode. BGP mode is not tested. |
7c. | Ensure reachability between services using External DNS, MetalLB and Ingress path based routing. | 0% | |
7d. | Training end to end | 0% | |
7e. | Inference based on trained model | 0% | |