Table of Contents | ||
---|---|---|
|
...
In this project, we provide a systematic way to real-time ingest DMaaP data to permanent storage and provide analytics tools and applications built on the data. DataLake's goals are:
- Provide a systematic way to real-time ingest DMaaP data to Couchbase, a distributed document-oriented database, and Druid, a data store designed for low-latency OLAP analytics.
- Serve as a common document storage for other ONAP components as well, with easy access.
- Provide data-access APIs and ways for ONAP components and external systems (e.g. OSS/BSS) to consume the data.
- Provide sophisticated and ready-to-use interactive analytics GUI tools that are built on the data. Custom analytics applications are also built on the data, whose results are exposed via REST API.
Architecture
The data storage and associated tools are external infrastructures to ONAP, to be installed only once initially, or making use of existing infrastructures. Since costume setting and applications will be deployed to them, they are really integrated part parts of DataLake.
Scope
Data Sources
...
Provide admin REST API for configuration configurations and topic management. A topic can be configured to be exported to which data stores, with Couchbase and Druid supported initially, and TTL (Time To Live) in the stores. We will support more distributed databases in the future if needed.
Provide SDC/Design time framework UI for managementAdmin GUI to manage the dispatcher, making use of the above admin REST API. It also manages the analytics tools and applications.
Document Store
Monitor selected topics, real-time pull the data and insert it into Couchbase, one table for each topic, with the same table name as the topic name.
Data types JSON, XML, and YAML are auto converted into native store schema. We may support additional formats. Data not in these formats is stored as a single string.
Provide REST API for data query, while applications can access the data through native API as well.
Couchbase supports Spark direct running on it, which allow complicate analytics tools to be built. We will develop Spark analytics applications if needed.
- Other ONAP components can take advantage this to store their operational data. If we need to run heavy analytics jobs on historical data, we should separate the operational data from historical data. Otherwise we have the option to have both to coexist, due to Couchbase's scalability.
OLAP Store
Monitor selected topics, real-time pull the data and insert it into Druid, one datasource for each topic, with the same datasource name as the topic name.
Extracts the dimensions and metrics from JSON files, and pre-configure Druid settings for each datasource, which is customizable through a web interface.
Integrate Apache Superset for data exploration and visualization, and provide pre-builds interactive dashboards.
Integrate Grafana for time series analytics.
...
- link to seed code (if applicable)
- Vendor Neutral
- Yes
- Meets Board policy (including IPR)
- A JIRA ticket is created for This proposal was presented at 2018-10-29 Arc dublin F2FDublin Architecture Planning F2F Meeting, and a JIRA ticket is created.
Use the above information to create a key project facts section on your project page
...