Attachments (including meeting recording)
...
Title | Responsible | Status | Last discussed | Notes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Shared Cassandra Database | Roger MaitlandIntegration Blockers |
| YellowRegarding
| In Progress | 14th March 2019 |
|
| OOM
| 1652
| /
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
- Is this something AAI team needs to be aware of? Or is it OK for OOM team to just switch it around?
- Will this introduce unexpected dependencies between AAI, AAF, OOF, Portal and SDC that will create difficulties for upgrade/downgrade/backup/restore/maintenance/schema change?
- Is it going to exacerbate the performance problems already noted?
A number of gerrit review issues raised: https://gerrit.onap.org/r/#/c/79425/
The "rolling upgrade" change has been combined with the "shared cassandra" change.
The "shared cassandra" change has been combined with the "AAF shared cassandra" change, which means it's also combined with the "AAI shared cassandra" change.
This sounds like a recipe for disaster.
18th Mar: New patchsets to address our concerns https://gerrit.onap.org/r/#/c/82418/
Status | ||||
---|---|---|---|---|
|
New requirement to have our csit run as part of oom test environment in windriver
Long term goal: one robot test case per endpoint that can run as part of e2e test
AAI R4 Integration Sanity Test Plans
See also Contributing To AAI Best Practise
4th March:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
13th March: Success! First commit merged, more to follow!
Status | ||
---|---|---|
|
31st Jan 2019:
The schema-service is ready. Currently it provides file-sharing capabilities in terms of schema/edgerule files.
In order for GraphGraph to take advantage of the schema parsing/processing in schema-service additional abstractions have to be implemented on top of the crude file2string functionality currently in schema-service.
- Venkata Harish Kajurwill ask Manisha Aggarwalif the current functionality of the schema-service is the final version for Dublin and if there will be further enhancements in next releases.
GraphGraph needs the following functionality:
Venkata Harish Kajur and Manisha Aggarwal What is missing in schema service that is needed in graphgraph is the following:
- rest call to get available schemas
- list of all schema nodes/items (like vserver, tenant, p-interfaces..) for example on a REST path /schemas/{schema}/nodes
- all relevant attributes of a given node/item for example on REST path /schemas/{schema}/nodes/{node}
- edges/relationships with their attributes between schema nodes/items (for example on REST path /schemas/{schema}/edges where you specify a "from" "to" schema items as query params)
- subgraph of the schema, where you specify 1. initial (root) items/node (like tenant or vserver) 2. schema version and 3. number of parent/cousin/child hops from the initial item/node
- all paths in a given schema graph between 2 items/nodes (like vserver and tenant) for a given schema version
- edges in the schema graph should be composed of edges in the schema file + edges created from the edgerules file
- edges should contain basic attributes when delivered via the subgraph call (like parent/child relationship and important properties from edgerules) and have additional (or all) attributes when queries via /schemas/{schema}/edges REST endpoint.
20. Mar 2019:
Open questions for schema-service:
- what is the current implemented functionality?
- what are the business use-cases in ONAP for schema-service? Description of functionality in relation to other services/projects is needed. In other words who needs it and why?
- if no business use-cases can be formulated we should consider removing schema-service from A&AI and replacing it with standard file-sharing mechanisms.
21st Mar 2019:
Based on William Reehil comments
https://lf-onap.atlassian.net/wiki/display/DW/AAI+Schema+Service?focusedCommentId=16325457 what is "our future proposed functionality"?
Status | ||||
---|---|---|---|---|
|
Guangrong Fu mentioned AAI in Baseline Measurements based on Testing Results:
The problem for caching is how to know when to update the cached data. Even though the access time may be fast for Holmes, the risk is using out-of-date data,
- Cache the AAI data and refresh them periodically so that Holmes won't have to make an HTTP call to AAI every time it tries to correlate one alarm to another.
Regarding
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
- Is this something AAI team needs to be aware of? Or is it OK for OOM team to just switch it around?
- Will this introduce unexpected dependencies between AAI, AAF, OOF, Portal and SDC that will create difficulties for upgrade/downgrade/backup/restore/maintenance/schema change?
- Is it going to exacerbate the performance problems already noted?
A number of gerrit review issues raised: https://gerrit.onap.org/r/#/c/79425/
The "rolling upgrade" change has been combined with the "shared cassandra" change.
The "shared cassandra" change has been combined with the "AAF shared cassandra" change, which means it's also combined with the "AAI shared cassandra" change.
This sounds like a recipe for disaster.
18th Mar: New patchsets to address our concerns https://gerrit.onap.org/r/#/c/82418/
Status | ||||
---|---|---|---|---|
|
New requirement to have our csit run as part of oom test environment in windriver
Long term goal: one robot test case per endpoint that can run as part of e2e test
AAI R4 Integration Sanity Test Plans
See also Contributing To AAI Best Practise
4th March:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
13th March: Success! First commit merged, more to follow!
20th March: Second commit merged https://gerrit.onap.org/r/#/c/82623/
Status | ||||
---|---|---|---|---|
|
Guangrong Fu mentioned AAI in Baseline Measurements based on Testing Results:
- Cache the AAI data and refresh them periodically so that Holmes won't have to make an HTTP call to AAI every time it tries to correlate one alarm to another.
The problem for caching is how to know when to update the cached data. Even though the access time may be fast for Holmes, the risk is using out-of-date data, so the correlations will be wrong anyway. Also, duplicating the AAI data outside of AAI is probably a bad architectural decision. Making AAI faster for these use cases would be better.
Has there been a performance analysis of where the time is spent? Could it help to use ElasticSearch (e.g. as in sparky)? Should Holmes have a batch interface to get more AAI data in fewer calls? Or a better correlation API that results in fewer calls?
31st Oct: https://lists.onap.org/g/onap-discuss/topic/27805753
1st Nov:
- Guangrong Fu will try custom queries for queries that took to long to return
- The hardware (mainly storage) influences the query speed - need to find out what hardware was the speed test conducted on (Guangrong Fu will provide HW specs)
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key HOLMES-186
Would the AAI Cacher
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
5th Mar: Guangrong Fu
Hi,
Sorry for my late response. It took me a long time to set up AAI in my own env. For Item 10, here's some information:
Main APIs invoked in Holmes for different use cases:
VoLTE
- Getting the VM query URL via: /search/nodes-query?search-node-type=vserver&filter=vserver-name:EQUALS: - once
- Getting VM info via: the URL returned by the query above - once
- Getting the VNF data via: network/generic-vnfs/generic-vnf - once
CCVPN
- Updating terminal point via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
- Getting logical links via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - 3 times
- Getting VPN bingding info via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
- Getting connectivity info via: /network/vpn-bindings/vpn-binding/{vpnId} - once
- Getting service instance info via: /network/connectivities/connectivity/{connectivityId} - once
Performance
We set up an AAI env on a VM (8 cores, 16GB memory, 160GB storage) following the guidance https://wiki.onap.org/display/DW/How+to+Docker+setup+on+Single+VM+HEAT+Deployment and tried to run a VNF query using "/aai/v11/cloud-infrastructure/cloud-regions/cloud-region/example-cloud-owner-val-45051/example-cloud-region-id-val-56689/tenants/tenant/example-tenant-id-val-51834/vservers/vserver/example-vserver-id-val-51834" (which is returned by "/search/nodes-query?search-node-type=vserver&filter=vserver-name:EQUALS:") for 1000 times. It took ~95ms per query. Also, we tried to query a VNF for 1000 times via "/aai/v11/network/generic-vnfs/generic-vnf/example-vnf-id-val-92494" and the average time is ~86ms.
From the result, we know that even for a single request, the time cost reaches around 100ms. Let alone there will be several requests sent to AAI when an alarm is processed by Holmes. Taking CCVPN for example, for each alarm, there are up to 7 requests made. That means it'll take around 600-700 ms for Holmes to interact with AAI. In case of alarm storms, it is hard for AAI to support such intensive queries.
6th March: Guangrong Fu
In my opinion, the performance of AAI queries is not only impacted by the computation inside AAI, but also impacted by the HTTP request itself.
I've done another test. I tried to send requests to the health check API (which does nothing but return immediately after it receives a request ) of Holmes. The average time cost is also ~ 70ms. So it seems to be a problem with the time cost caused by setting up and releasing HTTP connections.
6th March: Keong
Regarding these queries:
- Getting logical links via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - 3 times
- Getting VPN bingding info via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
What depth is used on these GET calls? If the defaulting to depth=0, then perhaps some improvement can be made by using "depth=1" or "depth=2"? Fewer calls returning more data could improve overall performance.
Same could be achieved by changing to Nodes query, e.g.
GET /aai/v14/nodes/p-interfaces?interface-name=nodeId-{pnfName}-ltpId-{ifName}
Question1: Can the Bulk API be used with GET calls? Documentation does not show any examples of GET actions. https://onap.readthedocs.io/en/casablanca/submodules/aai/aai-common.git/docs/AAI%20REST%20API%20Documentation/bulkApi.html
Question2: Would it help to have the Holmes pod co-located with the AAI haproxy and AAI resources pods? Reduced network latency could improve overall performance.
Guangrong: Holmes is acutally deployed by DCAE. I'm not sure whether your proposal is feasible. What's more, the performance data I got was based on the fact that Holmes and AAI were deployed on the same VM, sharing the same docker env.
Status | ||||
---|---|---|---|---|
|
Potentially breaking change:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
See also:
- Proposal to Change AAI PNF Entity to use PNF-ID as key
- AAI R4 Use Case and Functional Requirements Impacts
- Suggestion in comment Re: AAI REST API Documentation - Dublin
Questions:
- how to minimise impact of the transition from pnf-name as unique to pnf-id as unique key?
- would the v14 URL be different from the v15 URL? would both paths be equally supported for GET/PUT/etc?
- what forwards-compatibility or backwards-compatibility will be supported?
- how to migrate forwards or backwards database versions, ONAP versions, etc, across this transition?
- who is going to implement it? Test it?
- what is the impact of this not going ahead?
- William LaMont will check for existing migration utility that handles this use case (changing the key from one existing attribute to another)
- James Forsyth will socialize the breaking change on the PNF in the next PTL call so clients can prepare to do a search for ?pnf-name=${pnf-name} instead of /pnfs/pnf/${pnf-name}. They also need to handle doing the PUT operation differently - Added to PTL agenda PTL 2019-02-19
- need to find out what hardware was the speed test conducted on (Guangrong Fu will provide HW specs)
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key HOLMES-186
Would the AAI Cacher
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
5th Mar: Guangrong Fu
Hi,
Sorry for my late response. It took me a long time to set up AAI in my own env. For Item 10, here's some information:
Main APIs invoked in Holmes for different use cases:
VoLTE
- Getting the VM query URL via: /search/nodes-query?search-node-type=vserver&filter=vserver-name:EQUALS: - once
- Getting VM info via: the URL returned by the query above - once
- Getting the VNF data via: network/generic-vnfs/generic-vnf - once
CCVPN
- Updating terminal point via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
- Getting logical links via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - 3 times
- Getting VPN bingding info via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
- Getting connectivity info via: /network/vpn-bindings/vpn-binding/{vpnId} - once
- Getting service instance info via: /network/connectivities/connectivity/{connectivityId} - once
Performance
We set up an AAI env on a VM (8 cores, 16GB memory, 160GB storage) following the guidance https://lf-onap.atlassian.net/wiki/display/DW/How+to+Docker+setup+on+Single+VM+HEAT+Deployment and tried to run a VNF query using "/aai/v11/cloud-infrastructure/cloud-regions/cloud-region/example-cloud-owner-val-45051/example-cloud-region-id-val-56689/tenants/tenant/example-tenant-id-val-51834/vservers/vserver/example-vserver-id-val-51834" (which is returned by "/search/nodes-query?search-node-type=vserver&filter=vserver-name:EQUALS:") for 1000 times. It took ~95ms per query. Also, we tried to query a VNF for 1000 times via "/aai/v11/network/generic-vnfs/generic-vnf/example-vnf-id-val-92494" and the average time is ~86ms.
From the result, we know that even for a single request, the time cost reaches around 100ms. Let alone there will be several requests sent to AAI when an alarm is processed by Holmes. Taking CCVPN for example, for each alarm, there are up to 7 requests made. That means it'll take around 600-700 ms for Holmes to interact with AAI. In case of alarm storms, it is hard for AAI to support such intensive queries.
6th March: Guangrong Fu
In my opinion, the performance of AAI queries is not only impacted by the computation inside AAI, but also impacted by the HTTP request itself.
I've done another test. I tried to send requests to the health check API (which does nothing but return immediately after it receives a request ) of Holmes. The average time cost is also ~ 70ms. So it seems to be a problem with the time cost caused by setting up and releasing HTTP connections.
6th March: Keong
Regarding these queries:
- Getting logical links via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - 3 times
- Getting VPN bingding info via: /network/pnfs/pnf/{pnfName}/p-interfaces/p-interface/nodeId-{pnfName}-ltpId-{ifName} - once
What depth is used on these GET calls? If the defaulting to depth=0, then perhaps some improvement can be made by using "depth=1" or "depth=2"? Fewer calls returning more data could improve overall performance.
Same could be achieved by changing to Nodes query, e.g.
GET /aai/v14/nodes/p-interfaces?interface-name=nodeId-{pnfName}-ltpId-{ifName}
Question1: Can the Bulk API be used with GET calls? Documentation does not show any examples of GET actions. https://onap.readthedocs.io/en/casablanca/submodules/aai/aai-common.git/docs/AAI%20REST%20API%20Documentation/bulkApi.html
Question2: Would it help to have the Holmes pod co-located with the AAI haproxy and AAI resources pods? Reduced network latency could improve overall performance.
Guangrong: Holmes is acutally deployed by DCAE. I'm not sure whether your proposal is feasible. What's more, the performance data I got was based on the fact that Holmes and AAI were deployed on the same VM, sharing the same docker env.
Status | ||||
---|---|---|---|---|
|
Raised JIRA
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Also added to Contributing To AAI Best Practise
Please test and review.
Status | ||||
---|---|---|---|---|
|
Is there a guide for the description of the error message and the error codes? How are new error states (message + code) added?
- William LaMont will send James Forsyth the output of a script that formats the error.properties file to make a wiki page and readthedocs
Status | ||||
---|---|---|---|---|
|
Potentially breaking change:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
See also:
- Proposal to Change AAI PNF Entity to use PNF-ID as key
- AAI R4 Use Case and Functional Requirements Impacts
- Suggestion in comment Re: AAI REST API Documentation - Dublin
Questions:
- how to minimise impact of the transition from pnf-name as unique to pnf-id as unique key?
- would the v14 URL be different from the v15 URL? would both paths be equally supported for GET/PUT/etc?
- what forwards-compatibility or backwards-compatibility will be supported?
- how to migrate forwards or backwards database versions, ONAP versions, etc, across this transition?
- who is going to implement it? Test it?
- what is the impact of this not going ahead?
- William LaMont will check for existing migration utility that handles this use case (changing the key from one existing attribute to another)
- James Forsyth will socialize the breaking change on the PNF in the next PTL call so clients can prepare to do a search for ?pnf-name=${pnf-name} instead of /pnfs/pnf/${pnf-name}. They also need to handle doing the PUT operation differently - Added to PTL agenda PTL 2019-02-19
Raised JIRA
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
In think it would be good to answer what is the meaning of the field (collection of PEMs of the CA xor URL)
Questions:
1. Is AAI intended to strictly prescribe how the fields are used and what contents are in the values?
2. Or does AAI simply reflect the wishes of all the client projects that use it to store and retrieve data?
Even if (1) is true, AAI is not really in any position to enforce how clients use the data, so really (2) is always true and we need to consult the original producers of the data and the ultimate consumers of the data to document their intended meanings.
How do we push to have documentation on the purpose and meaning of the fields in AAI?
Where does all this documentation go?
Should the documentation be backed up by validation code?
See also discussion about AAI in 2018-11-28 ExtAPI Meeting notes
29th Nov: Started on new wiki page AAI Schema Producer-Consumer Pairings
|
Also added to Contributing To AAI Best Practise
Please test and review.
Status | ||||
---|---|---|---|---|
|
Is there a guide for the description of the error message and the error codes? How are new error states (message + code) added?
- William LaMont will send James Forsyth the output of a script that formats the error.properties file to make a wiki page and readthedocs
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Looking at AAI usage in OOF - HPA guide for integration testing by Dileep Ranganathan, wondering whether there is a better way to bootstrap AAI test data?
Generating AAI data
Note: Required only if the Multicloud has no real cloud-regions and HPA discovery cannot happen.
If Multicloud team has data for creating the Cloud-region and doesn't have the HPA, then please update the existing data with the flavors with HPA.
- Import the postman collection CASABLANCA_AAI_postman.json
- To add/remove HPA Capabilities edit the flavors section in the body of PUT Cloud-Region{x}
- Once all the necessary Use postman to add the complex and cloud regions in the order specified below
(snip screenshot of specific sequence)- Use the GET requests to verify the data.
(snip screenshot of specific sequence)
Similarly, Scott Seabolt and J / Joss Armstrong wrote for APPC Sample A&AI Data Setup for vLB/vDNS for APPC Consumption and Script to load vLB into AAI:
The below put_vLB.sh script can be used to submit the vLB data to A&AI in order to run ConfigScaleOut use case. This script and referenced JSON files are used on an AAI instance where the cloud-region and tenant are already defined.
Similarly:
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key TEST-133 Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key INT-705 - vCPE Use Case Tutorial: Design and Deploy based on ONAP
Related
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
One for VIM: How-To: Register a VIM/Cloud Instance to ONAP and
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Potential issues:
- fragility of static import data file w.r.t. schema changes and version upgrades for each ONAP release?
- how "common" is this knowledge, i.e. what to load, where to get it, who else should be using it, etc?
- should it be automated/scripted, rather than manual steps to bootstrap?
- should it be a simulator program or test harness, rather than a static data file?
- should it reside within AAI CI/CD jobs for maintenance and upgrade of schema versions?
- who maintains the data itself? Is there a "data repository" which can be delegated to other teams, e.g. like documentation repository links in git?
- how many other teams have similar private stashes of AAI bootstrap data?
- does it need to be published at a stable URL to avoid linkrot?
Status | ||||
---|---|---|---|---|
|
Under OOF Homing and Allocation Service (HAS) section, Dileep Ranganathan wrote about Project Specific enhancements:
Optimize - AAI cache
- Use MUSIC or any other alternative in memory caching like Redis etc?
- Optimize flavor retrieval from A&AI and Cache the information if necessary
See also Dénes Németh wrote in
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
In think it would be good to answer what is the meaning of the field (collection of PEMs of the CA xor URL)
Questions:
1. Is AAI intended to strictly prescribe how the fields are used and what contents are in the values?
2. Or does AAI simply reflect the wishes of all the client projects that use it to store and retrieve data?
Even if (1) is true, AAI is not really in any position to enforce how clients use the data, so really (2) is always true and we need to consult the original producers of the data and the ultimate consumers of the data to document their intended meanings.
How do we push to have documentation on the purpose and meaning of the fields in AAI?
Where does all this documentation go?
Should the documentation be backed up by validation code?
See also discussion about AAI in 2018-11-28 ExtAPI Meeting notes
29th Nov: Started on new wiki page AAI Schema Producer-Consumer Pairings
Status | ||||
---|---|---|---|---|
|
Looking at AAI usage in OOF - HPA guide for integration testing by Dileep Ranganathan, wondering whether there is a better way to bootstrap AAI test data?
Generating AAI data
Note: Required only if the Multicloud has no real cloud-regions and HPA discovery cannot happen.
If Multicloud team has data for creating the Cloud-region and doesn't have the HPA, then please update the existing data with the flavors with HPA.
- Import the postman collection CASABLANCA_AAI_postman.json
- To add/remove HPA Capabilities edit the flavors section in the body of PUT Cloud-Region{x}
- Once all the necessary Use postman to add the complex and cloud regions in the order specified below
(snip screenshot of specific sequence)- Use the GET requests to verify the data.
(snip screenshot of specific sequence)
Similarly, Scott Seabolt and J / Joss Armstrong wrote for APPC Sample A&AI Data Setup for vLB/vDNS for APPC Consumption and Script to load vLB into AAI:
The below put_vLB.sh script can be used to submit the vLB data to A&AI in order to run ConfigScaleOut use case. This script and referenced JSON files are used on an AAI instance where the cloud-region and tenant are already defined.
Similarly:
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key TEST-133 Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key INT-705 - vCPE Use Case Tutorial: Design and Deploy based on ONAP
Related
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
One for VIM: How-To: Register a VIM/Cloud Instance to ONAP and
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Similarly to the "AAI too slow for Holmes" item below, this introduction of extra caching of AAI data is a worrisome development and sad indictment of the performance of the system architecture.
What can we do about this?
Would the AAI Cacher
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
|
Potential issues:
- fragility of static import data file w.r.t. schema changes and version upgrades for each ONAP release?
- how "common" is this knowledge, i.e. what to load, where to get it, who else should be using it, etc?
- should it be automated/scripted, rather than manual steps to bootstrap?
- should it be a simulator program or test harness, rather than a static data file?
- should it reside within AAI CI/CD jobs for maintenance and upgrade of schema versions?
- who maintains the data itself? Is there a "data repository" which can be delegated to other teams, e.g. like documentation repository links in git?
- how many other teams have similar private stashes of AAI bootstrap data?
- does it need to be published at a stable URL to avoid linkrot?
Under OOF Homing and Allocation Service (HAS) section, Dileep Ranganathan wrote about Project Specific enhancements:
Optimize - AAI cache
- Use MUSIC or any other alternative in memory caching like Redis etc?
- Optimize flavor retrieval from A&AI and Cache the information if necessary
See also
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
HPA telemetry data collection and make it persistent in A&AI, from which OOF can leverage during its decision making process.and
1. Multi-cloud to collect the data from time-series data services like Prometheus (http://prometheus.io) or openstack Gnocchi, and push them to A&AI based on the data recording & aggregation rules.
and
The reason why we propose here is that VES mechanism doesn't store the telemetry data into A&AI. And OOF now can only get those kind of data from A&AI.
Some concerns:
- how much additional load will this place on AAI?
- will AAI cope with this load?
- is AAI suitable for "time-series data"?
- is "telemetry data" considered to be "active & available inventory"?
- should OOF access the telemetry/time-series data via other means (not AAI)?
- AAI API latency (4~6 second per request as benchmarked in CMCC lab) could be a problem
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Similarly to the "AAI too slow for Holmes" item below, this introduction of extra caching of AAI data is a worrisome development and sad indictment of the performance of the system architecture.
What can we do about this?
Would the AAI Cacher
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Comments on Orchestration Scenarios related to AAI:
Viswanath Kumar Skand Priya / kspviswa said:
Thank you Ranny Haiby & Fernando Oliveira . I agree partly, but I still have following queries.
- I agree & acknowledge that atleast for a foreseeable future, we would need a way to specify the VNFM / NFVO as part of "Design Decision", which I believe can be reflected as part of VNFD/NSD ( using some special attribute ) or as part of internal Model that SDC might build before distributing the same. SO can then use this hint to select relevant actors. My only question is, why this has to be maintained in AAI which is exclusively for runtime record? All AAI cares about is what is running in the network irrespective of how that got orchestrated. Isn't it ?
On a broader note, I would like to understand what's the original intent of AAI ( atleast in ECOMP world ) ? Are we simply assuming that, just because AAI has "available inventory" in its name, we are expecting it to keep track of cloud inventory realtime ? Because our entire story ( including the new G-FPS proposal ) is based on this assumption. Can anyone from AAI team or ATT clarify on this ?
Because AFAIK, AAI neither has the schema to host such available inventory, nor the MC has the pub/sub or polling mechanism ( today ) to refresh the cloud inventory inplace. Ofcourse those can be scoped for further releases, but my original question is, was that the original intent behind AAI or are we now including it in the scope?
and Fernando Oliveira replied:
For the first question: I think that A&AI needs to maintain the VNF instance ↔ VNFM instance and the NS instance ↔ NFVO instance relationship for subsequent life cycle operations, i.e. a scale or heal operation. The path would be something like Event (VNF Instance, Busy) → DCAE (policy for VNF instance) → Policy Evaluation (VNF instance, Scale-out) → SO (VNF instance, Scale-out) → A&AI (find VNFM instance for the VNF instance) → SO (VNF instance, VNFM instance, Scale-out) → SOL003 Adapter (VNFM instance, VNF instance, Scale-out) → VNFM instance (VNF Instance, Scale-out).
As I understand, ESR has "esr-vnfm-list", which has an "esr-vnfm", which has "esr-system-info-list", which has "esr-system-info", which has a "relationship-list" that can contain relationships to "generic-vnf" and other AAI objects.
The "generic-vnf" object also contains "self-link", "ipv4OamAddress", "ipv4OamGatewayAddress", etc, which links the AAI object back to its "source-of-truth" external-system.
Is there some new data, new schema or new API that is required on top of this?
Fernando Oliveira; Apologies for my lack of knowledge, but a few comments:
- For the VNF/VF ↔ VNFM case, I think that there needs to be a reference from a VNF/VF instance record to the specific instance of the VNFM that was used to deploy the VNF/VF. If there is already such a reference from the VNF/VF through the ESR to the specific item on the esr-vnfm-list, then I think that would be sufficient. If not, I think that would be a new requirement.
- For the Service ↔ NFVO case, Is there an equivalent NFVO/Orchestrator list in the ESR? The esr-nfvo-list would need the same set of info as the VNFM case. If the esr-nfvo-list does exist, I think that there needs to be a reference from the Service Instance record to the specific NFVO instance that deployed the Service. Is there such a reference? If not, I think that would be a new requirement.
Bo Lv can comment more on the current ESR capabilities, but I believe there are only 3 kinds of systems so far: EMS, VNFM and third-party SDNC.
ESR could be extended to handle VNFO as another kind of system.
Fernando Oliveira : I created JIRA stories:
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key ONAPARC-388 Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key ONAPARC-389 Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key ONAPARC-390
for various parts of the scenario.
t is this item related to your question for Support ETSI NFV-SOL 005 (Os-Ma-Nfvo ref point ) between SO & VF-C/NFVO?
Keong Lim it is related to the question
Status | ||||
---|---|---|---|---|
|
is there a way to do partial match? regex?
PUT /aai/v13/query?format=raw
Code Block |
---|
{"gremlin": "g.V().has('aai-ts', org.janusgraph.core.attribute.Text.textContains('URL'))"} |
- Invite Arul Nambi and CT Paterson to next week's dev call to talk about how sparky/elastic does partial/range matches
Status | ||||
---|---|---|---|---|
|
Reduction in size is mostly onf aai-common image as that based on ubuntu.
2/7 - Move the base image to be a part of ONAP Build, maybe aai-common repo
- Venkata Harish Kajur will create a Jira for it in Dublin Release
Move the aai-common Dockerfile RUN into the resources, traversal, graphadmin, cacher, schema service microservice
Status | ||
---|---|---|
|
The schema-service is ready. Currently it provides file-sharing capabilities in terms of schema/edgerule files.
In order for GraphGraph to take advantage of the schema parsing/processing in schema-service additional abstractions have to be implemented on top of the crude file2string functionality currently in schema-service.
Open question:
- What schema abstractions are planned in schema-service?
- When will the abstractions (from point 1.) be implemented?
- Venkata Harish Kajurwill ask Manisha Aggarwalif the current functionality of the schema-service is the final version for Dublin and if there will be further enhancements in next releases.
Venkata Harish Kajur and Manisha Aggarwal What is missing in schema service that is needed in graphgraph is the following:
- rest call to get available schemas
- list of all schema nodes/items (like vserver, tenant, p-interfaces..) for example on a REST path /schemas/{schema}/nodes
- all relevant attributes of a given node/item for example on REST path /schemas/{schema}/nodes/{node}
- edges/relationships with their attributes between schema nodes/items (for example on REST path /schemas/{schema}/edges where you specify a "from" "to" schema items as query params)
- subgraph of the schema, where you specify 1. initial (root) items/node (like tenant or vserver) 2. schema version and 3. number of parent/cousin/child hops from the initial item/node
- all paths in a given schema graph between 2 items/nodes (like vserver and tenant) for a given schema version
- edges in the schema graph should be composed of edges in the schema file + edges created from the edgerules file
- edges should contain basic attributes when delivered via the subgraph call (like parent/child relationship and important properties from edgerules) and have additional (or all) attributes when queries via /schemas/{schema}/edges REST endpoint.
Bin Yang and Lianhao Lu (Deactivated) wrote in
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
HPA telemetry data collection and make it persistent in A&AI, from which OOF can leverage during its decision making process.and
1. Multi-cloud to collect the data from time-series data services like Prometheus (http://prometheus.io) or openstack Gnocchi, and push them to A&AI based on the data recording & aggregation rules.
and
The reason why we propose here is that VES mechanism doesn't store the telemetry data into A&AI. And OOF now can only get those kind of data from A&AI.
Some concerns:
- how much additional load will this place on AAI?
- will AAI cope with this load?
- is AAI suitable for "time-series data"?
- is "telemetry data" considered to be "active & available inventory"?
- should OOF access the telemetry/time-series data via other means (not AAI)?
- AAI API latency (4~6 second per request as benchmarked in CMCC lab) could be a problem
Status | ||||
---|---|---|---|---|
|
Comments on Orchestration Scenarios related to AAI:
Viswanath Kumar Skand Priya / kspviswa said:
Thank you Ranny Haiby & Fernando Oliveira . I agree partly, but I still have following queries.
- I agree & acknowledge that atleast for a foreseeable future, we would need a way to specify the VNFM / NFVO as part of "Design Decision", which I believe can be reflected as part of VNFD/NSD ( using some special attribute ) or as part of internal Model that SDC might build before distributing the same. SO can then use this hint to select relevant actors. My only question is, why this has to be maintained in AAI which is exclusively for runtime record? All AAI cares about is what is running in the network irrespective of how that got orchestrated. Isn't it ?
On a broader note, I would like to understand what's the original intent of AAI ( atleast in ECOMP world ) ? Are we simply assuming that, just because AAI has "available inventory" in its name, we are expecting it to keep track of cloud inventory realtime ? Because our entire story ( including the new G-FPS proposal ) is based on this assumption. Can anyone from AAI team or ATT clarify on this ?
Because AFAIK, AAI neither has the schema to host such available inventory, nor the MC has the pub/sub or polling mechanism ( today ) to refresh the cloud inventory inplace. Ofcourse those can be scoped for further releases, but my original question is, was that the original intent behind AAI or are we now including it in the scope?
and Fernando Oliveira replied:
For the first question: I think that A&AI needs to maintain the VNF instance ↔ VNFM instance and the NS instance ↔ NFVO instance relationship for subsequent life cycle operations, i.e. a scale or heal operation. The path would be something like Event (VNF Instance, Busy) → DCAE (policy for VNF instance) → Policy Evaluation (VNF instance, Scale-out) → SO (VNF instance, Scale-out) → A&AI (find VNFM instance for the VNF instance) → SO (VNF instance, VNFM instance, Scale-out) → SOL003 Adapter (VNFM instance, VNF instance, Scale-out) → VNFM instance (VNF Instance, Scale-out).
As I understand, ESR has "esr-vnfm-list", which has an "esr-vnfm", which has "esr-system-info-list", which has "esr-system-info", which has a "relationship-list" that can contain relationships to "generic-vnf" and other AAI objects.
The "generic-vnf" object also contains "self-link", "ipv4OamAddress", "ipv4OamGatewayAddress", etc, which links the AAI object back to its "source-of-truth" external-system.
Is there some new data, new schema or new API that is required on top of this?
Fernando Oliveira; Apologies for my lack of knowledge, but a few comments:
- For the VNF/VF ↔ VNFM case, I think that there needs to be a reference from a VNF/VF instance record to the specific instance of the VNFM that was used to deploy the VNF/VF. If there is already such a reference from the VNF/VF through the ESR to the specific item on the esr-vnfm-list, then I think that would be sufficient. If not, I think that would be a new requirement.
- For the Service ↔ NFVO case, Is there an equivalent NFVO/Orchestrator list in the ESR? The esr-nfvo-list would need the same set of info as the VNFM case. If the esr-nfvo-list does exist, I think that there needs to be a reference from the Service Instance record to the specific NFVO instance that deployed the Service. Is there such a reference? If not, I think that would be a new requirement.
Bo Lv can comment more on the current ESR capabilities, but I believe there are only 3 kinds of systems so far: EMS, VNFM and third-party SDNC.
ESR could be extended to handle VNFO as another kind of system.
Fernando Oliveira : I created JIRA stories:
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key ONAPARC-388 Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key ONAPARC-389 Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key ONAPARC-390
for various parts of the scenario.
t is this item related to your question for Support ETSI NFV-SOL 005 (Os-Ma-Nfvo ref point ) between SO & VF-C/NFVO?
Keong Lim it is related to the question
Status | ||||
---|---|---|---|---|
|
- Invite Arul Nambi and CT Paterson to next week's dev call to talk about how sparky/elastic does partial/range matches
Status | ||||
---|---|---|---|---|
|
Reduction in size is mostly onf aai-common image as that based on ubuntu.
2/7 - Move the base image to be a part of ONAP Build, maybe aai-common repo
- Venkata Harish Kajur will create a Jira for it in Dublin Release
Move the aai-common Dockerfile RUN into the resources, traversal, graphadmin, cacher, schema service microservice
Status | ||||
---|---|---|---|---|
|
AAF will generate certificates to the be used by the containers at startup; AAI services should use the run-time generated certs instead of the ones that are in the repos or oom charts.
In dublin the services will mount a volume with certificates. This is on the roadmap for Dublin as a feature.
- is this for all service and/or HAProxy?
- Where are the certificates coming from (OOM/gerrit/generated by AAF)
- James Forsythwill ask Jonathan Gatham when the certificate init image is going to be available in ONAP and wether it is documented
Status | ||
---|---|---|
|
FREEMAN, BRIAN D asked on Re: Backup and Restore Solution: ONAP-OOM :
what would be the approach to backup an entire ONAP instance particualarly SDC, AAI, SDNC data ? would it be a script with all the references to the helm deploy releases or something that does a helm list and then for each entry does the ark backup ?
What is the AAI strategy for backup and restore?
What is the overall ONAP strategy for backup and restore?
Should it be unified with the data migration strategy as per "Hbase to Cassandra migration" on 2018-11-14 AAI Meeting Notes?
- James Forsythwill raise the topic of having backups and restore functionality in ONAP - if it is feasible, on the roadmap and what others PTL think
Jimmy didn't directly raise the topic but there was movement - Keong Lim asked "if istio service mesh is a no-go, is there a replacement for secure onap communications? is backup/restore/upgradability included in s3p?"
Michael O'Brien replied that a reference tool set for backup and restore was introduced in Casablanca: Backup and Restore Solution: ONAP-OOM
Mike Elliott said he would look at Brian's question, AAI will provide support as needed.
Status | ||
---|---|---|
|
Michael O'Brien has documented performance issues in aai-cassandra:
- Cloud Native Deployment (search for LOG-376 to find the specific references to AAI)
- https://jira.onap.org/browse/LOG-376?focusedCommentId=29358&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-29358
hector has discovered that the stress test jar (liveness probe?) in aai-cassandra is hammering the cpu/ram/hd on the vm that aai is on - this breaks the etcd cluster (not the latency/network issues we suspected that may cause pod rescheduling)
Is there something that should be tweaked in AAI config? Or documentation on the recommended setup to run the VM?
I'll come to the next AAI meet (conflicts with pomba meet) -
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
20190108 work continues to find the cause - I see 7 vCore spikes on cassandra as well as a saturated logstash on that particular vm - we are no longer a DaemonSet (13 instances on a 13+1 cluster) - I will reduce the current ReplicaSet from 5 to 2 or 1 until I can label the nodes and/or find out what is causing ls to saturate - Prudence Au and Sanjay Agraharam mentioned cassandra - I have seen cs high on several "top" sessions - will post screen caps - bottom line is correlation - I have a 2nd cluster where I can just run aai,dmaap and log
LOG Meeting Minutes 2019-01-15
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
- ask Michael O'Brien about performance problems - if they persist and what the problem exactly is.
- Venkata Harish Kajur will inform Michael about the schema performance fix - he should test with the casablanca maintenance release.
On-hold for 3 weeks (end of January) - if until then no performance issues reported agenda item will be closed
Status | ||||
---|---|---|---|---|
|
Mike Elliott wrote in OOM Meeting Notes - 2018-12-5
f. AAI team wanted to get notified of AAI Cassandra issues automatically
i. Can we setup a Nagios or equivalent to monitor both rancher/k8 and the applications for rancher/k8 issues ?
Keep an eye out for new issues!
Question: Keong Lim should this be part of a larger A&AI monitoring and failure prevention initiative
Status | ||||
---|---|---|---|---|
|
Modelling team having Service Instance thoughts by Chesla Wechsler, which will affect AAI schema.
Also referred from comments on ONAP R4+ Service Modeling Discussion Calls
9)“vhn-portal-url”?“Bandwidth”,"QoS","SLA",etc, attribtutes that not all the services need but still need to be stored in certain service instance: stored as a schemaless field on the service-instance vertex (Chesla will follow up) (my concerns: according to the call, is that ok if we set a "global-type of service" and a "customized-type of service", then mapped it with internal descriptor, and A&AI's model only stores global type in service instance's schema, but stores the customer-faced attributes of service in a schemaless way? Chesla Wechsler Kevin Scaggs Andy Mayer)
See also Modeling 2018-11-13
The service-instance already uses a "metadata" relationship, which can store an arbitrary list of key-value pairs, but perhaps AAI should extend the use of the "properties" element, which is also an arbitrary list of name-value pairs or the "extra-properties" element, which is also an arbitrary list of name-value pairs.
15th Nov: Having seen Chesla's presentation, it should be called "Model-driven schema" rather than "schemaless" behaviour, since the idea is that the changes are controlled by SDC modelling. Seems aligned to the eventual goal in AAI Schema Service Use Case Proposals and AAI Schema Service.
Status | ||
---|---|---|
|
There are 2 types of logging in the services
- one read from EELFManager
- the other Logger log = Logger.getLogger( ...
Is that correct? Shouldn't there be just 1 type?
1st Nov:
After Casablanca release investigate logging guidelines and figure out what library to use in order to unify logging within A&AI
26th Nov: See also ONAP Application Logging Specification - Post Dublin
29th Nov: how does this fit with
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Disable unused web services
(see also Helm chart requested values)
Status | ||||
---|---|---|---|---|
|
Could we disable unused (i.e. not integrated) A&AI web services, so that the deployment is faster and the resource footprint is smaller? e.g. Champ (any other ws?)
Motivation: Decrease the resource footprint for A&AI (ONAP) deployments
Idea: we could support 2 different deployments 1. full (normal) deployment and 2. barebones deployment. The point of the "barebone" deployment would be to deploy only the essential services necessary for proper functioning of A&AI (leaving out services like cacher, sparky, graphadmin, having 1 cassandra node instead of 3 or 5 etc).
In order to reduce hardware/cloud costs (mainly the memory footprint) it could be beneficial to support a minimalistic A&AI deployment.
1st Nov:
Venkata Harish Kajur Former user (Deleted) - investigate how to disable/enable charts in A&AI so we can create a core group of pods which handle the use-cases and than extended group will all the services. Consider a group of unused/unintegrated services (like Champ). Consider other possible groups (like GUI?)
- James Forsythcreates a JIRA ticket to define the list of AAI subprojects and create the categories (essential, full "experience") for the OOM deployment
Jira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key AAI-2025
Status | ||||
---|---|---|---|---|
|
- Who is responsible for the project?
- What is the roadmap for the project?
- Who will do the integration?
Status | ||||
---|---|---|---|---|
|
Technical solution to either decommission the proxy or make design changes to AAF to enable client side certificates.
After VF2F we will know if this is a requirement in Dublin. We discuss after this date.
question raised: MSB - would client authentication be supported?
15th Dec: https://wikilf-onap.onapatlassian.orgnet/wiki/display/DW/Pluggable+Security#PluggableSecurity-7.10Identifiedandsupportedpatternsandfeatures
- James Forsythcreates a task for encryption of communication between A&AI services and Cassandra
- Tian Lee and Steve Blimkie to report on the Amdocs managed A&AI microservice wheter they support criteria from the Dublin S3P requirements
Status | ||||
---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Need to replace custom queries currently in use by these systems (and others?) in Dublin toward the retirement of the named-query API in Dublin
- Coordination with the ROBOT team needed for data population
- Coordinate with each team (POLICY, VID...) to have the specific data for each named-query
Christopher Shang is leading the effort to deprecate named queries in e-Comp.
Next steps: get data from teams in order to prepare for testing the change from named queries to custom queries.
See also AAI Named Queries
- Keong Limwill ask the teams about the data mentioned above.
- James Forsyth will remind Christopher to create a LF account
- James Forsyth will ask for a Fitness test which tests the correctness of conversion from name-query to custom-quesry
- James Forsyth will try to find out if any documentation can be provided from ecomp regarding the named-queries (how they work and how to read the named-query files)