LOG Meeting Minutes 2019-02-19

Meeting at 1100 EST Tue - https://zoom.us/j/519971638

https://lists.onap.org/g/onap-discuss/topics

http://onap-integration.eastus.cloudapp.azure.com:3000/group/onap-integration

https://jira.onap.org/secure/RapidBoard.jspa?rapidView=143&view=planning.nodetail&epics=visible

Agenda

  • Michael started parallel full time ONAP related DevOps position - should discuss impact on logging/pomba project
    • Notice to onap: https://lists.onap.org/g/onap-discuss/topic/michael_reduced_availability/29918628?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29918628

      Team,

         I have taken on a full time DevOps kubernetes based role last week directly related to ONAP that may cause less focus on public in the short term and includes 20% travel.

         We can discuss this in the affected meetings.

         I have more than a couple pending mails to answer – sorry for the de-focus the last 2 weeks.

         I am still working out the details of working privately and publicly – as I was previously 100% public – the team I work with on DevOps is very open to the idea of continuing the LOG and CD work – as there is also opportunity for up-sourcing as both sides are ONAP focused and the role is in support of production ONAP deployment.

         So just a heads up

    • Any committer can run for the PTL role - it is a short 2 phase process - 3 days for who runs - on onap-discuss - then 3 days for committers to vote
  • Prep for M2 this thursday - Logging Dublin M2 Deliverables for Functionality Freeze Milestone Checklist
  • TSC-25 ramping down - CD work
  • Continue coding changes for spec - casablanca spec is implemented in Dublin
    • For example any change to the spec now is in el-alto
    • coding: portal/sdk - see ongoing for portal in  PORTAL-348 - Getting issue details... STATUS
    • still need to look at SO and AAI log libraries for comparison
  • Answer pending questions/mails
  • Review opentracing/zipkin  LOG-104 - Getting issue details... STATUS
  • Lorraine A. Welch System.out standardout - review in terms of not using a hardcoded file appender - in terms of syslogs
  • Discussion: DCAE logs
    • log formatting questions - the 4 types of logs per microservice - 
    • Discuss: Acumos single log discussion - MARKERs to replace log file name key - optional for ONAP for now - TODO verify our spec

Attendees 

Prudence Au

Lorraine A. Welch

Luke Parker

Items

  • Plan for 
    • current implementation
    • Future spec for el-alto for VNFs below
  • this week...
    • logging work
      • Dev environment back up - merging existing library - using vid-app-common as a template for usage of org.onap.portal.sdk
        • <epsdk.version>2.4.0</epsdk.version>
      • prepping for splitting repos - 1 per component - will need 8+
      • working with dmaap on charts and filebeat
      • pylog issues for vfc are transient LF issues - posting response (with multicloud)
      • release notes
      • scorecard for S3P
      • dcaegen2 work in  DCAEGEN2-1166 - Getting issue details... STATUS  under https://gerrit.onap.org/r/#/c/77910/
      • questions on logging format onap-discuss including hv-ves I need to address https://lists.onap.org/g/onap-discuss/message/14997?p=,,,20,0,0,0::Created,,log,20,2,20,29162034
        • Discussion on VNF logs (CLAMP) @Sanjay - with Alok Gupta
        • vnf behaviors on top of vnf events
        • dmaap TCA like events - look into capturing these
        • ?add our own log tracing when VNFs react to events - another tracing EPIC we should look at
          • both for VES and non-VES format
          • There is a gap in tracing VNF behaviour - via 5G RAN - 
          • cloud infrastructure logs - vm, k8s and cloud service logs (beyond the vm level) - CNI cloud-native plugins example
          • need to think more about combining the logs - at the same time as we currently just capture them.
          • Provide log requirements to VNF onboarding team.
          • Spondon Dey - feeding in to CL, policy - scaling behaviour - onap to drive more
    • infrastructure work
      • Helm ownership - 
      • CI/CD - going with Orange MQ robot for oom merges
      • Perf and mostly crashloop avoidance 
      • Deploy changes for RHEL7.6
      • Deploy order work
      • ARM A1 testing of new containers from dockerhub on AWS
        • 80g/vm images - reducing footprint, standard alpine java image, ARM/i64 compat
      • Nodeports for dmaap
      • Datalake (now part of DCAE) does not yet affect us but it will - 
    • ONS April conference prep work
    • The rest of our backlog is still in progress - M2 is coming up on the 14th
  • last week....
  • get committed resources for the next 2 months M2 to RC0 - so we can state what is in and out of the Dublin release
    9 weeks to april 4th
    • M2 - functionality freeze - 21 Feb
      M4 - April 4th 
      I have taken the liberty of adding some names - feel free to add your availability or edit this section - we will paraphrase it in the M1 report - Logging Dublin M1 Release Planning
    • Michael O'Brien - 50% direct Logging work - really 40% dev/devops + 10% PTL/TSC/Project - the rest = related ONAP, CD, Doc, OOM, conference/customer,
      • Todo lowered?
    • Prudence Au - doing half of the PTL work, template, meets, reviews - especially POMBA with James MacNider on reviews - representing on most Thu POMBA meets
    • Avdhut Kholkar - thank you for all the commit reviews
    • Luke Parker - co-PTL and reference code
    • Sanjay - TODO: % of work on the project
  • Meeting at 1200 EST today on ARM docker images (affecting LOG images as we need to get the ARM layer into the image - wrap the dockerhub versions)
    LOG-331 - Getting issue details... STATUS
  • Stop using "latest" for any image - lock down the version tag for testing stability - see our use of busybox

    LOG-949 - Getting issue details... STATUS
  • Good news: We passed M1 last Thu
  • Dublin scope finalized for M1
    Release Planning#DublinReleaseCalendar
    Logging Dublin Scope
  • New work for dublin
    • Assist in 5G edge work via OOM/AWS work - meet is at 1100 EST Wed with Ramki Krishnan's team
    • plus metric capture via Prometheus - LOG-911 - Getting issue details... STATUS
      LOG-707 - Getting issue details... STATUS
  • Review/consolidate JIRAs
  • opentrace - will try to get in by april - an LF project 
  • priority list
    • infrastructure - filebeat sidecars (before DaemonSet refactor) - see Log Streaming Compliance and API
    • format - via library - portal/sdk - minimal retrofit for markers/mdcs -  LOG-600 - Getting issue details... STATUS
    • all s3p - security, perf (aai-log-3**) - scaling - run with 1 logstash
    • Logstash used to be a Daemonset  - however the filebeat needs to a daemonset - instead of each sidecar - 1 container per vm - get story
    • Additional tools - get POC for each - determine which goes to production level
      • prometheus - requires coordination with oom and multicloud
      • log checker - pending
      • opentracing - us -  LOG-104 - Getting issue details... STATUS
      • search guard - us


PRITitleResponsible

Status

OPEN

DONE

In DublinLast Worked onStartNotes

Security Vulnerability templateOngoingIN
20190122

M1 template

DONE

IN2019012420190122

ONS NA 2019 April Talk proposal

DONE

IN
20190122

pending 20190208 decision

LOG-947 - Getting issue details... STATUS


Use manifest generation over raw oom values.yaml docker image tag names

DONE

IN2019012420190117

pending documentation in RTD

Team,

    In the TSC it was decided to treat the diff between oom and the manifest by always running the manifest generated yaml in your deployments – you will not need to do this for master work – just for Casablanca and RC0-2 work

 

Working out the details in

https://jira.onap.org/browse/LOG-929

 https://onap.readthedocs.io/en/casablanca/submodules/integration.git/docs/index.html?highlight=manifest

/michael


S3P Logging compliance TSC/PTL

IN PROGRESS

IN2019011520190114

El-Alto 1.4 logging spec change - plan only


todo merge with Dave's below

IN PROGRESS

IN20190122


Dublin Scope Planning


DONE

IN20190124

LOG-707 - Getting issue details... STATUS

Logging Dublin M1 Release Planning


RTD documentation

OPEN

IN20180129
Attending Thu 1130-1230 meets

restart log4j format and files

example

IN PROGRESS

IN20190111520190108

https://gerrit.onap.org/r/#/c/62405/

for LOG-630 - Getting issue details... STATUS

and

LOG-178 - Getting issue details... STATUS

Log Streaming Compliance and API#DeploymentProfiles


Work with portal/sdk libraryMichael O'Brien

IN PROGRESS

IN2019012920190115

Update: 20190129 - Existing eclipse environ for the RI being retrofitted

At the pom stage bringing in the jar via

portal/sdk in use by aai, dmaap, sdk, vid (vid link into so maybe?)

<groupId>org.onap.portal.sdk</groupId>

epic

LOG-600 - Getting issue details... STATUS

Jira

PORTAL-348 - Getting issue details... STATUS

review investigation in

Logging Developer Guide

Log Streaming Compliance and API#ExistingLibraryResearch\

Luke Parker discussion

need to use the portal library in an initiating project for tx processing

working likely with the SO team - via the work we are doing for them in https://gerrit.onap.org/r/#/c/69947/

(check the original spec - ODL specific - check appc/sdnc use of ccsdk)





New Committers

OPEN


20190115

We have room for 2-5 committers and will be reviewing the list

Logging Enhancements Project Proposal#KeyProjectFacts

add your details to

Logging Committer Promotion Requests

20190129 status - waiting on contributor documentation from each contributor


OPNFV/ONAP Paris

DONE


20190108

https://ddfplugfest19.sched.com/ Tue-Thu

Clover Gambia on prior https://zoom.us/j/115579117 - 7 hours ahead

https://ddfplugfest19.sched.com/event/K1Gy/opnfv-clover-utilizing-cloud-native-technologies-for-nfv


Security badging

IN PROGRESS

IN20190129
Need to restart this

Security Vulnerabilities

IN PROGRESS

IN20190129
lower - but for M4

s3p Secure https endpoints

LOG + POMBA

for djhunt

OPEN

IN

Discussion on whether we need to lock down the nodeport exposed ports

Can key off POMBA work already done

todo: get s3p page


Format compliance - working with AAI team

+ perf

IN PROGRESS

IN2019011520181101

(plus) 20190115 - casablanca cherry pick in queue logstack 5 to 3 and 1

https://gerrit.onap.org/r/#/c/75702/

(+) 20190109
from aai team
https://lf-onap.atlassian.net/wiki/display/DW/2019-01-17+AAI+Developers+Meeting+Open+Agenda
"hector has discovered that the stress test jar (liveness probe?) in aai-cassandra is hammering the cpu/ram/hd on the vm that aai is on - this breaks the etcd cluster (not the latency/network issues we suspected that may cause pod rescheduling) "


#6 on 2018-12-20 AAI Developers Meeting around LOG-376 - Getting issue details... STATUS

Discussion with @Sanjay Agraharam and [~pau2882] on checking how cassandra is running on the vm and if debug levels are on should be verified

use labelling to split aai-cs and ls - no DaemonSet

Michael O'Brien to reduce core count for ls to 1 from 3

LOG-915 - Getting issue details... STATUS

edited 2019-01-10 AAI Developers Meeting

for the 10th


AAI team - 2 types of logging AOP/non-AOP


OPEN

IN
20181101#22 on 2018-12-20 AAI Developers Meeting

Logging requests from Vendors

OPEN




LOG-877 - Getting issue details... STATUS

LOG-876 - Getting issue details... STATUS

#15,19 and 37 on SP priorities for Dublin


LOG Streaming compliance

IN PROGRESS

IN

Log Streaming Compliance and API

LOG-487 - Getting issue details... STATUS

LOG-487 - Getting issue details... STATUS

LOG-852 - Getting issue details... STATUS

and

PTL 2019-01-14


opentracing via

https://opentracing.io/



IN (planning/POC for sure)20190123

@Sanjay

discuss integration - out of band processing - LOG-104 - Getting issue details... STATUS

see zipkin arch https://zipkin.io/pages/architecture.html

possibly tie both as a client of es ?

Tie in to ONS NA 2019 April demo booth for LF

https://lists.onap.org/g/onap-discuss/message/15066?p=,,,20,0,0,0::Created,,opentracing,20,2,0,22460823


discussion - remove


20190108

discuss tick/tock logging spec behaviour - cassablanca implemented in dublin, dublin implemented in elalto


Log Checker
OUT20190109
MIke to review with Horace

Search Guard

OPEN

Maybe20180109

LOG-494 - Getting issue details... STATUS


spec changes for Dublin
IN (planning)2019010920190109

Dublin spec changes for Elalto

environment name

release name

check mail for reply Michael O'Brien

Prudence Au proposal of renaming the log file name itself for the release ie: 3.0.0-ONAP - will discuss later for next week



Cluster logging behaviour

S3P


IN

server name in clustered environments - I will add the details and the Jira right after this meet



LOG ELK stack indexing/dashboards

with Prometheus below


OPEN

IN20190123


Casablanca 3.0.1 work until 10th Jan

Including POMBA

DONE
2019012220190113

LOG-913 - Getting issue details... STATUS

revert Jira for data-router off TSC-92 -

pending merge of https://gerrit.onap.org/r/#/c/75999/


LOG openlab tenant devops

cluster creation/testing

Done pending vFWIN

We have 2 clusters a 1+4 and 1+13 used for testing deployments and running the vFW

Logging DevOps Infrastructure


Wiki edits, RTD review

OPEN

IN

Requiring Updates, Merges or Marked Deprecated

Metric Streaming and Prometheus

IN PROGRESS

IN20181207

LOG-911 - Getting issue details... STATUS - experimental chart on http://secure.solar:30000/graph

LOG-773 - Getting issue details... STATUS

LOG-861 - Getting issue details... STATUS

work with Vaibhav Chopra

OOM-1504 - Getting issue details... STATUS

@Sanjay - note the prom chart assumes a k8s environment - what about bare metal


Finish SO filebeat additions

IN PROGRESS

IN20181207

SO-1110 - Getting issue details... STATUS

https://gerrit.onap.org/r/#/c/69947/


Finish LOG common charts

OPEN

OUT to El-alto2019012320181207

James MacNider - bring in Prianka's common eLK charts and use them in Clamp, LOG, SDC, POMBA

https://gerrit.onap.org/r/#/c/64767/

OOM-1276 - Getting issue details... STATUS

rever to El-alto under LOG-936 - Getting issue details... STATUS



Team Members Thank you and review

OPEN

IN

del

Review last 4 weeks since 

LOG Meeting Minutes 2018-12-05


IN PROGRESS





TSC/PTL meet actions

IN PROGRESS





OOM transfer chart ownership to teams
LOG is part of poc

IN PROGRESS

IN20190107

Starting - will have a training session - will send out any meetings to onap-discuss

We may have the same symlink repo folder like we do for doc

Last discussed TSC 20180109


OOM Deployment priority
base platform includes LOG

IN PROGRESS

OUT

todo review with Mike Elliott

2019012320181207

Q2) priority of system level containers like the ELK stack - OOM has a common services JIRA - DMaaP, AAF - TODO get JIRA - make sure log is in this!

There is a cd.sh retrofit that sequences the pods in order for deployment stability - this will be phased out when tiered deployment comes in

LOG-898 - Getting issue details... STATUS

LOG-326 - Getting issue details... STATUS

https://gerrit.onap.org/r/#/c/75422/ via ONAP Development#WorkingwithJSONPath

DCAEGEN2-1067 - Getting issue details... STATUS



k8s manifest or oom values.yaml
for docker tags - truth

IN PROGRESS


20190123

TSC-86 - Getting issue details... STATUS

TODO: paste TSC review - manifest is truth

RTD doc link to run the script to get the yaml override


Nexus3 routing slowdown

DONE


20181222

TSC-79 - Getting issue details... STATUS

20181217-22

LOG-898 - Getting issue details... STATUS


LOG compliance diagram/exercise

Michael O'Brien

@Sanjay Agraharam

IN PROGRESS


20181205

Log Streaming Compliance and API

part of prometheus work now

Sanjay - diagram FB must be split between using AOP and AOP+spec compliant - only this one should be green










Ongoing






CI/CD pipeline TSC poc

IN PROGRESS


20180107


Notes