Roll call: who's there and emergencies

anarcat, gaba, hiro present, weasel and linus couldn't make it, no news from qbi.

What has everyone been up to

anarcat

  • followup with cymru (#29397)
  • OONI.tpo now moved out of TPO infrastructure (hosted at netlify) and closed some related accounts (#31718) - implied documenting how to retire a static component
  • identified that we need to work on onboarding/offboarding procedures (#32519) and especially "what happens to email when people leave" (#32558)
  • new caching service tweaks, now 88% hit ratio, will hopefully go down to 300$/mth costs in november! see the shiny graphs
  • worked more on Nginx status dashboards to ensure we have good response latency and rates in the caching system
  • reconfirmed mailing list problems as related to DMARC, can we fix this now? (#29770)
  • wrote a Postfix mail log parser (in lnav) to diagnose email issues in the mail server
  • helped with the deployment of a ZNC bouncer for IRC users (#32532) along with fixes to the "mosh" configuration
  • getting started on the new email service project, reconfirmed the "Goals" section with vegas
  • lots of work on puppet cleanup and refactoring
  • NMU'd upstream ganeti installer fix, proposed stable update
  • build-arm-* box retirement and ipsec config cleanup
  • fixed prometheus/ipsec reliability issues (#31916, it was ipsec!)

Hiro

  • Some work on donate.tpo with giant rabbit
  • Updates and debug on dip.tp.o
  • Security updates and reboots
  • Work on the websites
  • Git maintenance
  • Decommissioning Getulum
  • Started running the website meeting and coordinating dev portal for december

linus

Some coordination work around Nextcloud.

weasel

Nothing to report.

What we're up to next

anarcat

New:

  • varnish -> nginx conversion? (#32462)
  • review cipher suites? (#32351)
  • release our custom installer for public review? (#31239)
  • publish our puppet source code (#29387)

Continued/stalled:

  • followup on SVN shutdown, only corp missing (#17202)
  • audit of the other installers for ping/ACL issue (#31781)
  • followup with email services improvements (#30608)
  • send root@ emails to RT (#31242)
  • continue prometheus module merges

Hiro

  • Clean up websites bugs
  • needrestart automation (#31957)
  • CRM upgrades coordination for january? (#32198)
  • translation move (#31784)

linus

Will try to followup with Nextcloud again.

weasel

Nothing to report.

Winter holidays

Who's online when in December? Can we look at continuity during that merry time?

hiro will be online during the holidays. anarcat will be moderately online until january, but will take a week offline some time early january. to be clarified.

Need to clarify how much support we provide, see #31243 for the discussion.

prometheus server resize

Can i double the size of the prometheus server to cover for extra disk space? See #31244 for the larger project.

Will rise the cost from 4.90EUR to 8.90EUR. Everyone is go on this, anarcat updated the budget to reflect the new expense.

Other discussions

Blog status? Anarcat got a quote back and will bring it up at the next vegas meeting.

Next meeting

Unclear. jan 6th is a holiday in europe ("the day of the kings"), so we might postpone until january 13th. we are considering having shorter, weekly meetings.

Update: was held on 2020-01-13.

Metrics of the month

  • hosts in Puppet: 76, LDAP: 79, Prometheus exporters: 123
  • number of apache servers monitored: 32, hits per second: 195
  • number of nginx servers: 109, hits per second: 1, hit ratio: 0.88
  • number of self-hosted nameservers: 5, mail servers: 10
  • pending upgrades: 0, reboots: 0
  • average load: 0.62, memory available: 334.59 GiB/957.91 GiB, running processes: 414
  • bytes sent: 176.80 MB/s, received: 118.35 MB/s
  • planned buster upgrades completion date: 2020-05-01

Now also available as the main Grafana dashboard. Head to https://grafana.torproject.org/, change the time period to 30 days, and wait a while for results to render.

The Nginx cache ratio stats are not (yet?) in the main dashboard. Upgrade prediction graph still lives at https://help.torproject.org/tsa/howto/upgrades/ but the prediction script has been rewritten and moved to GitLab.