Roll call: who's there and emergencies

anarcat, gaba, hiro, and linus present.

What has everyone been up to

hiro

  • migrate gitlab-01 to a new VM (gitlab-02) and use the omnibus package instead of ansible (#32949)
  • automate upgrades (#31957 )
  • anti-censorship monitoring (external prometheus setup assistance) (#31159)
  • blog migration planning and setting up expectations

anarcat

https://trac.torproject.org/projects/tor/query?owner=anarcat&status=closed&changetime=Feb+3%2C+2020..Mar+6%2C+2020&col=id&col=summary&col=status&col=type&col=priority&col=milestone&col=component&order=priority

AKA:

Major work:

  • retire textile #31686
  • new gnt-fsn node (fsn-node-04) #33081
  • fsn-node-03 disk problems #33098
  • fix up /etc/aliases with puppet #32283
  • decomission storm / bracteata on February 11, 2020 #32390
  • review the puppet bootstrapping process #32914
  • ferm: convert BASE_SSH_ALLOWED rules into puppet exported rules #33143
  • decomission savii #33441
  • decomission build-x86-07 #33442
  • adopt puppetlabs apt module #33277
  • provision a VM for the new exit scanner #33362
  • started work on unifolium decom #33085
  • improved installer process (reduced the number of steps by half)
  • audited nagios puppet module to work towards puppetization (#32901)

Routine tasks:

  • Add aliases to apache config on check-01 #33536
  • New RT queue and alias iff@tpo #33138
  • migrate sysadmin roadmap in trac wiki #33141
  • Please update karsten's new PGP subkey #33261
  • Please no longer delegate onionperf-dev.torproject.net zone to AWS #33308
  • Please update GPG key for irl #33492
  • peer feedback work
  • taxes form wrangling
  • puppet patch reviews
  • znc irc bouncer debugging #33483
  • CiviCRM mail rate expansion monitoring #33189
  • mail delivery problems #33413
  • meta-policy process adopted
  • package installs (#33295)
  • RT root noises (#33314)
  • debian packaging and bugtracking
  • SVN discussion
  • contacted various teams to followup on buster upgrades (translation #33110 and metrics #33111) - see also progress followup
  • nc.riseup.net retirement coordination #32391

qbi

  • created several new trac components (for new sponsors)
  • disabled components (moved to archive)
  • changed mailing list settings on request of moderators

What we're up to next

I suggest we move this to the systematic roadmap / ticket review instead in the future, but that can be discussed in the roadmap review section below.

For now:

anarcat

  • unifolium retirement (cupani, polyanthum, omeiense still to migrate)
  • chase cymru and replace moly?
  • retire kvm3
  • new ganeti node

hiro

  • retire gitlab-01
  • TPA-RFC-2: define how users get support, what's an emergency and what is supported (#31243)
  • Migrating the blog to a static website with lektor. Make a test with discourse as comment platform.

Roadmap review

We keep on using this system for march:

https://trac.torproject.org/projects/tor/wiki/org/teams/SysadminTeam

Many things have been rescheduled to march and april because we ran out of time to do what we wanted. In particular, the libvirt/kvm migrations are taking more time than expected.

Policies review

TPA-RFC-1: policy; marked as adopted

TPA-RFC-2; support; hiro to write up a draft.

TPA-RFC-3: tools; to be brainstormed here

The goal of the new RFC is to define which tools we use in TPA. This does not concern service admins, at least not in the short term, but only sysadmin stuff. "Tools", in this context, are programs we use to implement a "service". For example, the "mailing list" service is being ran by the "mailman" tool (but could be implemented with another). Similarly, the "web cache proxy" service is implemented by varnish and haproxy, but is being phased out in favor of Varnish.

Another goal is to limit the number of tools team members should know to be functional in the team, and formalize past decisions (like "we use debian").

We particularly discussed the idea of introducing Fabric as an "ad-hoc changes tool" to automate host installation, retirement, and reboots. It's already in use to automate libvirt/ganeti migrations and is serving us well there.

Other discussions

A live demo of the Fabric code was performed some time after the meeting and no one raised objections to the new project.

Next meeting

No discussed, but should be on april 6th 2020.

Metrics of the month

  • hosts in Puppet: 77, LDAP: 81, Prometheus exporters: 124
  • number of apache servers monitored: 31, hits per second: 148
  • number of nginx servers: 2, hits per second: 2, hit ratio: 0.89
  • number of self-hosted nameservers: 6, mail servers: 10
  • pending upgrades: 174, reboots: 0
  • average load: 0.63, memory available: 308.91 GiB/1017.79 GiB, running processes: 411
  • bytes sent: 169.04 MB/s, received: 101.53 MB/s
  • planned buster upgrades completion date: 2020-06-24