Wed @ 6:30 & 7"15 is the monthly general "Stl Linux Unix Users Group"
mtng.
https://www.meetup.com/saint-louis-unix-users-group/events/311956042/
Learn how to replace deprecated "ifconfig, route, & netstat" commands with
the modern "ip & ss" commands. Clearer output, better troubleshooting, full
IPv6 support, and future-proof networking.
*"ip: Because Real Admins Don't Use ifconfig Anymore"*
*MAIN Topic:* Evolving "ifconfig & route" to "ip route"
Speaker: Lee_Lammert
<https://www.google.com/url?q=https://www.sluug.org/bio/Lee_Lammert&sa=D&source=calendar&usd=2&usg=AOvVaw1zJdyS9F6qtGRX3OETuxUW>
.
The above MAIN is at ~7:15pm. It is immediately PRECEDED at
~7:00pm by Announcements & short Q&A which is immediately preceeded at
6:30pm by
*BASIC Tutorial:* Troubleshooting with a Round Robin DB by Grant Taylo
<https://www.google.com/url?q=https://www.sluug.org/bio/Grant_Taylor&sa=D&source=calendar&usd=2&usg=AOvVaw3U_U07yFRWkEcq5yYvuhbX>
r
- For the last week or so I've been spending a lot of time recording
statistics in Round Robin Databases (RRD files) and generating graphs of
things as I try to troubleshoot some problems with the Nagios server at
work.
-
- This morning I spent some time writing some Perl to parse the
output of `netstat --all --numeric --tcp --udp` to calculate the number
of connections in various states, both inbound and outbound.
- I have something like 27 different data points that I'm graphing.
-
- As has been the case with three other (sets of) graphs that I've
created for this, I've asked questions within a few minutes of looking
at -> consuming -> using the graphs.
-
- I first started collecting, recording, and graphing the number of
Nagios tests that were in the `Unknown` (3) state.
- That turned out to be quite enlightening.
- We were seeing periodic rapid swings from a few hundred to nearly
two thousand tests in the `Unknown` (3) state.
- The visual nature of the graph showed that the number of tests in
the `Unknown` (3) state was ebbing and flowing.
- It also showed some atypical / jittery / not-smooth spikes as
opposed to the ramp-up / ramp-down that one might expect.
- As in nearly triple in a matter of a few minutes and then taper
back down a few minutes later.
-
- It turns out that Gearman / gearmand -- which we're using behind
Nagios for ${QUESTIONABLE_REASONS} -- was running out of file
descriptors.
- Once I raised the max file handles in the systemd unit file, things
were much smoother.
- So I started collecting, recording, and graphing the number of open
file descriptors.
- That too was quite telling.
-
- Gearman / gearmand was averaging about 1200 file descriptors
ranging from a few hundred to upwards of 1800.
- While working on the file descriptor problem I noticed that
connections to Gearman / gearmand would block until there was a free
file descriptor to accept the connection.
- Remember, TCP connections use a file descriptor.
-
- Knowing that slow connections to Gearman / gearmand were a symptom, I
started collecting, recording, and graphing connection latency. That graph
didn't show much of a symptom to chase. But it did provide a LOT of data in
that I now know that the vast majority of the time, connections to Gearman
/ gearmand take < 20 ms.
-
- So I have statistics / data to back monitoring -> alerting thresholds.
- I noticed that the number of open file descriptors drops from
nominally 1200 to below 500 during times that we're having problems.
- So, this morning I wrote about 210 lines of Perl to parse the
output of `netstat --all --numeric --tcp --udp` to collect and record
connection data.
-
- Graphing of said connection data yielded questions in a matter of
minutes. Not the least of which is that the vast majority of the
connections are inbound to Gearman / gearmand, not outbound like I thought
they were.
-
- I determined inbound vs outbound by looking at the listening
sockets in the `netstat` output. I defined listening as the remote IP
being 0.0.0.0 (IPv4) or ::(IPv6) and the ports thereon to be `*`.
-
- Knowing what is listening allowed me to compare the local IP & port
connections use with the list of listeners.
-
- If the local IP & port is a listener, then the connection is an
inbound connection.
- If the local IP & port isn't a listener, then the connection is an
outbound connection.
- So now I have more data and am effectively waiting for a problem
event to analyze the data that I'm collecting during the event for a
problematic time-frame after-the-fact.
The above are both Wed 12 Nov 2025 =≠=≠=≠=≠≠≠====≠EVERY MONTH:[
http://www.sluug.org/](
https://www.google.com/url?q=http://www.sluug.org/&sa=D&source=cale…
)The URL link to Zoom or Jitsi connection instructions for this meetng is
posted earlier on the day of the meetng, at the above home page. It is the
link called "linked here".