What do you use to monitor your NON-SWARM Docker machine?

jherazob@beehaw.org · 1 year ago

What do you use to monitor your NON-SWARM Docker machine?

wjs018@lemmy.ml · 1 year ago

I will be keeping an eye on this thread to see what other people do, but what I have done in the past is to have a couple different health checking strategies.

For web-accessible services I am running, I usually run something like Uptime Kuma or Gatus on a different box checking to make sure those web endpoints are available and performant. I lately have been really digging how Gatus can check more than just the response header, but also latency and certificate validity.
For the host machine, you can set up custom alerts within netdata for stuff like cpu utilization and memory with custom thresholds. The only other solution I have used for this in the past is setting up alerts through my VPS provider (if it is a VPS that is).
- On really low-spec machines I have had trouble with netdata though, so I don’t have a good solution in those cases. Interested to see if there are less demanding options. Instead, I have resorted to just using dashdot as a PWA so that I can check it easily on my phone if I am on the go.
For some custom services in the past that run on set schedules, I have used healthchecks.io (which you can selfhost) to send alerts in the case that they don’t run for some reason.
As for the containers being restarted, I actually don’t have experience with that, so I am interested to see what others have done.

Lupec@lemm.ee · 1 year ago

Gatus sounds pretty cool, I’ll definitely give it a closer look later. Maybe it’s the push I needed to go ahead and look into proper observability as a whole, log ingestion and whatnot. My homelab setup is sorely lacking on that department if I’m being honest lol

Toribor@corndog.social · edit-2 1 year ago

Uptime Kuma for web monitoring.

I’m experimenting with both Zabbix and Netdata to see which one I want to keep for monitoring resources on my hosts.

I use healthchecks.io to monitor backup scripts and cronjobs.

I’m using Autoheal to restart containers that are in an unhealthy state. For some containers this means I need to write my own health check. I mostly did this to resolve a rare issue where Plex would lock up but it’s helped in other scenarios too.

fubarx@lemmy.ml · 1 year ago

Have started experimenting with OpenTelemetry (https://opentelemetry.io/docs/what-is-opentelemetry/) to add observability to different parts of the stack running inside a Docker container.

Not gotten far enough to recommend anything specific, but there is big ecosystem of open source collectors and analytics tools out there.