[scale-infra] web to Centos8
Phil Dibowitz
phil at ipom.com
Wed Jan 31 06:08:49 UTC 2024
Oh and I missed the obvious one, the huge spike in user CPU at the same
time.
Anyway, this dashboard is locked at a timeframe that shows the spike,
the outage, and the recovery for anyone wanting to see:
https://app.datadoghq.com/dash/host/14232198117?refresh_mode=paused&view=spans&from_ts=1706629500000&to_ts=1706636700000&live=false
On 1/30/24 21:58, Phil Dibowitz wrote:
> Er, you're off by two hours 8:10am- 9:39am PT, best I can tell.
>
> I just got home from work, but I did some digging around.
>
> I don't think it's apache using too much memory. I don't think it's
> memory related at all.
>
> Here's useful things I've found:
>
> * Load average SPIKED a little while before the outage at around 7:52.
> It came back down before 8 though
>
> * We had a spike in network traffic at exactly the same time. Only
> 7MiB/s, but way bigger than our usual traffic.
>
> * There was a spike in memory usage at the same time, but the system was
> no wear near out of memory.
>
> * The last datapoint DD got from Apache was at 7:54
>
> * Odd thing that I don't think is related but is funny. The amount of
> memory journald is using drops significantly when apache gets killed
> from 215MB to 160MB. At the beginning the outage it ~188. I just
> restarted it and it's at 22MB. It's the top users of RSS memory on the
> box, or was until I restarted it.
>
> Based on the fact that every OTHER process on the system was reporting
> into DD, but apache was not, it seems pretty likely that every thread in
> apache is tied up on _something_ - either talking to a DB, a bad client
> (seems unlikely since we have a cloudflare in front of us), or something
> else.
>
> That's all I got for now.
>
>
> On 1/30/24 12:04, Ilan Rabinovitch wrote:
>> Looks like we had another outage for about an hour today. 10am -
>> 11:36am PT.
>>
>> On Mon, Jan 8, 2024 at 9:53 AM Phil Dibowitz <phil at ipom.com
>> <mailto:phil at ipom.com>> wrote:
>>
>> No, I woke up, saw the alert, bounced httpd, and had to run out the
>> door.
>>
>> It's a reasonable guess that it's the same memory issue as before.
>> We're
>> currently bouncing apache in cron every-other-hour, which seems to
>> mostly keep us up, but occasionally not, so my guess is that's
>> RIGHT at
>> the threshold.
>>
>> I don't have the bandwidth the go spelunking - the httpd.conf
>> adjustments I made in... November or whatever that was seemed to
>> help,
>> but obviously they didn't solve everything. I still suspect that
>> Drupal
>> is causing connections to be held open somewhere, but that's pure
>> speculation.
>>
>> We have the data, between the apache_status.log and DD to figure it
>> out,
>> I'd just need time to do it, and I don't have the cycles right now.
>>
>> On 1/7/24 17:11, Ilan Rabinovitch wrote:
>> > Looks like we had another outage today. Any idea whats up?
>> >
>> > On Mon, Dec 25, 2023 at 1:17 AM Ilan Rabinovitch
>> <ilan at linuxfests.org <mailto:ilan at linuxfests.org>
>> > <mailto:ilan at linuxfests.org <mailto:ilan at linuxfests.org>>> wrote:
>> >
>> > Looks like another outage about 2 hours ago.
>> >
>> > On Sun, Dec 24, 2023 at 8:45 PM Ilan Rabinovitch
>> > <ilan at linuxfests.org <mailto:ilan at linuxfests.org>
>> <mailto:ilan at linuxfests.org <mailto:ilan at linuxfests.org>>> wrote:
>> >
>> >
>> >
>> > website was down again.. looks like for about an hour.
>> ive just
>> > restarted httpd.
>> >
>> > On Sat, Oct 21, 2023 at 5:00 PM Phil Dibowitz
>> <phil at ipom.com <mailto:phil at ipom.com>
>> > <mailto:phil at ipom.com <mailto:phil at ipom.com>>> wrote:
>> >
>> > Logs look good.
>> >
>> > https://github.com/socallinuxexpo/scale-chef/pull/302
>> <https://github.com/socallinuxexpo/scale-chef/pull/302>
>> > <https://github.com/socallinuxexpo/scale-chef/pull/302
>> <https://github.com/socallinuxexpo/scale-chef/pull/302>>
>> > drops restarts to
>> > every 2 hours. Merged.
>> >
>> > On 10/21/23 13:47, Phil Dibowitz wrote:
>> > > I dropped it to less often a few days ago, I
>> haven't yet
>> > looked at the
>> > > logs to see what the status of httpd is at those
>> times.
>> > My plan is to do
>> > > that, then drop it to less often, and rinse and
>> repeat.
>> > >
>> > > Sorry new job is keeping me super busy.
>> > >
>> > > On 10/21/23 08:25, Ilan Rabinovitch wrote:
>> > >> Are we able to remove the cron job that's
>> restarting httpd?
>> > >>
>> > >> On Sun, Oct 8, 2023 at 12:48 AM Phil Dibowitz
>> > <phil at ipom.com <mailto:phil at ipom.com>
>> <mailto:phil at ipom.com <mailto:phil at ipom.com>>> wrote:
>> > >>>
>> > >>> On 10/7/23 14:30, Phillip Smith wrote:
>> > >>>> Yes, let's schedule something. I'm available
>> tomorrow
>> > morning and
>> > >>>> Tuesday evening.
>> > >>>
>> > >>> What can I provide you in the mean time? I
>> thought I
>> > saw an email from
>> > >>> you saying you wanted Datadog access, but I
>> can't seem
>> > to find it.
>> > >>>
>> > >>> Davide may have time Tuesday, I will have time
>> Wed
>> > evening. I'm out of
>> > >>> town until Monday and have plans Mon/Tues
>> evening.
>> > >>>
>> > >>> - Phil
>> > >>>
>> > >>>
>> > >>> _______________________________________________
>> > >>> scale-infra mailing list
>> > >>> scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>
>> > <mailto:scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>>
>> > >>>
>> > https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>>
>> > >> _______________________________________________
>> > >> scale-infra mailing list
>> > >> scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>
>> > <mailto:scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>>
>> > >>
>> > https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>>
>> > >
>> >
>> > --
>> > Phil Dibowitz phil at ipom.com <mailto:phil at ipom.com>
>> <mailto:phil at ipom.com <mailto:phil at ipom.com>>
>> > Open Source software and tech docs Insanity
>> Palace of
>> > Metallica
>> > http://www.phildev.net/ <http://www.phildev.net/>
>> <http://www.phildev.net/ <http://www.phildev.net/>>
>> > http://www.ipom.com/ <http://www.ipom.com/> <http://www.ipom.com/
>> <http://www.ipom.com/>>
>> >
>> > "Be who you are and say what you feel, because
>> those who
>> > mind don't
>> > matter and those who matter don't mind."
>> > - Dr. Seuss
>> >
>> >
>> > _______________________________________________
>> > scale-infra mailing list
>> > scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>
>> > <mailto:scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>>
>> > https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>>
>> >
>> >
>> > _______________________________________________
>> > scale-infra mailing list
>> > scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>
>> > https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>
>>
>> -- Phil Dibowitz phil at ipom.com <mailto:phil at ipom.com>
>> Open Source software and tech docs Insanity Palace of
>> Metallica
>> http://www.phildev.net/ <http://www.phildev.net/>
>> http://www.ipom.com/ <http://www.ipom.com/>
>>
>> "Be who you are and say what you feel, because those who mind don't
>> matter and those who matter don't mind."
>> - Dr. Seuss
>>
>>
>> _______________________________________________
>> scale-infra mailing list
>> scale-infra at lists.linuxfests.org
>> <mailto:scale-infra at lists.linuxfests.org>
>> https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>> <https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra>
>>
>>
>> _______________________________________________
>> scale-infra mailing list
>> scale-infra at lists.linuxfests.org
>> https://lists.linuxfests.org/cgi-bin/mailman/listinfo/scale-infra
>
--
Phil Dibowitz phil at ipom.com
Open Source software and tech docs Insanity Palace of Metallica
http://www.phildev.net/ http://www.ipom.com/
"Be who you are and say what you feel, because those who mind don't
matter and those who matter don't mind."
- Dr. Seuss
More information about the scale-infra
mailing list