On Mon, Jun 4, 2018 at 5:55 PM, Iznogoud <iznogoud at nobelware.com> wrote:
>> If it's a power supply failing or some component overheating, you are
>> unlikely to be able to find that in any standard logfile.
>> Good luck with your forensics.
>
> Randy nailed it. This is most likely what is going on, especially if it just
> started, and after no significant configuration change or software upgrade.
>
> We have servers that drop out of the compute cluster and come back up. We do
> not know why...
>
> I will take this opportunity to say that if you are running a server, you want:
>
> 1. a UPS that is monitored by the system itself
> 2. a solid and robust backup scheme
> 3. some level of driver RAID
> 4. some external notification that the server is having issues
> 5. (I recommend) no automatic reboot on failure
>

Greetings

The issue is, imo anyway, misbehaving software.
Have a UPS and working on connecting the much larger one that I also have.
What are you recommending for backup? I have an early model blu-ray writer
and even 25GB of storage doesn't go very far!
Drives are on Raid-10.
External notification would be nice but I don't want even more 'noises
in the night'
so that won't likely be happening.
May just reset this some time but I don't reboot the server often so it may have
to wait for a bit.

Thanks for the ideas all.

Dee