It’s been a long time since my last update, and since that time I was able to finally test memory in the VM host machine. I was mainly waiting for time to elapse so that I can determine stability after the results of my investigation. Per the image above, I was able to locate a faulty DIMM which ultimately was the root cause for my stability issues — not Proxmox itself as originally assumed.
It’s really nice having both a server and a desktop sharing the same type of DDR3 memory as I was able to swap ram between the two machines so that I could keep my VM’s going as I was testing the RAM on the desktop machine. While doing so I came across the faulty stick when doing groups of 4 DIMMs at a time, and quickly isolated it to a single DIMM. After re-installing 7 of the 8 DIMMs into the server, the Proxmox box was able to be brought up again on 28GB of RAM with plans to eventually replace the 8th stick at some point.
The memory usage at this time generally runs in the 8-12GB range on boot up with peaks around 16-20GB after some period of sustained usage. This usage is for 8 KVM based VMs, 3 OpenVZ containers, and a few extra services running for internal usage.
As my luck usually is, twice now while trying to confirm stability I’ve run into issues. After 19 days during the first boot up after re-installing all the confirmed good DIMMs, my rack lost power requiring a breaker reset on the breaker box to fix. Three days ago, at 21 days of uptime a second power loss was experienced. Wouldn’t you know it… it was the exact same thing. However, at least this time I was able to confirm exactly what was causing it — a space heater in one of the bedrooms which happens to be on the same 15A circuit was being turned on, and thus overloading it.
Immediately I offloaded web services back to a VM and powered down one of my 1850’s to relieve some stress on that 15A circuit, and I will most likely do the same for the 1850 powering the databases. While fun to run and manage, the PE1850’s draw some crazy power, make lots of noise, and put out lots of heat. However, it’s obvious that after all the talk of saying I need some UPS’s… perhaps it’s finally time to buy some and stop being so lazy all the time with my equipment!