Dell 2970 – Crashes / Reboots

Error on screen:

HyperTransport error caused a system reset Embedded I/O Bridge Device 2

SEL will show:

Chipset Err: Critical Event Sensor PCI Err (BUS 0 Device 7 Function 0) was asserted

There is a discussion about it here (with no real solution):
http://en.community.dell.com/support-forums/servers/f/946/t/19281276.aspx

I replaced both power supplied with known good units from a Dell 2950. No effect.

My SEL was filled with these errors and, obviously, my server was rebooting frequently.

I noticed NO similarities or triggers. OS did not matter, CPU load did not matter, disk activity did not matter, etc etc.

 

THE SOLUTION THAT FIXED THIS ISSUE WAS A REPLACEMENT MOTHERBOARD FROM DELL.   THE ORIGINAL DESIGN IS CURSED.

VMware Virtual Machine Hosting