Quantcast
Channel: openSUSE Forums
Viewing all articles
Browse latest Browse all 40713

ECC memory error question

$
0
0
Hi All: I have a big Opteron server with 256GB Kingston ECC DDR3 DRAM - I was wondering if any hardware Wizard out there can decode this for me?

Code:

Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572779] [Hardware Error]: CPU:36      MC4_STATUS[-|CE|MiscV|-|AddrV|CECC]: 0x9c08400008080a13

Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572791] [Hardware Error]:      MC4_ADDR: 0x0000003186466d90

Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572795] [Hardware Error]: Northbridge Error (node 6): DRAM ECC error detected on the NB.
                                                                                                                           
Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572815] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

I *think* this means it caught and fixed an error, but I'm not sure. (The only reason I think that is that it is supposedly "error-correcting" memory...) :| It throws errors like this about once every few weeks or so when it is running near 100% capacity (no swapping, ~90% of available CPUs with 100% on each CPU used, 40% memory utilization).

THANK YOU!!:)

EDIT: Trusty internet... http://superuser.com/questions/50226...s-from-syslogd - I guess my main question is, shouldn't it actually *say* that the error was corrected?

Viewing all articles
Browse latest Browse all 40713

Trending Articles