Signal 11 while compiling the kernel
This FAQ describes what the possible causes are for an effect that
bothers lots of people lately. Namely that a linux(*)-kernel (or any
other large package for that matter) compile crashes with a "signal
11". The cause can be software or (most likely) hardware. Read on to
find out more.
(*) Of course nothing is Linux specific. If your hardware is flaky,
Linux, Windows 3.1, FreeBSD, Windows NT and NextStep will all crash.
If you are not reading this at
http://www.BitWizard.nl/sig11/, that's where
you can find the most recent version.
For those of you who prefer reading this in French, the French
translation can be found at
http://www.linux-france.com/article/sig11fr/.
Email me at
R.E.Wolff@BitWizard.nl if you find any spelling errors,
worthwhile additions or with an "it also happened to me" story. (Note
that I reject some suggested additions on my belief that it is
technical nonsense). I would appreciate it if you put "sig11" or
something like that in the subject.
The Sig11 FAQ
QUESTION
My kernel compile crashes with
gcc: Internal compiler error: program cc1 got fatal signal 11
What is wrong with the compiler? Which version of the compiler do I
need? Is there something wrong with the kernel?
ANSWER
Most likely there is nothing wrong with your installation, your
compiler or kernel. It very likely has something to do with your
hardware. There are a variety of subsystems that can be wrong, and
there is a variety of ways to fix it. Read on, and you'll find out
more. There are two exceptions to this "rule". You could be running
low on virtual memory, or you could be installing Red Hat 5. There is
more about this near the end.
QUESTION
Ok it may not be the software, How do I know for sure?
ANSWER
First lets make sure it is the hardware that is causing your
trouble. When the "make" stops, simply type "make" again. If it
compiles a few more files before stopping, it must be hardware that is
causing you troubles. If it immediately stops again (i.e. scans a few
directories with "nothing to be done for xxxx" before bombing at exactly
the same place), try
dd if=/dev/HARD_DISK of=/dev/null bs=1024k count=MEGS
Change HARD_DISK to "hda" to the name of your harddisk (e.g. hda or
sda. Or use "df ."). Change the MEGS to the number of megabytes of
main memory that you have. This will cause the first several
megabytes of your harddisk to be read from disk, forcing the C source
files and the gcc binary to be reread from disk the next time you run
it. Now type make again. If it still stops in the same place I'm
starting to wonder if you're reading the right FAQ, as it is starting
to look like a software problem after all.... Take a peek at the "what
are the other possibilities" question..... If without this "dd"
command the compiler keeps on stopping at the same place, but moves to
another place after you use the "dd" you definitely have a disk->ram
transfer problem.
QUESTION
What does it really mean?
ANSWER
Well, the compiler accessed memory outside its memory range. If this
happens on working hardware it's a programming error inside the
compiler. That's why it says "internal compiler error". However when
the hardware occasionally flips a bit, gcc uses so many pointers,
that it is likely to end up accessing something outside of its addressing
range. (random addresses are mostly outside your addressing range, as
not very many people have a significant part of 4G as main memory... :-)
It seems that nowadays, everybody with "signal 11" problems gets
directed to this page. If you're developping your own software or have
software that hasn't been debugged quite enough, "signal 11" (or
segmentation fault) is still a strong hint that there is something
wrong with the program. Only when you can cause a "known working"
program like "gcc" to crash on a dataset (e.g. the Linux-kernel) that
has also been well-tested, then it becomes a hint that there is
something wrong with your hardware.
QUESTION
Ok. I may have a hardware problem what is it?
ANSWER
If it happens to be the hardware it can be:
- Main memory. Your main memory might be getting an occasional bit wrong.
If this happens on the "writes", you won't see any parity errors. There are
several ways to fix it:
- The memory speed might be too slow. Increase the number of
wait states in the BIOS.
This could be caused by the
AMIBIOSs autoconfig option: it may only know about 486s running
upto 80 MHz, whereas you currently buy 100 MHz versions. -- Pat
V.
- The memory speed might be too slow. Get faster DRAM SIMMs. For
example current ASUS motherboards require 60 ns DRAM if you have
a 100, or 133 MHz processor (Take a look in your motherboard's
manual). I've heard reports that 70 ns also works, reliability
problems like random sig11's belong to the possibilities.... (I
wouldn't take the risk) -- Andrew Eskilsson
(mpt95aes@pt.hk-r.se)
- There is a bad chip on one of the SIMMs. If you own more than 1
bank of memory you might be able to pull SIMMs and see if the
problem goes away. Be careful for STATIC!!!
- We handled a hard one here the last week. It turned out that ALL
4 16Mb SIMMs were broken in that they dropped a bit around once
per hour. This was sufficient to crash the machine in about a
day, or crash a kernel compile in about an hour. A new set of
SIMMs works perfectly. It took a long while to diagnose this
one, because all 4 of the SIMMs were affected equally, so
leaving half of the memory out didn't change things.
Mark
Kettner (kettner@cat.et.tudelft.nl) reports that his system was
capable of running my memory test for 2300 times faultlessly, but
then detected around 10 errors. It then continued detecting no
faults for a few hundred runs again..... In his case running
kernel compiles was a much more efficient way of detecting the
health of the system (in the most stable configuration the
system could compile around 14 kernels before going bzurk). His
solution was to "trade in" the old memory for a so called
"memory upgrade". The shopkeeper then "tests" in their memory
tester, which OKs the memory. He then got a good discount on the
new memory :-).
- It seems that some 30-72 pin converters can cause memory errors.
(It hasn't been proven whether the 4 SIMMS in the converter had
gone bad, or if the SIMM converter was at fault. The SIMMS had
been functioning perfectly for years before they were moved into
the converter....) -- Naresh Sharma
(n.sharma@is.twi.tudelft.nl). Paul Gortmaker
(paul.gortmaker@anu.edu.au) adds that the SIMM converters should
have at least 4 bypass capacitors to keep the power supply of the
SIMMs clean.
- If the refresh of the DRAM isn't functioning properly, the DRAMs
will slowly lose their information. Some (486) motherboards stop
refreshing correctly when you turn on "hidden refresh". There
seems to be a program called "dram" around that can also mess up
your refresh to cause sig11 problems. -- Hank Barta
(hank@pswin.chi.il.us), Ron Tapia (tapia@nmia.com)
- The number of waitstates could be too low. Increase the number of
waiststates in the BIOS for a fix. The Intel Endeavour board
doesn't allow you to increase the memory waitstates. This can
supposedly be fixed by flashing a MR BIOS into the motherboard.
-- David Halls (david.halls@cl.cam.ac.uk)
- Cache memory. Your cache memory might be getting an occasional
bit wrong. Caches are usually not equipped with parity. You can
diagnose that this is the case by turning off the cache in the
BIOS. If the problem goes away it is probably the cache. There
are several ways to fix it:
- The cache memory speed might be too slow. Increase the number
of wait states in the BIOS.
- The cache memory speed might be too slow. Get faster SRAM
chips.
- There is a bad chip in your cache. It is unlikely that you can
swap chips as easily as with SIMMs. Be careful for STATIC!!!
-- Joseph Barone (barone@mntr02.psf.ge.com)
- The cache might be set to "write back" while there is a bug in the
write back implementation of your chipset. The motherboard where
this happened was a "MV020 486VL3H" (with 20M RAM)
-- Scott Brumbaugh (scottb@borris.beachnet.com) (Mail address
doesn't work. Scott: Get back at me with a valid return address)
- The motherboard may require a jumper to switch between Cache On A
Stick and the old-fashioned dip chip cache. (JP16 on Rev 2.4 ASUS
P/I-P55TP4XE motherboards)
- Disk transfers. A block coming from disk might incur an occasional
bit error.
- If you have this problem, you are most likely to have to do the
"dd" command to "move" the problem from one place to the
next....
- Some IDE harddisks cannot handle the "irq_unmasking" option.
This may only show under load. And it could show as a sig11.
- Do you have a kalok 31xx? Throw it in the garbage. (or sell it
to a DOS user)
- SCSI? Termination? A short bus might still work (unreliably
that is) with bad termination. A long bus might get errors
anyway. Can you turn on parity on the host and the DISK?
- The CPU itself. Some batches of processors have a much higher
percentage of them that happen to be "bad". A few years ago:
original Intel-Pentium-120's. Today AMD K6/2-300's (1998, produced
in weeks 34 through 39!). If this turns out to be the problem,
you're entitled
to a new processor. Go and exchange it where you bought it.
(Forget about those P120's, it's not worth the trouble... ;-)
-- Guillaume Cottenceau (gcottenc@ens.insa-rennes.fr).
- The CPU itself. Some batches of K6 processors simply have a
design bug. Read
http://www.multimania.com/poulot/k6bug.html and then make sure
you get your K6 exchanged. -- Rongen (rongen@istar.ca).
- Overclocking. Some vendors (or private people) think it is
possible to overclock some CPUs. Some of them may work others
don't. You might want to try turning off turbo (note that most
pentium motherboards no longer support a non-turbo mode) and see
if the problem goes away. Check the speed of your CPU compared
(printed on it, carefully remove the fan if necessary) with what
the motherboard jumpers or BIOS settings say.... It seems that
even Intel may make mistakes in this area. I now have several
reliable reports that official pentium would sig11 at their rated
speed, but not at a lower speed. As for some speeds the
motherboard is only stressed HARDER for a slower processor speed,
(120 MHz-> motherboard runs at 60MHz, 100MHz-> motherboard runs at
66MHz), I think it is unlikely that this has anything to do with
the motherboard. Moreover a new 120MHz processor is now
functioning correctly. -- Samuel Ramac (sramac@vnet.ibm.com).
This is not unique to Intel or any of its competitors.
- CPU temperature. A high speed processor might overheat without the
correct heat sink. This can also be caused by a failing fan. (My
personal '486 has a fan that takes a few minutes to get up to
speed. It probably will never really FAIL because it's now
decommisioned :-). The CPU can become erratic if "pushed" by
compiling a kernel. This problem becomes worse if you disable
"HALT" on the LILO command line. Linux tries to power-down the CPU
by executing the "halt" instruction when the system is idle. This
preserves power, and therefore the CPU temperature drops when the
system is idle. You therefore might not notice this problem when
simply editing, and it might only surface after hours of CPU
intensive jobs when the ambient temp is high. If you have a
Pentium with Fdiv bug, it is advisable to trade it in at Intel.
They will send you a new one that preconfigured with an official
Intel-approved FAN. Also note that most normal glues are very bad
thermal conductors. There is special thermal glue available that
should be used when a fan needs to be glued to a CPU. -- Arno
Griffioen (arno@ixe.net), -- W. Paul Mills (wpmills@midusa.net) --
Alan Wind (wind@imada.ou.dk)
Intel says that the allowable temperature ranges for the
outside of your CPU is:
0 to +85 C: Intel486 SX, Intel486 DX, IntelDX2, IntelDX4 processor
0 to +95 C: IntelDX2, IntelDX4 OverDrive® processors
0 to +80 C: 60 MHz Pentium® processor
0 to +70 C: 66 to 166 MHz Pentium processor
For information on how to measure this and some confirmation of what
I say here, see:
http://pentium.intel.com/procs/support/faqs/iarcfaq.htm
(Especially questions Q5, Q6 and Q12. The document is getting
slightly outdated, but it is still very accurate. It seems the
questions move around a bit every now and then as well.)
- CPU voltage. Some motherboards allow you to select the CPU
voltage. Some motherboards badly document the jumper settings that
manage this. It seems that a 5V processor might still work most of
the time at 3.3 volts..... -- Karl Heyes
(krheyes@comp.brad.ac.uk)
- RAM voltage. It seems that vendors are preparing for 3.3V RAM
now. Most memory is now 3.3V. (but be careful if you have a board
capable of setting the RAM voltage: 3.3v RAM will break at 5V.....)
(Having heard little about this, I think the switch must be automatic.)
- Local bus overloading. At 25 MHz you're allowed to have 3
VesaLocalBus (VLB) cards, At 33MHz only two, at 40MHz only one and
guess what at 50MHz NONE! (i.e. you are allowed to run your system
with a 50MHz local bus, but then you're not allowed to use any VLB
cards). Some systems start acting flaky when you overload the
VLB. Even when your VLB isn't overloaded (over the limits stated
above), the system may lose a few nanoseconds of margin by adding
an extra VLB card, so you might need to add a cache wait state or
something after you've added a new VLB card.... -- Richard
Postgate (postgate@cafe.net)
- Power management. Some laptops (and nowadays also "green" pc's)
have power management features. These might interfere with
Linux. One feature might save a memory image to HD and restore the
RAM when you press a key. This sounds like fun, but Linux device
drivers don't expect that the hardware has been turned off between
two acesses. Some may recover, but others not. Try turning it off,
or enabeling "APM support" in your kernel. -- Elizabeth Ayer
(eca23@cam.ac.uk)
- Dust buildup. Some dust might conduct a bit and create a weak
short. It might increase capacitances somewhere, and degrade timing
characteristics. It might impede thermal flow, and lead to overheating
components. I recommend that every year or so, it is a good idea to
open up your computer, and vacuum the inside. Tip: Those
cotton-on-a-stick thingies help prodding the dust out of inaccessbile
spots... -- Craig Graham (c_graham@hinge.mistral.co.uk)
- The CPU itself. Several people are reporting that they have found
nothing to blame except the CPU. This could also have been an
incompatibility between the CPU and the motherboard. A wave of
reports concerning Intel CPUs has passed (Feb '97). A new wave of
reports is coming in that are blaming Cyrix/IBM 6x86
CPUs. Although it could indeed be the CPU, it could also be that
your motherboard is incompatible with your CPU. At least I've seen
a motherboard manual mention that it isn't compatible with older
6x86's. My own experience is that these devices aren't bad at all,
and on a kernel compile I benchmarked a P166+ to be equivalent
with a P155 (1.3 times faster than a P120).
The Memory hole. Many modern motherboards allow you to use old
ISA video cards with one or two megabytes of linear frame buffer.
To achieve this, they have to map out the memory just below
16Mb. Nobody actually ever used this feature, but if you turn
the memory hole (or LFB support in some BIOSes) on, your
machine will certainly be flaky..... -- Paul Connolly
(pconnolly@macdux.com.au)
QUESTION
RAM timing problems? I fiddled with the bios settings more than a
month ago. I've compiled numerous kernels in the mean time and nothing
went wrong. It can't be the RAM timing. Right?
ANSWER
Wrong. Do you think that the RAM manufacturers have a machine that
makes 60ns RAMs and another one that makes 70ns RAMs? Of course not!
They make a bunch, and then test them. Some meet the specs for 60 ns,
others don't. Those might be 61 ns if the manufacturer would have to
put a number to it. In that case it is quite likely that it works
in your computer when for example the temperature is below 40 degrees
centigrade (chips become slower when the temp rises. That's why some
supercomputers need so much cooling).
However "the coming of summer" or a long compile job may push the
temperature inside your computer over the "limit".
-- Philippe Troin (ptroin@compass-da.com)
QUESTION
I got suckered into not buying ECC memory because it was slightly
cheaper. I feel like a fool. I should have bought the more expensive
ECC memory. Right?
ANSWER
Buying the more expensive ECC memory and motherboards protects you
against a certain type of errors: Those that occur randomly by passing
alpha particles.
Because most people can reproduce "signal 11" problems within half an
hour using "gcc" but cannot reproduce them by memory testing for hours
in a row, that proves to me that it is not simply a random alpha
particle flipping a bit. That would get noticed by the memory test
too. This means that something else is going on.
I have the impression that most sig11 problems are caused by timing
errors on the CPU <-> cache <-> memory path. ECC on your main memory
doesn't help you in that case.
When should you buy ECC? a) When you feel you need it. b) When you
have LOTS of RAM. (Why not a cut-off number? Because the cut-off
changes with time, just like "LOTS".) Some people feel very strong
about everybody using ECC memory. I refer them to reason "a)".
QUESTION
Memory problems? My BIOS tests my memory and tells me its ok. I have
this fancy DOS program that tells me my memory is OK. Can't be memory
right?
ANSWER
Wrong. The memory test in the BIOS is utterly useless. It may even
occasionally OK more memory than really is available, let alone test
whether it is good or not.
A friend of mine used to have a 640k PC (yeah, this was a long time
ago) which had a single 64kbit chip instead of a 256kbit chip in the
second 256k bank. This means that he effectively had 320k working
memory. Sometimes the BIOS would test 384k as "OK". Anyway, only
certain applications would fail. It was very hard to diagnose the
actual problem....
Most memory problems only occur under special circumstances. Those
circumstances are hardly ever known. gcc Seems to exercise them. Some
memory tests, especially BIOS memory tests, don't. I'm no longer
working on creating a floppy with a linux kernel and a good memory
tester on it. Forget about bugging me about it......
The reason is that a memory test causes the CPU to execute just a few
instructions, and the memory access patterns tend to be very
regular. Under these circumstances only a very small subset of the
memories breaks down. If you're studying Electrical Engineering and
are interested in memory testing, a masters thesis could be to figure
out what's going on. There are computer manufacturers that would want
to sponsor such a project with some hardware that clients claim to be
unreliable, but doesn't fail the production tests......
QUESTION
Does it only happen when I compile a kernel?
ANSWER
Nope. There is no way your hardware can know that you are compiling a
kernel. It just so happens that a kernel compile is very tough on
your hardware, so it just happens a lot when you are compiling a
kernel. Compiling other large packages like gcc or glibc also often
trigger the sig11.
- People have seen "random" crashes for example while installing
using the slackware installation script.... --
dhn@pluto.njcc.com
- Others get "general protection errors" from the kernel (with
the crashdump). These are usually in /var/adm/messages.
-- fox@graphics.cs.nyu.edu
QUESTION
Nothing crashes on NT, Windows 95, OS/2 or DOS. It must be something
Linux specific.
ANSWER
First of all, Linux stresses your hardware more than all of the above.
Some OSes like the Microsoft ones named above crash in unpredictable
ways anyway. Nobody is going to call Microsoft and say "hey, my
windows box crashed today". If you do anyway, they will tell you that
you, the user, made an error (see
the interview with Bill
Gates in a German magazine....) and that since it works now, you
should shut up.
Those OSes are also somewhat more "predictable" than Linux. This means
that Excel might always be loaded in the exact same memory area.
Therefore when the bit-error occurs, it is always excel that gets
it. Excel will crash. Or excel will crash another application. Anyway,
it will seem to be a single application that fails, and not related to
memory.
What I am sure of is that a cleanly installed Linux system should be
able to compile the kernel without any errors. Certainly no sig-11
ones. (** Exception: Red Hat 5.0 with a Cyrix processor. See
elsewhere. **)
Really Linux and gcc stress your hardware more than other OSes. If you
need a non-linux thingy that stresses your hardware to the point
of crashing, you can try winstone. -- Jonathan Bright (bright@informix.com)
QUESTION
Is it always signal 11?
ANSWER
Nope. Other signals like four, six and seven also occur occasionally.
Signal 11 is most common though.
As long as memory is getting corrupted, anything can happen. I'd
expect bad binaries to occur much more often than they really
do. Anyway, it seems that the odds are heavily biased towards gcc
getting a signal 11. Also seen:
- free_one_pmd: bad directory entry 00000008
- EXT2-fs warning (device 08:14): ext_2_free_blocks bit already
cleared for block 127916
- Internal error: bad swap device
- Trying to free nonexistent swap-page
- kfree of non-kmalloced memory ...
- scsi0: REQ before WAIT DISCONNECT IID
- Unable to handle kernel NULL pointer dereference at virtual
address c0000004
- put_page: page already exists 00000046
invalid operand: 0000
- Whee.. inode changed from under us. Tell Linus
- crc error -- System halted (During the uncompress of the Linux kernel)
- Segmentation fault
- "unable to resolve symbol"
- make [1]: *** [sub_dirs] Error 139
make: *** [linuxsubdirs] Error 1
- The X Window system can terminate with a "caught signal xx"
The first few ones are cases where the kernel "suspects" a
kernel-programming-error that is actually caused by the bad memory.
The last few point to application programs that end up with the
trouble.
-- S.G.de Marinis (trance@interseg.it)
-- Dirk Nachtmann (nachtman@kogs.informatik.uni-hamburg.de)
QUESTION
What do I do?
ANSWER
Here are some things to try when you want to find out what is wrong...
note: Some of these will significantly slow your computer down. These
things are intended to get your computer to function properly and allow
you to narrow down what's wrong with it. With this information you
can for example try to get the faulty component replaced by your vendor.
The hardest part is that most people will be able to do all of the
above except borrowing memory from someone else, and it doesn't make a
difference. This makes it likely that it really is the RAM. Currently
RAM is the most pricy part of a PC, so you rather not have this
conclusion, but I'm sorry, I get lots of reactions that in the end
turn out to be the RAM. However don't despair just yet: your RAM may
not be completely wasted: you can always try to trade it in for different
or more RAM.
QUESTION
I had my RAMs tested in a RAM-tester device, and they are OK. Can't be the
RAM right?
ANSWER
Wrong. It seems that the errors that are currently occuring in RAMS are
not detectable by RAM-testers. It might be that your motherboard is
accessing the RAMs in dubious ways or otherwise messing up the RAM
while it is in YOUR computer. The advantage is that you can sell your
RAM to someone who still has confidence in his RAM-tester......
QUESTION
What are other possibilities?
ANSWER
Others have noted the following possibilities:
- The compiler and libc included in Red Hat 5.0 have an odd
interaction with the Cyrix processor. It crashes the compiler,
This is VERY odd. I would think that the only way
that this can be the case is whent he Cyrix has a bug that has
gone undetected all this time, and reliably gets triggered
when THAT gcc compiles the Linux kernel. Anyway, if you just want
compile a kernel, you should get a new compiler and/or libc from
the Red Hat website. (start at the homepage, and click errata).
- The Red Hat 5.x install program has a problem on some machines.
It crashes during the install. To me this clearly looks like a bug in
the install program. I'm not sure why there isn't more fuss about this.
I've seen this once, and got the machine to install by installing
a minimal base system, and then adding packages to the installed
system. Another solution is to NOT use Red Hat, but Debian or
SuSE. Someone suggested that the machine might be out-of-memory
when this happens. If you have this problem, could you tell me
how much RAM and SWAP you have/configured? (-- REW and many others)
- Compiling a 2.0.x kernel with a 2.8.x gcc or any egcs doesn't work.
There are a few bugs in the kernel that don't show up because
gcc 2.7.x does a lowsy job optimizing it. gcc 2.8.x and egcs just
dump some of the code because we didn't tell it not to. Anyway,
you usually get a kernel that seems to work but has funny bugs.
For example X may crash with a signal 11. Oh, and before you
ask, no it's not going to be fixed. Don't bother Alan or Linus
about this OK? -- Hans Peter Verne (h.p.verne@kjemi.uio.no)
- The pentium-optimizing-gcc (the one with the version number ending
in "p") fails with the default options on certain source files
like floppy.c in the kernel. The "triggers" are in the kernel, libc and
in gcc itself. This is easily diagnosed as "not a hardware
problem" because it always happens in the same place. You can
either disable some optimizations (try -fno-unroll-loops first) or
use another gcc. -- Evan Cheng (evan@top.cis.syr.edu)
(In other words: gcc 2.7.2p crashes with sig11 on floppy.c .
Workaround-1: Use plain gcc. Workaround-2: Manually compile
floppy.c with "-O" instead of "-O2". )
- A badly misconfigured gcc -- some parts from one version, some
from another. After a few weeks I ended up re-installing from
scratch to get everything right. -- Richard H. Derr III
(rhd@Mars.mcs.com).
- Gcc or the resulting application may terminate with sig11 when a
program is linked against the SCO libraries (which come with
iBCS). This occurs on some applications that have -L/lib in their
LDFLAGS....
- When compiling a kernel with an ELF compiler, but configured for
a.out (or the other way around, I forgot) you will get a signal 11
on the first call to "ld". This is easily identified as a software
problem, as it always occurs on the FIRST call to "ld" during the
build. -- REW
- An Ethernet card together with a badly configured PCI BIOS. If
your (ISA) Ethernet card has an aperture on the ISA bus, you might
need to configure it somewhere in the BIOS setup screens.
Otherwise the hardware would look on the PCI bus for the shared
memory area. As the ISA card can't react to the requests on the
PCI bus, you are reading empty "air". This can result in
segmentation faults and kernel crashes. -- REW
- Corrupted swap partition. Tony Nugent (T.Nugent@sct.gu.edu.au)
reports he used to have this problem and solved it by an mkswap on
his swap partition. (Don't forget to type "sync" before doing
anything else after an mkswap. -- Louis J. LaBash Jr.
(lou@minuet.siue.edu))
- NE2000 card. Some cheap Ne2000 cards might mess up the system. --
Danny ter Haar (dth@cistron.nl) I personally might have had
similar problems, as my mail server crashed hard every now and
then (once a day). It now seems that 1.2.13 and lots of the 1.3.x
kernels have this bug. I haven't seen it in 1.3.48. Probably got
fixed somewhere in the meantime.... -- REW
- Power supply? No I don't think so. A modern heavy system with two
or three harddisk, both SCSI and IDE will not exceed 120 Watts or
so. If you have loads of old harddisks and old expansion cards
the power requirements will be higher, but still it is very hard
to reach the limits of the power supply. Of course some people
manage to find loads of old full-size harddisks and install them
into their big-tower. You can indeed overload a powersupply that
way. -- Greg Nicholson (greg@job.cba.ua.edu)
A faulty power supply CAN of course deliver marginal power, which
causes all of the malfunctioning that you read about in this file....
-- Thorsten Kuehnemann (thorsten@actis.de)
- An inconsistent ext2fs. Some circumstances can cause the kernel
code of the ext2 file system to result in Signal 11 for Gcc.
-- Morten Welinder (terra@diku.dk)
- CMOS battery. Even if you set the BIOS as you want it, it could be
changing back to "bad" settings under your nose if the CMOS battery is
bad. -- Heonmin Lim (coco@me.umn.edu)
- No or too little swap space. Gcc doesn't gracefully handle the
"out of memory" condition. -- Paul Brannan (brannanp@musc.edu)
- Incompatible libraries. When you have a symlink from "libc.so.5"
pointing to "libc.so.6", some applications will bomb with sig11.
-- Piete Brooks (piete.brooks@cl.cam.ac.uk).
QUESTION
I don't believe this. To whom has this happened?
ANSWER
Well for one it happened to me personally. But you don't have to
believe me. It also happened to:
- Johnny Stephens (icjps@asuvm.inre.asu.edu)
- Dejan Ilic (d92dejil@und.ida.liu.se)
- Rick Tessner (rick@myra.com)
- David Fox (fox@graphics.cs.nyu.edu)
- Darren White (dwhite@baker.cnw.com) (L2 cache)
- Patrick J. Volkerding (volkerdi@mhd1.moorhead.msus.edu)
- Jeff Coy Jr. (jcoy@gray.cscwc.pima.edu) (Temp problems)
- Michael Blandford (mikey@azalea.lanl.gov) (Temp problems: CPU fan failed)
- Alex Butcher (Alex.Butcher@bristol.ac.uk) (Memory waitstates)
- Richard Postgate (postgate@cafe.net) (VLB loading)
- Bert Meijs (L.Meijs@et.tudelft.nl) (bad SIMMs)
- J. Van Stonecypher (scypher@cs.fsu.edu)
- Mark Kettner (kettner@cat.et.tudelft.nl) (bad SIMMs)
- Naresh Sharma (n.sharma@is.twi.tudelft.nl) (30->72 converter)
- Rick Lim (ricklim@freenet.vancouver.bc.ca) (Bad cache)
- Scott Brumbaugh (scottb@borris.beachnet.com)
- Paul Gortmaker (paul.gortmaker@anu.edu.au)
- Mike Tayter (tayter@ncats.newaygo.mi.us) (Something with the cache)
- Benni ??? (benni@informatik.uni-frankfurt.de) (VLB Overloading)
- Oliver Schoett (os@sdm.de) (Cache jumper)
- Morten Welinder (terra@diku.dk)
- Warwick Harvey (warwick@cs.mu.oz.au) (bit error in cache)
- Hank Barta (hank@pswin.chi.il.us)
- Jeffrey J. Radice (jjr@zilker.net) (Ram voltage)
- Samuel Ramac (sramac@vnet.ibm.com) (CPU tops out)
- Andrew Eskilsson (mpt95aes@pt.hk-r.se) (DRAM speed)
- W. Paul Mills (wpmills@midusa.net) (CPU fan disconnected from CPU)
- Joseph Barone (barone@mntr02.psf.ge.com) (Bad cache)
- Philippe Troin (ptroin@compass-da.com) (delayed RAM timing trouble)
- Koen D'Hondt (koen@dutlhs1.lr.tudelft.nl) (more kernel error messages)
- Bill Faust (faust@pobox.com) (cache problem)
- Tim Middlekoop (mtim@lab.housing.fsu.edu) (CPU temp: fan installed)
- Andrew R. Cook (andy@anchtk.chm.anl.gov) (bad cache)
- Allan Wind (wind@imada.ou.dk) (P66 overheating)
- Michael Tuschik (mt2@irz.inf.tu-dresden.de) (gcc2.7.2p victim)
- R.C.H. Li (chli@en.polyu.edu.hk) (Overclocking: ok for months...)
- Florin (florin@monet.telebyte.nl) (Overclocked CPU by vendor)
- Dale J March (dmarch@pcocd2.intel.com) (CPU overheating on laptop)
- Markus Schulte (markus@dom.de) (Bad RAM)
- Mark Davis (mark_d_davis@usa.pipeline.com) (Bad P120?)
- Josep Lladonosa i Capell (jllado@arrakis.es) (PCI options overoptimization)
- Emilio Federici (mc9995@mclink.it) (P120 overheating)
- Conor McCarthy (conormc@cclana.ucd.ie) (Bad SIMM)
- Matthias Petofalvi (mpetofal@ulb.ac.be) ("Simmverter" problem)
- Jonathan Christopher Mckinney (jono@tamu.edu) (gcc2.7.2p victim)
- Greg Nicholson (greg@job.cba.ua.edu) (many old disks)
- Ismo Peltonen (iap@bigbang.hut.fi) (irq_unmasking)
- Daniel Pancamo (pancamo@infocom.net) (70ns instead of 60 ns RAM)
- David Halls (david.halls@cl.cam.ac.uk)
- Mark Zusman (marklz@pointer.israel.net) (Bad motherboard)
- Elizabeth Ayer (eca23@cam.ac.uk) (Power management features)
- Thorsten Kuehnemann (thorsten@actis.de)
-
- (Email me with your story, you might get to be mentioned here... :-)
---- Update: I like to hear what happened to you. This will allow me to
guess what happens most, and keep this file as accurate as possible.
However I now have around 500 different Email addresses of people who've
had sig-11 problems. I don't think that it is useful to keep on adding
"random" people's names on this list. What do YOU think?
I'm interested in new stories. If you have a problem and are unsure
about what it is, it may help to
Email me at R.E.Wolff@BitWizard.nl
. My curiosity will usually drive me to answering your questions until
you find what the problem is..... (on the other hand, I do get pissed when
your problem is clearly described above :-)
This page is hosted by www.BitWizard.nl