View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000709 | LDMud 3.3 | LPC Compiler/Preprocessor | public | 2009-12-20 21:05 | 2009-12-22 09:15 |
Reporter | Wildcat | Assigned To | Gnomi | ||
Priority | normal | Severity | crash | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | x86_64 | OS | CentOS | OS Version | 5.3 |
Product Version | 3.3.718 | ||||
Target Version | 3.3.720 | Fixed in Version | 3.3.720 | ||
Summary | 0000709: Repeatable crash in 64 bit build of 3.3.718 | ||||
Description | Greetings, I just upgraded form an ancient driver to the most recent driver and I'm getting a repeatable crash loading a specific NPC but as I try to narrow it down the stack changes, it's always in a null string deference during the cloning but the objects in the path can change. In this example it was /bin/damage/mino for example. #0 0x0000000000450cf5 in mstr_mem_size (s=0x0) at mstrings.h:126 0000001 0x00000000004517cf in ref_mstring (s=0x0) at mstrings.h:187 0000002 0x00000000004588e0 in eval_instruction ( first_instruction=0x24a41ca "b\001\002cq", initial_sp=0x7801f0) at interpret.c:8443 0000003 0x000000000046ea66 in apply_low (fun=0xbafbc8, ob=0x10dfd68, num_arg=1, b_ign_prot=true, allowRefs=false) at interpret.c:17096 0000004 0x000000000046ebf1 in int_apply (fun=0xbafbc8, ob=0x10dfd68, num_arg=1, b_ign_prot=true, b_use_default=true) at interpret.c:17174 0000005 0x000000000046f046 in sapply_int (fun=0xbafbc8, ob=0x10dfd68, num_arg=1, b_find_static=true, b_use_default=true) at interpret.c:17335 0000006 0x0000000000495002 in reset_object (ob=0x10dfd68, arg=5) at object.c:899 0000007 0x00000000004dffa0 in load_object (lname=0x7d0860 "bin/damage/mino", create_super=false, depth=0, isMasterObj=false, chain=0x0) at simulate.c:2120 0000008 0x00000000004e0842 in lookfor_object (str=0x108e950, bLoad=true) at simulate.c:2388 0000009 0x00000000004e473e in f_load_object (sp=0x7801a0) at simulate.c:4449 0000010 0x0000000000457eb5 in eval_instruction ( first_instruction=0xbfd3b2 "b\001\001\037", initial_sp=0x780190) at interpret.c:8175 0000011 0x00000000004679b7 in eval_instruction ( first_instruction=0x19777da "?\002\001\020,\00375l\001\031?\0035\n", ---Type <return> to continue, or q <return> to quit--- initial_sp=0x780100) at interpret.c:14943 0000012 0x000000000046ea66 in apply_low (fun=0xbdb118, ob=0x250ea58, num_arg=1, b_ign_prot=false, allowRefs=false) at interpret.c:17096 0000013 0x000000000046ebf1 in int_apply (fun=0xbdb118, ob=0x250ea58, num_arg=1, b_ign_prot=false, b_use_default=true) at interpret.c:17174 #14 0x000000000046a57e in eval_instruction ( first_instruction=0x7fff99e5e0f0 "?\030E", initial_sp=0x780100) at interpret.c:16444 #15 0x0000000000470df5 in int_call_lambda (lsvp=0x7800d0, num_arg=3, allowRefs=false, external=true) at interpret.c:18354 #16 0x0000000000474c65 in v_apply (sp=0x780100, num_arg=4) at interpret.c:20588 #17 0x0000000000458696 in eval_instruction ( first_instruction=0xbffa2c "c\037", initial_sp=0x7800b0) at interpret.c:8374 #18 0x00000000004dc8c4 in catch_instruction (flags=0, offset=12, i_sp=0x8159b0, i_pc=0xbffa2c "c\037", i_fp=0x780070, reserve_cost=2000, i_context=0x0) at simulate.c:449 #19 0x000000000045a66b in eval_instruction ( first_instruction=0xbffa22 "b\002\002?\003Yc ", initial_sp=0x7800a0) at interpret.c:9593 #20 0x000000000047048f in int_call_lambda (lsvp=0x780050, num_arg=2, allowRefs=false, external=true) at interpret.c:18075 #21 0x00000000004e5f63 in v_limited (sp=0x780080, num_arg=4) at simulate.c:5228 ---Type <return> to continue, or q <return> to quit--- #22 0x0000000000458696 in eval_instruction ( first_instruction=0xbffa82 "b\002\001\002\b\016?\206\001", initial_sp=0x780030) at interpret.c:8374 #23 0x00000000004679b7 in eval_instruction ( first_instruction=0xb66d22 "b\002\006?\a", initial_sp=0x77fff0) at interpret.c:14943 #24 0x000000000046e646 in apply_low (fun=0xb88010, ob=0xbadb60, num_arg=2, b_ign_prot=false, allowRefs=false) at interpret.c:16983 #25 0x000000000046ebf1 in int_apply (fun=0xb88010, ob=0xbadb60, num_arg=2, b_ign_prot=false, b_use_default=true) at interpret.c:17174 #26 0x000000000046a57e in eval_instruction ( first_instruction=0x12285a2 "b\001\003?\a", initial_sp=0x77ff40) at interpret.c:16444 #27 0x000000000046e646 in apply_low (fun=0xc02e08, ob=0x11e7aa8, num_arg=1, b_ign_prot=false, allowRefs=false) at interpret.c:16983 #28 0x000000000046ebf1 in int_apply (fun=0xc02e08, ob=0x11e7aa8, num_arg=1, b_ign_prot=false, b_use_default=true) at interpret.c:17174 #29 0x000000000046f046 in sapply_int (fun=0xc02e08, ob=0x11e7aa8, num_arg=1, b_find_static=false, b_use_default=true) at interpret.c:17335 #30 0x0000000000408968 in parse_command (buff=0x7fff99e62e30 "clone drguard", from_efun=false) at actions.c:1068 #31 0x0000000000409282 in execute_command (str=0x7fff99e62e30 "clone drguard", ob=0x11e7aa8) at actions.c:1269 ---Type <return> to continue, or q <return> to quit--- #32 0x00000000004119f7 in backend () at backend.c:677 #33 0x00000000004819ef in main (argc=2, argv=0x7fff99e64888) at main.c:673 This only occurs if the driver is compiled on a 64 bit machine. I have a centos 5.3 i386 machine I do development on locally which doesn't exhibit the problem, but a centos 5.3 x64 does. Since it's so reproducible I can pretty much do whatever is needed to try to debug it more. | ||||
Tags | No tags attached. | ||||
Attached Files | bug709.diff (1,528 bytes)
Index: trunk/src/version.sh =================================================================== --- trunk/src/version.sh (Revision 2618) +++ trunk/src/version.sh (Arbeitskopie) @@ -17,7 +17,7 @@ # A timestamp, to be used by bumpversion and other scripts. # It can be used, for example, to 'touch' this file on every build, thus # forcing revision control systems to add it on every checkin automatically. -version_stamp="2009-05-30 12:00:00" +version_stamp="Di 22. Dez 02:25:01 CET 2009" # The version number information version_micro=719 Index: trunk/src/prolang.y =================================================================== --- trunk/src/prolang.y (Revision 2618) +++ trunk/src/prolang.y (Arbeitskopie) @@ -12997,7 +12997,12 @@ } CURRENT_PROGRAM_SIZE--; - last_expression--; + + /* If last_expression lies within the program area + * that was moved one bytecode adjust it accordingly. + */ + if(last_expression > $<function_call_head>2.start) + last_expression--; } argument_level--; @@ -13268,6 +13273,12 @@ } CURRENT_PROGRAM_SIZE--; + + /* If last_expression lies within the program area + * that was moved one bytecode adjust it accordingly. + */ + if(last_expression > $<function_call_head>4.start) + last_expression--; } argument_level--; | ||||
|
Always nice to have some reproducible problem. ;-) Could you supply us executable, coredump and the source code of the crashing program? Using a driver compiled with -O0 and -ggdb3 would be the best. (If the package gets big, we might exchange it using ftp or alike). |
|
Ok, on a second thought: as it is not a single program, which exhibits this behaviour, it may be a better idea to look at your mudlib. Do you have a public (minimal) mudlib, which triggers the crash and which we might use to reproduce it in our development environment? Additionally, could you please add config.h, machine.h, Makefile and the output of 'gcc -dumpmachine' and 'gcc -dumpspecs' as well? I use a driver compiled for x86_64 as well which does usually not crash. So I think, there has to be some significant difference either in your build environment or your mudlib. |
|
I've tarred up ldmud, a core dump, output of dumpmachine, dumpspecs, config.h, machine.h, Makefile into a tarball that can be found at http://www.thebigwave.net/709/709.tar.gz Unfortunately the mudlib is extremely large and custom. It started life as a stock LP 2.4.5 in 1990 and has gone through different drivers along the way while still being compat mode. I'm pretty sure there isn't that many like it left around. The specific program that I found crashing 'drguard.c' hasn't been modified since '01 when I think we were on an Amylaar driver. I notice that it even used: string sChat; sChat = allocate(2); sChat[0] = "String"; sChat[1] = "String"; type nomenclature, however changing that to string* sChat += does not avert the crash. Is there an easy way to dump all programs that are being compiled? I could perhaps form a minimum mudlib if I can track down what's being loaded easily but there are several layers involved. Given how reproducible it is, I just run a test driver/lib on another port and crash it all the time there, I can do whatever debugging you need done as I'm a professional game developer in my other life with experience shipping Linux based MMOs. |
|
Thanks for the data. Sometimes muds have a small, public version of their core lib without any secret objects and data, we could have checked if this is affected as well and used it as a starting point for a testcase. (Also, if we think about the possibility that programs are mis-compiled it helps to know how the compiled bytecode should look like.) However, the driver writes a list of all programs compiled to stdout if you start it with the command line option -c. Are you sure, you use a 3.3.718? I checked out 718 from our repository, but the line numbers in interpret.c don't match. Line 8443 is the opening { in CASE(F_FLOAT);. Are there any modifications/patches in use? |
|
Ok, Gnomi told me, that this seems to be 3.3.719. Then another thing: Could you do try 3.3.718 (and maybe even older releases) as well? If this bug was introduced recently, we may limit the problem to one release. If you manage to assemble a test case, I could use git bisect to find the right revision, that would be even better. We could then trace the compilation as well. Gnomi had a first look at the bytecode, which seems to be wrong. He needs the instr.h and the source of a crashing program. |
|
Yes it is 719, I'll try on 718 and a smattering of older releases. I'm also putting together a public mudlib with as much trimmed out as possible. One quick question, is there an option for 'deterministic random' I can easily turn on? The case has the NPC spawn a random race but I can trim out a few hundred files if it's always the same race. Actually an interesting point/question, I'm going to add the NPC in question to the end of the autoload and see if it crashes on startup with it listed there, this would reduce the set of files needed dramatically. Another random thing while you can get all of the info from config.h I figure to mention the configure I run with: ./configure --prefix=/usr/users --enable-compat-mode --enable-erq=xerq --with-er q-debug=0 --with-read-file-max-size=300000 --with-master-name=obj/master --with- max-array-size=0 --with-max-mapping-size=0 --with-max-mapping-keys=0 --with-max- players=100 --with-max-cost=5000000 --with-hard-malloc-limit=0 --enable-use-mysq l --enable-use-mccp --enable-use-pcre=builtin --enable-use-xml=xml2 --enable-use -tls --with-portno=2777 LDFLAGS=-L/usr/lib64/mysql The driver is also stock 719 other than the configure script listed above, and ldd is returning: linux-vdso.so.1 => (0x00007fff7bffe000) /lib64/rtkaio/librt.so.1 (0x00007f1b73cae000) libnsl.so.1 => /lib64/libnsl.so.1 (0x000000396b000000) libm.so.6 => /lib64/libm.so.6 (0x0000003969c00000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x000000396b400000) libssl.so.6 => /lib64/libssl.so.6 (0x000000325e200000) libcrypto.so.6 => /lib64/libcrypto.so.6 (0x000000325ce00000) libmysqlclient.so.15 => /usr/lib64/mysql/libmysqlclient.so.15 (0x000000325e600000) libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x0000003971000000) libz.so.1 => /usr/lib64/libz.so.1 (0x000000396a000000) libc.so.6 => /lib64/libc.so.6 (0x0000003969000000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003969800000) /lib64/ld-linux-x86-64.so.2 (0x0000003968c00000) libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x000000325d200000) libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x000000325da00000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x000000396e400000) libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x000000325de00000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003969400000) libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x000000325d600000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x000000396b800000) libresolv.so.2 => /lib64/libresolv.so.2 (0x000000396bc00000) libselinux.so.1 => /lib64/libselinux.so.1 (0x000000325ca00000) libsepol.so.1 => /lib64/libsepol.so.1 (0x000000396a800000) incase that comes up at all. Ok, adding it to the startup file repros the crash 100% of the time and I just hardcoded the race to always be the same which also is reproing 100% of the time. Test mudlib should be forthcoming... Ok, at http://www.thebigwave.net/crashlib.tgz is a minimal mudlib that reproduces it, you'll need to start the driver with -D PUBLIC_MUDLIB (used in master, quest, and commandd I believe) and it should start up and crash instantly given the before mentioned executable. the instrs.h that was used to compile the debug version given before is in http://www.thebigwave.net/instrs.h I think that's pretty much everything you could ask for/need, otherwise I'll be happy to supply more information. |
|
I tried the minimal mudlib and it complains about missing files in /obj/simul_efuns/ (included by /obj/simul_efun.c). |
|
I was afraid of that, there are some cases where includes are used and -c doesn't catch that, I'll grep the mudlib fast for includes and add anything that's not in a sys directory... I did a single pass through it and update the lib at http://www.thebigwave.net/crashlib.tgz , I'll go and start a VM to verify it as well... |
|
Ok I verified that the size 213927 date stamp Dec 21 14:33 crashlib.tgz that's there now boots without an error until the point of crash. (While using -D PUBLIC_MUDLIB) |
|
Thank you very much for the testcase. I can reproduce it on my system (and a different platform) and isolated the revision which introduced the error. |
|
I attached a patch that fixes this case. Unfortunately it doesn't seem to apply to 0000683 as well. The problem was, that the compiler adjusted last_expression wrongly after it had moved some parts of the program, so that last_expression might point to an argument instead of an instruction (and when that arguments happened to be 0x14 (F_NUMBER) and the distance between last_expression and CURRENT_PROGRAM_SIZE happened to be 9, the compiler did some optimizations it better had not done). |
|
Bugfix committed as r2809. |
Date Modified | Username | Field | Change |
---|---|---|---|
2009-12-20 21:05 | Wildcat | New Issue | |
2009-12-21 02:59 | zesstra | Project | LDMud => LDMud 3.3 |
2009-12-21 03:07 | zesstra | Note Added: 0001653 | |
2009-12-21 03:07 | zesstra | Status | new => acknowledged |
2009-12-21 03:07 | zesstra | OS | => CentOS |
2009-12-21 03:07 | zesstra | OS Version | => 5.3 |
2009-12-21 03:07 | zesstra | Platform | => x86_64 |
2009-12-21 03:07 | zesstra | Target Version | => 3.3.720 |
2009-12-21 03:38 | zesstra | Note Added: 0001654 | |
2009-12-21 11:55 | Wildcat | Note Added: 0001655 | |
2009-12-21 14:11 | zesstra | Note Added: 0001656 | |
2009-12-21 15:00 | zesstra | Note Added: 0001657 | |
2009-12-21 15:59 | Wildcat | Note Added: 0001658 | |
2009-12-21 16:14 | Gnomi | Note Added: 0001659 | |
2009-12-21 16:27 | Wildcat | Note Added: 0001660 | |
2009-12-21 16:39 | Wildcat | Note Added: 0001661 | |
2009-12-21 17:01 | zesstra | Note Added: 0001662 | |
2009-12-21 17:01 | zesstra | Status | acknowledged => confirmed |
2009-12-21 19:40 | Gnomi | File Added: bug709.diff | |
2009-12-21 19:45 | Gnomi | Note Added: 0001663 | |
2009-12-22 09:15 | Gnomi | Note Added: 0001664 | |
2009-12-22 09:15 | Gnomi | Status | confirmed => resolved |
2009-12-22 09:15 | Gnomi | Fixed in Version | => 3.3.720 |
2009-12-22 09:15 | Gnomi | Resolution | open => fixed |
2009-12-22 09:15 | Gnomi | Assigned To | => Gnomi |
2009-12-22 09:17 | Gnomi | Relationship added | has duplicate 0000708 |