View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000574 | LDMud 3.3 | Runtime | public | 2008-09-17 15:01 | 2009-10-29 03:21 |
Reporter | wedsall | Assigned To | |||
Priority | normal | Severity | crash | Reproducibility | random |
Status | closed | Resolution | unable to reproduce | ||
Product Version | 3.3.717 | ||||
Summary | 0000574: random crashes possibly memory related | ||||
Description | Mud crashes randomly -- seems to happen near the beginning of the boot. A reboot of the server seemed to clear up the mess after 3 consecutive crashes all within 15 minutes to 5 hours. I captured the log files from 2 crashes. Here is the stderr from crash 1: 2008.09.14 03:01:38 write socket (compressed): wrote 1460, should be 1024. 2008.09.14 03:01:44 write socket (compressed): wrote 2920, should be 108. 2008.09.14 03:02:04 write socket: wrote 267, should be 1024. 2008.09.14 03:02:04 write socket: wrote 872, should be 1024. 2008.09.14 03:02:04 write socket: wrote 266, should be 1024. 2008.09.14 03:02:04 write socket: wrote 873, should be 1024. 2008.09.14 03:02:04 write socket: wrote 873, should be 1024. 2008.09.14 03:02:04 write socket: wrote 873, should be 1024. 2008.09.14 03:02:04 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 436, should be 1023. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 872, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:05 write socket: wrote 1005, should be 1024. 2008.09.14 03:02:05 write socket: wrote 873, should be 1024. 2008.09.14 03:02:06 comm: write EWOULDBLOCK. Message discarded. 2008.09.14 03:02:16 write socket: wrote 569, should be 1024. 2008.09.14 03:02:16 write socket: wrote 721, should be 1024. 2008.09.14 03:02:16 write socket: wrote 873, should be 1024. 2008.09.14 03:02:16 write socket: wrote 721, should be 1024. .. [xerq] read: Success 2008.09.14 05:09:31 [xerq] Demon exiting. Here is some stdout from crash 1: 2008.09.14 02:38:08 Error in master_ob->valid_read() 2008.09.14 02:38:08 eval_cost too big 2100022 2008.09.14 02:38:08 Caught error: Too long evaluation. Execution aborted. .. I believe the above was my fault, and I repaired. 2008.09.14 03:37:40 ... execution continues. 2008.09.14 03:37:40 MCCP-DEBUG: 'obj/player#991' mccp ended ]/ / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / / / ]^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / ] ^ ^ ^ ^ ^ ^ ] / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / ]^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / ]^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ]/ / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ] / / / / / / / / / / / / / ] ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 2008.09.14 03:45:20 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. ..This rain looking message came through twice :) 2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:47:56 ... execution continues. 2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:47:56 ... execution continues. 2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:47:56 ... execution continues. 2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:47:56 ... execution continues. 2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:47:56 ... execution continues. 2008.09.14 04:47:56 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:47:56 ... execution continues. 2008.09.14 04:55:29 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:55:29 ... execution continues. 2008.09.14 04:55:51 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:55:51 ... execution continues. 2008.09.14 04:56:25 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:56:25 ... execution continues. 2008.09.14 04:57:57 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:57:57 ... execution continues. 2008.09.14 04:58:06 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:58:06 ... execution continues. 2008.09.14 04:59:11 MCCP-DEBUG: 'secure/login/login#3815' mccp started (86) 2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:59:14 ... execution continues. 2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:59:14 ... execution continues. 2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:59:14 ... execution continues. 2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:59:14 ... execution continues. 2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:59:14 ... execution continues. 2008.09.14 04:59:14 Caught error: Bad arg 1 to call_other(): got 'number', expec ted 'string/array/object'. 2008.09.14 04:59:14 ... execution con Nothing much in crash 1's debug log. Here is the debug log from crash 2: 51') line 1209 ' weapon_hit' in 'guilds/devil/objs/soulharvester.c' ('guilds/devil/objs/soulharvester#8151') line 266 ' master_hit' in ' obj/monster.c' ('domains/areas/varrak/tree/monsters/pixie#9405') line 1359 ' master_hit' in ' obj/living.c' ('domains/areas/varrak/tree/monsters/pixie#9405') line 1650 ' CATCH' in ('domains/areas/varrak/tree/monsters/pixie#9405') ' store_damage' in 'obj/daemon/damage_d.c' (' obj/daemon/damage_d') line 81 2008.09.14 20:44:01 ... execution continues. 2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88 2008.09.14 20:44:30 Dump of the call chain: No program to trace. Here is the stdout from crash 2: 8fa410a: 18 2 clit (2: 3) 8fa410c: 46 / (3: 4) 8fa410d: 50 < (2: 3) 8fa410e: 107 10 branch_when_zero (1: 2) 8fa4110: 31 0 local (0: 1) 8fa4112: 18 2 clit (1: 2) 8fa4114: 46 / (2: 3) 8fa4115: 119 166 push_identifier_lvalue (1: 2) 8fa4117: 79 (void)+= (2: 3) 8fa4118: 106 branch (0: 1) line 211 8fa411f: 8 166 identifier (0: 1) line 213 8fa4121: 14 35000 number (1: 2) 8fa4126: 48 > (2: 3) 8fa4127: 107 branch_when_zero (1: 2) 8fa4131: 98 save_arg_frame (0: 1) line 216 8fa4132: 10 9 cstring0 (1: 2) 8fa4134: 16 const1 (2: 3) 8fa4135: 415 7 call_out (3: 4) 8fa4137: 15 const0 (1: 2) 8fa4138: 99 restore_arg_frame (2: 3) 8fa4139: 93 pop_value (1: 2) 8fa413a: 25 return0 (0: 1) line 217 secure/simul_efun secure/simul_efun.c line 773 8d80f6e: 97 770 clear_locals (0: 4) line 773 8d80f71: 31 1 local (0: 4) line 776 8d80f73: 107 112 branch_when_zero (1: 5) 8d80f75: 8 8 identifier (0: 4) line 777 8d80f77: 31 0 local (1: 5) 8d80f79: 124 1 push_local_variable_lvalue (2: 6) 8d80f7b: 38 --x (3: 7) 8d80f7c: 62 index (3: 7) 8d80f7d: 62 index (2: 6) 8d80f7e: 124 4 push_local_variable_lvalue (1: 5) 8d80f80: 41 = (2: 6) 8d80f81: 198 pointerp (1: 5) 8d80f82: 39 9 && (1: 5) 8d80f84: 98 save_arg_frame (0: 4) 8d80f85: 31 4 local (1: 5) 8d80f87: 15 const0 (2: 6) 8d80f88: 446 38 member (3: 7) 8d80f8a: 99 restore_arg_frame (2: 6) 8d80f8b: 15 const0 (1: 5) 8d80f8c: 49 >= (2: 6) 8d80f8d: 107 branch_when_zero (1: 5) 8d80fca: 31 4 local (0: 4) line 785 8d80fcc: 108 branch_when_non_zero (1: 5) 8d80fd8: 98 save_arg_frame (0: 4) line 787 8d80fd9: 10 236 cstring0 (1: 5) 8d80fdb: 16 const1 (2: 6) 8d80fdc: 31 0 local (3: 7) 8d80fde: 31 1 local (4: 8) 8d80fe0: 415 7 call_out (5: 9) 8d80fe2: 15 const0 (1: 5) 8d80fe3: 99 restore_arg_frame (2: 6) 8d80fe4: 93 pop_value (1: 5) 8d80fe5: 25 return0 (0: 4) line 789 domains/areas/movalia/rooms/people4 <lambda ?> line 0 8d59853: 98 save_arg_frame (0: -1) line 0 8d59854: 173 0 lambda_cconstant (1: 0) 8d59856: 172 previous_object0 (2: 1) 8d59857: 207 this_object (3: 2) 8d59858: 430 funcall (4: 3) secure/master secure/master.c line 373 8d54e16: 31 0 local (0: 4) line 373 8d54e18: 107 38 branch_when_zero (1: 5) 8d54e1a: 97 258 clear_locals (0: 4) line 375 8d54e1d: 98 save_arg_frame (0: 4) 8d54e1e: 15 const0 (1: 5) 8d54e1f: 185 no_warn_deprecated (2: 6) 8d54e20: 22 61628 closure (2: 6) line 376 8d54e25: 31 1 local (3: 7) 8d54e27: 10 50 cstring0 (4: 8) 8d54e29: 16 const1 (5: 9) 8d54e2a: 167 4 aggregate (6: 10) 8d54e2d: 393 50 unbound_lambda (3: 7) 8d54e2f: 31 0 local (2: 6) 8d54e31: 413 5 bind_lambda (3: 7) 8d54e33: 99 restore_arg_frame (2: 6) 8d54e34: 124 2 push_local_variable_lvalue (1: 5) 8d54e36: 42 (void)= (2: 6) 8d54e37: 98 save_arg_frame (0: 4) line 377 8d54e38: 31 2 local (1: 5) 8d54e3a: 430 funcall (2: 6) domains/areas/movalia/rooms/people4 <lambda ?> line 0 a8a004b: 98 save_arg_frame (0: 6) line 0 a8a004c: 173 0 lambda_cconstant (1: 7) a8a004e: 173 1 lambda_cconstant (2: 8) a8a0050: 16 const1 (3: 9) a8a0051: 188 call_other (4: 10) domains/areas/movalia/rooms/people4 domains/areas/movalia/rooms/people4.c line 6 a4dfa9a: 98 save_arg_frame (0: 10) line 6 a4dfa9b: 31 0 local (1: 11) a4dfa9d: 112 call_inherited (2: 12) domains/areas/movalia/rooms/people4 room/room.c line 160 8eedfae: 31 0 local (0: 12) line 160 8eedfb0: 107 1 branch_when_zero (1: 13) 8eedfb2: 25 return0 (0: 12) domains/areas/movalia/rooms/people4 domains/areas/movalia/rooms/people4.c line 6 a4dfaa2: 99 restore_arg_frame (2: 12) line 6 a4dfaa3: 93 pop_value (1: 11) a4dfaa4: 31 0 local (0: 10) line 7 a4dfaa6: 107 1 branch_when_zero (1: 11) a4dfaa8: 25 return0 (0: 10) domains/areas/movalia/rooms/people4 <lambda ?> line 0 a8a0052: 99 restore_arg_frame (2: 8) line 0 a8a0053: 24 return (1: 7) secure/master secure/master.c line 377 8d54e3c: 99 restore_arg_frame (2: 6) line 377 8d54e3d: 93 pop_value (1: 5) 8d54e3e: 106 branch (0: 4) line 379 8d54e49: 31 1 local (0: 4) line 382 8d54e4b: 108 6401 branch_when_non_zero (1: 5) 8d54e4e: 98 save_arg_frame (0: 4) line 385 8d54e4f: 31 1 local (1: 5) 8d54e51: 10 51 cstring0 (2: 6) 8d54e53: 10 52 cstring0 (3: 7) 8d54e55: 226 15 time (4: 8) 8d54e57: 188 call_other (5: 9) 8d54e58: 99 restore_arg_frame (2: 6) 8d54e59: 93 pop_value (1: 5) 8d54e5a: 98 save_arg_frame (0: 4) line 386 8d54e5b: 31 1 local (1: 5) 8d54e5d: 10 53 cstring0 (2: 6) 8d54e5f: 10 54 cstring0 (3: 7) 8d54e61: 188 call_other (4: 8) 8d54e62: 99 restore_arg_frame (2: 6) 8d54e63: 40 5 || (1: 5) 8d54e65: 14 900 number (0: 4) 8d54e6a: 24 return (1: 5) domains/areas/movalia/rooms/people4 <lambda ?> line 0 8d5985a: 99 restore_arg_frame (2: 1) line 0 8d5985b: 24 return (1: 0) guilds/demoniser/rooms/ds2 <lambda ?> line 0 8d59853: 98 save_arg_frame (0: -1) 8d59854: 173 0 lambda_cconstant (1: 0) 8d59856: 172 previous_object0 (2: 1) 8d59857: 207 this_object (3: 2) 8d59858: 430 funcall (4: 3) secure/master secure/master.c line 373 8d54e16: 31 0 local (0: 4) line 373 8d54e18: 107 38 branch_when_zero (1: 5) 8d54e1a: 97 258 clear_locals (0: 4) line 375 8d54e1d: 98 save_arg_frame (0: 4) 8d54e1e: 15 const0 (1: 5) 8d54e1f: 185 no_warn_deprecated (2: 6) 8d54e20: 22 61628 closure (2: 6) line 376 8d54e25: 31 1 local (3: 7) 8d54e27: 10 50 cstring0 (4: 8) 8d54e29: 16 const1 (5: 9) 8d54e2a: 167 4 aggregate (6: 10) 8d54e2d: 393 50 unbound_lambda (3: 7) 8d54e2f: 31 0 local (2: 6) 8d54e31: 413 5 bind_lambda (3: 7) 8d54e33: 99 restore_arg_frame (2: 6) 8d54e34: 124 2 push_local_variable_lvalue (1: 5) 8d54e36: 42 (void)= (2: 6) 8d54e37: 98 save_arg_frame (0: 4) line 377 8d54e38: 31 2 local (1: 5) 8d54e3a: 430 funcall (2: 6) guilds/demoniser/rooms/ds2 <lambda ?> line 0 a8a004b: 98 save_arg_frame (0: 6) line 0 a8a004c: 173 0 lambda_cconstant (1: 7) a8a004e: 173 1 lambda_cconstant (2: 8) a8a0050: 16 const1 (3: 9) a8a0051: 188 call_other (4: 10) guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/ds2.c line 20 10e5c096: 8 23 identifier (0: 10) line 20 10e5c098: 108 branch_when_non_zero (1: 11) 10e5c0a6: 98 save_arg_frame (0: 10) line 25 10e5c0a7: 31 0 local (1: 11) 10e5c0a9: 112 call_inherited (2: 12) guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/shadowroom.c line 15 110077ea: 98 save_arg_frame (0: 12) line 15 110077eb: 31 0 local (1: 13) 110077ed: 112 call_inherited (2: 14) guilds/demoniser/rooms/ds2 room/room.c line 160 8eedfae: 31 0 local (0: 14) line 160 8eedfb0: 107 1 branch_when_zero (1: 15) 8eedfb2: 25 return0 (0: 14) guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/shadowroom.c line 15 110077f2: 99 restore_arg_frame (2: 14) line 15 110077f3: 93 pop_value (1: 13) 110077f4: 31 0 local (0: 12) line 16 110077f6: 107 1 branch_when_zero (1: 13) 110077f8: 25 return0 (0: 12) guilds/demoniser/rooms/ds2 guilds/demoniser/rooms/ds2.c line 25 10e5c0ae: 99 restore_arg_frame (2: 12) line 25 10e5c0af: 93 pop_value (1: 11) 10e5c0b0: 31 0 local (0: 10) line 26 10e5c0b2: 107 1 branch_when_zero (1: 11) 10e5c0b4: 25 return0 (0: 10) guilds/demoniser/rooms/ds2 <lambda ?> line 0 a8a0052: 99 restore_arg_frame (2: 8) line 0 a8a0053: 24 return (1: 7) secure/master secure/master.c line 377 8d54e3c: 99 restore_arg_frame (2: 6) line 377 8d54e3d: 93 pop_value (1: 5) 8d54e3e: 106 branch (0: 4) line 379 8d54e49: 31 1 local (0: 4) line 382 8d54e4b: 108 6401 branch_when_non_zero (1: 5) 8d54e4e: 98 save_arg_frame (0: 4) line 385 8d54e4f: 31 1 local (1: 5) 8d54e51: 10 51 cstring0 (2: 6) 8d54e53: 10 52 cstring0 (3: 7) 8d54e55: 226 15 time (4: 8) 8d54e57: 188 call_other (5: 9) 8d54e58: 99 restore_arg_frame (2: 6) 8d54e59: 93 pop_value (1: 5) 8d54e5a: 98 save_arg_frame (0: 4) line 386 8d54e5b: 31 1 local (1: 5) 8d54e5d: 10 53 cstring0 (2: 6) 8d54e5f: 10 54 cstring0 (3: 7) 8d54e61: 188 call_other (4: 8) 8d54e62: 99 restore_arg_frame (2: 6) 8d54e63: 40 5 || (1: 5) 8d54e65: 14 900 number (0: 4) 8d54e6a: 24 return (1: 5) guilds/demoniser/rooms/ds2 <lambda ?> line 0 8d5985a: 99 restore_arg_frame (2: 1) line 0 8d5985b: 24 return (1: 0) a6b7071: 124 2 42 98 208 10 48 98 No program to trace. 2008.09.14 20:44:30 LDMud aborting on fatal error. and finally the stderr from crash 2: 2008.09.14 05:10:01 [xerq] XERQ Aug 15 2006: Path 'erq', debuglevel 0 2008.09.14 05:10:01 [xerq] Demon started 2008.09.14 05:10:04 Failed to load file: 'players/parisboy/tokyo/shoquest'. 2008.09.14 05:10:04 Failed to load file: 'players/tiberius/quest_object'. 2008.09.14 06:26:17 write socket: wrote 417, should be 1024. 2008.09.14 06:32:47 Failed to load file: 'players/rapier/objects/cball'. 2008.09.14 07:35:21 write socket: wrote 79, should be 681. 2008.09.14 07:44:09 Failed to load file: 'players/undertaker/items/bluegem'. 2008.09.14 07:44:09 Failed to load file: 'players/undertaker/items/bluegem'. 2008.09.14 08:05:38 write socket: wrote 418, should be 1024. 2008.09.14 08:10:42 write socket: wrote 417, should be 1024. 2008.09.14 08:14:11 Failed to load file: 'players/undertaker/items/bluegem'. 2008.09.14 08:17:48 write socket: wrote 416, should be 1024. 2008.09.14 09:08:44 write socket: wrote 275, should be 579. 2008.09.14 09:09:20 write socket: wrote 416, should be 1024. 2008.09.14 16:08:20 obj/living/soul.c line 1552: syntax error before ' ob = find'. 2008.09.14 16:08:20 obj/living/soul.c line 1553: Bad assignment: illegal lhs (target) before end of line. 2008.09.14 16:08:20 Error in loading object: 'obj/living/soul'. 2008.09.14 20:35:50 guilds/demoniser/rooms/quest/indoorquestroom.c line 51: Warning: casting a value to its own type: int before ' 2)'. 2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88 [xerq] read: Success 2008.09.14 20:44:31 [xerq] Demon exiting. | ||||
Additional Information | > Another crash, this time with different log data. Ok, then right at the beginning: Did you get a core dump? ;-) > Ldmud debug log (with some mud paths cut out): > 2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for > slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88 > 2008.09.14 20:44:30 Dump of the call chain: > No program to trace. Ok, there are several possibilities here I think. One is that the allocator tried to free a slab, thats memory was (at least partially) corrupted/over-written by someone else. Each slab has a magic value at the beginning to detect exactly that kind of problem. Also the size information of the block was corrupted as well. Another one would be, that someone called xfree() for a pointer which doesn't point to the beginning of a memory block previously allocated by xalloc(). The memory allocator then calls fatal() which usually dumps the LPC stack trace, but apparantly there was none to print, which suggests that the driver was in the backend cycle and not executing some LPC program, I guess. fatal() also call dump_core() which takes care of dumping the core if allowed. OK, these are the last LPC instructions that were executed before the crash, but unfortunately they don't have to be related as nobody knows so far, when these memory block was corrupted. Seems that some lambda from guilds/demoniser/rooms/ds2 was last executed, you may have a look at that, but I don't really expect that you find something. As one possible cause for the crash is some memory corruption, I advise you to enable --enable-malloc-trace and --enable-malloc-lpc-trace. That will consume some additional memory but may give some more hints. But I don't see a realistic chance of solving the issue without a core dump, best one written by a driver without any optimization. > I'm starting to think my machine has some bad hardware.. maybe memory? That would be a possibility as well, yes. Besides a genuine bug in the driver. ;-) I really think you should file a bug at the bug tracker and attach the important part of your log files, your config.h and maybe core dumps, executable, and additinonal information about your system (architecture, OS) there. I guessed the memory allocator in use, but that doesn't have to be correct. Lars Duening schrieb: > I don't think this is a case of bad memory - the values are too meaningful. [...] > Putting this all together, I think we have classic case of invalid > memory access here: somehow the control field before the block got > decremented by 2. When the allocator calculated the address of the slab, > it didn't get the actual slab header, but instead an address 2 words > into the slab header. This the slab->prev pointer (pointing to 0xef6e88) > was mistaken as the magic word, and slab->next (being 0) was mistaken as > the slab's size. | ||||
Tags | No tags attached. | ||||
|
Complete Mail from Lars just FTR: > 2008.09.14 20:44:30 mem_free: block 0xe4f6be8 magic match failed for slab e4f5600: size 4294967280, expected 2360aab8, found e4f6e88 I don't think this is a case of bad memory - the values are too meaningful. Taken literally, this message means that a block was freed in a slab not suited for its size (e.g. a 4-Byte block in a slab for 8-Byte blocks). But the found magic value is peculiar - it's a value which looks like a valid memory address (the slab is at 0xe4f5600, the block at 0xe4f6be8, the magic value is 0xe4f6e88). The size (which is the block's size, but read from the slab header) is 4294967280 or 0xFFFFFFF0. This is not the exact value listed in the slab, but essentially (slab->size-4) * 4; with this formula the size listed in the slab must have been 0. The allocator finds the slab for a given block by taking the control word before the block and extracting the offset to the slab start from it, with the offset given in word_t's. Putting this all together, I think we have classic case of invalid memory access here: somehow the control field before the block got decremented by 2. When the allocator calculated the address of the slab, it didn't get the actual slab header, but instead an address 2 words into the slab header. This the slab->prev pointer (pointing to 0xef6e88) was mistaken as the magic word, and slab->next (being 0) was mistaken as the slab's size. However, to further debug this problem, you need at minimum a good coredump; and having MALLOC_TRACE and MALLOC_LPC_TRACE enable wouldn't hurt either. |
|
I am setting this to 'feedback' state until William gets a core dump or other additional information usable for tracking this down. Seems we have to wait until then. |
|
I doubt that you'll receive a core dump here, after this long time... |
|
Sorry for the long delay. I think we talked offline at some point.. I replaced the server with better newer hardware and this resolved the problem. I believe the issue was bad server memory however it was not reporting as bad memory with memtest. At least, something was wrong with the old server. Memory, motherboard, etc. |
|
Like Lars I doubt that it was bad memory (alone?). As Lars said, the values were too meaningful. But since we can't proceed without a core dump and this problem did not occur again for some reason, I close this as 'unable to reproduce'. If it surfaces again, please tell me to re-open. |
Date Modified | Username | Field | Change |
---|---|---|---|
2008-09-17 15:01 | wedsall | New Issue | |
2008-09-17 16:09 | zesstra | Note Added: 0000786 | |
2008-09-21 11:24 | zesstra | Note Added: 0000788 | |
2008-09-21 11:24 | zesstra | Status | new => feedback |
2009-10-28 18:38 | Coogan | Note Added: 0001569 | |
2009-10-28 18:58 | wedsall | Note Added: 0001570 | |
2009-10-29 03:21 | zesstra | Note Added: 0001571 | |
2009-10-29 03:21 | zesstra | Status | feedback => closed |
2009-10-29 03:21 | zesstra | Resolution | open => unable to reproduce |