View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000677 | LDMud 3.2 | Runtime | public | 2009-09-19 13:06 | 2009-10-06 01:32 |
Reporter | Coogan | Assigned To | zesstra | ||
Priority | normal | Severity | crash | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Product Version | 3.2.16 | ||||
Summary | 0000677: starting an SSL-secured connection crashes the driver | ||||
Description | Running on Debian stable and compiled with SSL-support (openssl), starting an SSL-connection on the mud-SSL-port crashes the driver, dumping a core file. coogan@mud:/home/tubmud/mudlib$ telnet-ssl -z ssl mud.tubmud.de 7682 Trying 85.214.44.4... SSL_connect: Success coogan@mud:/home/tubmud/mudlib$ LDMUD-3.2.14 works fine. The handshake code looks a bit different to 3.2.14, so I think this is a regression. | ||||
Additional Information | tubmud@mud:~/mudlib$ gdb ../mudbin/driver-3.2.16 core.3917 GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... warning: core file may not match specified executable file. Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done. Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done. Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Core was generated by `/home/tubmud/mudbin/driver -Mkernel/master -DTUBMUD=1 -D__START_TIME__="_S_T_20'. Program terminated with signal 11, Segmentation fault. [New process 3917] #0 0x00a4782c in sk_value () from /usr/lib/i686/cmov/libcrypto.so.0.9.8 (gdb) bt #0 0x00a4782c in sk_value () from /usr/lib/i686/cmov/libcrypto.so.0.9.8 0000001 0x00fa08b8 in ssl3_accept () from /usr/lib/i686/cmov/libssl.so.0.9.8 0000002 0x00fb8d7a in SSL_accept () from /usr/lib/i686/cmov/libssl.so.0.9.8 0000003 0x00faae70 in ssl23_get_client_hello () from /usr/lib/i686/cmov/libssl.so.0.9.8 0000004 0x00fab768 in ssl23_accept () from /usr/lib/i686/cmov/libssl.so.0.9.8 0000005 0x00fb8713 in SSL_do_handshake () from /usr/lib/i686/cmov/libssl.so.0.9.8 0000006 0x080bf38d in tls_continue_handshake (ip=0xa8ad054) at pkg-tls.c:774 0000007 0x08063977 in get_message (buff=0xbfcd3f38 "") at comm.c:2896 0000008 0x08054f81 in backend () at backend.c:537 0000009 0x080a3805 in main (argc=14, argv=0xbfcd58c4) at main.c:517 (gdb) | ||||
Tags | No tags attached. | ||||
Attached Files | 0001-Register-our-own-allocator-functions-with-openssl.patch (1,724 bytes)
From c03f18dbcbd52da259f14c1f77879573b5f56cec Mon Sep 17 00:00:00 2001 From: zesstra <zesstra@zesstra.de> Date: Sun, 4 Oct 2009 22:30:39 +0200 Subject: [PATCH] Register our own allocator functions with openssl. Use CRYPTO_set_mem_functions() to register pointers to our own allocator in openssl. Otherwise, openssl may use the system allocator and memory get used twice. (#677) (pkg-tls.c) --- src/pkg-tls.c | 31 +++++++++++++++++++++++++++++++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/src/pkg-tls.c b/src/pkg-tls.c index 543152b..3c9b446 100644 --- a/src/pkg-tls.c +++ b/src/pkg-tls.c @@ -257,6 +257,34 @@ tls_verify_callback(int preverify_ok, X509_STORE_CTX *ctx) /*-------------------------------------------------------------------------*/ #ifdef HAS_OPENSSL + +static void * +tls_pkg_malloc (size_t size) +/* + * Wrapper function for using our own allocator in openssl. + */ +{ + return pxalloc(size); +} + +static void +tls_pkg_free (void * ptr) +/* + * Wrapper function for using our own allocator in openssl. + */ +{ + pfree(ptr); +} + +static void * +tls_pkg_realloc (void * ptr, size_t size) +/* + * Wrapper function for using our own allocator in openssl. + */ +{ + return prexalloc(ptr, size); +} + void tls_verify_init (void) @@ -415,6 +443,9 @@ tls_global_init (void) debug_message("%s TLS: (OpenSSL) Keyfile '%s', Certfile '%s'\n" , time_stamp(), keyfile, certfile); + // first register our own allocator function before calling any Openssl function. + CRYPTO_set_mem_functions(tls_pkg_malloc, tls_pkg_realloc, tls_pkg_free); + SSL_load_error_strings(); ERR_load_BIO_strings(); if (!SSL_library_init()) -- 1.6.4.3 | ||||
|
Just a spontaneous shot in the dark: something like this (segfaults in used libraries) may happen, if our memory allocator does not replace the system allocator reliably, openssl uses (partially) the system allocator and the two interfere. It seems unlikely in this particular case, but it is not much effort: You may just try --enable-malloc-sbrk=no and see if that changes anything. |
|
I just saw that there are references in the net to CRYPTO_malloc_init(), CRYPTO_set_mem_functions(), CRYPTO_set_mem_ex_functions(), CRYPTO_set_locked_mem_functions(), CRYPTO_set_locked_mem_ex_functions(). So there is a way to tell openssl to use a custom memory allocator. Which we use but don't tell openssl. Unfortunately - and quite typical for openssl - documentation seems to be missing. Or better: use the source, luke... *argl* |
|
I had a look at Coogans Coredump. Relevant stacktrace with data: (gdb) bt full #0 sk_value (st=0x84e5060, i=4) at stack.c:310 No locals. 0000001 0x00e54888 in ssl3_accept (s=0x84e5060) at s3_srvr.c:393 buf = <value optimized out> l = <value optimized out> Time = 1254609144 cb = (void (*)(const SSL *, int, int)) 0 num1 = <value optimized out> ret = <value optimized out> state = 8544 skip = <value optimized out> 0000002 0x00e6cd4a in SSL_accept (s=0x84e5060) at ssl_lib.c:850 No locals. 0000003 0x00e5ee40 in ssl23_get_client_hello (s=0x84e5060) at s23_srvr.c:568 buf_space = "\200d\001\003\001\000K\000\000\000\020" p = (unsigned char *) 0x84e9b91 "" d = (unsigned char *) 0x84da0cd "" i = <value optimized out> csl = 75 cl = <value optimized out> n = 100 j = <value optimized out> type = 2 0000004 0x00e5f738 in ssl23_accept (s=0x84e5060) at s23_srvr.c:203 buf = (BUF_MEM *) 0x84d7ae8 Time = 1254609144 cb = (void (*)(const SSL *, int, int)) 0 ret = <value optimized out> state = 139350112 0000005 0x00e6c6e3 in SSL_do_handshake (s=0x84e5060) at ssl_lib.c:2085 ret = 0 0000005 0x00e6c6e3 in SSL_do_handshake (s=0x84e5060) at ssl_lib.c:2085 ret = 0 0000006 0x080dd7df in tls_continue_handshake (ip=0x84d9400) at pkg-tls.c:774 n = 139350112 ret = 1 0000007 0x080ddd38 in f_tls_init_connection (sp=0x8124e38, num_arg=3) at pkg-tls.c:1012 session = (SSL *) 0x84e5060 argp = (svalue_t *) 0x8124e38 ret = 0 obj = (object_t *) 0x84b3f84 ip = (interactive_t *) 0x84d9400 char *sk_value(const STACK *st, int i) { if(!st || (i < 0) || (i >= st->num)) return NULL; return st->data[i]; } It crashes upon st->data[i]. st is: $3 = {num = 769, data = 0x2000, sorted = 15213344, num_alloc = 139295540, comp = 0x84d7be8} Clearly, 0x2000 is not mapped into the process memory, and the values for num_alloc and sorted look rather strange, num probably as well. The address 0x84d7be8 should be a function pointer to an openssl function. But as far as I can see, they are mapped somewhere around 0x00e.... and 0x84d7be8 is part of the heap. Note that the given address for the st is 0x84e5060 which is the address of the SSL context/session. And that is: (gdb) print *s $1 = {version = 769, type = 8192, method = 0xe82320, rbio = 0x84d7b34, wbio = 0x84d7be8, bbio = 0x84d7be8, rwstate = 1, in_handshake = 2, ... line 393 in s3_srvr.c is ret=ssl3_send_certificate_request(s); That one actually iterates over a stack in the session and calls sk_value (although it is called differently, these openssl people love playing tricks with defines). But strange enough, I would expect ssl3_send_certificate_request(s) to be in the stacktrace... The stack is extracted from the SSL context by SSL_get_client_CA_list(). That one should return s->ctx->client_CA which is: (gdb) print s->ctx->client_CA $4 = (STACK *) 0x832fc58 (gdb) print *(s->ctx->client_CA) $5 = {num = 1, data = 0x0, sorted = 0, num_alloc = 1886220131, comp = 0x8007461} That would make more sense, but it is obviously not the address given to sk_value(). So... I don't know what openssl exactly does here... BTW: I assume, you use 0.9.8g-15+lenny5? |
|
I checked this afternoon with tubmuds settings and tubmuds public mudlib on MacOS 10.6 and Debian Lenny. Debian Lenny: crash as Coogan described (openssl-0.9.8g-15+lenny5) MacOS x86_64: driver not working at all MacOS i386: no crash, TLS works fine. (openssl-0.9.8) |
|
The stack used by the sk_value() which segfaults is tls_session->ctx->client_CA, which is als context->client_CA in pkg-tls.c, created during tls_global_init() by SSL_CTX_new(). After creation it is (gdb) print *context->client_CA $20 = {num = 0, data = 0x931abf8, sorted = 0, num_alloc = 4, comp = 0} When we get a new ssl session in f_tls_init_connection(), that context contains already the wrong data: SSL * session = SSL_new(context); // line 978 (gdb) print *session->ctx->client_CA $13 = {num = 30, data = 0x20003, sorted = 170242664, num_alloc = 2, comp = 0x28} So somewhere between tls_global_init() and f_tls_init_connection this context gets somehow changed. I tried to find out more by using gdb's watchpoints, but that seems to slow it down far to much. :-( |
|
Mhmm: Hardware watchpoint 9: ((struct stack_st *) 154250276)->num Old value = 150 New value = 978082211 0xb7db5032 in BN_lshift (r=0x9322a78, a=0x9322a78, n=320) at bn_shift.c:151 1: *(struct stack_st *) 154250276 = {num = 978082211, data = 0x35b5d4c7, sorted = -557861449, num_alloc = 1591931666, comp = 0} (gdb) list 146 f=a->d; 147 t=r->d; 148 t[a->top+nw]=0; 149 if (lb == 0) 150 for (i=a->top-1; i>=0; i--) 151 t[nw+i]=f[i]; 152 else 153 for (i=a->top-1; i>=0; i--) 154 { 155 l=f[i]; (gdb) bt #0 0xb7db5032 in BN_lshift (r=0x9322a78, a=0x9322a78, n=320) at bn_shift.c:151 0000001 0xb7dd5dc8 in DSA_generate_parameters_ex (ret=0x931a1cc, bits=1024, seed_in=0x0, seed_len=<value optimized out>, counter_ret=0x0, h_ret=0x0, cb=0xbf9685f4) at dsa_gen.c:236 0000002 0xb7dd7b43 in DSA_generate_parameters (bits=1024, seed_in=0xbf968644 ";-) :-( :-) :-( $\223\021\b$\223\021\b9\223\021\b$\223\021\bN\223\021\b\003", seed_len=20, counter_ret=0x0, h_ret=0x0, callback=0, cb_arg=0x0) at dsa_depr.c:99 0000003 0x080df54b in set_dhe1024 () at pkg-tls.c:130 0000004 0x080dfcec in tls_global_init () at pkg-tls.c:500 0000005 0x080b6a62 in main (argc=9, argv=0xbf9699d4) at main.c:390 (gdb) print *a $39 = {d = 0x931abf8, top = 5, dmax = 16, neg = 0, flags = 0} And then another: Hardware watchpoint 9: ((struct stack_st *) 154250276)->num Old value = 978082211 New value = 0 0x0804f4fa in _allocate_array (n=6, file=0x8107020 "interpret.c::allocate_uninit_array", line=12370) at array.c:271 271 *svp++ = const0; 1: *(struct stack_st *) 154250276 = {num = 0, data = 0x35b5d4c7, sorted = -557861449, num_alloc = 1591931666, comp = 0x931abdc} (gdb) bt #0 0x0804f4fa in _allocate_array (n=6, file=0x8107020 "interpret.c::allocate_uninit_array", line=12370) at array.c:271 0000001 0x080948a0 in eval_instruction (first_instruction=0x94618cf "R\001\001\033", initial_sp=0x8129a50) at interpret.c:12370 0000002 0x080a8cbf in apply_low (fun=0x92aa750 "inaugurate_master", ob=0x932afec, num_arg=1, b_ign_prot=1) at interpret.c:21684 0000003 0x080a8f15 in sapply_int (fun=0x92aa750 "inaugurate_master", ob=0x932afec, num_arg=1, b_find_static=1) at interpret.c:21796 0000004 0x080a94bd in apply_master_ob (fun=0x92aa750 "inaugurate_master", num_arg=1, external=1) at interpret.c:22084 0000005 0x080b6e12 in main (argc=9, argv=0xbf9699d4) at main.c:483 And then it goes on and on and on with changing st->num, before finally transfer_svalue_no_free_spc() writes the 'final' data into it. So somehow that memory range gets just used twice, because openssl gets it from a different allocator than our own. That is confirmed by undefining SBRK_OK in machine.h before make. Sooooo.... Workaround for Coogan: disable SBRK_OK. Unfortunately, in 3.2 that does not work with --enable-malloc-sbrk=no which I mentioned earlier, because that does not exist anymore. Und obviously, our detection whether it is safe to use SBRK_OK and replace the system allocator is again not working. I guess, the more libraries we optionally use, the more severe our problems with sbrk and replacing the system allocator get. I suggest to change our defaults in 3.3. and 3.5 (but that is a different issue). |
|
Coogan: please apply the attached patch and try on your system. I believe it will solve the immediate problem. At least it does on my test system. BTW: Even the OpenSSL developers are sceptical, that all memory allocation and deallocation in openssl takes place by using the functions supplied by the user with CRYPTO_set_mem_functions(). (The function is even undocumented.) So I believe, there may be more issues like this. The only proper and robust fix would be a) not to use SBRK_OK and replace the memory allocators or b) use mmap() for allocating memory. b) is only possibly in 3.3 and 3.5 and a) increases memory allocator overhead because our allocator then uses the system allocator for getting memory. It is also not that easy, because it can't be configured by configure in 3.2 (but in 3.3 and 3.5). |
|
OK. For some reason, my idea yesterday was just plain wrong. Don't know why my test didn't crash, maybe I used the wrong binary. :-( So, next gdb session today: The client_CA stack is at 0x94594b4. I checked now all allocations and deallocations and finally found: Breakpoint 18, sfree (ptr=0x94594b4) at smalloc.c:1147 1147 if (bsize > SMALL_BLOCK_MAX) (gdb) print context->client_CA $27 = (STACK *) 0x94594b4 (gdb) bt #0 sfree (ptr=0x94594b4) at smalloc.c:1147 0000001 0x080f73d4 in xfree (ptr=0x94594b4) at smalloc.c:1245 0000002 0x080f88cd in pfree (p=0x94594b4) at smalloc.c:2891 0000003 0x080f89e2 in afree (p=0x94594b4) at smalloc.c:2987 0000004 0x080f988f in free (ptr=0x94594b4) at smalloc.c:3441 0000005 0xb7dc1cca in CRYPTO_free (str=0x94594b4) at mem.c:378 0000006 0xb7e2d925 in sk_free (st=0x94594b4) at stack.c:298 0000007 0xb7e2d979 in sk_pop_free (st=0x94594b4, func=0xb7e48880 <X509_NAME_free>) at stack.c:291 0000008 0xb7f01006 in SSL_CTX_set_client_CA_list (ctx=0x944febc, name_list=0x94594b4) at ssl_cert.c:549 0000009 0x080dd216 in tls_verify_init () at pkg-tls.c:395 0000010 0x080dd4c7 in tls_global_init () at pkg-tls.c:498 0000011 0x080b47ce in main (argc=9, argv=0xbf8a8914) at main.c:390 Ok... OpenSSL deliberately frees that stack. static void set_client_CA_list(STACK_OF(X509_NAME) **ca_list,STACK_OF(X509_NAME) *name_list) { if (*ca_list != NULL) sk_X509_NAME_pop_free(*ca_list,X509_NAME_free); *ca_list=name_list; } Yeah, it frees the old content before storing the new pointer. Now the problem: We do in tls_verify_init(): stack = SSL_CTX_get_client_CA_list(context); if (trustdirectory != NULL) { SSL_add_dir_cert_subjects_to_stack(stack, trustdirectory); } if (stack != NULL) { SSL_CTX_set_client_CA_list(context, stack); } So we set the old stack again. Now openssl frees the stack and stores the pointer to the stack it just free'd. No wonder that that stack is overwritten all the time. ;-) |
|
I duplicated the fix for this issue in 3.3.x. This should now be fixed in r2755. |
|
Please be aware, that this is a report against LDMud-3.2. A fix for 3.5 and/or 3.3 is nice but does not help me here. As this seems to be a regression since 3.2.14, it should also get fixed in the 3.2 trunk. |
|
The last note was a little ambiguous. Zesstra took a fix for 3.3 and applied it to 3.2. So this bug is now fixed in the 3.2 trunk. |
Date Modified | Username | Field | Change |
---|---|---|---|
2009-09-19 13:06 | Coogan | New Issue | |
2009-09-19 13:55 | zesstra | Note Added: 0001279 | |
2009-10-03 17:23 | zesstra | Note Added: 0001444 | |
2009-10-04 03:22 | zesstra | Note Added: 0001445 | |
2009-10-04 11:01 | zesstra | Note Added: 0001448 | |
2009-10-04 11:01 | zesstra | Status | new => confirmed |
2009-10-04 13:03 | zesstra | Note Added: 0001449 | |
2009-10-04 14:21 | zesstra | Note Added: 0001450 | |
2009-10-04 14:36 | zesstra | File Added: 0001-Register-our-own-allocator-functions-with-openssl.patch | |
2009-10-04 14:41 | zesstra | Note Added: 0001451 | |
2009-10-04 14:43 | zesstra | Status | confirmed => feedback |
2009-10-04 16:02 | zesstra | Relationship added | related to 0000687 |
2009-10-05 14:27 | zesstra | Note Added: 0001469 | |
2009-10-05 14:27 | zesstra | Assigned To | => zesstra |
2009-10-05 14:27 | zesstra | Status | feedback => assigned |
2009-10-05 14:36 | zesstra | Relationship added | duplicate of 0000583 |
2009-10-05 15:01 | zesstra | Note Added: 0001470 | |
2009-10-05 15:01 | zesstra | Status | assigned => resolved |
2009-10-05 15:01 | zesstra | Resolution | open => fixed |
2009-10-06 01:10 | Coogan | Note Added: 0001484 | |
2009-10-06 01:10 | Coogan | Status | resolved => feedback |
2009-10-06 01:10 | Coogan | Resolution | fixed => reopened |
2009-10-06 01:32 | Gnomi | Note Added: 0001485 | |
2009-10-06 01:32 | Gnomi | Status | feedback => resolved |
2009-10-06 01:32 | Gnomi | Resolution | reopened => fixed |