View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000117 | LDMud 3.3 | Efuns | public | 2004-08-24 03:11 | 2011-09-21 11:27 |
Reporter | warp | Assigned To | zesstra | ||
Priority | normal | Severity | feature | Reproducibility | always |
Status | closed | Resolution | no change required | ||
Target Version | 3.3.721 | ||||
Summary | 0000117: Lossy mode for convert_charset() | ||||
Description | Would it be possible to extend convert_charset() so that it optionally runs in a lossy mode, where characters, which are not convertible to the target charset, will be replaced by "?" or a specifyable string instead of function aborting completely? We cannot use the function otherwise for e.g. UTF-8 to ISO-8859-1 conversions for text entered by users (e.g. say) | ||||
Tags | No tags attached. | ||||
|
The tricky part is to find out how many of the input characters mess up the conversion. Various modes are imaginable: convert_charset(in, from-cs, to_cs, 1): if the conversion aborts on an unexpected sequence, the first input character is removed from in, a '?' append to the result, and the conversion begins again. Repeat ad nauseatum. convert charset(in, from_cs, to_cs, fun): The function fun() receives the remaining in-string as argument, and returns an array consisting of ({ "string to add to the result", "new remaning in-string" }). |
|
we had this problem with irc users switching between latin1 and utf8 charsets and return -1 instead of throwing an exception and handle the problem on 'mudlib' level. brutal diff of strfuns.c (adjust return type in func_spec alike)... see next entry (edit does not seem to like attaching files) --- projects/psyc/ldmud/3-3/src/strfuns.c 2004-04-28 05:57:59.000000000 +0200 +++ 3-3/src/strfuns.c 2004-10-14 21:49:11.000000000 +0200 @@ -483,22 +483,42 @@ if (errno == EILSEQ) { +#if 0 error("convert_charset(): Invalid character sequence at index %ld\n", (long)(pIn - get_txt(in_str))); /* NOTREACHED */ +#endif + free_string_svalue(sp--); + free_string_svalue(sp--); + free_string_svalue(sp); + + put_number(sp, -1); return sp; } if (errno == EINVAL) { +#if 0 error("convert_charset(): Incomplete character sequence at index %ld\n", (long)(pIn - get_txt(in_str))); /* NOTREACHED */ +#endif + free_string_svalue(sp--); + free_string_svalue(sp--); + free_string_svalue(sp); + + put_number(sp, -1); return sp; } - +#if 0 error("convert_charset(): Error %d at index %ld\n" , errno, (long)(pIn - get_txt(in_str)) ); /* NOTREACHED */ +#endif + free_string_svalue(sp--); + free_string_svalue(sp--); + free_string_svalue(sp); + + put_number(sp, -1); return sp; } /* if (rc < 0) */ } /* while (in_left) */ edited on: 10-14-04 15:05 |
|
Just wondered on the status of this enhancement request. Do you wait for feedback, did you forget it or did you decide to not change the behaviour? Regarding the various modes you suggested: Both are ok, but the first one would definitely already solve my problem, while probably being simpler to implement. |
|
note: looking at the implementation of the iconv program (iconv_prog.c) that comes with glibc may be helpful, as it has an -c 'omit invalid characters from output' switch |
|
you can use convert_charset(your_string, "UTF-8", "ISO-8859-1//TRANSLIT") instead, iconv will replace unconvertable characters to "?" or something else depending on iconv implementation |
|
Hi.. I have tried //TRANSLIT and it didn't help at all. Sigh. :( Now I'm using catch(), but since these failures happen rather often, it is a costy solution. I'm using catch() with the nolog flag. Does it skip line number calculation in that case? |
|
Out of curiosity: is it still the case, that appending "//TRANSLIT" does not work? And on which platforms/libiconv is that the case? At least on my system, it does work. Another possibility might be also "//IGNORE". |
|
Since there was no other feedback: I believe, with current iconv() the desired effect can be achieved with //TRANSLIT and/or //IGNORE and we therefore don't need to change anything. If not, please re-open or tell me. |
Date Modified | Username | Field | Change |
---|---|---|---|
2004-08-24 03:11 | warp | New Issue | |
2004-09-20 20:46 |
|
Note Added: 0000170 | |
2004-10-14 12:54 | fippo | Note Added: 0000202 | |
2004-10-14 13:01 | fippo | Note Edited: 0000202 | |
2004-10-14 13:05 | fippo | Note Edited: 0000202 | |
2005-04-01 04:13 | warp | Note Added: 0000358 | |
2005-05-04 06:54 | fippo | Note Added: 0000361 | |
2005-06-26 15:48 | szalicil | Note Added: 0000380 | |
2006-03-06 18:40 | lynx | Note Added: 0000492 | |
2011-02-19 19:52 | zesstra | Note Added: 0001999 | |
2011-02-19 19:52 | zesstra | Assigned To | => zesstra |
2011-02-19 19:52 | zesstra | Status | new => feedback |
2011-02-23 22:02 | zesstra | Target Version | => 3.3.721 |
2011-09-21 11:27 | zesstra | Note Added: 0002061 | |
2011-09-21 11:27 | zesstra | Status | feedback => closed |
2011-09-21 11:27 | zesstra | Resolution | open => no change required |