You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In utf8_unicode_inplace_ex(), we have the following code:
c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
/* single byte unicode (7 bit ASCII equivilent) has no validation */
count++;
if (count <= len) {
if (c == 0) *data = x2c(&c);
else *data++ = c;
}
}
The code if (c == 0) *data = x2c(&c); is wrong.
utf the input string.
If input is "x", we copy the character => correct.
If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space).
Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop.
The correct code is
if (c == 0) {
sprintf(data, "%%u%04x", utf[1]);
count += 4;
data += 6;
i++;
*changed = 1;
}
Remarks:
Using "%04x" replaces a lot of useless code used in other parts of the code
I didn't add a check if (count <= len) after count += 4; because it cannot overflow if we fix the code: len = input_len * 6 + 1; instead of len = input_len * 4 + 1; (4 bytes -> %u1234). The check (and the variable count) could thus be removed everywhere
After filling data, we have several checks (/* invalid UTF-8 character number range (RFC 3629) /, / check for overlong */, ...) with this code count++; *data++ = c; What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?
The text was updated successfully, but these errors were encountered:
In utf8_unicode_inplace_ex(), we have the following code:
The code
if (c == 0) *data = x2c(&c);
is wrong.utf the input string.
If input is "x", we copy the character => correct.
If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space).
Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop.
The correct code is
Remarks:
if (count <= len)
aftercount += 4;
because it cannot overflow if we fix the code:len = input_len * 6 + 1;
instead oflen = input_len * 4 + 1;
(4 bytes -> %u1234). The check (and the variable count) could thus be removed everywherecount++; *data++ = c;
What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?The text was updated successfully, but these errors were encountered: