Incorrect utf8toUnicode transformation for 00xx #3129

marcstern · 2024-04-23T11:48:34Z

In utf8_unicode_inplace_ex(), we have the following code:

c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
    /* single byte unicode (7 bit ASCII equivilent) has no validation */
    count++;
    if (count <= len) {
        if (c == 0) *data = x2c(&c);
        else *data++ = c;
    }
}

The code if (c == 0) *data = x2c(&c); is wrong.
utf the input string.
If input is "x", we copy the character => correct.
If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space).
Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop.
The correct code is

if (c == 0) {
    sprintf(data, "%%u%04x", utf[1]);
    count += 4;
    data += 6;
    i++;
    *changed = 1;
}

Remarks:

Using "%04x" replaces a lot of useless code used in other parts of the code
I didn't add a check if (count <= len) after count += 4; because it cannot overflow if we fix the code: len = input_len * 6 + 1; instead of len = input_len * 4 + 1; (4 bytes -> %u1234). The check (and the variable count) could thus be removed everywhere
After filling data, we have several checks (/* invalid UTF-8 character number range (RFC 3629) /, / check for overlong */, ...) with this code count++; *data++ = c; What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?

The text was updated successfully, but these errors were encountered:

marcstern added the 2.x Related to ModSecurity version 2.x label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect utf8toUnicode transformation for 00xx #3129

Incorrect utf8toUnicode transformation for 00xx #3129

marcstern commented Apr 23, 2024 •

edited

Incorrect utf8toUnicode transformation for 00xx #3129

Incorrect utf8toUnicode transformation for 00xx #3129

Comments

marcstern commented Apr 23, 2024 • edited

marcstern commented Apr 23, 2024 •

edited