Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text-to-binary module returns invalid results for non-ASCII text #1082

Open
lionel-rowe opened this issue May 14, 2024 · 1 comment · May be fixed by #1085
Open

text-to-binary module returns invalid results for non-ASCII text #1082

lionel-rowe opened this issue May 14, 2024 · 1 comment · May be fixed by #1085
Assignees
Labels
bug Something isn't working triage

Comments

@lionel-rowe
Copy link

lionel-rowe commented May 14, 2024

Describe the bug

Using https://it-tools.tech/text-to-binary, supplying non-ASCII text gives invalid results.

To be fair the tool's title is "Text to ASCII binary", but the description "Convert text to its ASCII binary representation" implies "ASCII" applies to the binary output (i.e. "0" and "1" are both ASCII chars), not the text input.

What happened?

Example: Converting

Expected

Either one of these:

  1. Preferred:
    • Tool gives UTF-8 representation of , padded to 8-bit bytes (11100110 10010110 10000111)
    • Converting back from 11100110 10010110 10000111 gives
  2. Alternative:
    • Tool simply rejects non-ASCII input
    • Tool additionally rejects binary -> text input if any of its bytes aren't ASCII (currently, it just converts them as if they're code points)

Actual

  • Converting text -> binary gives 110010110000111 (i.e. 25991, which is 's code point, represented as a non-padded 15-bit binary string)
  • Converting 110010110000111 back from binary -> text gives error "Binary should be a valid ASCII binary string with multiples of 8 bits".
  • Converting 11100110 10010110 10000111 (the correct UTF-8 representation of ) from binary -> text gives æ��

System information

Win 11, Chrome Version 124.0.6367.158 (Official Build) (64-bit)

Where did you encounter the bug?

Public app (it-tools.tech)

@lionel-rowe lionel-rowe added bug Something isn't working triage labels May 14, 2024
@lionel-rowe
Copy link
Author

Implementation something like this

textToUtf8Bin(input)
// '11100110 10010110 10000111'
utf8BinToText('11100110 10010110 10000111')
// '文'

function textToUtf8Bin(text) {
    return [...new TextEncoder().encode(text)]
        .map((x) => x.toString(2).padStart(8, '0'))
        .join(' ')
}

function utf8BinToText(bin) {
    bin = bin.replaceAll(/\s+/g, '')
    assert(!/[^01]/.test(bin))
    assert(bin.length % 8 === 0)

    return new TextDecoder().decode(
        Uint8Array.from({ length: bin.length / 8 }, (_, i) => parseInt(
            bin.slice(i * 8, (i + 1) * 8),
            2,
        )),
    )
}

for (const text of ['ascii', '文字', '💩']) {
    const converted = textToUtf8Bin(text)
    assert(utf8BinToText(converted) === text, utf8BinToText(converted))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants