Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add more safeguards in the String constructors #2691

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented May 16, 2024

Related to #2687

Fix #2392

Also related to #2678

Description of the problem:

sddefault

EDIT: Please note that now, an empty string will allocate one byte on the heap for the null terminator. This will degrade the performance of some programs, the tradeoff is more safety. This performance penalty will go away with SSO, as the null terminator will be on the stack, which I hope can be implemented soon-ish when #2637 is fixed

@gabrieldemarmiesse gabrieldemarmiesse requested a review from a team as a code owner May 16, 2024 20:30
@@ -609,6 +613,7 @@ struct String(
fn __init__(inout self):
"""Construct an uninitialized string."""
self._buffer = Self._buffer_type()
self._buffer.append(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this change means that constructing an empty string will now allocate, where it did not before. That may be what we want, but its a notable change to the default representation of a String.

E.g. a List[String] default initialized to empty String and then later filled in would now do O(N) allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. a List[String] default initialized to empty String and then later filled in would now do O(N) allocations.

Can you give a small code example? I'm not sure I see the regression you are talking about.

Pratically speaking, when we have SSO, empty strings will have null termintors allocated on the stack, so it will be very cheap. I'd advise that until then, we err on the side of caution about safety in String.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a compelling point regarding the small-string optimization.

I'm convinced now that NUL terminating by default is not necessarily a bad behavior to aim for long term🙂

Can you give a small code example? I'm not sure I see the regression you are talking about.

Not a regression per se, but consider:

var list = List[String](String(), String(), String(), String(), String())

for count in len(list):
    list[count] = str(count)

Before this change, this program only allocates a String for each element once.

After this change, first five empty strings will be allocated, and then overwritten with the final content of the list as initialized by the for loop.

I don't necessarily think this is a convincing reason on its own not to move forward with this change. I'm just offering this as an example of the change in behavior this PR brings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the insight. Indeed it's a good example where we may have degraded performance. With SSO, that shouldn't be an issue. I'd be interested to know if a null terminator needs to be a requirement for String, but I guess this can be discussed after SSO is done


assert_equal(String(), String(List[Int8]()))


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main benefits I can see from this change is that an empty String is now validly NUL terminated, so it can be passed to functions that are expecting a non-null string pointer.

So if thats the desired behavior, we should add some tests for that🙂

(But, I'd argue most functions that could accept an empty string should arguably be written to accept a null pointer, because as mentioned then they'll optimally allow the caller to avoid allocating space just to store an empty string.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if thats the desired behavior, we should add some tests for that🙂

Do you mean adding a test where if do assert_equal(String()._buffer[0], 0)?

EDIT: Nevermind, I understand now. I'll add the test tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
modularbot pushed a commit that referenced this pull request May 23, 2024
…str__` method (#40421)

[External] [stdlib] Fix: Add null terminator to the buffer inside
`__str__` method

String really has a hard time working without this null terminator and I
feel this PR is a step towards fixing
#2687 without requirering
controversial changes like the ones in
#2691.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2787
MODULAR_ORIG_COMMIT_REV_ID: 95f438ebbb6e076fe6ffb79debf7cd4fd70248de
martinvuyk pushed a commit to martinvuyk/mojo that referenced this pull request May 24, 2024
…str__` method (#40421)

[External] [stdlib] Fix: Add null terminator to the buffer inside
`__str__` method

String really has a hard time working without this null terminator and I
feel this PR is a step towards fixing
modularml#2687 without requirering
controversial changes like the ones in
modularml#2691.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes modularml#2787
MODULAR_ORIG_COMMIT_REV_ID: 95f438ebbb6e076fe6ffb79debf7cd4fd70248de
@gabrieldemarmiesse
Copy link
Contributor Author

I'll close this PR in favor of the SSO work. We can always see after SSO if this is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants