Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins.concatStringsSep: Guarantee amortised O(n) #3942

Closed
wants to merge 1 commit into from

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Aug 18, 2020

@Ericson2314
Copy link
Member

What if we made another C++ concatStringsSep that took an iterator (something I wanted anyways, FWIW) and then wrote them both interms of that?

@nh2
Copy link
Contributor Author

nh2 commented Aug 18, 2020

Answering #3941 (comment):

Doesn't std::string grow the capacity by a constant factor when it exceeds the current capacity, thereby amortizing the cost to O(1) per character?

There's no C++ standard guarantee for it: https://en.cppreference.com/w/cpp/string/basic_string/operator%2B%3D:

Complexity
There are no standard complexity guarantees

The amortised-constant guarantee that the standard gives for vector/string is on push_back().

GCC's libstdc++ and LLVM's libc++ currently implement += with exponential growth (see e.g. this, section Growth strategy), but they may change it and other standard library implementations may not do it this way.

So it's more of a "do you want to code against API guarantees or implementation details" choice.

This PR guarantees the complexity relying only on what the standard says. It would also be OK to rely on the implementation but I think that should have a comment then.


(For my memory, and in contrast to what I believed before, stringstream does not provide a complexity guarantee in the standard either, and it can be slower than string's += because it also does locale encoding.)

@edolstra
Copy link
Member

I don't think we need to worry too much about theoretical implementations not doing the sane thing. Having said that, it would be nice to avoid doing any reallocations, but then we should evaluate each string, sum the sizes and reserve() the entire result string.

BTW another optimization that might be worthwhile is to avoid allocating the temporary strings returned by coerceToString() (which is wasteful if the value is a plain string). Ideally it would return something like Rust's Cow type, i.e. either a string or a string_view.

@nh2
Copy link
Contributor Author

nh2 commented Aug 18, 2020

I don't think we need to worry too much about theoretical implementations not doing the sane thing.

OK, I shall switch it then to just a comment that says why it's not quadratic on the platforms we use.

Having said that, it would be nice to avoid doing any reallocations, but then we should evaluate each string, sum the sizes and reserve() the entire result string.

@edolstra How should that best be done? There seem to be 2 straightforward ways:

  • Computing state.coerceToString(pos, *args[1]->listElems()[n], context); twice for each string (the first time just to compute its length, the second time for copying it over). I'm doing that in util: concatStringsSep(): Avoid quadratic string append #3941 but I'm a bit scared of that here because inside coerceToString() the v.type == tPath looks like it'll have to copyPathToStore(), which requires a typedef std::map<Path, StorePath> SrcToStore lookup, which might be slower than what the optimisation provides.
  • Computing once it once and storing the returned strings (but this would require ~50% more maximum memory, assuming that at the worst point we'd have to keep around the original nix strings, the output of the corercion, and the final string buffer), unless your Cow idea is also implemented.

@edolstra
Copy link
Member

The second option should be fine since they're temporary strings anyway and probably not very large.

@stale
Copy link

stale bot commented Feb 14, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 14, 2021
@stale
Copy link

stale bot commented Apr 16, 2022

I closed this issue due to inactivity. → More info

@stale stale bot closed this Apr 16, 2022
@Ericson2314
Copy link
Member

Hold up, fine if stale bot is marking things stale, but it should be closing them. We need to actually triage old stuff. We can't just let a bot due our duties for us.

I assume this was just a misconfiguration, as the Nixpkgs stale bot doesn't close things to my knowledge.

@nh2
Copy link
Contributor Author

nh2 commented Apr 16, 2022

It looks like nix's stale bot was configured to close already 14 months ago:

#4526 (review)

It also closed some other PRs that I think really shouldn't be closed: #3208 (comment)

@nh2 nh2 mentioned this pull request Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants