-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Int32 overflow when using File.read on big files #3209 #3218
Conversation
I hate that in Crystal such things might keep coming up and I don't think this is the right solution, but it's an improvement 👍 |
def self.read(filename, encoding = nil, invalid = nil) : String | ||
File.open(filename, "r") do |file| | ||
if encoding | ||
file.set_encoding(encoding, invalid: invalid) | ||
file.gets_to_end | ||
else | ||
if file.size > Int32::MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this check be performed always and not just when no encoding is defined?
I believe it will be more consistent behavior that File.read
checks size on any scenario, providing or not an encoding for the file being read.
Shouldn't we switch |
I'm afraid this is a bad solution: @jhass definitely, sizes should be expressed with an unsigned integer (as is usually the case in C) so we could grow up to 4GB strings/slices. Maybe we could even consider the case where |
To me it sounds ridiculous to switch from one arbitrary limitation to another slightly bigger arbitrary limitation, especially at the price of having to deal with unsigned types (let's face it, non- In my opinion, all sizes should be expressed with a 64-bit type (yes, ideally it would be platform-dependent, but that is also a pain to deal with in Crystal). The 32-bit plague can't go on. I've seen it in many places throughout the codebase, and this is only the beginning of the problems that will happen and may hurt the reputation of Crystal: "Oh, I can't work with big files in Crystal? Well, that's too bad." |
Yes, sizes and indeces should definitely be signed 64 bit integer on a 64 bit capable platform imo. Or simply: 64 bit wide integers should be the norm (from Crystal number literals too). Int32 should only be used when doing something very specific, like heavy integer-division or for size optimization of structs / arrays or whatever. |
I'd 👍 an Thought, using a signed integer, we do arbitrarily limit the maximum size to half of what it could it be. This is especially remarkable on 32bits. |
The limitation in size because of sign is not arbitrary, it's quite an acceptable one, if it causes less boring type-conflicts when coding (UInt* vs Int*). Most hardware addressing schemes can't handle more than 47 bits. And those last bits of addressing space are usually forbidden in user-space anyway (must be zero according to spec). I think it's safe to say that sacrificing that last bit is perfectly alright, since we won't be near touching it anyway. Such allocation sizes won't happen until 64-bit arch has been dead for decades. (Just like we now are trying to finally put 32-bit arch in the second room - it's time!) However, this is probably bike-shedding nit-picking, sorry. |
@luislavena this is only done in the case no encoding is specified because if I understood the code properly, @ysbaddaden @BlaXpirit @ozra Why not introduce a Like this every modifications on the primitive type used to represent sizes would be done in one single place (and the type may be chosen at compile time depending on the arch ?). |
BTW, if making this fix the right way (use 64-bits Slices/sizes, ...) takes too much time/effort, could we at least merge a quick fix like this one to avoid silent errors ? (see #3209) |
@olbat considering the 0.19 release is a while away, we have time to settle on and implement the correct solution here. |
Okay, sorry I'm not really aware of the releases schedule I was thinking that the 0.19 will be released soon. @ysbaddaden Why not |
Just note that File.size has |
No description provided.