Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add formatters and parsers for standard time formats #4729

Closed

Conversation

straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Jul 19, 2017

This adds format and parse methods to Time for the commonly used standardized time formats:

The formats are defined as Time::Format::RFC_1123 and Time::Format::RFC_3339.

A use case is already include in HTTP::Cookie. These are common formats for exchanging time information in information networks and parsing/formatting should be directly supported by Time class.

@bew
Copy link
Contributor

bew commented Jul 19, 2017

I think we shouldn't have the links of RFCs hard-coded in the docs, but just specify the RFC or ISO number.

@Sija
Copy link
Contributor

Sija commented Jul 20, 2017

@bew, there are already many links throughout the stdlib docs, I also see no reason why not to?

@straight-shoota
Copy link
Member Author

I don't care if the links are hardcoded, though I don't see any harm and such links to specifications and definitions are all over the doc comments in stdlib.

@RX14
Copy link
Contributor

RX14 commented Jul 24, 2017

I'm not entirely sure we want an explicit method for each time format, it might lead to an explosion of methods.

@straight-shoota
Copy link
Member Author

@RX14 I think those standard formats should be included in stdlib and be easily accessible.
I've considered allowing to_s to receive the format as a symbol. Perhaps, this would be better than explicit methods? It reduces discoverability, though.
I don't expect an explosion of methods because they are only a limited set of standardized and widely-used time formats. We probably would not need to add any more of this kind.

@RX14
Copy link
Contributor

RX14 commented Jul 24, 2017

If we're not going to add more then this amount is probably fine, but i'd like to get the rest of the core team's opinion on this too.

@ysbaddaden
Copy link
Contributor

I'm not fond of adding many methods, but alternatives ain't better. If limited to these 3, I guess it's fine.

Please use ISO8601 naming, it's a well known international standard; I never heard about RFC3339 before.

We should harmonize how to represent RFC and ISO numbers. For example the methods don't have an underscore (to_rfc1123) but links insert a space (RFC 1123) and constants insert an underscore RFC_1123. I would make both RFC1123 without space or underscore. It's simpler, and may improve readability in docs as it would avoid RFC\n1123 line breaks.

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 25, 2017

I have switched from calling it ISO8601 to RFC3339 because ISO8601 includes a number of formats and variations, that are not supported by this formatter. The purpose of RFC3339 is exactly to make ISO8601 better usable for internet protocols by limiting it to a specific subset that fits this use-case.
Other modern languages call it RFC 3339 instead of ISO 8601 as well: Go time, Rust chrono::DateTime. Explanation from Rust doc:

Why isn't this named parse_from_iso8601? That's because ISO 8601 allows some freedom over the syntax and RFC 3339 exercises that freedom to rigidly define a fixed format.

See also What's the difference between ISO 8601 and RFC 3339 Date Formats?

Harmonizing the standards' names seems like a good idea. Though in text ISO/RFC should be separated from the number with a space - this is the usual naming scheme employed by the standards organisations themselves as well as most other resources. Apart from the occasional internal line break, this is easier readable (try for yourself: the first references is this comment are without, the rest with space) and people are used to this.
We could change the methods to to_rfc_3339 though I think this is unnecessary bulky. I'd rather harmonize the in-code naming by removing the underscores from the constants. Rust and Go have it this way, too.

@RX14
Copy link
Contributor

RX14 commented Jul 25, 2017

We should be able to parse the ISO 8601 spec, and all it's corner cases, in the stdlib. And even if we emit RFC 3339 standard timestamps, they're automatically ISO 8601 timestamps so it's much more discoverable to call it to_iso8601. As a side note, i'm pretty sure the current time formatter can't parse ISO 8601, we might have to do it manually.

RFC_3339 = new "%FT%X%:z"

# The [RFC 1123](https://tools.ietf.org/html/rfc1123#page-55) datetime format.
RFC_1123 = new "%a, %d %b %Y %H:%M:%S GMT", Time::Kind::Utc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't parse timezones, it's not up to spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was initially intended for formatting only.

RFC_1123 = new "%a, %d %b %Y %H:%M:%S GMT", Time::Kind::Utc

# The [RFC 2822](https://tools.ietf.org/html/rfc2822) datetime format.
RFC_2822 = new "%a, %d %b %Y %H:%M:%S %z"
Copy link
Contributor

@RX14 RX14 Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 2822 obsoletes RFC 822, RFC 1123 updates RFC 822 to enforce 4 digit years: it's the same time format, except RFC 2822 defines some parts of it as obsolete. There's no need for 2 separate patterns here at all, they're different versions of the same spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Keeping track of all these formats is really disturbing... So we should keep RFC 2822 and add notes, that it can also be used (mostly) for 822 and 1123?

@RX14
Copy link
Contributor

RX14 commented Jul 25, 2017

Also note that rust's chrono uses hand-written parsers for ISO 8601 and RFC 2822, despite having a strftime-based parser. We should do the same: it's impossible to parse these complex standards using a simple format string.

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 25, 2017

As I understand it, RFC 3339 should generally be preferred over ISO 8601 in the context of internet protocols. For these purposes, you'd deliberatly not want the full ISO 8601 spec, which also includes durations and date-only values. I would expect parse_iso8601 to support those data types as well, whereas parse_rfc3339 is explicitly for datetime values and will fail if the input is something else. Same goes for the formatter: ISO 8601 allows some variations, but we should use a strict format defined by RFC 3339 - and if we use it, why should we not name it explicitly? If we add a complete ISO 8601 parser, that one should be named accordingly, but it's different to the proposed RFC 3339 parser/formatter.
I believe it to be the best to follow the naming route that Rust and Go took. We could add aliases for ISO 8601 to support both (like Ruby's Datetime) but that would probably not be welcome in Crystal.
Discoverability should not be too much a concern, searching for ISO 8601 in the docs should point to the RFC 3339 methods and explain why it is this way (like Rust does).

Edit: There are some further explanations about Rust's chrono in chronotope/chrono#43 (comment)

@straight-shoota
Copy link
Member Author

I am aware that the formatters can't parse all variants of the full specs. But I'd suggest to implement custom parsers in a different PR.
Intermediately we could leave the parse methods for rudimentary support - I could also remove them until properly implemented.

@straight-shoota straight-shoota changed the title Add formatter and parser for standards ISO8601, RFC1123, RFC2822 Add formatters and parsers for standard time formats Jul 25, 2017
The date format used for HTTP applications (such as cookies) according to [RFC 2612](https://tools.ietf.org/html/rfc2616#section-3.3.1)
is *not* an arbitrary [RFC 1123](https://tools.ietf.org/html/rfc1123#page-55) string but has a fixed timezone `GMT`. Therefore it
is not the same as `Time#to_rfc2822` and the method should not be called `rfc1123_date`.
@straight-shoota
Copy link
Member Author

In 558d19e the original behaviour of HTTP.rfc1123_date was restored but renamed to HTTP.format_date:
The date format used for HTTP applications (such as cookies) according to RFC 2612
is not an arbitrary RFC 1123 format but has a fixed timezone GMT. Therefore it is not the same as Time#to_rfc2822 and the method should not be called rfc1123_date.

I am not sure, if this formatter should better be moved to Time/Time::Format with the others. This would put all the included date formats in one place. On the other hand, it is an application specific format used with HTTP, so it would make sense to have it in HTTP. The more I think about it, I tend to move it with the others, but would like to get some feedback on that.

@RX14
Copy link
Contributor

RX14 commented Jul 25, 2017

@straight-shoota Isn't that exactly the same as time.to_utc.to_rfc2822?

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 25, 2017

That's what I had initially assumed, but as these failed specs show, they differ in the notation of the time zone.
While RFC 2822/RFC 1123 support both time offsets (like +00:00) and time zone identifiers (like UTC), for HTTP applications RFC 2616 mandates the use of GMT as time zone identifier (which is to be interpreted as UTC in this context). Therefore this needs to be handled as a special case, when usually for unified time UTC (or Z) would be used instead of GMT (because these are not identical) if not even a time offset +00:00 (which to_rfc2822 produces). While most applications would probably understand +00:00 or UTC as well, the standard requires GMT and there are certainly cases where this matters.

@ysbaddaden
Copy link
Contributor

I repeat myself and stand strong here: the stdlib must support ISO 8601 which is an international standard, not RFC 3339 which is an internet protocol. The stdlib musn't restrict itself to web applications.

Renaming HTTP.format_date is a good idea. It's a better name, and the format is HTTP specific, so it's the right place —unless the format is expected in other places.

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 26, 2017

This PR supports ISO 8601 in due form of it's profile RFC 3339. It provide formatters and parsers for standard time formats to properly support exchanging time values in computer and network systems. For this purpose, we neither need nor want to support the full ISO 8601 standard as this would be very complicated and mostly unnecessary.
What we really need is to support the more strict RFC 3339 which covers the common use-case of ISO 8601. Yes, it's an internet standard but internet standards are also applied for other applications, this can hardly be a point against RFC 3339.
If you want to add an implementation of full ISO 8601 formats, I'm fine with it, but that's a different issue and doesn't mean the RFC 3339 formatter should be replaced, because it is sometimes explicitly required to support only RFC 3339 instead of full ISO 8601.

We could call this ISO 8601 and technically, that wouldn't be wrong, because it essentially is ISO 8601 (though not all of it). But it would still be bad naming - similar to HTTP.rfc1123_date, which not referring to arbitrary RFC 1123 format but a very specific variant.

TL;DR: Renaming from RFC 3339 to ISO 8601 has no benefit and is misleading. Changing the behaviour from supporting RFC 3339 to full ISO 8601 would require much effort for little benefit and we'd probably still wan't to support RFC 3339 anyway.

@RX14
Copy link
Contributor

RX14 commented Jul 26, 2017

@straight-shoota RFC 3339 is a subset of ISO 8601. That means that emitting RFC 3339 means you support ISO 8601. However, we do not correctly parse ISO 8601. Nothing needs to be changed in the emitting side, but we should be able to parse ISO 8601 properly. And that's impossible using just a Time::Format.

By emitting RFC 3339 and parsing ISO 8601, we have compatibility with both specs. Why not?

@straight-shoota
Copy link
Member Author

Let's put it the other way around: what's the use case for parsing full ISO 8601 with all it's variations? Wouldn't RFC 3339 be enough?

Parsing RFC 3339 correctly already requires a customized parser. And parsing ISO 8601 is a different story.

@RX14
Copy link
Contributor

RX14 commented Jul 26, 2017

@straight-shoota the usecase is parsing ISO 8601, some of which might not be RFC 3339 compatible. This is because i've literally never heard of RFC 3339 before, so why expect all ISO 8601 to be compatible with a standard which isn't even well-known?

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 26, 2017

It seems like many people don't know ISO 8601 but call it that when they really mean RFC 3339.
Python datetime.isoformat, Ruby Time.iso8601 and Elixir DateTime.from_iso8601 are just a few examples of methods named ISO 8601 but essentially only support RFC 3339 (or a little bit more).
Have you ever seen a serious use of e.g. 2017-W11-2 091537,123 to exchange a time value in a standardized format?

I am not entirely opposed to implementing a full ISO 8601 format parser but I think this would be a) to much for this PR and b) there should be a parser for the more strictly defined RFC 3339 format as well.
Therefore I'd suggest to stay with the idea of this PR and add real ISO 8601 support later.

@akzhan
Copy link
Contributor

akzhan commented Jul 26, 2017

Why not to keep iso8601 name? rfc3339 is just subset of, and a lot of people will look at iso8601 while searching for conversion method.

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 26, 2017

@akzhan Because it's only a subset and not full ISO 8601. Therefore the parse method would be utterly misnamed (as it parses only one major variant), and the formatter would also only create one specific flavour of ISO 8601 - which is commonly known as RFC 3339.

@ysbaddaden
Copy link
Contributor

Please cool down. Starting a flame war over this is pointless.

Seeing how Python, Rust, Ruby, along others, all use ISO 8601 as reference and naming, seeing that Wikipedia redirects the RFC article to the ISO page, the single fact that the RFC is a subset to the ISO standard, all confirm my opinion on this topic.

Introducing RFC 3339 methods means that introducing an ISO 8601 parser later would lead to have somewhat duplicate methods (bad: duplication), or to rename the methods (bad: breaking change).

Hence the strong opposition.

A proper ISO 8601 parser should be introduced (now or later); RFC named methods should not.

@straight-shoota
Copy link
Member Author

straight-shoota commented Jul 26, 2017

Sorry if this was perceived this way, but I have no intention to flame, just to defend my argument since I am convinced about my opinion as well ;)

Rust does not use ISO 8601, but RFC 3339, as does Go. They are relatively modern languages.
Ruby and Python already had their established Time API's when RFC 3339 was released (in 2002) but as I have explained three comments above, they are actually employing the format described by RFC 3339, not the full ISO standard.

The duplication of having both RFC 3339 and ISO 8601 is negligible: Both formatters should produce the same string by default, but the ISO formatter should support other formats as well. Both parsers would understand the RFC 3339 format, but the ISO 8601 would also understand e.g. ordinal dates or week numbers. So it is not exactly the same but each has its case: RFC 3339 for a fast and simple strict mode, ISO 8601 for more broader, quirky uses. Though I am still not convinced that the standard lib should necessarily support full ISO 8601. As explained before, many languages don't have it.

@straight-shoota
Copy link
Member Author

So, I think we could proceed in one of the following ways:

  • A) Merge this as-is. Add an enhanced parser for RFC 3339 and ISO 8601 support (formatter and parser) in subsequent PRs. (final methods: to_rfc3339, parse_rfc3339, to_iso8601, parse_iso8601)
  • B1) Rename RFC 3339 identifiers to ISO 8601. Improve parser for ISO 8601 in a subsequent PR. (final methods: to_iso8601, parse_iso8601)
  • B2) Rename RFC 3339 identifiers to ISO 8601. Improve parser for ISO 8601 with a "strict" option to only recognize RFC 3339 in a subsequent PR. (final methods: to_iso8601, parse_iso8601(rfc3339 : Bool))
  • B3) Rename RFC 3339 formatter to ISO 8601, but keep RFC 3339 parser with a temporary alias for ISO 8601. Add enhanced parsers for RFC 3339 and ISO 8601 in subsequent PRs. (final methods: parse_rfc3339, to_iso8601, parse_iso8601)
  • C) Remove RFC 3339. Add an enhanced formatter and parser for ISO 8601 in a subsequent PR. (final methods: to_iso8601, parse_iso8601)

Did I miss something? What would you suggest @RX14 @ysbaddaden?

Another issue is how to deal with alternative representations in the parser and formatter.
With RFC 3339 there are AFAIK only two valid variations: T and Z may also be lower case and the time may or may not have subsecond precision. Lower/uppercase is just representation, so this is only relevant for the parser; the formatter can safely stick with upper case. But for the precision, there should probably be a flag in the formatter.
In ISO 8601 there are a whole lot of syntactical variations, but also quite a few semantical. I have no compiled list because it is so much. But it is safe to say that the parser would need to be quite complex and there would probably need to be some formatter options for meaningful variations.

@RX14
Copy link
Contributor

RX14 commented Jul 30, 2017

I'd say option B2 is what I prefer @straight-shoota.

@akzhan
Copy link
Contributor

akzhan commented Jul 31, 2017

About B2: I think that boolean parameters is bad choice. They aren't extendable. Flags enum is OK.

[@Flags]
enum Iso8601Flavor
  Rfc3339,
end
Rfc3339 = Iso8601Flavor::Rfc3339

parse_iso8601(rule : Iso8601Flavor)

@straight-shoota
Copy link
Member Author

straight-shoota commented Aug 7, 2017

@akzhan You're probably right. But because of this I would rather prefer B3 in order to provide easy access to a parser for a neatly defined specification RFC 3339 and a different parser method for the more flexible ISO 8601 formats. Having to use an enum for this is quite cumbersome. If this would be advisable, we could just use the plain parse method for everything and configure the format through an enum argument.
I'd also argue that B3 is a more future-proof option than B2 because we don't need to implement a lenient parse_iso8601 right away and both standards will eventually have quite different behaviour: parse_rfc3339 as a fast and simple method when you know that the data is in a strict format and explicitly don't want to support anything else and parse_iso8601 is for the case where you need to support some or all formats defined by the ISO standard. This might or might not be configurable through options - but this would need to be discussed on the implementation of a full blown ISO 8601 parser.

@straight-shoota
Copy link
Member Author

#5123 incorporates this and has a better implementation of the formats.

@straight-shoota straight-shoota deleted the time-formats branch November 18, 2021 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants