Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Std.os.time #933

Merged
merged 9 commits into from Apr 22, 2018
Merged

Std.os.time #933

merged 9 commits into from Apr 22, 2018

Conversation

tgschultz
Copy link
Contributor

I've added a namespace time to std.os and included within it basic utc timestamp functions, a high performance timer, and constants for various epochs. It made sense to me to move os.sleep into time so I did that too.

std/os/epoch.zig Outdated
pub const amiga = 252460800; //Jan 01, 1978 AD
pub const pickos = -63244800; //Dec 31, 1967 AD
pub const gps = 315964800; //Jan 06, 1980 AD
pub const clr = 62135769600; //Jan 01, 0001 AD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be negative instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should.

std/os/time.zig Outdated
},
Os.linux => {
var ts: posix.timespec = undefined;
var result = posix.clock_getres(posix.CLOCK_MONOTONIC, &ts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want CLOCK_MONOTONIC_RAW here instead. The _RAW variant is the same but avoids any time skewing due to ntp and/or adjtime. This makes it more suitable as a high-performance timer since we won't get any sudden jumps/delays.

Windows' QueryPerformanceCounter isn't time-adjusted either so we match this behavior across operating systems. Not 100% on MacOS.

Copy link
Member

@tiehuis tiehuis Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that CLOCK_MONOTONIC_RAW this is only present on kernels 2.6.28 and newer and is linux-specific so we would need to keep that in mind:
https://linux.die.net/man/2/clock_gettime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not 100% convinced about this change after doing some more reading. So happy to hear disagreements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought and did some reading on the subject. I went this way for two reasons:

  1. CLOCK_MONOTONIC_RAW is linux specific from what I can tell. Easy enough to switch our way through that, but:
  2. Pretty much everyone I've seen having this discussion has ultimately decided on using regular MONOTONIC. The reasoning is that even though the interval of CLOCK_MONOTONIC is not stable, it will average out to being more correct in relation to the real passage of time.

Both are guaranteed to only increase (bugs aside) and neither should "jump" ahead, my understanding is that the skews applied at this level are small.

That said, I'm not entirely sure RAW isn't a better choice just because it matches the QPC behavior. My guess is that the Darwin behavior is also not subject to skew based on how it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CLOCK_MONOTONIC_RAW is best for linux.

  • it's OK to depend on OS-specific things. we'll abstract darwin, linux, and windows (and others when supported) into a platform-agnostic API.
  • timer is trying to measure an absolute duration, rather than measure the difference between 2 clock times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thing to consider is whether handling ntp adjustments leads to more accurate time on small periods i.e. if the cpu oscillator is inaccurate enough that the constant skewing leads to more accurate times using CLOCK_MONTONIC on moderate intervals (say 1second+). Looking at ntpd specifically, it seems that the lower poll interval is 64 seconds by default so I would presume that the onboard cpu clock is accurate enough for the time periods (sub-minute, likely second) we are interested in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see.

This looks like a nice writeup: ros2/rcutils#43

In this case I'm happy with CLOCK_MONOTONIC.

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been working on a distributed clock fault-tolerance problem the past week and spotted the comment in std/time.zig for the Timer implementation pointing here, so I thought I'd drop a comment if this is revisited again:

  1. CLOCK_MONOTONIC trumps CLOCK_MONOTONIC_RAW for the reasons already given by @tgschultz and @andrewrk . It's definitely the better choice of the two. As far as I can see, CLOCK_MONOTONIC_RAW is mostly useful for programs like NTP that want to get at the raw data to measure errors with respect to drift (i.e. clocks that tick too fast or too slow) as opposed to skew (i.e. clocks that are head or behind the "true time"), but CLOCK_MONOTONIC_RAW is not what you want when you want to measure elapsed units of time. So we're good here.

  2. However, for measuring elapsed time that corresponds to time passing across multiple systems, CLOCK_MONOTONIC is trumped again by CLOCK_BOOTTIME, which is exactly the same as CLOCK_MONOTONIC but which can measure elapsed time accurately, even across a suspend. That CLOCK_MONOTONIC fails to measure elapsed time after a suspend accurately was more an accidental historical bug in the kernel. There was an attempt to fix this in the kernel at one point to make CLOCK_MONOTONIC the same as CLOCK_BOOTTIME but this broke applications. Compared to CLOCK_MONOTONIC, CLOCK_BOOTTIME allows applications to get a true suspend aware monotonic clock.

Granted, some applications may only want to measure elapsed time while the process is running, so may prefer CLOCK_MONOTONIC but the choice would be application-specific, and it seems there are also quite a few applications (e.g. distributed databases) suggesting or using CLOCK_MONOTONIC for timers that govern leader leases where they probably should be using CLOCK_BOOTTIME.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @jorangreef. Based on this information I think we may want to consider an iteration on the std lib time API to take into consideration hardware suspension and perhaps force the user to make a choice.

…our mind in the future.

Updated std.os imported tests' block with lazy declaration workaround and added time.zig.
Corrected some incorrect comments.
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Nice work!

std/os/index.zig Outdated
_ = @import("windows/index.zig");
_ = @import("test.zig");
comptime {
if(builtin.is_test) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: space before (. Happens in other places too, I won't point them out individually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that subtlety of the styling in the other std code. One day zig-fmt will save me from having to switch styles depending on what I'm working on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's not worry about style. Style concerns should be in the form of pull requests to zig fmt.

req = rem;
continue;
},
else => return,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't anything but EINTR be a fatal error? I mean, they shouldn't happen in a bug-free program.

// Sometimes Darwin returns EINVAL for no reason.

Interesting. I've never observed that but looking at the nanosleep implementation in libc and __semwait_signal_nocancel() in xnu, they can indeed fail with EINVAL for some runtime errors.

I guess treating it as EINTR is a bad idea, that might result in busy-looping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write the sleep code, I just moved it from std.os and substituted named constants, so I'm not sure of the reasoning.

std/os/time.zig Outdated
var err = darwin.gettimeofday(&tv, null);
debug.assert(err == 0);
const sec_ms = tv.tv_sec * ms_per_s;
const usec_ms = @divFloor(tv.tv_usec, (us_per_s / ms_per_s));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat inconsistent use of parens vis-a-vis line 95. (The ones on line 72 aren't necessary either, for that matter.)


/// Divisions of a second
pub const ns_per_s = 1000000000;
pub const us_per_s = 1000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame, could have benefited from #504...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use something like comptime u64(math.pow(f64, 10, 9)); here, but I'm not sure that's actually more readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I agree. It was more wistful thinking-out-aloud.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

std/os/time.zig Outdated

//Initialize the timer structure.
//This gives us an oportunity to grab the counter frequency in windows.
//On Windows: QueryPerformanceCounter will succeed on anything >= XP/2000.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See libuv/libuv#1268 and the issues linked therein - not an argument per se against QueryPerformanceCounter() but it's not as stable as clock_gettime(CLOCK_MONOTONIC) is on the Unices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'll add some asserts to cover things like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't need to add anything since zig has me covered with runtime checks in the appropriate modes.

std/os/time.zig Outdated
},
Os.linux => {
var ts: posix.timespec = undefined;
var result = posix.clock_getres(monotonic_clock_id, &ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could benefit from caching. clock_getres() is not a hugely expensive system call (might not be a system call at all, often served from the vDSO) but it's still a few cycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is cached in the Timer struct. As long as you use the same Timer to do your timing you won't call it again. We could store it in a global and only ever call clock_getres/QPC/mach_timebase_info once, but it seems unintuitive to me that the time.monotonic_resolution (or whatever) would be 0 or some other invalid value until the first call to Timer.start(), so then I'd want to introduce a time.getMonotonicResolution() to retrieve the value and handle initialization if necessary.

I dunno, I guess I didn't consider this because there's too much Java in my past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason not to globally cache it: every call to getMonotonicResultion() would have to check for an error, instead of just once per start().

std/os/time.zig Outdated
self.start_time = u64(ts.tv_sec * ns_per_s + ts.tv_nsec);
},
Os.macosx, Os.ios => {
darwin.mach_timebase_info(&self.frequency);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also benefit from caching. mach_timebase_info() is fairly expensive.


fn clockLinux() u64 {
var ts: posix.timespec = undefined;
var result = posix.clock_gettime(monotonic_clock_id, &ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if it matters but seccomp2 sandboxes can reject clock_gettime(), although that's arguably a broken-beyond-reasonableness sandbox.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that a system might allow clock_getres but not clock_gettime. It's possible and therefore should be considered. My first thought is to add another check in start() and only throw the error there, but my understanding is that seccomp could be used to reject syscalls using arbitrary criteria, meaning that just because a syscall succeeded once doesn't mean it will in the future, so the appropriate thing to do would be to make every call that makes a syscall potentially throw an error.

That seems really annoying, but edge cases matter, so I dunno. At the very least I can add a check to start() and make EACESS throw an error instead of unreachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it turns out seccomp can return an arbitrary errno value. I feel like the best way to handle this is to just use asserts and unreachable because this kind of thing would be roughly on par with using a debugger to insert random values. Open to suggestions though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me.

std/os/time.zig Outdated
var ts: posix.timespec = undefined;
var result = posix.clock_gettime(monotonic_clock_id, &ts);
debug.assert(posix.getErrno(result) == 0);
return u64(ts.tv_sec * ns_per_s + ts.tv_nsec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can overflow on 32 bits systems since tv_sec and tv_nsec are of type isize? Happens in a few more places in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I had casted the individual values to u64 other places, but apparently was hasty and missed several of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants