Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http cache: introduce cache entries, prevent cache misses when resource will be fetched #24203

Closed
wants to merge 1 commit into from

Conversation

gterzian
Copy link
Member

@gterzian gterzian commented Sep 12, 2019

FIX #24166


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • These changes fix #___ (GitHub issue number if applicable)
  • There are tests for these changes OR
  • These changes do not require tests because ___

This change is Reviewable

@highfive
Copy link

Heads up! This PR modifies the following files:

  • @KiChjang: components/net/tests/http_cache.rs, components/net/fetch/methods.rs, components/net/http_loader.rs, components/net/http_cache.rs

@highfive highfive added the S-awaiting-review There is new code that needs to be reviewed. label Sep 12, 2019
@gterzian gterzian force-pushed the cache_experiment branch 4 times, most recently from 7986748 to d6ea5dc Compare September 12, 2019 16:18
@gterzian gterzian changed the title Http cache: prevent cache misses when resource will be fetched Http cache: introduce cache entries, prevent cache misses when resource will be fetched Sep 12, 2019
@gterzian
Copy link
Member Author

@bors-servo try=wpt

bors-servo pushed a commit that referenced this pull request Sep 12, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

⌛ Trying commit c4f6a6b with merge 2742af8...

@bors-servo
Copy link
Contributor

💔 Test failed - linux-rel-css

@highfive highfive added the S-tests-failed The changes caused existing tests to fail. label Sep 12, 2019
Copy link
Member

@asajeffrey asajeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big comment is that this looks like it's structured differently from the spec, which (IIRC) leaves all caching to network_or_cache_fetch.

Also, it would be nice if a GET could join on to a GET in progress, rather than waiting for all its payload to arrive. That might be a big change though.

@@ -224,6 +225,13 @@ pub fn main_fetch(
// Step 11.
// Not applicable: see fetch_async.

// Get the cache entry corresponding to the url, after potential hsts switch.
let cache_entry = if let Ok(mut http_cache) = context.state.http_cache.write() {
http_cache.get_entry(&request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this spec-complaint? Shouldn't anything that touches the cache be in http_network_or_cache_fetch?

Copy link
Member Author

@gterzian gterzian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly-speaking, I think it's ok, since at this point we're not actually running any specified cache related logic, we're just getting the entry corresponding to the current request, later to be used to check the cache. So basically the "entry" is the interface to the "cache" from the spec perspective, and it will be used later in http_network_or_cache_fetch.

I think the spec in this case is not really providing the full algorithm, rather describing the outcome from a black-box perspective.

For example, see the diagram for the state-machine of the Chromium cache, little of which shows up in the spec: https://www.chromium.org/developers/design-documents/network-stack/http-cache

Same thing goes for the Gecko cache: https://developer.mozilla.org/en-US/docs/Mozilla/HTTP_cache#Implementation (more specifically, see their CacheEntry description).

If anything, ours matches the spec most closely, and yet ours seems to be the most naive implementation of them all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer a solution that kept all the caching logic inside http_network_or_cache_fetch, I find it confusing moving some bits of caching out. What's the motivation for doing that?

response: response,
needs_validation: has_expired,
};
if cached_resource.aborted.load(Ordering::Relaxed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering::Relaxed? You are a brave soul!

Copy link
Member Author

@gterzian gterzian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other parts of this workflow also use Relaxed, and we could change that in this PR as well in another commit, see

response.aborted.store(true, Ordering::Relaxed);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response: response,
needs_validation: has_expired,
};
if cached_resource.aborted.load(Ordering::Relaxed) {
return None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we construct a new CachedResponse at this point and return it, rather than returning None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I realize this change was left over from #22813, and I'm not sure if its' really necessary. Basically there is no need to make this return an Option<CacheResource>, based on the check for aborted, or at least it's not solving any known problem.

Copy link
Member Author

@gterzian gterzian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second-thoughts, I think this is a change that makes sense, since we shouldn't construct an aborted response, and I should have extracted it from the pervious PR at the time.

So I switched to release/acquire for that flow.

@@ -563,6 +580,132 @@ impl HttpCache {
}
}

/// Get the cache entry corresponding to this request
pub fn get_entry(&mut self, request: &Request) -> Option<HttpCacheEntry> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly annoyingly that means we need a write lock just to read the cache :( We may as well make it a Mutex rather than a RWLock.

Copy link
Member Author

@gterzian gterzian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can change that to a mutex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just keep it a RwLock for now, since everything else on HttpState uses one?

pub struct HttpState {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, bit of a waste, but not a huge problem.

let entry = self
.entries
.entry(entry_key.clone())
.or_insert(HttpCacheEntry::new(entry_key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that invalidation can end up inserting an entry into the cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we actually want to return if there is no corresponding entry :)

/// if a pending network fetch has already begun,
/// and a pending response has been stored in the cache,
/// or if there is no pending network fetch.
is_ready_to_construct: Arc<(Mutex<bool>, Condvar)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use futures rather than a condvar here? Essentially we're building a Future<Option<CachedResource>> here.

Copy link
Member Author

@gterzian gterzian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would require moving all of fetch to futures/tasks, since a fetch currently runs on a thread, and in such a setup its more reliable to do this with condvars and locks.

There is an issue at #22813

@gterzian
Copy link
Member Author

Ok lots of TIMEOUT, I must have missed a wake-up somewhere...

@gterzian
Copy link
Member Author

gterzian commented Sep 13, 2019

Also, it would be nice if a GET could join on to a GET in progress, rather than waiting for all its payload to arrive. That might be a big change though.

On that, with the current changes, there are in fact two separate "wait" workflows:

  1. The newly introduced wait on a condvar based on whether an entry is "ready to construct" a response. "ready" is false only for as long as a concurrent fetch checked the cache(via a specific entry), found nothing, and moved on to a network fetch, but has not had the chance yet to store a "pending" resource in the cache. This workflow prevent "quick succession" concurrent request from getting a cache miss just because the "first" fetch hasn't actually started the network fetch yet. The "wait" does not really wait on any network data, it only waits for the network future to be launched, after which a "pending" resource will immediately be stored in the cache.

  2. The existing "wait for a pending resource" workflow. That happens when a concurrent fetch hits the cache, and get's a pending resource. At that point this fetch will wait on the concurrent network fetch to finish. See

    fn wait_for_cached_response(done_chan: &mut DoneChannel, response: &mut Option<Response>) {

If by "joining" you mean actually sending the intermediary steps on the FetchTaskTarget, I wonder if we should do that, it might be best from the perspective of script to see any entry from the cache as coming in one chunk, as opposed to "join" into the networking operation of the fetch that added the resource to the cache, since the fetch that hits the cache doesn't go through a "network fetch"(at least not directly, although it could be said that it indirectly goes through a network fetch via the concurrent cache read), but I'm not sure.

@highfive highfive removed the S-tests-failed The changes caused existing tests to fail. label Sep 13, 2019
@gterzian
Copy link
Member Author

@bors-servo try=wpt

@bors-servo
Copy link
Contributor

⌛ Trying commit 5087540 with merge faa97eb...

bors-servo pushed a commit that referenced this pull request Sep 13, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

💔 Test failed - linux-rel-css

@highfive highfive added S-tests-failed The changes caused existing tests to fail. and removed S-tests-failed The changes caused existing tests to fail. labels Sep 13, 2019
@gterzian
Copy link
Member Author

@bors-servo try=wpt

@bors-servo
Copy link
Contributor

⌛ Trying commit 7ea31c0 with merge 0bde330...

bors-servo pushed a commit that referenced this pull request Sep 13, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

💔 Test failed - linux-rel-css

@highfive highfive added S-tests-failed The changes caused existing tests to fail. and removed S-tests-failed The changes caused existing tests to fail. labels Sep 13, 2019
@gterzian
Copy link
Member Author

gterzian commented Sep 16, 2019

@bors-servo try=wpt (100ms timeout)

@bors-servo
Copy link
Contributor

⌛ Trying commit 474dcf3 with merge e7fca8d...

bors-servo pushed a commit that referenced this pull request Sep 16, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - linux-rel-css, linux-rel-wpt, status-taskcluster
State: approved= try=True

@gterzian
Copy link
Member Author

@bors-servo try=wpt (increase timeout from 100 to 500ms)

@bors-servo
Copy link
Contributor

⌛ Trying commit 589c09a with merge bb81703...

bors-servo pushed a commit that referenced this pull request Sep 16, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@gterzian
Copy link
Member Author

Screen Shot 2019-09-16 at 7 27 53 PM

Ok now its really ready for another round of review @asajeffrey

@jdm jdm assigned asajeffrey and unassigned emilio Sep 16, 2019
@asajeffrey
Copy link
Member

Oh /html/semantics/scripting-1/the-script-element/async_004.htm is an interesting one.

One way we could deal with that is change how prefetching is done, so that rather than having it just treated like a regular cache access, we treat the first request for the resource as being a cache hit, even if the resource didn't have cache metadata? This does worry me a bit, as it's not clear whether that's spec compiliant, but it would get this test to pass, and would improve pref in cases where we prefetched a resource that turned out to have no cache metadata.

I'll read over the latest version.

Copy link
Member

@asajeffrey asajeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm still confused about why we're moving caching logic out of http_network_or_cache_fetch, and there's still the question about how to treat the first fetch of a cached prefetch, but otherwise LGTM.

@@ -224,6 +225,13 @@ pub fn main_fetch(
// Step 11.
// Not applicable: see fetch_async.

// Get the cache entry corresponding to the url, after potential hsts switch.
let cache_entry = if let Ok(mut http_cache) = context.state.http_cache.write() {
http_cache.get_entry(&request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer a solution that kept all the caching logic inside http_network_or_cache_fetch, I find it confusing moving some bits of caching out. What's the motivation for doing that?

components/net/http_cache.rs Outdated Show resolved Hide resolved
components/net/http_cache.rs Outdated Show resolved Hide resolved
@@ -563,6 +580,132 @@ impl HttpCache {
}
}

/// Get the cache entry corresponding to this request
pub fn get_entry(&mut self, request: &Request) -> Option<HttpCacheEntry> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, bit of a waste, but not a huge problem.

components/net/http_cache.rs Show resolved Hide resolved
@gterzian
Copy link
Member Author

gterzian commented Sep 17, 2019

I'd still prefer a solution that kept all the caching logic inside http_network_or_cache_fetch, I find it confusing moving some bits of caching out. What's the motivation for doing that?

I understand your concern, and I think the proposed changes actually fit in with your thinking, for the following reasons:

  1. All the underlying cache operations involving cached resources(construct/store/refresh) are still happening inside http_network_or_cache_fetch. The proposed changes add a get_entry operation, however that doesn't involve manipulating the resources of the cache directly. get_entry only gives us the "partition" of the cache matching the current request, based on it's URL, later to be used for specified cache operations inside http_network_or_cache_fetch.

  2. To make the situation clearer, I could rename the HttpCache to something like HttpCacheManager and the HttpCacheEntry to HttpCache, to show that the "entry" really is the "cache", from the point of view of the current request.

In terms of the motivations of the proposed approach, they are:

  1. In practice, it makes it easier to introduce some blocking like this PR does, since we only want to block other fetches using the same entry, not all fetches. This would be harder to do if all fetches were still operating on the same RwLock<HttpCache>.

  2. The CacheEntry acts as a partitioned interface to the underlying cache storage. The partitioning is based on the url, and it could also be based on additional keys, for example the origin of the top-level, to support something like Double-keyed HTTP cache whatwg/fetch#904

  3. In theory, it also improves parallelism, since actual construct/store/refresh operation by different fetches will not contend, unless they are using the same entry, compared to the current situation where all fetches operate on the same RwLock<HttpCache>, hence any construct/store/refresh potentially blocks any other fetch, regardless of the URL.

Please let me know what you think.

@asajeffrey
Copy link
Member

OK, we appear to be at a bit of a stalemate here, about whether all caching logic should be in http_network_or_cache_fetch as per the spec. What I was hoping is that at step 5.19.1 of https://fetch.spec.whatwg.org/#http-network-or-cache-fetch, if there's already an outstanding request then we'd block waiting for its headers to arrive to know whether we can return a cached response or not. We'd have to block without holding the lock on the cache. The rest of the implementation wouldn't change, there'd be another case when doing a cache lookup, corresponding to "a cacheable GET request has been issued, but the response headers haven't come back yet, so we don't know yet whether there's a cache entry".

@asajeffrey asajeffrey added S-awaiting-answer Someone asked a question that requires an answer. and removed S-awaiting-review There is new code that needs to be reviewed. labels Sep 27, 2019
@gterzian
Copy link
Member Author

gterzian commented Sep 29, 2019

Ok so I've opened #24318 as an alternative implementation, which does not introduce the concept of entries, so you can choose one, or suggest a third approach.

The different commit is 89fe0ba

@asajeffrey

@gterzian gterzian added S-awaiting-review There is new code that needs to be reviewed. and removed S-awaiting-answer Someone asked a question that requires an answer. labels Sep 29, 2019
@gterzian gterzian mentioned this pull request Oct 2, 2019
5 tasks
@gterzian
Copy link
Member Author

gterzian commented Oct 3, 2019

In favor of #24318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-awaiting-review There is new code that needs to be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent requests for the same resource can result in cache misses
5 participants