Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.2] Validation performance optimisation #12651

Merged
merged 11 commits into from
Apr 1, 2016
Merged

[5.2] Validation performance optimisation #12651

merged 11 commits into from
Apr 1, 2016

Conversation

themsaid
Copy link
Member

@themsaid themsaid commented Mar 7, 2016

The use case for testing performance is that I'm uploading a large excel file, convert it to array, validate the array before saving into the database.

The array looks like:

array:7 [
  "products" => array:257 []
  "product-sizes" => array:257 []
  "product-size-ingredients" => array:1142 []
]

Inside the validator we flatten the data to the dot.format and iterate over the results in multiple areas in the code, like in the validateDistinct, validateArray, and initializeAttributeOnData.

The issue here is that we hit all the flattened records even the ones that don't belong to the attribute we're checking, and this is causing a huge performance issue.

This PR grabs only the required attribute values before flattening them with Arr::dot, so based on the example above if I'm validating products.code is distinct I'll only iterate over products only, not the whole data.


Sometimes the data is nested:

Validator::make(
    [
        'foo' => [
            'bar' => [['baz' =>1], ['baz' =>2]],
            'cc'
        ],
        'bb'
    ],
    ['foo.bar.*.baz' => 'distinct']
);

Here we only need to flatten $data['foo']['bar'] for the check, and Arr::only was only working for single dimensional arrays, so I modified Arr::only to accept dot-keys and used the explicit part of the attribute name foo.bar to get only this data before passing it to Arr:dot.


UPDATE:

The Arr::only method was left untouched, and a new method was added to the Validator extractData().

@vlakoff
Copy link
Contributor

vlakoff commented Mar 7, 2016

I'm not sure it's a safe idea to bring dot notation in Arr::only.

To be more consistent your Arr::only should use Arr::get/has instead, but precisely because of the #12626 above I'm not sure that would work.

I think the best solution might be to not touch Arr::only and add a new protected method in Validator for your purpose.

$attributeData = [];

foreach ((array) $keys as $key) {
if (! $keyValue = data_get($array, $key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You miss all falsy values because of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, this one is to pass the following test:

$array = ['name' => 'taylor', 'age' => 26];
$this->assertEquals(['name' => 'taylor'], array_only($array, ['name']));
$this->assertSame([], array_only($array, ['nonExistingKey']));

Not sure if there are other cases I missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahaaa, you mean null, 0, false, etc..? You are right, this should be fixed.

@themsaid
Copy link
Member Author

themsaid commented Mar 7, 2016

Well giving it a deep thought I find the new addition to array_only is quite useful, maybe the method itself can be left alone but it'd be nice to have another method that outputs an array of only the keys you need using dot notation.

I'm thinking of API response for example, maybe you want to present ['parent'] and ['other_parent']['child1'] only for example.

@vlakoff
Copy link
Contributor

vlakoff commented Mar 7, 2016

Indeed it would be useful, I only say I think this goes outside of the scope of Arr.

Just some idea for later, creating some new class Illuminate\Support\Data, and moving into it all the heavy, "swiss-knife" functions, namely data_get/set() to begin with. These functions are a fundamental part of the framework and are presented as simple helpers, it looks a bit wrong to me.

@themsaid
Copy link
Member Author

themsaid commented Mar 7, 2016

I see, maybe a Data::only() method in that class instead of Arr::only(). Let's see what Taylor thinks of the matter.

*/
protected function getExplicitAddress($attribute)
{
return str_replace_last('.', '', explode('*', $attribute)[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the Str::replaceLast method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just rtrim( ... , '.') ?

@themsaid
Copy link
Member Author

themsaid commented Mar 7, 2016

@vlakoff updates were made. Thanks for the notes.

continue;
}

data_set($attributeData, $key, $keyValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do your $key contain "*" ? I think the data_set is too "useful", Arr::set should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlakoff Yes, the Arr::set works on dots

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. So you could use Arr::get/set in this modified Arr::only. It would be a much better fit, while still working with the code in Validator.

This modifed Arr::only is probably slower, that's the only caveat I can think of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlakoff, I'm even on data_get or Arr::get/set, since I agree on your concerns above due to #12626.
for data_set part, what I think is use simpler code as possible.

@vlakoff
Copy link
Contributor

vlakoff commented Mar 7, 2016

Because of the getExplicitAddress's just above in Validator, there could not be *, but there could be dots.

@themsaid themsaid changed the title [5.2] Validation performance optimisation [5.2] [WIP] Validation performance optimisation Mar 28, 2016
@themsaid themsaid changed the title [5.2] [WIP] Validation performance optimisation [5.2] Validation performance optimisation Mar 28, 2016
@taylorotwell
Copy link
Member

Would it be possible for people to bypass validation by passing missing as their value? Could that possibly be exploited in any way?

@taylorotwell
Copy link
Member

What did the before / after benchmarks look like on this?

@themsaid
Copy link
Member Author

@taylorotwell If the value passed was __missing__ the end result of extractData() will be an empty array. I believe an empty array of data is not harmful, maybe we should leave the value as null instead of __missing__.

@themsaid
Copy link
Member Author

@taylorotwell doing a very basic benchmarking, applying the distinct rule on 1,656 array items exceeded 30 seconds of execution. Applying the change and only passing the data the rule needs dropped it to 8 seconds.

The test is on an array that looks like:

array:7 [
  "products" => array:257 []
  "product-sizes" => array:257 []
  "product-size-ingredients" => array:1142 []
]

The rules are:

[
    'products.*.id' => 'required|distinct',
    'product-sizes.*.id' => 'required|distinct',
]

The difference in execution time is that before we pass all the data to the distinct rule to check for only the ones we need:

 $data = Arr::where(Arr::dot($this->data), function ($key) use ($attribute, $attributeName) {
            return $key != $attribute && Str::is($attributeName, $key);
        });

The change is:

$attributeData = $this->extractData($explicitAddress);
        $data = Arr::where(Arr::dot($attributeData), function ($key) use ($attribute, $attributeName) {
            return $key != $attribute && Str::is($attributeName, $key);
        });

Here extractData will only bring products when validating for products, and product-sizes when checking for product sizes. product-size-ingredients will never be passed to the validator method because we don't need it.

@vlakoff
Copy link
Contributor

vlakoff commented Mar 28, 2016

Just as a suggestion, how about this:

protected function extractData($attribute)
{
    static $missing;
    $missing || $missing = random_bytes(64);

    $results = [];

    $value = Arr::get($this->data, $attribute, $missing);

    if ($value !== $missing) {
        Arr::set($results, $attribute, $value);
    }

    return $results;
}

GrahamCampbell pushed a commit that referenced this pull request Apr 1, 2016
Closes #12651

Signed-off-by: Graham Campbell <graham@alt-three.com>
@taylorotwell taylorotwell merged commit 32b9eb8 into laravel:5.2 Apr 1, 2016
@themsaid themsaid deleted the validation-performance branch April 1, 2016 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants