Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libexpr: Recognize newline in more places in lexer #1939

Merged
merged 1 commit into from Mar 16, 2018

Conversation

dezgeg
Copy link
Contributor

@dezgeg dezgeg commented Mar 2, 2018

Flex's regexes have an annoying feature: the dot matches everything
except a newline. This causes problems for expressions like:

"${0}\
"

where the backslash-newline combination matches this rule instead of the
intended one mentioned in the comment:

    <STRING>\$|\\|\$\\ {
                    /* This can only occur when we reach EOF, otherwise the above
                    (...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered.
                    This is technically invalid, but we leave the problem to the
                    parser who fails with exact location. */
                    return STR;
                }

However, the parser actually accepts the resulting token sequence
('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer
rule didn't assign anything to yylval. Ultimately this leads to a crash
when dereferencing a NULL pointer in ExprConcatStrings::bindVars().

The fix does change the syntax of the language in some corner cases
but I think it's only turning previously invalid (or crashing) syntax
to valid syntax. E.g.

"a\
b"

and

''a''\
b''

were previously syntax errors but now both result in "a\nb".

Found by afl-fuzz.

Flex's regexes have an annoying feature: the dot matches everything
except a newline. This causes problems for expressions like:

"${0}\
"

where the backslash-newline combination matches this rule instead of the
intended one mentioned in the comment:

    <STRING>\$|\\|\$\\ {
                    /* This can only occur when we reach EOF, otherwise the above
                    (...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered.
                    This is technically invalid, but we leave the problem to the
                    parser who fails with exact location. */
                    return STR;
                }
However, the parser actually accepts the resulting token sequence
('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer
rule didn't assign anything to yylval. Ultimately this leads to a crash
when dereferencing a NULL pointer in ExprConcatStrings::bindVars().

The fix does change the syntax of the language in some corner cases
but I think it's only turning previously invalid (or crashing) syntax
to valid syntax. E.g.

"a\
b"

and

''a''\
b''

were previously syntax errors but now both result in "a\nb".

Found by afl-fuzz.
@shlevy
Copy link
Member

shlevy commented Mar 2, 2018

LGTM, would like a second set of eyes to make sure we're not breaking any valid expression with this (or, if we are, that it's worth it).

@dezgeg
Copy link
Contributor Author

dezgeg commented Mar 16, 2018

cc @edolstra

@edolstra edolstra merged commit 64441f0 into NixOS:master Mar 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants