Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB-158 / LB-170: Fix listenstore code for users with special characters, add tests and upgrade influx #199

Merged
merged 4 commits into from Jun 19, 2017

Conversation

paramsingh
Copy link
Collaborator

@paramsingh paramsingh commented Jun 17, 2017

So I spent the day looking through the influxdb-python source code. And the reason this was so hard to debug is a combination of random weird behaviour in influx with backslashes and some problems in our code.

Influxdb-python automatically escapes measurement names when we use write_points which can be seen here, so when we send a measurement with name "para\m" to write_points, it shows up in influx as "para\\m". As a result, the line protocol only requires escaping of \n characters. On the other hand, the query language requires escaping of everything and since influx had replaced the \ in the measurement name with \\, we need FOUR backslashes in the query to get the data.

So, I have changed the code so that it does this and added a bunch of comments explaining stuff as best I could. I expect more changes will be requested to make it clearer. Also, I have upgraded to influx 1.2.4 and influxdb-python 4.1.0 .

Also, there is some weird behaviour when we backslash quotes in influx. Influx SOMETIMES totally ignores backslashes present in measurement names, if they come before quotes. This behaviour is present in influx itself (not influxdb-python) and I am not sure if this is a bug or if I am misunderstanding something. For example:

Backslash away from quotes:

> insert "random\user"",val=1 hello=1
> show measurements
name: measurements
name
----
"random\user""

> select * from "\"random\\user\"\""
name: "random\user""
time                hello val
----                ----- ---
1497719311488498867 1     1

Backslash in front of a quote gets ignored

> insert "randomuser\"",val=1 hello=1
> show measurements
name: measurements
name
----
"random\user""
"randomuser""

> select * from "\"randomuser\"\""
name: "randomuser""
time                hello val
----                ----- ---
1497719359874380564 1     1

Stuff like this is hard to write code for, unless there is something that I am missing completely...

@paramsingh paramsingh closed this Jun 17, 2017
@paramsingh paramsingh reopened this Jun 17, 2017
@paramsingh paramsingh changed the title Escape users LB-158 / LB-170: Fix listenstore code for users with special characters, add tests and upgrade influx Jun 17, 2017
@paramsingh
Copy link
Collaborator Author

Clicked Create Pull Request before it was ready. 😅

@@ -196,7 +197,7 @@ def to_influx(self, measurement):
# add the user generated keys present in additional info to fields
for key, value in self.data['additional_info'].items():
if key not in Listen.SUPPORTED_KEYS:
data['fields'][key] = value
data['fields'][key] = escape(value)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this because additional_info variables may have random newlines in them, which would lead to parse errors.

Influx has different behaviour for the line protocol and for
the query language, so we have to use 2 backslashes for each
backslash in the line protocol and 4 backslashes for each
backslash in the query language. Change the code so that it
does that everywhere.

Also, because newlines cause parse errors in the line protocol
replace newlines in the measurement names and the values of
extraneous variables in additional_info with `\n`.

I've also added a unit test and an integration test with a user
with a weird user name.
Setuptools might have an error that is causing build errors
See here: https://travis-ci.org/paramsingh/listenbrainz-server/builds/244051410

The bug seems to be known, see this:
pypa/setuptools#951
Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing all of this painful work! 🛡 💯

@mayhem mayhem merged commit 81c826c into metabrainz:master Jun 19, 2017
@paramsingh paramsingh deleted the escape-users branch June 19, 2017 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants