New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LB-158 / LB-170: Fix listenstore code for users with special characters, add tests and upgrade influx #199
Conversation
Clicked |
listenbrainz/listen.py
Outdated
@@ -196,7 +197,7 @@ def to_influx(self, measurement): | |||
# add the user generated keys present in additional info to fields | |||
for key, value in self.data['additional_info'].items(): | |||
if key not in Listen.SUPPORTED_KEYS: | |||
data['fields'][key] = value | |||
data['fields'][key] = escape(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this because additional_info
variables may have random newlines in them, which would lead to parse errors.
Influx has different behaviour for the line protocol and for the query language, so we have to use 2 backslashes for each backslash in the line protocol and 4 backslashes for each backslash in the query language. Change the code so that it does that everywhere. Also, because newlines cause parse errors in the line protocol replace newlines in the measurement names and the values of extraneous variables in additional_info with `\n`. I've also added a unit test and an integration test with a user with a weird user name.
Setuptools might have an error that is causing build errors See here: https://travis-ci.org/paramsingh/listenbrainz-server/builds/244051410 The bug seems to be known, see this: pypa/setuptools#951
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing all of this painful work! 🛡 💯
So I spent the day looking through the influxdb-python source code. And the reason this was so hard to debug is a combination of random weird behaviour in influx with backslashes and some problems in our code.
Influxdb-python automatically escapes measurement names when we use write_points which can be seen here, so when we send a measurement with name
"para\m"
to write_points, it shows up in influx as"para\\m"
. As a result, the line protocol only requires escaping of\n
characters. On the other hand, the query language requires escaping of everything and since influx had replaced the\
in the measurement name with\\
, we need FOUR backslashes in the query to get the data.So, I have changed the code so that it does this and added a bunch of comments explaining stuff as best I could. I expect more changes will be requested to make it clearer. Also, I have upgraded to influx 1.2.4 and influxdb-python 4.1.0 .
Also, there is some weird behaviour when we backslash quotes in influx. Influx SOMETIMES totally ignores backslashes present in measurement names, if they come before quotes. This behaviour is present in influx itself (not influxdb-python) and I am not sure if this is a bug or if I am misunderstanding something. For example:
Backslash away from quotes:
Backslash in front of a quote gets ignored
Stuff like this is hard to write code for, unless there is something that I am missing completely...