Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PICARD-300: Dont save files if metadata is unchanged #535

Closed
wants to merge 12 commits into from

Conversation

samj1912
Copy link
Collaborator

@samj1912 samj1912 commented Jan 8, 2017

No description provided.

Copy link
Member

@mineo mineo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the inline comments, how does this handle the various settings on the Tags options page that influence how tags are written and which tags are removed from files, especially the ID3 version etc?

orig_values = orig_metadata.getall(name)
if new_values != orig_values:
return False
if orig_metadata.length != new_metadata.length:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not correct, see #109 and PICARD-342.

new_metadata = self.new_metadata
orig_metadata = self.orig_metadata
tags = set(new_metadata.keys() + orig_metadata.keys())
for name in filter(lambda x: not x.startswith("~") and self.supports_tag(x), tags):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x.startswith("~") is not a correct check. In some cases we do actually write ~ tags to files (I think ~rating (which also depends on the email adress) and ~id3-prefixed tags (look in ID3File._save for those)).

@samj1912
Copy link
Collaborator Author

samj1912 commented Jan 8, 2017

Updated as per review :)

Copy link
Member

@mineo mineo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the ID3 version, text encoding etc. settings? I gave this a try and if I change them, the tags are still not saved.

@@ -62,6 +62,13 @@
</widget>
</item>
<item>
<widget class="QCheckBox" name="dont_save_on_unchanged">
<property name="text">
<string>Don't save files if the tags are uncahnged</string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unchanged"

@samj1912
Copy link
Collaborator Author

samj1912 commented Jan 9, 2017

What about the ID3 version, text encoding etc. settings? I gave this a try and if I change them, the tags are still not saved.

Hmm, yes, for a file specific don't save on unchanged, I am thinking I will add the settings that affect saving as "~" tags in the file metadata when its loaded and compare them to determine if the settings have changed.

Any suggestions?

@samj1912
Copy link
Collaborator Author

Updated as requested. Now saves if settings are changed.

Copy link
Member

@mineo mineo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This completely ignores image tags.

What if I save a file, quit Picard and a few months later want to retag that same file with different settings? If it's loaded after the preferences are changed, orig_metadata will contain the wrong initial value and the file won't be saved.

(Unfortunately I don't have a complete list of settings and scenarios to cover in my head, I write them down as I think of them :-( )

orig_values = orig_metadata.getall(name)
if new_values != orig_values:
return False
if abs(float(orig_metadata.length) - float(new_metadata.length)) > 2000:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2000 here (and in the tagdiff code) should be moved to a new constant in picard/const/__init__.py, like MAX_INSIGNIFICANT_LENGTH_DIFFERENCE or something like that.

if new_metadata[name] != orig_metadata[name]:
return False
for name in filter(lambda x: (not x.startswith("~")
or x.startswith("~id3")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is unique to formats using ID3 tags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to add a check specifically for id3 tags since, we are already 'AND' the above result with self.supports_tag

def _load_preserved_config(self, metadata):
# Adding additional config to metadata to prevent unecessary saving
metadata['~config:rating_steps'] = config.setting['rating_steps']
metadata['~config:rating_user_email'] = config.setting['rating_user_email']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this embed user email in file tags ?
If yes, it leads to privacy issues.
I would make only one hash of config options, stored in ~config:hash and recalculate/compare on load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ~ metadata is not in itself a privacy issue as ~ data is not (with some exceptions) saved to media files.

However, email addresses are saved to the id3 rating tag - this is how the rating tag is defined in the id3 standard, not a choice made by Picard.

BUT, I counsel against storing config or indeed non-track related data in ~ metadata. Just because it is not visible in the standard UI, doesn't make this a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling here that we are over complicating this solution by trying to make it perfect for all combinations of changing option settings and saving combinations of files. e.g. if we change options and save one file and change them back again before saving the rest. This seems to be why we are saving configuration settings in file metadata.

Most people will have a standard set of settings that they use daily - and won't change them frequently. The easiest solution in my opinion is to preserve globally the settings when Picard is started, and if current settings have changed from these, then we save the files. If we want to get a bit cleverer, then we can have a list of settings that we check or ignore (so that e.g. script settings changed don't trigger global saves because script changes of note will have changed metadata which will trigger a save).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. It seems to me that a list of settings which change media tags without changes to Picard's metadata is quite small. Images, ratings, what else? So we create a list of these settings, and preserve the startup values and then compare the startup values to current values when user saves files.

What do people think of this idea?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the best of both, I can just define a new variable called in the File class called config and store the above there?

Removes all the above associated problems.

Copy link
Collaborator Author

@samj1912 samj1912 Jan 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are always saving it for the first time a save is being called on it and then skipping on saving it afterwards if no settings have been changed.

Me and @mineo discussed it over irc that we should avoid saving any tags to the file and that a first save is done on a new file regardless is the way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the right approach is to compare old and new tags every time, but if @mineo still thinks your way is better after reviewing my most recent comment, then I will certainly defer to his greater knowledge and experience.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also currently in the above code, it only prevents writing metadata to files. I haven't implemented prevent_save for other operations.(Saving image to files, rename files, move files, move additional files, delete empty dirs)

I can include it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think this would be overkill. If you don't save a file because nothing has changed you don't rename files, delete directories etc.

BUT, if you previously saved images to files and now save them to tags (or vice versa) then the file has changed. We would be better concentrating on loading file metadata more comprehensively, and then checking whether it has changed - so for images we should load files and tags (noting their source at the same time) and then see if the images have changed c.f. downloaded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, true. Will update the PR to reflect these changes as well :)

@samj1912
Copy link
Collaborator Author

Updated PR. Not saving non metadata related things to self.metadata now.

@Sophist-UK
Copy link
Contributor

  1. If changing settings makes the file saveable, then this needs to be noted in the UI by change of Album / file icon.

  2. These changes also seem to include a lot of format-specific changes (i.e. to save settings relevant to this format on a per file basis as the file is loaded), however if it is possible to have a more generic solution that would be preferable as it would make point 1. easier to implement.

file_config['remove_ape_from_mp3'] = config.setting['remove_ape_from_mp3']
if self._File != mutagen.aiff.AIFF:
file_config['write_id3v1'] = config.setting['write_id3v1']
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously commented (and now hidden) IMO this is not the way to note that tags have changed because settings have changed.

The only reliable way to check whether tags have changed is to compare the original tag with the new tag - we do that for metadata, but not for special tags like rating and images.

So I think we should concentrate on:

a. Saving the existing ratings tag in hidden metadata and then comparing it to the new one at save time to see if it has changed.

b. Determining the coding for a file (which may not be possible) and comparing it at save time. If we cannot determine encoding from tag content, then we should save the encoding as new MUSICBRAINZ_ENCODING metadata in order to be able to see if it has changed.

c. I think Join_With may already be handled (because the existing ID3v23 tags are stored in Metadata as a string) and we already generate a string for multi-value tags when checking if the metadata has changed.

d. I am not certain whether we can determine whether file is ID3v23 or ID3v24 in all cases. certainly we should be able to tell if it e.g. has TMCL tags, but there is no guarantee that these tags will be used. Again, perhaps we need a MUSICBRAINZ_ID3V2x tag.

e. We should try to identify whether the existing file has APE or ID3v1 metadata and if so, set a hidden metadata flag.

f. The code does not handle changes to settings to how images are stored. As previously suggested, I think we should load images from both tags and files when we load the media file into Picard (noting their source) and then determine whether downloaded images are different when determine whether the file has changed.

Returns False if no related configs"""
raise False

def _test_same(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this _is_metadata_same or (with opposite return values) _has_metadata_changed?

@samj1912
Copy link
Collaborator Author

  1. will require changing a lot of icons if the number of files loaded is large.

@samj1912
Copy link
Collaborator Author

I think we are complicating it even more. Currently I will implement only the file tags related changes. Further caching of other methods(move files, rename, etc) can be implemented in a different PR after discussion as a lot of config settings come into play then.

@Sophist-UK
Copy link
Contributor

Saving config settings to files is complicating things - but more importantly it won't do what it is supposed to if you changed the settings before the file was loaded. This is NOT the right approach - the only approach which will work properly is to look at the existing tags and compare them to the new ones.

@samj1912
Copy link
Collaborator Author

samj1912 commented Jan 13, 2017

@Sophist-UK it will actually. We are ALWAYS saving the file the first time it is saved after being loaded.

We are skipping on a complicated implementation with the downside that we are saving it once regardless of the fact that the tags have been changed or not(or the related settings for that matter)

@Sophist-UK
Copy link
Contributor

@samj1912 said: "1. will require changing a lot of icons if the number of files loaded is large."

If you changed settings before loading, then when the file is loaded the icon will be set.

If you change settings after loading, then yes - you need to run through all albums and files and change those that now need saving as a result of changing the setting. But, that could be done as a queued background task (in the same way that loading and saving files and doing web service calls are handled in the background).

@Sophist-UK
Copy link
Contributor

Sophist-UK commented Jan 13, 2017

@samj1912 said:

@Sophist-UK it will actually. We are always saving the file the first time it is saved after being loaded.

We are skipping on a complicated implementation with the downside that we are saving it once regardless of the fact that the tags have been changed or not(or the related settings for that matter)

That in itself makes this enhancement MUCH less useful. When I load my library, I currently use the Remove Perfect Albums plugin to get rid of any which haven't changed, and then save the rest. If this PR doesn't work on first save (and avoid having to use that plugin), it probably doesn't give much benefit.

I think you may need to bite the bullet and think about doing a more complicated but proper job.

@samj1912
Copy link
Collaborator Author

It is possible to include a much more complicated way to cache the saving process but I think it is out of the scope of this PR as it will involve looking at format specific ways to determine the file's existing configuration and comparing it with the settings.

This PR just serves as a base to prevent useless saving when -

for eg - You did a lot of small changes across albums and some of them are already saved. You select all and press 'Save'.

I agree that it is not perfect but it is better than the current implementation and requires very less implementation.

We can implement a much more elaborate save cache in v2.0 maybe but seeing as we are approaching v1.4 deadline, I think a simple augmentation like this might save on some unnecessary IO.

@samj1912
Copy link
Collaborator Author

Added image tag related settings and made the code more modular.

@Sophist-UK
Copy link
Contributor

@samj1912 said:

It is possible to include a much more complicated way to cache the saving process but I think it is out of the scope of this PR as it will involve looking at format specific ways to determine the file's existing configuration and comparing it with the settings.

This PR just serves as a base to prevent useless saving when -

for eg - You did a lot of small changes across albums and some of them are already saved. You select all and press 'Save'.

I agree that it is not perfect but it is better than the current implementation and requires very less implementation.

I am not the ultimate arbiter of whether PRs get merged or not - but IMO as it stands this PR should not be merged because:

A. In the above use case you can already select which albums/tracks to save in the UI;

B. In my use case refreshing metadata on my entire library, it does not eliminate the need for using Remove Perfect Albums;

C. Does not handle the use case where settings were changed before files were loaded;

D. This does not solve the general problem of properly identifying which files have non-metadata tags that have changed;

E. Does not properly display in the UI files that now need saving because settings have changed

Consequently I think that this introduces code which takes Picard in the wrong direction, and results in Picard doing different things in different circumstances in a way which will be confusing to users.

@Sophist-UK
Copy link
Contributor

Link to JIRA issue: PICARD-300

@samj1912
Copy link
Collaborator Author

samj1912 commented Jan 13, 2017

A few comments -

  1. This does not serve to replace Remove Perfect Albums. This PR addresses a different use case entirely?
  2. I have already addressed how we are handling C.
  3. I dont get E at all? Currently also in picard if you change tag related settings, picard doesn't indicate that a particular file needs to be saved.
    My PR makes practically no difference to this use case?
    There is no need to show which files need to be saved. Just because we are internally skipping on saving files to preserve processing in case the user tries to save files which have not changed. It is kind of similar to the way we have cached cover art images. The fact that we are not writing to the files itself is only visible in the debug log which I don't think a normal user will be affected by.
  4. It's an optional setting which is disabled by default.

@samj1912
Copy link
Collaborator Author

samj1912 commented Jan 13, 2017

Also, if the option name in the ui seems confusing maybe we can change that? :)

@Sophist-UK
Copy link
Contributor

To address your points:

  1. The use case it addresses can be met already. If it always saves files on the first save, then 90% of its usefulness has gone. Because if you are then tweaking a few other albums and saving them, Picard already allows you to select those albums and save only them. And I think that the Remove Perfect Albums use case is one that will be far more useful to more people.

  2. Exactly - it doesn't handle C except by reverting to standard behaviour.

  3. E - OK - I accept that this PR is not intended to, and does not address this issue. But it is part of what needs improving.

But the bottom line is that as currently scoped, IMO this PR does not address the full need, the bits it does address have an alternative existing way of being achieved, its only addressing the needs of a very minor use case but is hacking the Picard innards to achieve it in a somewhat unaesthetic way.

I know its hard to hear someone say that they don't think your hard-work in coding this is useful - and I know because I have been there myself - but I have to be honest about this. And it is harder for me to do because I can see the effort you have put into all the other improvements and fixes you have recently done, and the last thing I want to do is to dent your enthusiasm.

But, as I have said before, in the end it is not by decision - it is @mineo or @zas who need to merge this, and my comments are only my personal opinion. But I also have to be honest about whether the functionality is right and whether this is taking Picard in the right direction - and in this case IMO it isn't.

@samj1912
Copy link
Collaborator Author

samj1912 commented Mar 6, 2017

Closing this PR as I don't see this working out.

@samj1912 samj1912 closed this Mar 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants