Conversation
Fields: | ||
number (models.PositiveIntegerField): A positive integer that uniquely identifies a version | ||
of a specific repository. Each new version for a repo should have this field set to | ||
1 + the most recent version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the latest version is deleted and then a new version is created. It would have the same version number as the deleted version. I think a better idea would be to store a field (e.g. latest_version) on the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in delete_version() you prevent users from deleting the latest to prevent the problem I mention.
Looking at the squashing code, it looks like it possible that multiple RepositoryContent records could point to the same version. This means you might see a version in which a content unit displays in the added list but is not in the repo version, or is displayed in the removed list but is in the repo version. We may want to clean up RepositoryContent records or filter them out in added()/removed(). |
@daviddavis I don't think I quite follow that. Could you maybe add an example of the scenario you're describing? |
Sure, for the first case, suppose you add a content unit in version 1 and remove it in version 2. Then you delete version 1. Now you have a RepositoryContent where vadded and vremoved are both '2'. When I see the list of 'added' units in the API for version 2, the repository version shows the content unit but it's not actually in the version's content. The reverse case would require version 1 to add the unit, version 2 to remove the unit, and version 3 to re-add the unit. Then I remove version 2. Now I have 2 RepositoryContent records pointing to version 3—one where vremoved is version 3 (but vadded is still version 1) and one where vadded is version 3. So the unit appears in 'removed' but is still in version 3's content. This is really a minor problem though and perhaps something we can live with for the MVP. |
That RepositoryContent record would get deleted during the squash for the reasons you're bringing up. You can find it from this comment in that task: "# delete any relationships added in the version being deleted and removed in the next one."
That case is interesting. Everything will behave correctly as you observed, because the content sets for each version will continue to show the right memberships. It's just weird to show the same content being removed and added by the same version. Probably the best way to handle that is at the end of the squash task, query for any content that was both added and removed by version 3, and fix those relationships up. I think that would mean deleting the RepositoryContent(vadded=3, vremoved=None) record, and setting vremoved=None on the other record. |
@daviddavis I added "test_7_add_remove_add" to the gist to test that it behaves correctly in the case you described. https://gist.github.com/mhrivnak/69af54063dff7465212914094dff34c2#file-vrtest-py-L225 |
repository = models.ForeignKey(Repository, on_delete=models.CASCADE) | ||
vadded = models.ForeignKey('RepositoryVersion', related_name='added_memberships') | ||
vremoved = models.ForeignKey('RepositoryVersion', null=True, related_name='removed_memberships') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the memberships theme in related_name
. Can you help me understand the thinking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe another word would be better. The idea is that on a repo version object, there will be a django manager attribute that corresponds to all of the RepositoryVersion
objects that add something for that version. What should that attribute be named? added_repository_contents
seems awkward. Thinking of these objects as "memberships" seemed like a reasonable noun to replace "repository_contents". But introducing a new word might just make things worse.
What do you think it should be called? Perhaps added_repository_contents
isn't so bad?
repository = models.ForeignKey(Repository) | ||
number = models.PositiveIntegerField(db_index=True, default=1) | ||
created = models.DateTimeField(auto_now_add=True) | ||
action = models.TextField(choices=ACTIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the value if tracking the actions
and recommend removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I'm +1 to removing it.
content (pulpcore.plugin.models.Content): a content model to add | ||
""" | ||
# duplicates are ok | ||
with suppress(IntegrityError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 suppress
|
||
def get_url(self, obj, view_name, request, format): | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing description in docstring.
view_name='versions-removed', | ||
lookup_field='number', parent_lookup_kwargs={'repository_pk': 'repository__pk'}, | ||
) | ||
number = serializers.IntegerField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 number
@@ -80,4 +88,8 @@ def sync(importer_pk): | |||
'repository': importer.repository.name, | |||
'importer': importer.name | |||
}) | |||
importer.sync() | |||
importer.sync(new_version, old_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
try:
importer.sync(new_version, old_version)
except Exception:
new_version.delete()
Another thing to consider is that when the new version is saved without transactions, it will be visable in a half-baked state for the duration of the sync. This might be bad for read operations not locked by reservation like applicability calculation.
log.info('The repository version was not found. Nothing to do.') | ||
return | ||
|
||
log.info('Deleting and squashing version %(v)d of repository %(r)s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i18n
next_version = version.next() | ||
except models.RepositoryVersion.DoesNotExist: | ||
# do something friendly here | ||
log.info('Cannot delete and squash version until a newer one is created.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i18n
repo_relations.filter(vremoved=version).update(vremoved=next_version) | ||
|
||
# With no more relationships remaining, delete the version. | ||
version.delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Like how simple the squash is.
{ | ||
'repository': publisher.repository.name, | ||
'publisher': publisher.name | ||
'publisher': publisher.name, | ||
'version': version.num, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe s/num/number/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -221,25 +207,147 @@ class RepositoryContent(Model): | |||
|
|||
content (models.ForeignKey): The associated content. | |||
repository (models.ForeignKey): The associated repository. | |||
vadded (models.ForeignKey): The RepositoryVersion which added the referenced Content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a personal preference but I prefer version_added
over vadded
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
(REMOVE, 'Remove'), | ||
) | ||
|
||
repository = models.ForeignKey(Repository) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a user_visible field that defaults to False. It is set to true as the last step of the a sync before recording it on the task status as the 'created_resource'. This is written in the MVP use cases some. This resolves the inconsistant state comments below. Note that the viewset should not display user_visible; to the user it doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the MVP talks about having a 'add_many' and a 'remove_many' both of which would be list href attributes to be added and removed in bulk via a POST to the RepositoryVersion detail. To add them I think 3 things need to happen:
- They need to be recorded at the DB level as data provided by the user for the creation of this repo version.
- During the sync, after the creation of the repo version, core should associate the add_many units and remove the remove_many units before calling into any specific importer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a user_visible field that defaults to False. It is set to true as the last step of the a sync before recording it on the task status as the 'created_resource'. This is written in the MVP use cases some. This resolves the inconsistant state comments below. Note that the viewset should not display user_visible; to the user it doesn't exist.
I like that idea. We've talked at some point about a similar thing for publications also.
There's also a use case for recovery from an incomplete operation, for example a sync that crashed. I think it would go like this: When the core is about to make a new version, it would look for any existing versions with this boolean set to False
and delete them. Then it would proceed with making a new version and handing it to the appropriate plugin code.
Any other thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the "add_many" and "remove_many" comment, I'm just not following. Can you clarify? Did you mean to say "RepositoryVersion" instead of "RepositoryContent"? What's the user story behind this? Is it "As a user, I can add and remove content in a repository by POSTing the representation of a new RepositoryVersion to the repo version collection endpoint" or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the design you propose regarding deletion. That is a better design because it makes Pulp more crash-safe. I think we are planning to do the exact same thing in Publications (from the MVP call convo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I mean "RepositoryVersion" not "RepositoryComment" I updated the comment.
Sure. The use case idea is to have this fulfil both move and copy and associate/unassociate. For instance you upload and create N content units, you could get them into a repo with add_many=['href1', 'href2', 'href3']
etc. Or copy these Y units into a repo, or remove these Z units, etc. It's immutable so the only thing the user can do is to add and/or remove content when making a new one anyway. Another option they can use in conjunction w/ add_many, remove_many is to call an importer and have it import content. It's a strange use case, but the idea is the options could all be used together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the sync, after the creation of the repo version, core should associate the add_many units and remove the remove_many units before calling into any specific importer.
Are you suggesting a user would supply "add_many" and "remove_many" to a sync operation? I don't see how those would work together. I would expect those two options to be parameters for a different kind of task besides a sync.
Is this a nested implementation where each version detail URL is "nested" under a repository? Since the urls have been flattened I think this is the one area where nesting them makes sense. |
(REMOVE, 'Remove'), | ||
) | ||
|
||
repository = models.ForeignKey(Repository) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the MVP calls out removing the association of an importer with any specific repo, and replace it with an attribute on a repo called default_importer
that is a foreign key to a single importer. A post can specify an 'importer' setting which tells the sync code to call that importer to finish the repo version. If 'importer' is not specified, but the associated importer has default_importer
set, then use that as the importer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly there is an idea to use a base_version
. This base_version
would be an option the user could specify. If unspecified it defaults to the latest version of the associated repo. If specified as None then the new version starts with no content. If specified as a href to a version, the repo version begins with the content from that version before applying add_many or remove_many or calling into an importer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the MVP calls out removing the association of an importer with any specific repo, and replace it with an attribute on a repo called default_importer that is a foreign key to a single importer. A post can specify an 'importer' setting which tells the sync code to call that importer to finish the repo version. If 'importer' is not specified, but the associated importer has default_importer set, then use that as the importer.
Can you elaborate on how that change is relevant to this code? I'm just not seeing the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly there is an idea to use a base_version. This base_version would be an option the user could specify. If unspecified it defaults to the latest version of the associated repo. If specified as None then the new version starts with no content. If specified as a href to a version, the repo version begins with the content from that version before applying add_many or remove_many or calling into an importer.
Interesting idea. What are the use cases motivating it?
I assume that behavior would mostly happen in task code, and not here on the model, right?
And would this be a 3.1+ idea?
Taking a step back, I propose that this PR remain at approximately the same level of scope it currently has. Additional behaviors such as POSTing to create new versions more directly can be cleanly separated into follow-on stories and PRs. That will let us quickly get the base behaviors and data structures in place, unblocking other work such as that of plugin writers. I think it'll also make it easier to plan and implement those additional features in a focused way. I can take a stab at adding the How does that sound? |
@mhrivnak I think that sounds fine. |
@mhrivnak I agree, we can handle all of the followon work in separate PRs. Can the conflicts be fixed up and I'm +1 to merging it. |
Hello @mhrivnak! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 19, 2017 at 17:54 Hours UTC |
af6ad1e
to
e74a4b7
Compare
I decided to rebase the PR onto latest 3.0-dev just so it doesn't get too stale, particularly since the un-nesting work had a substantial impact. I also updated the gist containing tests. I also addressed all the feedback on the PR. I did not implement any new behavior, such as adding a boolean value to the version model, since it seems like discussions may not be complete about what to name it and how it should be used. That seems easy enough to implement as an additional change. |
@@ -61,10 +60,12 @@ class ChangeSet: | |||
>>> | |||
""" | |||
|
|||
def __init__(self, importer, additions=(), removals=()): | |||
def __init__(self, importer, version, additions=(), removals=()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this become s/version/repo_version/ ? I think that would be clearer and also the same name as the attribute.
@@ -25,7 +25,7 @@ class Importer(PlatformImporter): | |||
:class: `pulpcore.plugin.serializers.repository.ImporterSerializer`. | |||
""" | |||
|
|||
def sync(self): | |||
def sync(self, new_version, old_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just passing in new_version and leaving out old_version? The plugin code can easily query for other versions as it needs them. Most cases I believe will only require new_version. Also what about calling it repo_version
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old_version
is how a plugin will determine what content is currently in the repo, which is required information when trying to decide what to add and remove. Given that, every sync operation will require both of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as new_version is created, it will implicitly have all of the same content. Couldn't the plugin writer look at new_version and answer the same question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_version may change as the sync does what it does, so it's not a reliable source of information about what was in the previous version. The previous version is explicitly the immutable content set that the plugin needs in order to create a new content set. So I think the simplest and least-confusing thing to do is give the plugin a direct reference to the content set it is interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to adopt this for now (since it works and users can sync pulp_file repos) and then meet up to develop a comprehensive design around the plugin api for repo versions when we address the plugin writer user stories. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. It would be a big step forward. We can fix the other parts based on the user stories.
""" | ||
|
||
created = models.DateTimeField(auto_now_add=True) | ||
|
||
publisher = models.ForeignKey('Publisher', on_delete=models.CASCADE) | ||
|
||
version = models.ForeignKey('RepositoryVersion', on_delete=models.CASCADE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about calling this repo_version
for clarity?
@@ -30,6 +31,12 @@ def sync(importer_pk): | |||
if not importer.feed_url: | |||
raise ValueError(_("An importer must have a 'feed_url' attribute to sync.")) | |||
|
|||
new_version = models.RepositoryVersion(repository=importer.repository) | |||
new_version.save() | |||
old_version = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think old_version can come out entirely of the interface.
The latest change renames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on merging this and then finishing any final things as follow-up work. I think overall it's a huge step in the right direction. Thank you so much @mhrivnak because this will significantly benefit the Pulp community.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the REST api changes in this PR and pulp/pulp_file#20. I was able to sync using an importer to produce a repository version 1. I could view the content in repository version 1. I also confirmed that the endpoints for content added and removed work.
…dded Repo version squash should be resolved according to this design: pulp#3228 (comment) fixes pulp#3233 https://pulp.plan.io/issues/3233
…dded Repo version squash should be resolved according to this design: pulp#3228 (comment) fixes pulp#3233 https://pulp.plan.io/issues/3233
…dded Repo version squash should be resolved according to this design: pulp#3228 (comment) fixes pulp#3233 https://pulp.plan.io/issues/3233
This is in good condition, only missing docs and release notes. Please see email on pulp-dev@ for details and context, and I greatly appreciate feedback, questions and review.