Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Adds repository versions #3228

Merged
merged 1 commit into from Dec 19, 2017
Merged

Adds repository versions #3228

merged 1 commit into from Dec 19, 2017

Conversation

mhrivnak
Copy link
Contributor

This is in good condition, only missing docs and release notes. Please see email on pulp-dev@ for details and context, and I greatly appreciate feedback, questions and review.

Fields:
number (models.PositiveIntegerField): A positive integer that uniquely identifies a version
of a specific repository. Each new version for a repo should have this field set to
1 + the most recent version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the latest version is deleted and then a new version is created. It would have the same version number as the deleted version. I think a better idea would be to store a field (e.g. latest_version) on the repository.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in delete_version() you prevent users from deleting the latest to prevent the problem I mention.

@daviddavis
Copy link
Contributor

Looking at the squashing code, it looks like it possible that multiple RepositoryContent records could point to the same version. This means you might see a version in which a content unit displays in the added list but is not in the repo version, or is displayed in the removed list but is in the repo version. We may want to clean up RepositoryContent records or filter them out in added()/removed().

@mhrivnak
Copy link
Contributor Author

@daviddavis I don't think I quite follow that. Could you maybe add an example of the scenario you're describing?

@daviddavis
Copy link
Contributor

daviddavis commented Nov 28, 2017

Sure, for the first case, suppose you add a content unit in version 1 and remove it in version 2. Then you delete version 1. Now you have a RepositoryContent where vadded and vremoved are both '2'. When I see the list of 'added' units in the API for version 2, the repository version shows the content unit but it's not actually in the version's content.

The reverse case would require version 1 to add the unit, version 2 to remove the unit, and version 3 to re-add the unit. Then I remove version 2. Now I have 2 RepositoryContent records pointing to version 3—one where vremoved is version 3 (but vadded is still version 1) and one where vadded is version 3. So the unit appears in 'removed' but is still in version 3's content.

This is really a minor problem though and perhaps something we can live with for the MVP.

@mhrivnak
Copy link
Contributor Author

suppose you add a content unit in version 1 and remove it in version 2. Then you delete version 1. Now you have a RepositoryContent that has vadded and vremoved is both '2'.

That RepositoryContent record would get deleted during the squash for the reasons you're bringing up. You can find it from this comment in that task: "# delete any relationships added in the version being deleted and removed in the next one."

The reverse case would require version 1 to add the unit, version 2 to remove the unit, and version 3 to re-add the unit. Then I remove version 2. Now I have 2 RepositoryContent records pointing to version 3—one where vremoved is version 3 (but vadded is still version 1) and one where vadded is version 3. So the unit appears in 'removed' but is still in version 3's content.

That case is interesting. Everything will behave correctly as you observed, because the content sets for each version will continue to show the right memberships. It's just weird to show the same content being removed and added by the same version.

Probably the best way to handle that is at the end of the squash task, query for any content that was both added and removed by version 3, and fix those relationships up. I think that would mean deleting the RepositoryContent(vadded=3, vremoved=None) record, and setting vremoved=None on the other record.

@mhrivnak
Copy link
Contributor Author

@daviddavis I added "test_7_add_remove_add" to the gist to test that it behaves correctly in the case you described.

https://gist.github.com/mhrivnak/69af54063dff7465212914094dff34c2#file-vrtest-py-L225

repository = models.ForeignKey(Repository, on_delete=models.CASCADE)
vadded = models.ForeignKey('RepositoryVersion', related_name='added_memberships')
vremoved = models.ForeignKey('RepositoryVersion', null=True, related_name='removed_memberships')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the memberships theme in related_name. Can you help me understand the thinking here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe another word would be better. The idea is that on a repo version object, there will be a django manager attribute that corresponds to all of the RepositoryVersion objects that add something for that version. What should that attribute be named? added_repository_contents seems awkward. Thinking of these objects as "memberships" seemed like a reasonable noun to replace "repository_contents". But introducing a new word might just make things worse.

What do you think it should be called? Perhaps added_repository_contents isn't so bad?

repository = models.ForeignKey(Repository)
number = models.PositiveIntegerField(db_index=True, default=1)
created = models.DateTimeField(auto_now_add=True)
action = models.TextField(choices=ACTIONS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value if tracking the actions and recommend removing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I'm +1 to removing it.

content (pulpcore.plugin.models.Content): a content model to add
"""
# duplicates are ok
with suppress(IntegrityError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 suppress


def get_url(self, obj, view_name, request, format):
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing description in docstring.

view_name='versions-removed',
lookup_field='number', parent_lookup_kwargs={'repository_pk': 'repository__pk'},
)
number = serializers.IntegerField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 number

@@ -80,4 +88,8 @@ def sync(importer_pk):
'repository': importer.repository.name,
'importer': importer.name
})
importer.sync()
importer.sync(new_version, old_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

try:
    importer.sync(new_version, old_version)
except Exception:
    new_version.delete()

Another thing to consider is that when the new version is saved without transactions, it will be visable in a half-baked state for the duration of the sync. This might be bad for read operations not locked by reservation like applicability calculation.

log.info('The repository version was not found. Nothing to do.')
return

log.info('Deleting and squashing version %(v)d of repository %(r)s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i18n

next_version = version.next()
except models.RepositoryVersion.DoesNotExist:
# do something friendly here
log.info('Cannot delete and squash version until a newer one is created.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i18n

repo_relations.filter(vremoved=version).update(vremoved=next_version)

# With no more relationships remaining, delete the version.
version.delete()
Copy link
Contributor

@jortel jortel Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Like how simple the squash is.

{
'repository': publisher.repository.name,
'publisher': publisher.name
'publisher': publisher.name,
'version': version.num,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe s/num/number/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -221,25 +207,147 @@ class RepositoryContent(Model):

content (models.ForeignKey): The associated content.
repository (models.ForeignKey): The associated repository.
vadded (models.ForeignKey): The RepositoryVersion which added the referenced Content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a personal preference but I prefer version_added over vadded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

(REMOVE, 'Remove'),
)

repository = models.ForeignKey(Repository)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a user_visible field that defaults to False. It is set to true as the last step of the a sync before recording it on the task status as the 'created_resource'. This is written in the MVP use cases some. This resolves the inconsistant state comments below. Note that the viewset should not display user_visible; to the user it doesn't exist.

Copy link
Member

@bmbouter bmbouter Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the MVP talks about having a 'add_many' and a 'remove_many' both of which would be list href attributes to be added and removed in bulk via a POST to the RepositoryVersion detail. To add them I think 3 things need to happen:

  1. They need to be recorded at the DB level as data provided by the user for the creation of this repo version.
  2. During the sync, after the creation of the repo version, core should associate the add_many units and remove the remove_many units before calling into any specific importer.

Copy link
Contributor Author

@mhrivnak mhrivnak Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a user_visible field that defaults to False. It is set to true as the last step of the a sync before recording it on the task status as the 'created_resource'. This is written in the MVP use cases some. This resolves the inconsistant state comments below. Note that the viewset should not display user_visible; to the user it doesn't exist.

I like that idea. We've talked at some point about a similar thing for publications also.

There's also a use case for recovery from an incomplete operation, for example a sync that crashed. I think it would go like this: When the core is about to make a new version, it would look for any existing versions with this boolean set to False and delete them. Then it would proceed with making a new version and handing it to the appropriate plugin code.

Any other thoughts on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the "add_many" and "remove_many" comment, I'm just not following. Can you clarify? Did you mean to say "RepositoryVersion" instead of "RepositoryContent"? What's the user story behind this? Is it "As a user, I can add and remove content in a repository by POSTing the representation of a new RepositoryVersion to the repo version collection endpoint" or something like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the design you propose regarding deletion. That is a better design because it makes Pulp more crash-safe. I think we are planning to do the exact same thing in Publications (from the MVP call convo).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I mean "RepositoryVersion" not "RepositoryComment" I updated the comment.

Sure. The use case idea is to have this fulfil both move and copy and associate/unassociate. For instance you upload and create N content units, you could get them into a repo with add_many=['href1', 'href2', 'href3'] etc. Or copy these Y units into a repo, or remove these Z units, etc. It's immutable so the only thing the user can do is to add and/or remove content when making a new one anyway. Another option they can use in conjunction w/ add_many, remove_many is to call an importer and have it import content. It's a strange use case, but the idea is the options could all be used together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the sync, after the creation of the repo version, core should associate the add_many units and remove the remove_many units before calling into any specific importer.

Are you suggesting a user would supply "add_many" and "remove_many" to a sync operation? I don't see how those would work together. I would expect those two options to be parameters for a different kind of task besides a sync.

@bmbouter
Copy link
Member

bmbouter commented Dec 7, 2017

Is this a nested implementation where each version detail URL is "nested" under a repository? Since the urls have been flattened I think this is the one area where nesting them makes sense.

(REMOVE, 'Remove'),
)

repository = models.ForeignKey(Repository)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the MVP calls out removing the association of an importer with any specific repo, and replace it with an attribute on a repo called default_importer that is a foreign key to a single importer. A post can specify an 'importer' setting which tells the sync code to call that importer to finish the repo version. If 'importer' is not specified, but the associated importer has default_importer set, then use that as the importer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly there is an idea to use a base_version. This base_version would be an option the user could specify. If unspecified it defaults to the latest version of the associated repo. If specified as None then the new version starts with no content. If specified as a href to a version, the repo version begins with the content from that version before applying add_many or remove_many or calling into an importer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the MVP calls out removing the association of an importer with any specific repo, and replace it with an attribute on a repo called default_importer that is a foreign key to a single importer. A post can specify an 'importer' setting which tells the sync code to call that importer to finish the repo version. If 'importer' is not specified, but the associated importer has default_importer set, then use that as the importer.

Can you elaborate on how that change is relevant to this code? I'm just not seeing the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly there is an idea to use a base_version. This base_version would be an option the user could specify. If unspecified it defaults to the latest version of the associated repo. If specified as None then the new version starts with no content. If specified as a href to a version, the repo version begins with the content from that version before applying add_many or remove_many or calling into an importer.

Interesting idea. What are the use cases motivating it?

I assume that behavior would mostly happen in task code, and not here on the model, right?

And would this be a 3.1+ idea?

@mhrivnak
Copy link
Contributor Author

mhrivnak commented Dec 8, 2017

Taking a step back, I propose that this PR remain at approximately the same level of scope it currently has. Additional behaviors such as POSTing to create new versions more directly can be cleanly separated into follow-on stories and PRs. That will let us quickly get the base behaviors and data structures in place, unblocking other work such as that of plugin writers. I think it'll also make it easier to plan and implement those additional features in a focused way.

I can take a stab at adding the user_visible boolean and related behavior, which is quite straight-forward.

How does that sound?

@daviddavis
Copy link
Contributor

@mhrivnak I think that sounds fine.

@bmbouter
Copy link
Member

bmbouter commented Dec 8, 2017

@mhrivnak I agree, we can handle all of the followon work in separate PRs. Can the conflicts be fixed up and I'm +1 to merging it.

@pep8speaks
Copy link

pep8speaks commented Dec 17, 2017

Hello @mhrivnak! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 19, 2017 at 17:54 Hours UTC

@mhrivnak mhrivnak force-pushed the vr-updated branch 2 times, most recently from af6ad1e to e74a4b7 Compare December 17, 2017 20:13
@mhrivnak
Copy link
Contributor Author

I decided to rebase the PR onto latest 3.0-dev just so it doesn't get too stale, particularly since the un-nesting work had a substantial impact. I also updated the gist containing tests.

I also addressed all the feedback on the PR. I did not implement any new behavior, such as adding a boolean value to the version model, since it seems like discussions may not be complete about what to name it and how it should be used. That seems easy enough to implement as an additional change.

@@ -61,10 +60,12 @@ class ChangeSet:
>>>
"""

def __init__(self, importer, additions=(), removals=()):
def __init__(self, importer, version, additions=(), removals=()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this become s/version/repo_version/ ? I think that would be clearer and also the same name as the attribute.

@@ -25,7 +25,7 @@ class Importer(PlatformImporter):
:class: `pulpcore.plugin.serializers.repository.ImporterSerializer`.
"""

def sync(self):
def sync(self, new_version, old_version):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just passing in new_version and leaving out old_version? The plugin code can easily query for other versions as it needs them. Most cases I believe will only require new_version. Also what about calling it repo_version instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old_version is how a plugin will determine what content is currently in the repo, which is required information when trying to decide what to add and remove. Given that, every sync operation will require both of these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as new_version is created, it will implicitly have all of the same content. Couldn't the plugin writer look at new_version and answer the same question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_version may change as the sync does what it does, so it's not a reliable source of information about what was in the previous version. The previous version is explicitly the immutable content set that the plugin needs in order to create a new content set. So I think the simplest and least-confusing thing to do is give the plugin a direct reference to the content set it is interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to adopt this for now (since it works and users can sync pulp_file repos) and then meet up to develop a comprehensive design around the plugin api for repo versions when we address the plugin writer user stories. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. It would be a big step forward. We can fix the other parts based on the user stories.

"""

created = models.DateTimeField(auto_now_add=True)

publisher = models.ForeignKey('Publisher', on_delete=models.CASCADE)

version = models.ForeignKey('RepositoryVersion', on_delete=models.CASCADE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling this repo_version for clarity?

@@ -30,6 +31,12 @@ def sync(importer_pk):
if not importer.feed_url:
raise ValueError(_("An importer must have a 'feed_url' attribute to sync."))

new_version = models.RepositoryVersion(repository=importer.repository)
new_version.save()
old_version = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think old_version can come out entirely of the interface.

@mhrivnak
Copy link
Contributor Author

The latest change renames version to repo_version on the Publication model per review.

Copy link
Contributor

@daviddavis daviddavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we can merge this once @bmbouter ACKs and @dkliban finishes testing.

Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on merging this and then finishing any final things as follow-up work. I think overall it's a huge step in the right direction. Thank you so much @mhrivnak because this will significantly benefit the Pulp community.

Copy link
Member

@dkliban dkliban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the REST api changes in this PR and pulp/pulp_file#20. I was able to sync using an importer to produce a repository version 1. I could view the content in repository version 1. I also confirmed that the endpoints for content added and removed work.

@daviddavis daviddavis merged commit 60812ce into pulp:3.0-dev Dec 19, 2017
werwty added a commit to werwty/pulp that referenced this pull request Jan 9, 2018
werwty added a commit to werwty/pulp that referenced this pull request Jan 10, 2018
werwty added a commit to werwty/pulp that referenced this pull request Jan 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants