Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity plugin (WIP) #230

Merged
merged 16 commits into from May 24, 2017
Merged

Similarity plugin (WIP) #230

merged 16 commits into from May 24, 2017

Conversation

ralsina
Copy link
Member

@ralsina ralsina commented May 22, 2017


from __future__ import print_function, unicode_literals

import gensim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a requirements.txt file

vec_lsi = lsi[vec_bow]
sims = index[vec_lsi]
sims = sorted(enumerate(sims), key=lambda item: -item[1])
print(i, sims[:10])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn’t this write to eg. a JSON file? Or otherwise make available to others?

Also, you have no doit tasks here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was very early code :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was very early code :)

'name': out_name,
'targets': [out_name],
'actions': [(write_similar, (out_name, related))],
# 'file_dep': ['####MAGIC####TIMELINE'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to depend on the render_posts task for magic timeline to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhhh good point. I sort of need to do it differently, so I may have to extend that support.

similarity data for **all** posts.
* The more translations you have, the longer it takes.
* My test site contains 1300 posts of varied lengths in 2 languages,
and initialization takes ~90 seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is it really worth it?
  • How powerful is your machine?
  • Is there a way to save the startup data? Even with something like manual pickle/unpickle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is, and I can make this better using setup tasks, which is a feature in doit I did not know/recall.

@Kwpolska
Copy link
Member

The $64,000 question is: how can WordPress plugins generate a list like that dynamically on every request? Perhaps this approach is a bit overdone?

@ralsina
Copy link
Member Author

ralsina commented May 24, 2017

Oh, they have all the text in a DB with FTS. With that, it's easier.

@ralsina ralsina merged commit d235b5e into master May 24, 2017
@ralsina ralsina deleted the similarity branch May 24, 2017 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants