Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadMemberGroups() #381

Closed
wants to merge 1 commit into from
Closed

loadMemberGroups() #381

wants to merge 1 commit into from

Conversation

joshuaadickerson
Copy link
Contributor

Signed-off-by:Joshua Dickerson joshua.a.dickerson@gmail.com

Signed-off-by:Joshua Dickerson <joshua.a.dickerson@gmail.com>
@norv
Copy link
Contributor

norv commented May 10, 2013

I'd suggest to always create a new branch from master, when you're working on more different things/PRs.

If a commit (like this) is related to another and based on it - like this is based on loadMemberGroups() in the first place - then just checkout that branch and continue on it.

@norv norv mentioned this pull request May 10, 2013
@norv
Copy link
Contributor

norv commented May 10, 2013

Actually. I'd suggest to try to use instead:
https://github.com/elkarte/Elkarte/blob/master/sources/subs/Membergroups.subs.php#L890

@joshuaadickerson
Copy link
Contributor Author

getBasicMembergroupData() is not exactly the same and really, should probably be deprecated when you have all of the groups already.

@eurich
Copy link
Member

eurich commented May 10, 2013

Josh, please don't get offended but IMHO it's bad practice to fetch all data and keep them in memory if you don't need them at all.(at least php memory_limit could cause trouble)

getBasicMembergroupData() isn't a perfect designed function but it allows you to filter the groups via SQL. Feel free to improve that function or replace it with a better alternative, but with the ability to filter groups (include certain groups, exclude groups) via SQL.

@joshuaadickerson
Copy link
Contributor Author

I'm not offended but it isn't bad practice. It is common practice. In fact, SMF does the same thing with $context. Get a lot of data from your data store, keep it for the life of the script, destroy it when the script dies. Think about the amount of RAM you're talking about as opposed to the amount of RAM for other operations. If you held all of the categories and boards for the SMF community in one array, how much RAM do you think that equates to? Say 200 boards, 13 categories, maybe around 60k (very rough estimation). That is the size of one long forum post. If you have a forum that large, you should dedicate a lot more RAM anyway. SM.org has a ridiculous amount of boards. A more likely scenario is probably a tenth that size.

The entire idea behind all of this is that databases don't scale well. However, cache systems and PHP servers scale as easily as adding a load balancer and more nodes. Say you are on that shared host or VPS. Now you want to scale. You could move your entire application to a dedicated host or ask your host to give your database more resources. Or, you could increase the caching and add more of those $5/m PHP servers and connect them to one database. Now say you are at AWS - you could spend more for a better RDS instance or you could add some more cheap EC2 instances when you need to. Adding EC2 instances also helps with reliability, whereas scaling up a database doesn't. Say you are on a single dedicated host. You have your MySQL database, Nginx, and PHP running on the same machine with no virtualization. You gave your database tons of RAM but it doesn't want to use it. You aren't technically savvy so you don't know that there are a 1000 tuning variables and they change every version. The easy solution is to add caching instead of worrying about tuning your database.

The next point is that caches should have as few variables in their keys as possible so that they hit more often. If you are caching each permutation of a set of data, you will have a huge overhead of caching. I am not just talking about user-variable caches like memcached, I am also talking about MySQL's query cache. When you change the query, you change the likelihood that it will hit in that cache. Even shared hosts have QC enabled.

Finally, technical debt. The more LOC, the harder it is to maintain. The more conditions, the more you have to think about all of the ways it can do something else. If you make it simple and just say "here goes everything you could want to know about member groups," the developer knows that information will always be available. That's really what's important - development speed.

As for this function, it allows you to filter via PHP just the same. I committed that somewhere. The one thing a little more difficult is sorting, but I can write that in PHP as well.

@norv
Copy link
Contributor

norv commented May 10, 2013

Re:

The more conditions, the more you have to think about all of the ways it can do something else. If you make it simple and just say "here goes everything you could want to know about member groups," the developer knows that information will always be available. That's really what's important - development speed.

The information is always available: it's a call to getBasic_etc() away. Call it, you have the info.
Instead, in your version, the client code developer has to implement filtering. Again. And again. That's meaningless, when you can call a function, set the right params, and there you have it.

That's the whole purpose of designing .subs functions more like an API: reusability. The client code doesn't have to reinvent implementation of some filter or whatnot, to take from the store only what they need. In your version, all clients, both core controllers, and add-ons, would have to figure out how to split that thing for their needs, over and over.
That's not less development time.
That's more development time.

@norv
Copy link
Contributor

norv commented May 10, 2013

Re:

The next point is that caches should have as few variables in their keys as possible so that they hit more often. If you are caching each permutation of a set of data, you will have a huge overhead of caching.

Agree. We shouldn't cache each permutation of a set of data. But here's the thing: lets take a parametrized function like this getBasic_stuff($kind_of_param, $another_kind_of_params). (Assuming params are very closely related, conceptually, to the meaning of the function, equals to the common needs of the client code. (if not, further refactoring.))

We might start with caching the default call. Hopefully, the default call is the most commonly used. (= params set to their default values)
In time, we can figure out (we might know already, or find calls through the code, or we can get reports, or we can add logging and find out, or you name it) that: 3-4 other calls are very widely used. We can add caching for those. The conditions only depend on params. That caching, we add it to the .subs function only. Tweak it there.
For user installations, any update will deliver it. Or an experimental 'big boards' set of .subs files. Or a performance tweaks and tips.

@norv
Copy link
Contributor

norv commented May 10, 2013

Re:

If you held all of the categories and boards for the SMF community in one array, how much RAM do you think that equates to? Say 200 boards, 13 categories, maybe around 60k (very rough estimation). That is the size of one long forum post.

This is the crux of your argument, and it's interesting. (even if the numbers may be off)
Categories is probably the best example: it's indeed very unlikely that forums will have more than a small number. So, if I understand you correctly. You're saying it's better if we just don't bother with a smaller subset of categories, when the difference to the whole is negligible anyway.

I can't estimate it, but yes, it's possible... in a very limited set of examples.
My opinion: the argument is correct, but this implementation is off.
Look at it this way: we use the exact reasoning in other cases, such as exactly getMembergroups, I believe (and if we don't, we will, and so will addons), when we choose a subset of columns and data, and never a smaller subset. i.e. if you need only name and id, you still call the function for, lets say, like 2-3 columns more. That's fine, IMO, and within the limits of reasonable.

I think there is really a consensus here, as you can see throughout the discussions, that we shouldn't take a higher threshold than that. If you disagree, numbers please. (testing, benchmarking, something we miss).

To put it differently: the .subs functions should be designed to serve well the common cases, and support the rest. Caching should be added to them. In each case, on a case by case basis. I don't think we can generalize, nor we should, to something like "all filtering is useless, and if it's useful, we implement it always in php".
Moreover. In these .subs functions, perhaps in a particular case it's better to simplify the query and add some php around. Dunno. On a case by case basis: lets see where, with what results. All in those functions, not by adding custom processing to client/controller code.

@norv
Copy link
Contributor

norv commented May 11, 2013

Now say you are at AWS

I don't know. I'd like Elk to be ready for cloud-based hosting conditions, but I can't estimate those conditions. I haven't worked with them. (well not more than negligible)

Here's a suggestion, on membergroups: (this is only a suggestion for you and Elk-based app, if you're willing to consider it)

  • try to replace implementation of getBasicMembergroupData() with: the simplified query you propose, cache that, and add PHP fiter/sort
  • refactor queries throughout controllers, to use this function. (lets tweak/improve its interface (I mean signature) if not enough yet, or add another if we don't have yet, for what maybe it doesn't cover)
  • use your version of getBasicMembergroupData() through your app.
  • in the ideal case, your only change to Elk, on this, would be the implementation of this function. (well, that, and joins)

If it's possible and you're willing to consider it, it would help at many levels: you would use Elk code unmodified for the rest, which simplifies your keeping up to date process. And still benefit from your wanted PHP implementation of query conditions. Let us know how it performs in time. You could even switch implementations and compare/test/benchmark/whatnot, in much better conditions.

@eurich
Copy link
Member

eurich commented May 11, 2013

Thank you Norv for the detailed explanations :)

@joshuaadickerson
regarding developement speed: if you want to increase developement speed to a maximum for mod developers, you should IMHO add a large set of STORED PROCEDURES and table VIEWS to the database:

Just a short sample to show you the benefits:
DELIMITER //
CREATE PROCEDURE gimmeAdmins()
BEGIN
SELECT id_member, member_name FROM elkarte_members WHERE id_group = 1 OR FIND_IN_SET(1, additional_groups);
END //
DELIMITER ;

all you need is as simple as that: mysql_query("CALL gimmeAdmins();"); and you're done ;)

@joshuaadickerson
Copy link
Contributor Author

Oh hell no. Stored procedures and views are the opposite of what you want
when speeding up development. First off, debugging is like finding the
answer to life. It doesn't scale at all. If you want to test them, you need
to either copy all of your data or change all of the instances of the call.
Does SQLite even allow them? I know this from an awful experience of
dealing with maintaining 1000s of them. If what you're after is
centralizing them, use a dbal with good models.

On Sat, May 11, 2013 at 7:22 AM, Thorsten Eurich
notifications@github.comwrote:

Thank you Norv for the detailed explanations :)

@joshuaadickerson https://github.com/joshuaadickerson
regarding developement speed: if you want to increase developement speed
to a maximum for mod developers, you should IMHO add a large set of STORED
PROCEDURES and table VIEWS to the database:

Just a short sample to show you the benefits:
DELIMITER //
CREATE PROCEDURE gimmeAdmins()
BEGIN
SELECT id_member, member_name FROM elkarte_members WHERE id_group = 1 OR
FIND_IN_SET(1, additional_groups);
END //
DELIMITER ;

all you need is as simple as that: mysql_query("CALL gimmeAdmins();"); and
you're done ;)


Reply to this email directly or view it on GitHubhttps://github.com//pull/381#issuecomment-17758329
.

@eurich
Copy link
Member

eurich commented May 11, 2013

naah, I don't have any intentions to integrate stored procedures or views in Elk ;) I reckon it wouldn't work for most forums because most hosts don't allow more than just the basics.. Even CREATE TEMPORARY TABLE is sometimes a problem 'cause of missing mysql permissions.

The smaller functions in subs are often somewhat similar compared to a stored procedure.. just throw in some infos (a CALL for the procedure, a parameter for the function) and you get exactly the data you wanted in the first time. It's absolutely fine the way it is.

@norv
Copy link
Contributor

norv commented May 17, 2013

This is still valid, as refactoring issue: those queries don't belong there, and it's nice to see them all at once. But the solution is not acceptable (and the function doesn't exist in Elk core). @joshuaadickerson I suggest we need them updated to getBasicMembergroupData() wherever possible.
(no idea off-hand if the function itself might need some update)

@joshuaadickerson
Copy link
Contributor Author

loadMemberGroups() should remain and getBasicMembergroupData() is still not what should fill the void, but a getter for this should implement filtering.

@norv
Copy link
Contributor

norv commented May 17, 2013

If getBasicMembergroupData() is not entirely suitable for this refactoring (which I'm not sure at a glance), perhaps you can propose an update for it?
It is a getter. And it implements filtering.

@eurich
Copy link
Member

eurich commented Jul 6, 2013

Closing this one. IIRC Norv cherry-picked the important ones and the PR is completely outdated.

@eurich eurich closed this Jul 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants