Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygon Cleanup #387

Closed
wants to merge 1 commit into from
Closed

Conversation

foxox
Copy link

@foxox foxox commented Mar 16, 2019

I have been using SolveSpace a lot to design things for my 3d printer recently and I'm a professional software engineer so I thought I would take a look at the code.
I cleaned up various things in the Polygon component... const correctness, project code format adherence, int types, pass by const reference for nonprimitives, readability improvements, etc.
These changes should make the cost more robust to bugs in the future and may even have some performance impact.
If you like this I may do the same with other files later.
I can squash commits after review if you like.

@CLAassistant
Copy link

CLAassistant commented Mar 16, 2019

CLA assistant check
All committers have signed the CLA.

@whitequark
Copy link
Contributor

Thank you for spending time to improve SolveSpace. However, I would ask you that you consult the maintainers before making far-reaching, sweeping changes.

There are a few serious issues with this PR that would prevent it from being merged, and which are general for any changes that can be applied:

  • Every commit should do one specific change with one specific purpose.
  • Every commit should have a commit message that clearly states why the change is done.
  • SolveSpace is a mature codebase that is written in a somewhat unorthodox dialect of C++. We do not change it for the sake of change or for the sake of making it slightly more elegant, since the amount of churn and bugs introduced due to typos and other simple mistakes outweighs the benefits.
  • In particular, as an extension of the above, SolveSpace does not have a large testsuite. We heavily rely on git annotate for determining what commit introduced some code, and use that for determining how it is supposed to work. Sweeping changes like modifying indentation, adding braces, etc break this completely, which would make further maintenance much harder.

Specific to this PR:

  • There is no good justification for making local variables (themselves) constant. In a better language (like Rust) this would be the default, but we already have our codebase in C++, and changing almost every single variable declaration does not justify the minimal (if non-zero at all) amount of bugs it would highlight.
  • There is absolutely no point in replacing i++ with ++i when i is POD. If there is a compiler where the latter is more efficient, then this compiler is explicitly not supported for building SolveSpace. Trivial optimizations can always be relied on happening. (I'm pretty sure this is also true for non-POD cases at this point, but I'd have to check.)

Cleaned up various things in the Polygon component... const correctness, code format, readability improvements, etc.
@foxox
Copy link
Author

foxox commented Mar 16, 2019

@whitequark Pull requests exist to facilitate this kind of conversation -- thank you. In my day job, a 300 line cleanup PR to one component is nothing... gets reviewed by 2 or 3 other people, they leave a few comments, I push commits in response, get approval, squash the commits into one, it gets merged, and everyone is glad for it. I don't often contribute to open source projects, so your response was a bit surprising. Since you maintain this project, I'll concede on most points since they don't matter.

However, I disagree about const correctness. Const correctness should be applied to all code, even local variables. If a variable is not meant to be mutated after it is initialized, it should be const. It makes the code easier to read and easier to change because it shows the author's intent and (as you mentioned) helps catch bugs. Rust is irrelevant... const is a core C++ feature and it's there to be used. C++ coding standards like HIC++ or the JSF standard or whatever typically include a requirement for const correctness in all situations.

Also, "churn" isn't a problem if code follows best practices and is documented properly. Would things be different if there were more unit tests in this project?

@whitequark
Copy link
Contributor

In my day job, a 300 line cleanup PR to one component is nothing... gets reviewed by 2 or 3 other people, they leave a few comments, I push commits in response, get approval, squash the commits into one, it gets merged, and everyone is glad for it. I don't often contribute to open source projects, so your response was a bit surprising.

This is common in certain open source projects with a large team of developers typically paid by top tech companies maintaining them, for example, LLVM. However, SolveSpace has one maintainer--me--I am not paid (in fact I'm beyond broke at this point), and I have health issues that limit the amount of time I can spend on FOSS. I definitely do not have the bandwidth to review thousands of lines of cleanups that just check a box in some coding standard.

C++ coding standards like HIC++ or the JSF standard or whatever typically include a requirement for const correctness in all situations.

Coding standards are not a gospel but a tool in a large array of tools that improve code quality. SolveSpace does not and will not blindly adhere to any particular coding standard or rule unless there's demonstrable evidence that following it (and changing code to follow it) does more good than harm.

If you look through the git log, there is a number of commits that justify a few sweeping changes in great detail, e.g. the commit that changed the initialization style. I expect any cleanups to justify themselves at least that well, including any churn they cause.

Also, "churn" isn't a problem if code follows best practices and is documented properly. Would things be different if there were more unit tests in this project?

I've spent something like 1.5 years learning how to maintain this project without breaking it, and about half of my time was spent fixing bugs that were the direct result of various "cleanups" and "consistency improvements". This codebase is sparsely tested--almost untested, really--and has a very large amount of hidden invariants. Your approach is unfortunately not applicable to a legacy codebase like this.

In short, my direct experience doing what you are trying to do in this PR results in my understanding that it is undesirable and infeasible to do while removing more bugs than adding them. This, of course, could change if this codebase would become well-documented and well-tested to the point where any trivial errors introduced in bulk refactoring would be caught by the test suite. That is not the case today.

@whitequark
Copy link
Contributor

whitequark commented Mar 16, 2019

However, I disagree about const correctness. Const correctness should be applied to all code, even local variables. If a variable is not meant to be mutated after it is initialized, it should be const. It makes the code easier to read and easier to change because it shows the author's intent and (as you mentioned) helps catch bugs.

To expand on this, I agree with you in principle, and, although I do not use this style in my other C++ projects, I would be open to using it in a greenfield codebase. But I don't think that const correctness for locals justifies a commit that would break git annotate for almost the entire codebase; git annotate is a tool that is critical to understanding the code, so it's not going to happen.

@Evil-Spirit
Copy link
Collaborator

Evil-Spirit commented Mar 18, 2019

@foxox const correctness is something you can apply for projects created from scratch. Even in this case, you can't link any third-party libraries without pain. Your programming skills and time will start working on const-correctness maintaining only. If sometime someone show you what really means to follow some code style exactly how it written, you will be scared about typing every space and newline. You will be disarmed completely. My way was a hard - I was stucked inside premature optimizations, const-correctness, template magic, and so on. Pure programming without a result. But I worked with some people who passed through this. Now I don't bother about optimization - there is profiler to show me what's wrong with, I don't bother because of codestyle - I can follow any codestyle. I don't bother about programming language - I can use any. I can blame any code and make it more optimal in some context - make it faster, make it shorter, make it more flexible, or more understandable. But I can't do it all simultaneously - only balance can be reached.

@whitequark
Copy link
Contributor

@Evil-Spirit Thanks, I completely forgot about the issue with 3rd party libraries, which we've already hit.

Copy link
Contributor

@bcmpinc bcmpinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see quite a few changes that hurt readability, possibly hurt performance or increase the risk of introducing bugs. For a code cleanup, improving readability should be your primary concern. Applying coding guidelines, can be useful, but does require you to understand their purpose and underlying considerations.

W.r.t. const, I hope you understand that C++ does not really have a concept of constant variables. The only thing that const does is tell the compiler it should create an error whenever you try to assign to that variable name. The variable might still be modified through other means. Hence for compilation and optimization purposes, the compiler completely ignores const. I suppose that those errors are still useful in commercial code, where code quality often varies wildly and can be extremely low, often due to time constraints, frequent deadlines and a lack of maintenance, but this is not commercial code.

@@ -180,14 +184,14 @@ class STriangle {
Vector normals[3];
};

static STriangle From(STriMeta meta, Vector a, Vector b, Vector c);
static STriangle From(const STriMeta& meta, const Vector& a, const Vector& b, const Vector& c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass STriMeta and Vector as references? They are small POD-type classes, used in a similar fashion as an int. Just like writing foo(const int& bar) instead of foo(int bar) is less readable and probably slower, so is using const STriMeta& meta as argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the cutoff? How big does a type need to be before it's worthwhile to pass by reference/pointer instead of by value? I can find a number of references online indicating that it's good practice in C++ to pass any non-primitive type by reference. Has this stuff been profiled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this stuff been profiled?

Yes, and we've converted most functions on the hot paths to pass POD classes that are larger than a pointer to pass-by-const-reference.

for(i = 0; i < l.n; i++) {
int SPointList::IndexForPoint(const Vector& pt) const {
size_t i;
for(i = 0; i < l.n; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a function returning a signed int value, why define the variable that is used as the return value as an unsigned int (size_t)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this function should probably initialize int i with -1 and return it at the end

if(se->EdgeCrosses(a, b, pi, spl)) {
inters++;
++inters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizations like these are so easy that a compiler does them for you. A compiler is probably much better at that too, so don't try to outsmart it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a matter of outsmarting the compiler. Pre-increment is the preferred operation in this context because it is simpler, which makes it more readable/understandable and less prone to errors introduced in the future. Ex: imagine if something like this would need to be done in the future: const size_t num_inters_this_iteration = inters++; It's easy to make this kind of bug when post-increment is used habitually. It's a shame that the language wasn't called ++C

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that pre-increment is simpler or more readable in this context (indeed, since no other codebase I work on uses pre-increment pervasively, it trips me up each time), but more importantly, you are not considering the unintended consequences of making such a change globally. Namely:

  • it is not semantics-preserving, and so it will introduce bugs, and we don't have nearly enough test coverage to catch them;
  • it screws up git blame, and we don't have nearly enough documentation to not rely on that.

Therefore this change and changes similar to it are not useful, but are harmful.

bool STriangle::ContainsPointProjd(const Vector& n, const Vector& p) const {
const Vector ab = b.Minus(a);
const Vector bc = c.Minus(b);
const Vector ca = a.Minus(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting these out over multiple lines is indeed useful. In this case it makes it easier to check that they all follow the same pattern xy = y.Minux(x) where x, y are replaced with a, b or c.

void FixContourDirections();
void Clear();
bool SelfIntersecting(Vector *intersectsAt) const;
bool IsEmpty() const;
Vector AnyPoint() const;
void OffsetInto(SPolygon *dest, double r) const;
void OffsetInto(SPolygon *dest, const double r) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether r is modified inside OffsetInto is an implementation detail and should not be exposed in the header file.

bool IsClockwiseProjdToNormal(const Vector& n) const;
bool ContainsPointProjdToNormal(const Vector& n, const Vector& p) const;
void OffsetInto(SContour *dest, const double r) const;
void CopyInto(SContour * const dest) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, you define dest as const, but in the implementation dest is clearly modified. Yes, I understand it is the pointer which is const, not the object being pointed to. However it takes time to realize that, so here, using const hurts readability.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this basic C++?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is nothing but visual noise in the header (the constness of the argument inside the function has no bearing whatsoever on its public interface; yet another design mistake in C++).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's like writing void foo(const int param). No one writing code like this.

for(i = 0; i < (l.n - 1); i++) {
void SContour::MakeEdgesInto(SEdgeList * const el) const {
size_t i;
for(i = 0; (i + 1) < l.n; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop condition here is correct, but written in a non-usual way to avoid underflow errors (size_t is unsigned). You need to be careful about this when using size_t, which makes this type prone for introducing errors and thus should be avoided.

@whitequark
Copy link
Contributor

@bcmpinc Thank you, this review reflects my thoughts as well.

@whitequark
Copy link
Contributor

I believe this kind of cleanup is unproductive and so I am going to close this PR. The reasons are outlined above multiple times and I don't have anything to add to that.

I would, of course, be happy to collaborate with you on other changes that take into account the requirements imposed by the project's history and maintenance needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants