Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helix crashes #611

Closed
ruevs opened this issue May 12, 2020 · 31 comments · Fixed by #612
Closed

Helix crashes #611

ruevs opened this issue May 12, 2020 · 31 comments · Fixed by #612
Labels

Comments

@ruevs
Copy link
Member

ruevs commented May 12, 2020

System information

SolveSpace version: master e84fd46
Operating system: Windows 10
Microsoft Visual C++ 2019 x64

Expected behavior

Create a helix

Actual behavior

Crashes here https://github.com/solvespace/solvespace/blob/master/src/group.cpp#L1033
With

Exception thrown at 0x756707F2 in solvespace.exe: Microsoft C++ exception: std::length_error at memory location 0x00EFC7E8.
Unhandled exception at 0x756707F2 in solvespace.exe: Microsoft C++ exception: std::length_error at memory location 0x00EFC7E8.

Call stack:

 	KernelBase.dll!756707f2()	Unknown
 	KernelBase.dll![Frames below may be incorrect and/or missing, no symbols loaded for KernelBase.dll]	Unknown
 	[External Code]	
>	solvespace.exe!SolveSpace::Group::CopyEntity(SolveSpace::IdList<SolveSpace::Entity,SolveSpace::hEntity> * el, SolveSpace::Entity * ep, int timesApplied, int remap, SolveSpace::hParam dx, SolveSpace::hParam dy, SolveSpace::hParam dz, SolveSpace::hParam qw, SolveSpace::hParam qvx, SolveSpace::hParam qvy, SolveSpace::hParam qvz, SolveSpace::hParam dist, SolveSpace::Group::CopyAs as) Line 1033	C++
 	solvespace.exe!SolveSpace::Group::Generate(SolveSpace::IdList<SolveSpace::Entity,SolveSpace::hEntity> * entity, SolveSpace::IdList<SolveSpace::Param,SolveSpace::hParam> * param) Line 649	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 243	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 197	C++
 	solvespace.exe!SolveSpace::Group::MenuGroup(SolveSpace::Command id, SolveSpace::Platform::Path linkFile) Line 343	C++
 	solvespace.exe!SolveSpace::Group::MenuGroup(SolveSpace::Command id) Line 68	C++
 	[External Code]	
 	solvespace.exe!SolveSpace::Platform::WindowImplWin32::WndProc(HWND__ * h, unsigned int msg, unsigned int wParam, long lParam) Line 1041	C++
 	[External Code]	
 	solvespace.exe!SolveSpace::Platform::RunGui() Line 1679	C++
 	solvespace.exe!main(int argc, char * * argv) Line 29	C++
 	solvespace.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 40	C++

Additional information

Watch the movie ;-)
HelixCrash.zip

@whitequark whitequark added the bug label May 12, 2020
@phkahler
Copy link
Member

Does not happen on my Fedora setup. Did it really die on copying the font of an entity? That sounds like unitialized memory or something. Couldn this be due to the recent change of memory allocator?

@whitequark
Copy link
Contributor

whitequark commented May 12, 2020

@phkahler Have you tried to reproduce this issue on a normal build, or on a build with sanitizers enabled?

@ruevs
Copy link
Member Author

ruevs commented May 12, 2020

If I go back to 74103ee (just before 521473e) then it crashes here:

void *p = HeapAlloc(PermHeap, HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY, n);

Critical error detected c0000374
 	ntdll.dll!776093dd()	Unknown
 	ntdll.dll![Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]	Unknown
 	[External Code]	
>	solvespace.exe!SolveSpace::MemAlloc(unsigned int n) Line 60	C++
 	solvespace.exe!SolveSpace::List<SolveSpace::SEdge>::ReserveMore(int howMuch) Line 218	C++
 	solvespace.exe!SolveSpace::List<SolveSpace::SEdge>::AllocForOneMore() Line 232	C++
 	solvespace.exe!SolveSpace::List<SolveSpace::SEdge>::Add(const SolveSpace::SEdge * t) Line 236	C++
 	solvespace.exe!SolveSpace::SEdgeList::AddEdge(SolveSpace::Vector a, SolveSpace::Vector b, int auxA, int auxB, int tag) Line 224	C++
 	solvespace.exe!SolveSpace::SContour::MakeEdgesInto(SolveSpace::SEdgeList * el) Line 564	C++
 	solvespace.exe!SolveSpace::SPolygon::MakeEdgesInto(SolveSpace::SEdgeList * el) Line 674	C++
 	solvespace.exe!SolveSpace::SPolygon::UvGridTriangulateInto(SolveSpace::SMesh * mesh, SolveSpace::SSurface * srf) Line 425	C++
 	solvespace.exe!SolveSpace::SSurface::TriangulateInto(SolveSpace::SShell * shell, SolveSpace::SMesh * sm) Line 436	C++
 	solvespace.exe!SolveSpace::SShell::TriangulateInto$omp$1() Line 1058	C++

Which I think is this:
#600 (comment)

If I checkout up to and including 521473e then the crash changes to the one above. So @phkaler - yes, I think 521473e removes that crash and introduces this one.

@phkahler
Copy link
Member

@phkahler Have you tried to reproduce this issue on a normal build, or on a build with sanitizers enabled?

I only run release builds, sorry. It's kind of siily I haven't adopted a decent IDE yet or used the debugger. gedit and the terminal over here...

@ruevs yeah using OpenMP was a risk, but that first crash you had was in such a strange place, it doesn't seem related on the surface.

That second crash (that you posted) looks like it's in the merging of the triangle lists. That was not done by separate threads prior to e84fd46 Does it crash if you go one prior to that?

@whitequark
Copy link
Contributor

I only run release builds, sorry. It's kind of siily I haven't adopted a decent IDE yet or used the debugger. gedit and the terminal over here...

You don't need an IDE or a debugger to run a build with sanitizers. Try configuring with cmake .. -DENABLE_SANITIZERS=ON (probably in another build directory than your normal one, like build-debug or something), then reproducing the bug. If there is a memory issue you'll see it in the terminal.

@phkahler
Copy link
Member

phkahler commented May 12, 2020

OK, probably not important but when running cmake:

[paul@localhost build]$ cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_SANITIZERS=ON
-- Using sanitizer options address;alignment;bounds;shift;signed-integer-overflow;integer-divide-by-zero;null;bool;enum;return
-- Using in-tree libdxfrw
-- Using in-tree flatbuffers
-- Using in-tree q3d
CMake Warning (dev) at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:272 (message):
  The package name passed to `find_package_handle_standard_args` (SPACEWARE)
  does not match the name of the calling package (SpaceWare).  This can lead
  to problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake/FindSpaceWare.cmake:19 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  CMakeLists.txt:266 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Configuring done
-- Generating done

Then immediately on running solvespace:

SolveSpace!

=================================================================
==180558==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x6060000df640 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   56 bytes;
  size of the deallocated type: 48 bytes.
    #0 0x7f85834e7175 in operator delete(void*, unsigned long) (/lib64/libasan.so.5+0x111175)
    #1 0x5775e2 in SolveSpace::Platform::TimerImplGtk::RunAfter(unsigned int) (/home/paul/projects/solvespace/build/bin/solvespace+0x5775e2)
    #2 0x4d85ee in SolveSpace::SolveSpaceUI::Init() (/home/paul/projects/solvespace/build/bin/solvespace+0x4d85ee)
    #3 0x49adb4 in main (/home/paul/projects/solvespace/build/bin/solvespace+0x49adb4)
    #4 0x7f85812521a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
    #5 0x4a68ad in _start (/home/paul/projects/solvespace/build/bin/solvespace+0x4a68ad)

0x6060000df640 is located 0 bytes inside of 56-byte region [0x6060000df640,0x6060000df678)
allocated by thread T0 here:
    #0 0x7f85834e5a97 in operator new(unsigned long) (/lib64/libasan.so.5+0x10fa97)
    #1 0x5773db in SolveSpace::Platform::TimerImplGtk::RunAfter(unsigned int) (/home/paul/projects/solvespace/build/bin/solvespace+0x5773db)

SUMMARY: AddressSanitizer: new-delete-type-mismatch (/lib64/libasan.so.5+0x111175) in operator delete(void*, unsigned long)
==180558==HINT: if you don't care about these errors you may set ASAN_OPTIONS=new_delete_type_mismatch=0
==180558==ABORTING

@whitequark
Copy link
Contributor

The first issue (CMake) is unimportant but something we should fix in the long run just to reduce noise.

The second issue (new-delete-type-mismatch) is a major memory issue that needs to be fixed.

@whitequark
Copy link
Contributor

@phkahler I forgot to mention that to get useful backtraces with sanitizer builds you should add -DCMAKE_BUILD_TYPE=Debug to the cmake command line.

@whitequark
Copy link
Contributor

The second issue (new-delete-type-mismatch) is a major memory issue that needs to be fixed.

It looks like a bug in libsigc++ (which is a transitive dependency through GTK 3): libsigcplusplus/libsigcplusplus#10. We can't do anything about it unfortunately so you should export ASAN_OPTIONS=new_delete_type_mismatch=0 like it suggests you to work around the abort.

@phkahler
Copy link
Member

@whitequark Thanks I'll try that next. If I ignore the error it appears to run but then crashes if I try to Revolve or Helix (but not extrude) it's claiming a use-after-free in CopyEntity called from Group::Generate.

I'm impressed that these tools are finding issues that have gone undetected so long.

@whitequark
Copy link
Contributor

I'm impressed that these tools are finding issues that have gone undetected so long.

Sanitizers are extremely powerful and valuable tools and it is almost impossible to write reliable C++ code without using them; completely impossible if a team of people with a varying degree of understaning C++ is working on a shared codebase. (Before sanitizers there was Valgrind, which is still occasionally useful, but it can fundamentally discover fewer issues in C/C++ codebases, and is much slower.)

The main downside is a small amount of false positives they have (e.g. TSan doesn't appear to be usable with the GNU OpenMP implementation, sadly), but almost everything they report tends to be a real issue after all.

@whitequark
Copy link
Contributor

whitequark commented May 12, 2020

Oh, and we do already run sanitizers on CI, but the scarce test coverage means many issues go undetected anyway.

@phkahler
Copy link
Member

Lathe is failing too.

@phkahler
Copy link
Member

Oh, lots of memory leaks too without doing anything, just run it and close the window.

@whitequark
Copy link
Contributor

whitequark commented May 12, 2020

Oh, lots of memory leaks too without doing anything, just run it and close the window.

There are quite a few allocations in GTK and OpenGL drivers that aren't freed, so leak checking is more usefully performed with Windows builds, unfortunately. The leaks are "real" in the sense that it is memory that is never freed and, in fact, cannot be freed, but they are "not real" in the sense that it's done on purpose by an upstream project such that this memory is gone with the process and isn't going to be fixed.

@phkahler
Copy link
Member

The leaks look mostly GTK related too. The crashes are not...

@whitequark
Copy link
Contributor

Yep, the crashes are exceedingly likely to point to a real issue.

@phkahler
Copy link
Member

Oh.... I think I did something dumb. I need to review some history. And by dumb I don't mean the typical "wrote a bug dumb", but the "ignored the comments that warn you about a hazard" kinda dumb.

phkahler added a commit to phkahler/solvespace that referenced this issue May 12, 2020
@phkahler
Copy link
Member

@ruevs can you try this:
phkahler@f06fb4f

I probably introduced this when I removed extra entity copies from those 3 tools. There is a comment right there not to use *e after copy because it may not be valid. I changed to code for clarity but ignored that comment.

@phkahler
Copy link
Member

Yes, the bugs I just fixed have been there since: 45eb246

There seem to be a lot of people building from source but apparently not on Windows?

@ruevs
Copy link
Member Author

ruevs commented May 12, 2020

EDIT: A had started writing this comment about three hours before I posted it and I did not see your progress before posting it... leaving it in "for history"

@ruevs yeah using OpenMP was a risk, but that first crash you had was in such a strange place, it doesn't seem related on the surface.

That second crash (that you posted) looks like it's in the merging of the triangle lists. That was not done by separate threads prior to e84fd46 Does it crash if you go one prior to that?

  1. If I revert only void SShell::TriangulateInto(SMesh *sm) to this:

    void SShell::TriangulateInto(SMesh *sm) {
    version then it crashes in the first way. So the second crash is certainly due to (a race condition in?) the parallel code. Sorry I did not state this more clearly. I thought pointing to this Use OpenMP for triangulation #600 (comment) made it obvious.

  2. As for Whitequarks' heap change 521473e it does not git revert cleanly so I need to spend more time to figure out what reverting it does, Tomorrow.

@phkahler
Copy link
Member

phkahler commented May 12, 2020

@ruevs did you try the patch I linked. It's in PR #612, waiting for you to verify it actually fixes your issue and not just the problems I found.

@ruevs
Copy link
Member Author

ruevs commented May 12, 2020

Just did. Your phkahler/issue611 fixes the "first crash".
I do build on Windows all the time (VS, MinGW, clang) but I had not created new helix-es in a while :-) Opening existing ones (obviously) works just fine.

So you can merge #612

@ruevs
Copy link
Member Author

ruevs commented May 12, 2020

Now I'll try with your patch, the heap change AND parallel SShell::TriangulateInto. Essentially the new master at 23dfd97 ...

@phkahler
Copy link
Member

Oops. I linked this issue to the PR so it closed on merge. Please let us know if you still have the other problem.

@whitequark
Copy link
Contributor

Parallel triangulation won't ever work on Windows without removing HEAP_NO_SERIALIZE.

@phkahler
Copy link
Member

Is that done on master?

@whitequark
Copy link
Contributor

Of course.

@ruevs
Copy link
Member Author

ruevs commented May 12, 2020

It does not crash at all with current master 23dfd97

So @whitequark your removal of the Win32 heap seems to fix (hide?) the "OpenMP SShell::TriangulateInt" related crash.

And @phkahler your #612 fixes the original crash in this issue.

And by the way I like your "Split quads based on angle" - now that I can try it I will play some more :-)

@phkahler
Copy link
Member

Awesome!

@whitequark
Copy link
Contributor

So @whitequark your removal of the Win32 heap seems to fix (hide?) the "OpenMP SShell::TriangulateInt" related crash.

It doesn't "hide" anything. It fixes the root cause of the crash, which is the lack of serialization on heap access. In this context, serialization means using a mutex to synchronize heap access, which would cause the threads to access it one at a time, or serially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants