Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OpenMP for triangulation #600

Merged
merged 1 commit into from May 9, 2020
Merged

Conversation

phkahler
Copy link
Member

@phkahler phkahler commented May 8, 2020

This implements the change suggested in #481. Performance gain with simple helical test models was 15-25 percent on current master. Gain is more apparent at lower chord tolerance. Existing builds should not see any difference until they are told to use OpenMP. For GCC that just means passing -fopenmp.

@phkahler phkahler linked an issue May 8, 2020 that may be closed by this pull request
@whitequark
Copy link
Contributor

Looks good. Can we use range based for loops here? We've been migrating away from using .l / .n containers (mostly thanks to @rpavlik's efforts) and I'm not sure if this is OMP compatible. (But the other two loops should be fine.)

@phkahler
Copy link
Member Author

phkahler commented May 9, 2020

Is this better? I even used auto ;-)

My testing seems to show force-to-triangle-mesh performing slower but indicating faster (lower time) at the bottom. Any idea why either of those might be? I'm using a single helix for testing so there shouldn't be any group combining going on.

@whitequark
Copy link
Contributor

Is this better?

Yep, that's actually a lot easier to understand for me! Less things to track while reading the code.

My testing seems to show force-to-triangle-mesh performing slower but indicating faster (lower time) at the bottom.

I suspect it might be a measurement artifact, since the frame timer at the bottom is only supposed to measure rendering performance (the OpenGL display list stuff and such). Is triangulation even included in that timer?

@rpavlik
Copy link
Contributor

rpavlik commented May 9, 2020

Looks like range-for on random-access container is ok for parallel for in open mp 5, but possibly not before? https://stackoverflow.com/questions/17848521/using-openmp-with-c11-range-based-for-loops

@whitequark
Copy link
Contributor

Thanks for looking @rpavlik! I don't think we have OpenMP 5 on our version of MSVC (I'm not sure if any version of MSVC implements it?) so as I suspected that loop is best kept in a simpler form.

@phkahler
Copy link
Member Author

phkahler commented May 9, 2020

@whitequark The frame timer seem to measure actual frame rate. Rendering time from OpenGL is utterly meaningless these days. I have AMD integrated graphics (APU) and I can rotate or zoom full screen at 4K with 1ms or less frame time. Stretching an extrusion shows sensible numbers in the 10ms or 100ms range and going into the 1000ms range for really sluggish stuff (that still rotates blazingly fast). Not sure what it's measuring when forced to mesh but the indicated frame time is lower while visually it's slower.

@whitequark
Copy link
Contributor

Rendering time from OpenGL is utterly meaningless these days. I have AMD integrated graphics (APU) and I can rotate or zoom full screen at 4K with 1ms or less frame time.

A lot of effort, mostly by @Evil-Spirit, went into making that happen. The old immediate mode code was much slower.

Not sure what it's measuring when forced to mesh but the indicated frame time is lower while visually it's slower.

I think it might be converting NURBS to mesh within the region measured by the timer (since we do it lazily and cache), but performing booleans on meshes outside of that region, or something like that.

@phkahler
Copy link
Member Author

phkahler commented May 9, 2020

How should we handle updating the builds to take advantage of this change?

@whitequark
Copy link
Contributor

whitequark commented May 9, 2020

Something like:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 3e98325d..05c1bbf0 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -96,6 +96,14 @@ if(CMAKE_SYSTEM_PROCESSOR STREQUAL "i686" OR CMAKE_SYSTEM_PROCESSOR STREQUAL "X8
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${FLOAT_FLAGS}")
 endif()
 
+# We use OpenMP to speed up some geometric operations, but it's optional.
+include(FindOpenMP)
+if(OpenMP_FOUND)
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
+else()
+    message(WARNING "OpenMP not found, geometric operations will be single-threaded")
+endif()
+
 if(APPLE OR CMAKE_SYSTEM_NAME STREQUAL "FreeBSD")
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libc++")
 endif()
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 928b4522..4834a655 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -224,6 +224,7 @@ add_dependencies(solvespace-core
     q3d_header)
 
 target_link_libraries(solvespace-core
+    ${OpenMP_CXX_LIBRARIES}
     dxfrw
     ${util_LIBRARIES}
     ${ZLIB_LIBRARY}

@phkahler phkahler merged commit b4e1ce4 into solvespace:master May 9, 2020
@whitequark
Copy link
Contributor

Did you forget to update the PR with the CMake OpenMP changes?

@phkahler
Copy link
Member Author

phkahler commented May 9, 2020

Did you forget to update the PR with the CMake OpenMP changes?

Oops yes.

@whitequark
Copy link
Contributor

Fixed in 47e8279.

@whitequark
Copy link
Contributor

That broke the tests on Windows... could you take a look?

@phkahler
Copy link
Member Author

phkahler commented May 9, 2020

I looked. Don't see any reason for it :(

@whitequark
Copy link
Contributor

Do you have a VS12 install you can use to check the backtrace?

@phkahler
Copy link
Member Author

phkahler commented May 9, 2020

I haven't used Windows or VS anything at home since 2005. If not for your awesome porting efforts I wouldn't even be participating! :-)

@whitequark
Copy link
Contributor

Ah, OK! I'll take a look at it later then.

@whitequark
Copy link
Contributor

@phkahler The testsuite crashes with error C0000374, which is STATUS_HEAP_CORRUPTION, on any MSVC version from 2013 to 2017. I suspect this is a genuine problem but I don't yet see where it is.

@whitequark
Copy link
Contributor

The issue is most certainly real, you can reproduce it with ThreadSanitizer (try -DENABLE_SANITIZERS=ON -DSANITIZERS=thread on master).

WARNING: ThreadSanitizer: data race (pid=703290)
  Write of size 8 at 0x7ffcd68dd1d0 by main thread:
    #0 std::_Function_handler<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*), SolveSpace::PolylineBuilder::GenerateOutlines(SolveSpace::SOutlineList*)::{lambda(SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)#1}>::_M_invoke(std::_Any_data const&, SolveSpace::PolylineBuilder::Vertex*&&, std::_Any_data const&, SolveSpace::PolylineBuilder::Edge*&&) <null> (solvespace+0x17ae85)
    #1 SolveSpace::PolylineBuilder::Generate(std::function<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)> const&, std::function<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)> const&, std::function<void (SolveSpace::PolylineBuilder::Edge*)> const&, std::function<void ()> const&) <null> (solvespace+0x17b9b7)
    #2 SolveSpace::PolylineBuilder::GenerateOutlines(SolveSpace::SOutlineList*) <null> (solvespace+0x17be75)
    #3 SolveSpace::Group::GenerateDisplayItems() <null> (solvespace+0x13a671)
    #4 SolveSpace::Group::Draw(SolveSpace::Canvas*) <null> (solvespace+0x13a744)
    #5 SolveSpace::GraphicsWindow::DrawPersistent(SolveSpace::Canvas*) <null> (solvespace+0x7e5db)
    #6 SolveSpace::GraphicsWindow::Draw(SolveSpace::Canvas*) <null> (solvespace+0x80123)
    #7 SolveSpace::GraphicsWindow::Paint() <null> (solvespace+0x81b79)
    #8 std::_Function_handler<void (), std::_Bind<void (SolveSpace::GraphicsWindow::*(SolveSpace::GraphicsWindow*))()> >::_M_invoke(std::_Any_data const&) <null> (solvespace+0x121ddf)
    #9 std::function<void ()>::operator()() const <null> (solvespace+0x6bce6)
    #10 SolveSpace::Platform::GtkGLWidget::on_render(Glib::RefPtr<Gdk::GLContext> const&) <null> (solvespace+0x6be08)
    #11 Gtk::GLArea_Class::render_callback(_GtkGLArea*, _GdkGLContext*) <null> (libgtkmm-3.0.so.1+0x36355a)
    #12 main <null> (solvespace+0x63808)

  Previous read of size 8 at 0x7ffcd68dd1d0 by thread T2:
    #0 SolveSpace::SShell::TriangulateInto(SolveSpace::SMesh*) [clone ._omp_fn.0] <null> (solvespace+0x2339ff)
    #1 <null> <null> (libgomp.so.1+0x19de5)

  Location is stack of main thread.

  Location is global '<null>' at 0x000000000000 ([stack]+0x00000001e1d0)

  Thread T2 (tid=703293, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x58ba2)
    #1 <null> <null> (libgomp.so.1+0x1a3fa)
    #2 SolveSpace::Group::GenerateDisplayItems() <null> (solvespace+0x13a267)
    #3 SolveSpace::Group::Draw(SolveSpace::Canvas*) <null> (solvespace+0x13a744)
    #4 SolveSpace::GraphicsWindow::DrawPersistent(SolveSpace::Canvas*) <null> (solvespace+0x7e5db)
    #5 SolveSpace::GraphicsWindow::Draw(SolveSpace::Canvas*) <null> (solvespace+0x80123)
    #6 SolveSpace::GraphicsWindow::Paint() <null> (solvespace+0x81b79)
    #7 std::_Function_handler<void (), std::_Bind<void (SolveSpace::GraphicsWindow::*(SolveSpace::GraphicsWindow*))()> >::_M_invoke(std::_Any_data const&) <null> (solvespace+0x121ddf)
    #8 std::function<void ()>::operator()() const <null> (solvespace+0x6bce6)
    #9 SolveSpace::Platform::GtkGLWidget::on_render(Glib::RefPtr<Gdk::GLContext> const&) <null> (solvespace+0x6be08)
    #10 Gtk::GLArea_Class::render_callback(_GtkGLArea*, _GdkGLContext*) <null> (libgtkmm-3.0.so.1+0x36355a)
    #11 main <null> (solvespace+0x63808)

SUMMARY: ThreadSanitizer: data race (/home/whitequark/Projects/solvespace/build/gcctsan/bin/solvespace+0x17ae85) in std::_Function_handler<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*), SolveSpace::PolylineBuilder::GenerateOutlines(SolveSpace::SOutlineList*)::{lambda(SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)#1}>::_M_invoke(std::_Any_data const&, SolveSpace::PolylineBuilder::Vertex*&&, std::_Any_data const&, SolveSpace::PolylineBuilder::Edge*&&)

@whitequark
Copy link
Contributor

Ah, those are likely false positives.

@whitequark
Copy link
Contributor

I figured it out. The OMP code is probably correct. The problem was that the heap on Windows was created with HEAP_NO_SERIALIZE and the triangulation code allocated from it. This fixes it:

diff --git a/src/platform/utilwin.cpp b/src/platform/utilwin.cpp
index 4e452802..b43a84ee 100644
--- a/src/platform/utilwin.cpp
+++ b/src/platform/utilwin.cpp
@@ -60 +60 @@ void *MemAlloc(size_t n) {
-    void *p = HeapAlloc(PermHeap, HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY, n);
+    void *p = HeapAlloc(PermHeap, 0 | HEAP_ZERO_MEMORY, n);
@@ -65 +65 @@ void MemFree(void *p) {
-    HeapFree(PermHeap, HEAP_NO_SERIALIZE, p);
+    HeapFree(PermHeap, 0, p);
@@ -70 +70 @@ void vl() {
-    ssassert(HeapValidate(PermHeap, HEAP_NO_SERIALIZE, NULL), "Corrupted heap");
+    ssassert(HeapValidate(PermHeap, 0, NULL), "Corrupted heap");
@@ -85 +85 @@ std::vector<std::string> InitPlatform(int argc, char **argv) {
-    PermHeap = HeapCreate(HEAP_NO_SERIALIZE, 1024*1024*20, 0);
+    PermHeap = HeapCreate(0, 1024*1024*20, 0);

@whitequark
Copy link
Contributor

whitequark commented May 10, 2020

@jwesthues Do you have any objection to getting rid of the PermHeap on Windows entirely? On *nix, we've been using the CRT heap (malloc/free) all this time; this is actually an improvement in debug experience because the usual pointer debugging tools (valgrind, ASan, LSan) all intercept malloc/free. It looks like on Windows, CRT debug heap has provided this functionality for a while, and recently ASan got ported to MSVC. (Also, we've been using STL collections for a while, which use the CRT heap anyway.)

I'm going to leave TempHeap intact for now, but for completeness, the situation with it is quite weird. On *nix, the current code for AllocTemporary/FreeAllTemporary is actually slower than the normal MemAlloc/MemFree because it maintains a linked list on top of the normal heap instead of using a separate arena; I don't recall why it looks like that, but it has the benefit that it integrates well with pointer debugging tools.

Long term, unfortunately the way AllocTemporary works presents an obstacle for many features (parallel generation of multiple sketches, exporting Expr through FFI, etc) because it's a single global heap. It is possible to add a per-Expr bump allocator (in a rather invasive way), but I suspect that a better way forward is to just use a better allocator (perhaps jemalloc) application-wide on all platforms, and allocate temporaries like all other objects. This of course would have to be benchmarked to make sure that we don't do worse overall, which I don't have time to do right now.

Thoughts?

@whitequark
Copy link
Contributor

whitequark commented May 10, 2020

@phkahler I came up with this simplification / optimization:

 void SShell::TriangulateInto(SMesh *sm) {
-    std::vector<SMesh> tm(surface.n);
-
-#pragma omp parallel for
+    #pragma omp parallel for
     for(int i=0; i<surface.n; i++) {
         SSurface *s = &surface[i];
-        s->TriangulateInto(this, &tm[i]);
-    }
-
-    // merge the per-surface meshes
-    for (auto& m : tm) {
+        SMesh m;
+        s->TriangulateInto(this, &m);
+        #pragma omp critical
         sm->MakeFromCopyOf(&m);
         m.Clear();
     }

Do you agree this is valid? Could you measure the performance impact? Also, I wonder if OpenMP fares any better if you LD_PRELOAD jemalloc into SolveSpace.

@jwesthues
Copy link
Member

No objection to switching the permanent heap to malloc(). I think I used the Windows functions only because this was a Windows-only project initially, and those gave me better control/debuggability; but obviously lots of good tools for malloc() too.

AllocTemporary() is indeed trickier, since we don't free Exprs explicitly and rely on the FreeAllTemporary(). If I remember correctly, you can end up with multiple references to a given Expr after automatic differentiation and stuff, and to add explicit freeing would thus be a somewhat unpleasant task. I suspect your easiest path there would be to make the allocator thread-safe somehow (mutex but that's slow, separate pools per thread but that's complicated, etc.), and then keep a single global free when all the threads completed.

@whitequark
Copy link
Contributor

I suspect your easiest path there would be to make the allocator thread-safe somehow (mutex but that's slow, separate pools per thread but that's complicated, etc.)

That's precisely what jemalloc provides: it keeps per-thread memory pools (segregated by size as well, to reduce fragmentation), and uses optimistic locking so that most of the time it doesn't have to take a mutex on malloc. Rust used to use jemalloc by default on all platforms, with major performance improvements over the platform malloc on nearly every benchmark, but they stopped doing that because people didn't like the increase in binary size (jemalloc is around 1 MB, so your "Hello, world" would be at least that large) and because it caused issues in some FFI scenarios.

If I remember correctly, you can end up with multiple references to a given Expr after automatic differentiation and stuff, and to add explicit freeing would thus be a somewhat unpleasant task

I would actually be tempted to use a smart pointer--either shared_ptr, or our own lower-overhead version of it that doesn't use an atomic reference count. That's not free, but I think the overhead is worth it: it'll eliminate use-after-frees within the CAD core, and it's especially handy for constructing or inspecting expressions via an integrated scripting language, which I think would be really useful to have once we can handle FFI.

@jwesthues
Copy link
Member

I see the benefit of smart pointers for integration with a scripting language, but otherwise I doubt it would get us much. The structure of the code makes use-after-free unlikely, since there's a pretty clear boundary between state that does vs. doesn't persist across regenerations. Maybe we could get usefully more contiguous use of memory (thus better cache performance) from earlier freeing, but I doubt it.

Perhaps worth considering, but I wouldn't personally touch it without really good reason--feels like a lot of work, and you could solve your immediate problem pretty easily with some slightly smarter allocator and the same global free....

@whitequark
Copy link
Contributor

The reason I wanted to use smart pointers for Expr in the core is to be able to use it for various auxiliary tasks that aren't necessarily pinned to the generation boundary. For example:

  • keeping parsed expressions around when implementing Parametric sketches #77 so that it would be easy to go through all constraints that reference the (symbolic) parameters and check if they'll get broken when removing a parameter (or even just look at the reference count of the parameter object itself?);
  • keeping a single parsed expression around in the internationalization code, which uses a very similar grammar for choosing the appropriate pluralized form.

I agree that if we stick solely to the current use of Expr there's no real benefit to reference counting them, but I think there's a good opportunity to expand their use while simplifying the invariants new contributors would have to keep in mind.

Also, while full support for scripting is of course very challenging, I think there's a specific niche where small improvements can lead to large gains--that would be the testing harness. It currently uses C++ (which I went with just to get it off the ground), but that's very unpleasant to write API (or worse, UI) acceptance tests in. With the recent addition of many new features I think testing is a very important area to focus on.

feels like a lot of work

I've actually reviewed a patch that did this change--the changes required to get an allocator reference everywhere a new Expr could be created were really invasive, and the reason I didn't merge it, but the rest wasn't bad at all.

@jwesthues
Copy link
Member

For #77 just re-parsing every time seems kind of ugly but easy and perhaps fine; but certainly no specific objection to reference counting here, if you really want to take on all that work.

@phkahler
Copy link
Member Author

@whitequark No, I don't think that change is OK. The line:

sm->MakeFromCopyOf(&m);

Was pulled out of the parallel loop because it's not actually making a copy, it's adding triangle from m into sm. I can't say for certain, but I don't think adding to a list is thread-safe.

I found it surprising that SovleSpace is actually written in a very functional style, which makes many things in there thread-safe. Sometimes whole data structures are created and deleted inside a complex function. It sometimes seems inefficient, but results in no shared/modified state or side effects. I'd say that's best for preventing bugs, while being able to use OMP is a side effect.

BTW if it is causing a real bug this PR is not huge performance win, but using OMP opens the door to similar improvements.

@ruevs
Copy link
Member

ruevs commented May 10, 2020

With "force NURBS booleans..." on a helix for me it crashes here

void *v = HeapAlloc(TempHeap, HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY, n);

With

Exception thrown at 0x00007FFBD65060A9 (ntdll.dll) in solvespace.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000007A02EA3FF0).
Critical error detected c0000374

Call stack:

>	solvespace.exe!SolveSpace::AllocTemporary(unsigned __int64 n) Line 23	C++
 	solvespace.exe!SolveSpace::Expr::AllocExpr() Line 52	C++
 	solvespace.exe!SolveSpace::Expr::From(SolveSpace::hParam p) Line 215	C++
 	solvespace.exe!SolveSpace::EntityBase::PointGetExprsInWorkplane(SolveSpace::hEntity wrkpl, SolveSpace::Expr * * u, SolveSpace::Expr * * v) Line 623	C++
 	solvespace.exe!SolveSpace::ConstraintBase::GenerateEquations(SolveSpace::IdList<SolveSpace::Equation,SolveSpace::hEquation> * l, bool forReference) Line 374	C++
 	solvespace.exe!SolveSpace::System::WriteEquationsExceptFor(SolveSpace::hConstraint hc, SolveSpace::Group * g) Line 345	C++
 	solvespace.exe!SolveSpace::System::Solve(SolveSpace::Group * g, int * rank, int * dof, SolveSpace::List<SolveSpace::hConstraint> * bad, bool andFindBad, bool andFindFree, bool forceDofCheck) Line 415	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::SolveGroup(SolveSpace::hGroup hg, bool andFindFree) Line 536	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::SolveGroupAndReport(SolveSpace::hGroup hg, bool andFindFree) Line 493	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 272	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 197	C++

If I remove "HEAP_NO_SERIALIZE" then it starts crashing with access violation

Exception thrown at 0x00007FFBD65054C2 (ntdll.dll) in solvespace.exe: 0xC0000005: Access violation writing location 0x000000811DC00FF8.

on the same spot.

@ruevs
Copy link
Member

ruevs commented May 10, 2020

Hmmm even with HEAP_NO_SERIALIZE in place and -openmp removed it still crashes...

>	solvespace.exe!SolveSpace::AllocTemporary(unsigned __int64 n) Line 23	C++
 	solvespace.exe!BspUtil::AllocClassify(unsigned __int64 size) Line 175	C++
 	solvespace.exe!BspUtil::ClassifyTriangle(SolveSpace::STriangle * tri, SolveSpace::SBsp3 * node) Line 194	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 469	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 413	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvex(SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 cnt, SolveSpace::SMesh * instead) Line 421	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 401	C++

skip > 2500 of these

 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 401	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvex(SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 cnt, SolveSpace::SMesh * instead) Line 421	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 401	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 506	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 85	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 482	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 85	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 482	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 85	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 482	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::FromMesh(const SolveSpace::SMesh * m) Line 27	C++
 	solvespace.exe!SolveSpace::SMesh::MakeFromIntersectionOf(SolveSpace::SMesh * a, SolveSpace::SMesh * b) Line 301	C++
 	solvespace.exe!SolveSpace::Group::GenerateForBoolean<SolveSpace::SMesh>(SolveSpace::SMesh * prevs, SolveSpace::SMesh * thiss, SolveSpace::SMesh * outs, SolveSpace::Group::CombineAs how) Line 184	C++
 	solvespace.exe!SolveSpace::Group::GenerateShellAndMesh() Line 417	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 275	C++
Exception thrown at 0x00007FFBD65060B4 (ntdll.dll) in solvespace.exe: 0xC0000005: Access violation writing location 0x0000001AC9200FF8

It's too late I'm missing something...

@whitequark
Copy link
Contributor

Was pulled out of the parallel loop because it's not actually making a copy, it's adding triangle from m into sm. I can't say for certain, but I don't think adding to a list is thread-safe.

@phkahler Of course it's not. That's why there is a #pragma omp critical preceding it!

@whitequark
Copy link
Contributor

@ruevs You might be hitting #320, unrelated to anything in this PR.

@phkahler
Copy link
Member Author

@whitequark Your change works fine but it's hard to tell if it's any faster (might be). I was concerned about the time to restart the threads if they're blocked. Then I remembered, the vector of SMeshes might run into cache coherence issues as the lists update their lengths. It's probably good.

@ruevs ruevs mentioned this pull request May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Question/Suggestion Parallel tesselation.
5 participants