Use OpenMP for triangulation #600

phkahler · 2020-05-08T17:48:31Z

This implements the change suggested in #481. Performance gain with simple helical test models was 15-25 percent on current master. Gain is more apparent at lower chord tolerance. Existing builds should not see any difference until they are told to use OpenMP. For GCC that just means passing -fopenmp.

whitequark · 2020-05-09T01:40:53Z

Looks good. Can we use range based for loops here? We've been migrating away from using .l / .n containers (mostly thanks to @rpavlik's efforts) and I'm not sure if this is OMP compatible. (But the other two loops should be fine.)

phkahler · 2020-05-09T03:56:13Z

Is this better? I even used auto ;-)

My testing seems to show force-to-triangle-mesh performing slower but indicating faster (lower time) at the bottom. Any idea why either of those might be? I'm using a single helix for testing so there shouldn't be any group combining going on.

whitequark · 2020-05-09T04:07:42Z

Is this better?

Yep, that's actually a lot easier to understand for me! Less things to track while reading the code.

My testing seems to show force-to-triangle-mesh performing slower but indicating faster (lower time) at the bottom.

I suspect it might be a measurement artifact, since the frame timer at the bottom is only supposed to measure rendering performance (the OpenGL display list stuff and such). Is triangulation even included in that timer?

rpavlik · 2020-05-09T04:10:22Z

Looks like range-for on random-access container is ok for parallel for in open mp 5, but possibly not before? https://stackoverflow.com/questions/17848521/using-openmp-with-c11-range-based-for-loops

whitequark · 2020-05-09T04:11:41Z

Thanks for looking @rpavlik! I don't think we have OpenMP 5 on our version of MSVC (I'm not sure if any version of MSVC implements it?) so as I suspected that loop is best kept in a simpler form.

phkahler · 2020-05-09T04:28:51Z

@whitequark The frame timer seem to measure actual frame rate. Rendering time from OpenGL is utterly meaningless these days. I have AMD integrated graphics (APU) and I can rotate or zoom full screen at 4K with 1ms or less frame time. Stretching an extrusion shows sensible numbers in the 10ms or 100ms range and going into the 1000ms range for really sluggish stuff (that still rotates blazingly fast). Not sure what it's measuring when forced to mesh but the indicated frame time is lower while visually it's slower.

whitequark · 2020-05-09T04:34:23Z

Rendering time from OpenGL is utterly meaningless these days. I have AMD integrated graphics (APU) and I can rotate or zoom full screen at 4K with 1ms or less frame time.

A lot of effort, mostly by @Evil-Spirit, went into making that happen. The old immediate mode code was much slower.

Not sure what it's measuring when forced to mesh but the indicated frame time is lower while visually it's slower.

I think it might be converting NURBS to mesh within the region measured by the timer (since we do it lazily and cache), but performing booleans on meshes outside of that region, or something like that.

phkahler · 2020-05-09T13:28:23Z

How should we handle updating the builds to take advantage of this change?

whitequark · 2020-05-09T13:58:00Z

Something like:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 3e98325d..05c1bbf0 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -96,6 +96,14 @@ if(CMAKE_SYSTEM_PROCESSOR STREQUAL "i686" OR CMAKE_SYSTEM_PROCESSOR STREQUAL "X8
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${FLOAT_FLAGS}")
 endif()
 
+# We use OpenMP to speed up some geometric operations, but it's optional.
+include(FindOpenMP)
+if(OpenMP_FOUND)
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
+else()
+    message(WARNING "OpenMP not found, geometric operations will be single-threaded")
+endif()
+
 if(APPLE OR CMAKE_SYSTEM_NAME STREQUAL "FreeBSD")
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libc++")
 endif()
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 928b4522..4834a655 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -224,6 +224,7 @@ add_dependencies(solvespace-core
     q3d_header)
 
 target_link_libraries(solvespace-core
+    ${OpenMP_CXX_LIBRARIES}
     dxfrw
     ${util_LIBRARIES}
     ${ZLIB_LIBRARY}

whitequark · 2020-05-09T17:57:41Z

Did you forget to update the PR with the CMake OpenMP changes?

phkahler · 2020-05-09T18:01:26Z

Did you forget to update the PR with the CMake OpenMP changes?

Oops yes.

whitequark · 2020-05-09T19:46:30Z

Fixed in 47e8279.

whitequark · 2020-05-09T20:01:16Z

That broke the tests on Windows... could you take a look?

phkahler · 2020-05-09T20:17:48Z

I looked. Don't see any reason for it :(

whitequark · 2020-05-09T20:26:18Z

Do you have a VS12 install you can use to check the backtrace?

phkahler · 2020-05-09T22:11:52Z

I haven't used Windows or VS anything at home since 2005. If not for your awesome porting efforts I wouldn't even be participating! :-)

whitequark · 2020-05-09T22:34:59Z

Ah, OK! I'll take a look at it later then.

whitequark · 2020-05-10T02:36:01Z

@phkahler The testsuite crashes with error C0000374, which is STATUS_HEAP_CORRUPTION, on any MSVC version from 2013 to 2017. I suspect this is a genuine problem but I don't yet see where it is.

whitequark · 2020-05-10T02:56:21Z

The issue is most certainly real, you can reproduce it with ThreadSanitizer (try -DENABLE_SANITIZERS=ON -DSANITIZERS=thread on master).

WARNING: ThreadSanitizer: data race (pid=703290)
  Write of size 8 at 0x7ffcd68dd1d0 by main thread:
    #0 std::_Function_handler<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*), SolveSpace::PolylineBuilder::GenerateOutlines(SolveSpace::SOutlineList*)::{lambda(SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)#1}>::_M_invoke(std::_Any_data const&, SolveSpace::PolylineBuilder::Vertex*&&, std::_Any_data const&, SolveSpace::PolylineBuilder::Edge*&&) <null> (solvespace+0x17ae85)
    #1 SolveSpace::PolylineBuilder::Generate(std::function<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)> const&, std::function<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)> const&, std::function<void (SolveSpace::PolylineBuilder::Edge*)> const&, std::function<void ()> const&) <null> (solvespace+0x17b9b7)
    #2 SolveSpace::PolylineBuilder::GenerateOutlines(SolveSpace::SOutlineList*) <null> (solvespace+0x17be75)
    #3 SolveSpace::Group::GenerateDisplayItems() <null> (solvespace+0x13a671)
    #4 SolveSpace::Group::Draw(SolveSpace::Canvas*) <null> (solvespace+0x13a744)
    #5 SolveSpace::GraphicsWindow::DrawPersistent(SolveSpace::Canvas*) <null> (solvespace+0x7e5db)
    #6 SolveSpace::GraphicsWindow::Draw(SolveSpace::Canvas*) <null> (solvespace+0x80123)
    #7 SolveSpace::GraphicsWindow::Paint() <null> (solvespace+0x81b79)
    #8 std::_Function_handler<void (), std::_Bind<void (SolveSpace::GraphicsWindow::*(SolveSpace::GraphicsWindow*))()> >::_M_invoke(std::_Any_data const&) <null> (solvespace+0x121ddf)
    #9 std::function<void ()>::operator()() const <null> (solvespace+0x6bce6)
    #10 SolveSpace::Platform::GtkGLWidget::on_render(Glib::RefPtr<Gdk::GLContext> const&) <null> (solvespace+0x6be08)
    #11 Gtk::GLArea_Class::render_callback(_GtkGLArea*, _GdkGLContext*) <null> (libgtkmm-3.0.so.1+0x36355a)
    #12 main <null> (solvespace+0x63808)

  Previous read of size 8 at 0x7ffcd68dd1d0 by thread T2:
    #0 SolveSpace::SShell::TriangulateInto(SolveSpace::SMesh*) [clone ._omp_fn.0] <null> (solvespace+0x2339ff)
    #1 <null> <null> (libgomp.so.1+0x19de5)

  Location is stack of main thread.

  Location is global '<null>' at 0x000000000000 ([stack]+0x00000001e1d0)

  Thread T2 (tid=703293, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x58ba2)
    #1 <null> <null> (libgomp.so.1+0x1a3fa)
    #2 SolveSpace::Group::GenerateDisplayItems() <null> (solvespace+0x13a267)
    #3 SolveSpace::Group::Draw(SolveSpace::Canvas*) <null> (solvespace+0x13a744)
    #4 SolveSpace::GraphicsWindow::DrawPersistent(SolveSpace::Canvas*) <null> (solvespace+0x7e5db)
    #5 SolveSpace::GraphicsWindow::Draw(SolveSpace::Canvas*) <null> (solvespace+0x80123)
    #6 SolveSpace::GraphicsWindow::Paint() <null> (solvespace+0x81b79)
    #7 std::_Function_handler<void (), std::_Bind<void (SolveSpace::GraphicsWindow::*(SolveSpace::GraphicsWindow*))()> >::_M_invoke(std::_Any_data const&) <null> (solvespace+0x121ddf)
    #8 std::function<void ()>::operator()() const <null> (solvespace+0x6bce6)
    #9 SolveSpace::Platform::GtkGLWidget::on_render(Glib::RefPtr<Gdk::GLContext> const&) <null> (solvespace+0x6be08)
    #10 Gtk::GLArea_Class::render_callback(_GtkGLArea*, _GdkGLContext*) <null> (libgtkmm-3.0.so.1+0x36355a)
    #11 main <null> (solvespace+0x63808)

SUMMARY: ThreadSanitizer: data race (/home/whitequark/Projects/solvespace/build/gcctsan/bin/solvespace+0x17ae85) in std::_Function_handler<void (SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*), SolveSpace::PolylineBuilder::GenerateOutlines(SolveSpace::SOutlineList*)::{lambda(SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Vertex*, SolveSpace::PolylineBuilder::Edge*)#1}>::_M_invoke(std::_Any_data const&, SolveSpace::PolylineBuilder::Vertex*&&, std::_Any_data const&, SolveSpace::PolylineBuilder::Edge*&&)

whitequark · 2020-05-10T03:23:15Z

Ah, those are likely false positives.

whitequark · 2020-05-10T03:55:40Z

I figured it out. The OMP code is probably correct. The problem was that the heap on Windows was created with HEAP_NO_SERIALIZE and the triangulation code allocated from it. This fixes it:

diff --git a/src/platform/utilwin.cpp b/src/platform/utilwin.cpp
index 4e452802..b43a84ee 100644
--- a/src/platform/utilwin.cpp
+++ b/src/platform/utilwin.cpp
@@ -60 +60 @@ void *MemAlloc(size_t n) {
-    void *p = HeapAlloc(PermHeap, HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY, n);
+    void *p = HeapAlloc(PermHeap, 0 | HEAP_ZERO_MEMORY, n);
@@ -65 +65 @@ void MemFree(void *p) {
-    HeapFree(PermHeap, HEAP_NO_SERIALIZE, p);
+    HeapFree(PermHeap, 0, p);
@@ -70 +70 @@ void vl() {
-    ssassert(HeapValidate(PermHeap, HEAP_NO_SERIALIZE, NULL), "Corrupted heap");
+    ssassert(HeapValidate(PermHeap, 0, NULL), "Corrupted heap");
@@ -85 +85 @@ std::vector<std::string> InitPlatform(int argc, char **argv) {
-    PermHeap = HeapCreate(HEAP_NO_SERIALIZE, 1024*1024*20, 0);
+    PermHeap = HeapCreate(0, 1024*1024*20, 0);

whitequark · 2020-05-10T04:16:25Z

@jwesthues Do you have any objection to getting rid of the PermHeap on Windows entirely? On *nix, we've been using the CRT heap (malloc/free) all this time; this is actually an improvement in debug experience because the usual pointer debugging tools (valgrind, ASan, LSan) all intercept malloc/free. It looks like on Windows, CRT debug heap has provided this functionality for a while, and recently ASan got ported to MSVC. (Also, we've been using STL collections for a while, which use the CRT heap anyway.)

I'm going to leave TempHeap intact for now, but for completeness, the situation with it is quite weird. On *nix, the current code for AllocTemporary/FreeAllTemporary is actually slower than the normal MemAlloc/MemFree because it maintains a linked list on top of the normal heap instead of using a separate arena; I don't recall why it looks like that, but it has the benefit that it integrates well with pointer debugging tools.

Long term, unfortunately the way AllocTemporary works presents an obstacle for many features (parallel generation of multiple sketches, exporting Expr through FFI, etc) because it's a single global heap. It is possible to add a per-Expr bump allocator (in a rather invasive way), but I suspect that a better way forward is to just use a better allocator (perhaps jemalloc) application-wide on all platforms, and allocate temporaries like all other objects. This of course would have to be benchmarked to make sure that we don't do worse overall, which I don't have time to do right now.

Thoughts?

whitequark · 2020-05-10T04:22:49Z

@phkahler I came up with this simplification / optimization:

 void SShell::TriangulateInto(SMesh *sm) {
-    std::vector<SMesh> tm(surface.n);
-
-#pragma omp parallel for
+    #pragma omp parallel for
     for(int i=0; i<surface.n; i++) {
         SSurface *s = &surface[i];
-        s->TriangulateInto(this, &tm[i]);
-    }
-
-    // merge the per-surface meshes
-    for (auto& m : tm) {
+        SMesh m;
+        s->TriangulateInto(this, &m);
+        #pragma omp critical
         sm->MakeFromCopyOf(&m);
         m.Clear();
     }

Do you agree this is valid? Could you measure the performance impact? Also, I wonder if OpenMP fares any better if you LD_PRELOAD jemalloc into SolveSpace.

jwesthues · 2020-05-10T04:40:46Z

No objection to switching the permanent heap to malloc(). I think I used the Windows functions only because this was a Windows-only project initially, and those gave me better control/debuggability; but obviously lots of good tools for malloc() too.

AllocTemporary() is indeed trickier, since we don't free Exprs explicitly and rely on the FreeAllTemporary(). If I remember correctly, you can end up with multiple references to a given Expr after automatic differentiation and stuff, and to add explicit freeing would thus be a somewhat unpleasant task. I suspect your easiest path there would be to make the allocator thread-safe somehow (mutex but that's slow, separate pools per thread but that's complicated, etc.), and then keep a single global free when all the threads completed.

whitequark · 2020-05-10T05:01:11Z

I suspect your easiest path there would be to make the allocator thread-safe somehow (mutex but that's slow, separate pools per thread but that's complicated, etc.)

That's precisely what jemalloc provides: it keeps per-thread memory pools (segregated by size as well, to reduce fragmentation), and uses optimistic locking so that most of the time it doesn't have to take a mutex on malloc. Rust used to use jemalloc by default on all platforms, with major performance improvements over the platform malloc on nearly every benchmark, but they stopped doing that because people didn't like the increase in binary size (jemalloc is around 1 MB, so your "Hello, world" would be at least that large) and because it caused issues in some FFI scenarios.

If I remember correctly, you can end up with multiple references to a given Expr after automatic differentiation and stuff, and to add explicit freeing would thus be a somewhat unpleasant task

I would actually be tempted to use a smart pointer--either shared_ptr, or our own lower-overhead version of it that doesn't use an atomic reference count. That's not free, but I think the overhead is worth it: it'll eliminate use-after-frees within the CAD core, and it's especially handy for constructing or inspecting expressions via an integrated scripting language, which I think would be really useful to have once we can handle FFI.

jwesthues · 2020-05-10T05:39:31Z

I see the benefit of smart pointers for integration with a scripting language, but otherwise I doubt it would get us much. The structure of the code makes use-after-free unlikely, since there's a pretty clear boundary between state that does vs. doesn't persist across regenerations. Maybe we could get usefully more contiguous use of memory (thus better cache performance) from earlier freeing, but I doubt it.

Perhaps worth considering, but I wouldn't personally touch it without really good reason--feels like a lot of work, and you could solve your immediate problem pretty easily with some slightly smarter allocator and the same global free....

whitequark · 2020-05-10T06:08:15Z

The reason I wanted to use smart pointers for Expr in the core is to be able to use it for various auxiliary tasks that aren't necessarily pinned to the generation boundary. For example:

keeping parsed expressions around when implementing Parametric sketches #77 so that it would be easy to go through all constraints that reference the (symbolic) parameters and check if they'll get broken when removing a parameter (or even just look at the reference count of the parameter object itself?);
keeping a single parsed expression around in the internationalization code, which uses a very similar grammar for choosing the appropriate pluralized form.

I agree that if we stick solely to the current use of Expr there's no real benefit to reference counting them, but I think there's a good opportunity to expand their use while simplifying the invariants new contributors would have to keep in mind.

Also, while full support for scripting is of course very challenging, I think there's a specific niche where small improvements can lead to large gains--that would be the testing harness. It currently uses C++ (which I went with just to get it off the ground), but that's very unpleasant to write API (or worse, UI) acceptance tests in. With the recent addition of many new features I think testing is a very important area to focus on.

feels like a lot of work

I've actually reviewed a patch that did this change--the changes required to get an allocator reference everywhere a new Expr could be created were really invasive, and the reason I didn't merge it, but the rest wasn't bad at all.

jwesthues · 2020-05-10T06:13:43Z

For #77 just re-parsing every time seems kind of ugly but easy and perhaps fine; but certainly no specific objection to reference counting here, if you really want to take on all that work.

phkahler · 2020-05-10T15:51:44Z

@whitequark No, I don't think that change is OK. The line:

sm->MakeFromCopyOf(&m);

Was pulled out of the parallel loop because it's not actually making a copy, it's adding triangle from m into sm. I can't say for certain, but I don't think adding to a list is thread-safe.

I found it surprising that SovleSpace is actually written in a very functional style, which makes many things in there thread-safe. Sometimes whole data structures are created and deleted inside a complex function. It sometimes seems inefficient, but results in no shared/modified state or side effects. I'd say that's best for preventing bugs, while being able to use OMP is a side effect.

BTW if it is causing a real bug this PR is not huge performance win, but using OMP opens the door to similar improvements.

ruevs · 2020-05-10T19:01:07Z

With "force NURBS booleans..." on a helix for me it crashes here

solvespace/src/platform/utilwin.cpp

Line 23 in c95a07a

void *v = HeapAlloc(TempHeap, HEAP_NO_SERIALIZE | HEAP_ZERO_MEMORY, n);

With

Exception thrown at 0x00007FFBD65060A9 (ntdll.dll) in solvespace.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000007A02EA3FF0).
Critical error detected c0000374

Call stack:

>	solvespace.exe!SolveSpace::AllocTemporary(unsigned __int64 n) Line 23	C++
 	solvespace.exe!SolveSpace::Expr::AllocExpr() Line 52	C++
 	solvespace.exe!SolveSpace::Expr::From(SolveSpace::hParam p) Line 215	C++
 	solvespace.exe!SolveSpace::EntityBase::PointGetExprsInWorkplane(SolveSpace::hEntity wrkpl, SolveSpace::Expr * * u, SolveSpace::Expr * * v) Line 623	C++
 	solvespace.exe!SolveSpace::ConstraintBase::GenerateEquations(SolveSpace::IdList<SolveSpace::Equation,SolveSpace::hEquation> * l, bool forReference) Line 374	C++
 	solvespace.exe!SolveSpace::System::WriteEquationsExceptFor(SolveSpace::hConstraint hc, SolveSpace::Group * g) Line 345	C++
 	solvespace.exe!SolveSpace::System::Solve(SolveSpace::Group * g, int * rank, int * dof, SolveSpace::List<SolveSpace::hConstraint> * bad, bool andFindBad, bool andFindFree, bool forceDofCheck) Line 415	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::SolveGroup(SolveSpace::hGroup hg, bool andFindFree) Line 536	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::SolveGroupAndReport(SolveSpace::hGroup hg, bool andFindFree) Line 493	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 272	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 197	C++

If I remove "HEAP_NO_SERIALIZE" then it starts crashing with access violation

Exception thrown at 0x00007FFBD65054C2 (ntdll.dll) in solvespace.exe: 0xC0000005: Access violation writing location 0x000000811DC00FF8.

on the same spot.

ruevs · 2020-05-10T19:22:38Z

Hmmm even with HEAP_NO_SERIALIZE in place and -openmp removed it still crashes...

>	solvespace.exe!SolveSpace::AllocTemporary(unsigned __int64 n) Line 23	C++
 	solvespace.exe!BspUtil::AllocClassify(unsigned __int64 size) Line 175	C++
 	solvespace.exe!BspUtil::ClassifyTriangle(SolveSpace::STriangle * tri, SolveSpace::SBsp3 * node) Line 194	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 469	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 413	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvex(SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 cnt, SolveSpace::SMesh * instead) Line 421	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 401	C++

skip > 2500 of these

 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 401	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvex(SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 cnt, SolveSpace::SMesh * instead) Line 421	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertConvexHow(SolveSpace::BspClass how, SolveSpace::STriMeta meta, SolveSpace::Vector * vertex, unsigned __int64 n, SolveSpace::SMesh * instead) Line 401	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 506	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 85	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 482	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 85	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 482	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 85	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 482	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertHow(SolveSpace::BspClass how, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 90	C++
 	solvespace.exe!SolveSpace::SBsp3::Insert(SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 485	C++
 	solvespace.exe!SolveSpace::SBsp3::InsertOrCreate(SolveSpace::SBsp3 * where, SolveSpace::STriangle * tr, SolveSpace::SMesh * instead) Line 461	C++
 	solvespace.exe!SolveSpace::SBsp3::FromMesh(const SolveSpace::SMesh * m) Line 27	C++
 	solvespace.exe!SolveSpace::SMesh::MakeFromIntersectionOf(SolveSpace::SMesh * a, SolveSpace::SMesh * b) Line 301	C++
 	solvespace.exe!SolveSpace::Group::GenerateForBoolean<SolveSpace::SMesh>(SolveSpace::SMesh * prevs, SolveSpace::SMesh * thiss, SolveSpace::SMesh * outs, SolveSpace::Group::CombineAs how) Line 184	C++
 	solvespace.exe!SolveSpace::Group::GenerateShellAndMesh() Line 417	C++
 	solvespace.exe!SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate type, bool andFindFree, bool genForBBox) Line 275	C++

Exception thrown at 0x00007FFBD65060B4 (ntdll.dll) in solvespace.exe: 0xC0000005: Access violation writing location 0x0000001AC9200FF8

It's too late I'm missing something...

whitequark · 2020-05-11T05:51:43Z

Was pulled out of the parallel loop because it's not actually making a copy, it's adding triangle from m into sm. I can't say for certain, but I don't think adding to a list is thread-safe.

@phkahler Of course it's not. That's why there is a #pragma omp critical preceding it!

whitequark · 2020-05-11T05:52:37Z

@ruevs You might be hitting #320, unrelated to anything in this PR.

phkahler · 2020-05-11T17:44:37Z

@whitequark Your change works fine but it's hard to tell if it's any faster (might be). I was concerned about the time to restart the threads if they're blocked. Then I remembered, the vector of SMeshes might run into cache coherence issues as the lists update their lengths. It's probably good.

phkahler linked an issue May 8, 2020 that may be closed by this pull request

Question/Suggestion Parallel tesselation. #481

Closed

Use OpenMP for triangulation

43f7934

phkahler force-pushed the parallel branch from 6e0ff0d to 43f7934 Compare May 9, 2020 03:39

phkahler merged commit b4e1ce4 into solvespace:master May 9, 2020

ruevs mentioned this pull request May 12, 2020

Helix crashes #611

Closed

ruevs added the performance label Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use OpenMP for triangulation #600

Use OpenMP for triangulation #600

phkahler commented May 8, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

rpavlik commented May 9, 2020 •

edited

Loading

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020 •

edited

Loading

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020 •

edited

Loading

whitequark commented May 10, 2020 •

edited

Loading

jwesthues commented May 10, 2020

whitequark commented May 10, 2020

jwesthues commented May 10, 2020

whitequark commented May 10, 2020

jwesthues commented May 10, 2020

phkahler commented May 10, 2020

ruevs commented May 10, 2020 •

edited

Loading

ruevs commented May 10, 2020

whitequark commented May 11, 2020

whitequark commented May 11, 2020

phkahler commented May 11, 2020

Use OpenMP for triangulation #600

Use OpenMP for triangulation #600

Conversation

phkahler commented May 8, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

rpavlik commented May 9, 2020 • edited Loading

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020 • edited Loading

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

phkahler commented May 9, 2020

whitequark commented May 9, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020

whitequark commented May 10, 2020 • edited Loading

whitequark commented May 10, 2020 • edited Loading

jwesthues commented May 10, 2020

whitequark commented May 10, 2020

jwesthues commented May 10, 2020

whitequark commented May 10, 2020

jwesthues commented May 10, 2020

phkahler commented May 10, 2020

ruevs commented May 10, 2020 • edited Loading

ruevs commented May 10, 2020

whitequark commented May 11, 2020

whitequark commented May 11, 2020

phkahler commented May 11, 2020

rpavlik commented May 9, 2020 •

edited

Loading

whitequark commented May 9, 2020 •

edited

Loading

whitequark commented May 10, 2020 •

edited

Loading

whitequark commented May 10, 2020 •

edited

Loading

ruevs commented May 10, 2020 •

edited

Loading