Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Use mimalloc to avoid freelist woes on *nix #642

Closed
whitequark opened this issue Jul 6, 2020 · 16 comments · Fixed by #646
Closed

RFC: Use mimalloc to avoid freelist woes on *nix #642

whitequark opened this issue Jul 6, 2020 · 16 comments · Fixed by #646

Comments

@whitequark
Copy link
Contributor

Right now we implement AllocTemporary and FreeAllTemporary on *nix using a freelist. This does work but it is strictly worse than what happens on Windows, which is unfortunate because I'd like to not privilege any platforms. There are also some multithreading related issues that are currently solved in not especially elegant way.

It looks like mimalloc is pretty much perfect:

  • works on all of our platforms;
  • uses CMake so easy to integrate;
  • supports heaps so we can efficiently deallocate temporary objects.

One thing to note is their performance claims. It's not a bad allocator but the revolutionary claims should be taken with a grain of salt.

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jul 8, 2020

I'd If I'm reading the heap doc right, then mimalloc might not help with the concurrency issues at all?

A heap can only be used for (re)allocation in the thread that created this heap! Any allocated blocks can be freed by any other thread though.

@whitequark
Copy link
Contributor Author

Ah, that's unfortunate...

nabijaczleweli added a commit to nabijaczleweli/solvespace that referenced this issue Jul 9, 2020
@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jul 9, 2020

My mimalloc branch might have some problems (CMake defeated me WRT declaring the right dependency tree, so explicitly running make libmimalloc.a or editing the final link command might be necessary on Linux) but should be good enough for testing.

Running interactively on Win32, it seems to work just as well as the clean version; I never got a window on my Buster box, so no interactive comparisons there.

Benchmarks say

stock:
	D:\Programy\solvespace\build>bin\solvespace-benchmark.exe load t:\assembly.slvs
	Iterations: 5
	Time:       214.974 s
	Per iter.:  42.995 s

mimalloc:
	D:\Programy\solvespace\build>bin\solvespace-benchmark.exe load t:\assembly.slvs
	Iterations: 5
	Time:       172.823 s
	Per iter.:  34.565 s


stock:
	nabijaczleweli@tarta:~/uwu/solvespace/build$ ./bin/solvespace-benchmark load ~/uwu/assembly.slvs
	Iterations: 5
	Time:       75.682 s
	Per iter.:  15.136 s

mimalloc:
	nabijaczleweli@tarta:~/uwu/solvespace/build$ ./bin/solvespace-benchmark load ~/uwu/assembly.slvs
	Iterations: 5
	Time:       75.379 s
	Per iter.:  15.076 s

Granted, I have zero idea if this data is of any use at all, but.

@ruevs
Copy link
Member

ruevs commented Jul 9, 2020

Would you try it with OpenMP re-enabled? In other words revert this a80a033

@nabijaczleweli
Copy link
Contributor

Even with the diff below for solvespace to link to libgomp, it didn't go above 1 CPU on Windows (as expected, I guess?):

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 03baff2..4348add 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -99,7 +99,7 @@ endif()
 # We use OpenMP to speed up some geometric operations, but it's optional.
 include(FindOpenMP)
 # No MinGW distribution actually ships an OpenMP runtime, but FindOpenMP detects it anyway.
-if(OpenMP_FOUND AND NOT MINGW)
+if(OpenMP_FOUND)
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
 else()
     message(WARNING "OpenMP not found, geometric operations will be single-threaded")

Buster slammed all 24 CPUs out of the box.

stock + revert + diff:
	D:\Programy\solvespace\build>bin\solvespace-benchmark.exe load t:\assembly.slvs
	Iterations: 5
	Time:       206.955 s
	Per iter.:  41.391 s

mimalloc + revert + diff:
	D:\Programy\solvespace\build>bin\solvespace-benchmark.exe load t:\assembly.slvs
	Iterations: 5
	Time:       217.641 s
	Per iter.:  43.528 s


stock + revert:
	nabijaczleweli@tarta:~/uwu/solvespace/build$ ./bin/solvespace-benchmark load ~/uwu/assembly.slvs
	Iterations: 5
	Time:       81.802 s
	Per iter.:  16.360 s

mimalloc + revert:
	nabijaczleweli@tarta:~/uwu/solvespace/build$ ./bin/solvespace-benchmark load ~/uwu/assembly.slvs
	Iterations: 5
	Time:       80.201 s
	Per iter.:  16.040 s

@ruevs
Copy link
Member

ruevs commented Jul 9, 2020

Thanks! So minmalloc appears to work fine with OpenMP enabled. Is this not contrary to the statement above that you (@nabijaczleweli)pointed out? Perhaps run it with sanitizers to see if some problems are detected?
@whitequark what do you think?
Sorry for not doing this myself - too little time lately :-(

@nabijaczleweli
Copy link
Contributor

Well, the documentation does say not to do this, so presumably it's just a race to the bottom.

I reconfigured with -DENABLE_SANITIZERS=ON, got a

-- Using sanitizer options address;alignment;bounds;shift;signed-integer-overflow;integer-divide-by-zero;null;bool;enum;return

back, reran the benchmark, now agonisingly slow, and:

nabijaczleweli@tarta:~/uwu/solvespace/build$ ./bin/solvespace-benchmark load ~/uwu/assembly.slvsenmp=libomp -O1 -fs
Iterations: 5
Time:       746.069 s
Per iter.:  149.214 s

=================================================================
==11042==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 3563520 byte(s) in 60 object(s) allocated from:
    #0 0x4daaed in operator new[](unsigned long) (/home/nabijaczleweli/uwu/solvespace/build/bin/solvespace-benchmark+0x4daaed)
    #1 0x573708 in SolveSpace::List<SolveSpace::STriangle>::ReserveMore(int) build/../src/dsc.h:218:31
    #2 0x573605 in SolveSpace::List<SolveSpace::STriangle>::AllocForOneMore() build/../src/dsc.h:230:13
    #3 0x55882b in SolveSpace::List<SolveSpace::STriangle>::Add(SolveSpace::STriangle const*) build/../src/dsc.h:235:9
    #4 0x55882b in SolveSpace::SMesh::AddTriangle(SolveSpace::STriangle const*) build/../src/mesh.cpp:37:7
    #5 0x55882b in SolveSpace::SMesh::MakeFromCopyOf(SolveSpace::SMesh*) build/../src/mesh.cpp:309:9
    #6 0x6ed7ab in .omp_outlined._debug__ build/../src/srf/surface.cpp:1059:13
    #7 0x6ed7ab in .omp_outlined. build/../src/srf/surface.cpp:1053:1
    #8 0x7f3a8ad106f2 in __kmp_invoke_microtask (/lib/x86_64-linux-gnu/libomp.so.5+0xb06f2)
    #9 0x7f3a8aca3a42  (/lib/x86_64-linux-gnu/libomp.so.5+0x43a42)

Direct leak of 3072000 byte(s) in 20 object(s) allocated from:
    #0 0x4daaed in operator new[](unsigned long) (/home/nabijaczleweli/uwu/solvespace/build/bin/solvespace-benchmark+0x4daaed)
    #1 0x573708 in SolveSpace::List<SolveSpace::STriangle>::ReserveMore(int) build/../src/dsc.h:218:31
    #2 0x573605 in SolveSpace::List<SolveSpace::STriangle>::AllocForOneMore() build/../src/dsc.h:230:13
    #3 0x55882b in SolveSpace::List<SolveSpace::STriangle>::Add(SolveSpace::STriangle const*) build/../src/dsc.h:235:9
    #4 0x55882b in SolveSpace::SMesh::AddTriangle(SolveSpace::STriangle const*) build/../src/mesh.cpp:37:7
    #5 0x55882b in SolveSpace::SMesh::MakeFromCopyOf(SolveSpace::SMesh*) build/../src/mesh.cpp:309:9
    #6 0x54d91e in SolveSpace::Group::GenerateShellAndMesh() build/../src/groupmesh.cpp:223:23
    #7 0x51288e in SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate, bool, bool) build/../src/generate.cpp:274:24
    #8 0x7ad1c2 in SolveSpace::SolveSpaceUI::AfterNewFile() build/../src/solvespace.cpp:433:5
    #9 0x4de113 in main::$_1::operator()() const build/../bench/harness.cpp:66:20
    #10 0x4de113 in std::_Function_handler<bool (), main::$_1>::_M_invoke(std::_Any_data const&) /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:282:9
    #11 0x4dda6d in std::function<bool ()>::operator()() const /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:687:14
    #12 0x4dda6d in RunBenchmark(std::function<void ()>, std::function<bool ()>, std::function<void ()>, unsigned long, double) build/../bench/harness.cpp:26:9
    #13 0x4dda6d in main build/../bench/harness.cpp:59:18
    #14 0x7f3a8a33d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Direct leak of 614400 byte(s) in 4 object(s) allocated from:
    #0 0x4daaed in operator new[](unsigned long) (/home/nabijaczleweli/uwu/solvespace/build/bin/solvespace-benchmark+0x4daaed)
    #1 0x573708 in SolveSpace::List<SolveSpace::STriangle>::ReserveMore(int) build/../src/dsc.h:218:31
    #2 0x573605 in SolveSpace::List<SolveSpace::STriangle>::AllocForOneMore() build/../src/dsc.h:230:13
    #3 0x55882b in SolveSpace::List<SolveSpace::STriangle>::Add(SolveSpace::STriangle const*) build/../src/dsc.h:235:9
    #4 0x55882b in SolveSpace::SMesh::AddTriangle(SolveSpace::STriangle const*) build/../src/mesh.cpp:37:7
    #5 0x55882b in SolveSpace::SMesh::MakeFromCopyOf(SolveSpace::SMesh*) build/../src/mesh.cpp:309:9
    #6 0x54d91e in SolveSpace::Group::GenerateShellAndMesh() build/../src/groupmesh.cpp:223:23
    #7 0x51288e in SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate, bool, bool) build/../src/generate.cpp:274:24
    #8 0x7ad1c2 in SolveSpace::SolveSpaceUI::AfterNewFile() build/../src/solvespace.cpp:433:5
    #9 0x4de113 in main::$_1::operator()() const build/../bench/harness.cpp:66:20
    #10 0x4de113 in std::_Function_handler<bool (), main::$_1>::_M_invoke(std::_Any_data const&) /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:282:9
    #11 0x4dd997 in std::function<bool ()>::operator()() const /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:687:14
    #12 0x4dd997 in RunBenchmark(std::function<void ()>, std::function<bool ()>, std::function<void ()>, unsigned long, double) build/../bench/harness.cpp:14:9
    #13 0x4dd997 in main build/../bench/harness.cpp:59:18
    #14 0x7f3a8a33d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

SUMMARY: AddressSanitizer: 7249920 byte(s) leaked in 84 allocation(s).

@whitequark
Copy link
Contributor Author

Does it do the same thing without mimalloc? I might just not be cleaning up enough stuff in the benchmark runner.

@nabijaczleweli
Copy link
Contributor

It yields this, which is identical, save for the addresses and times

nabijaczleweli@tarta:~/uwu/solvespace/build$ ./bin/solvespace-benchmark load ~/uwu/assembly.slvs
Iterations: 5
Time:       741.646 s
Per iter.:  148.329 s

=================================================================
==25540==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 3563520 byte(s) in 60 object(s) allocated from:
    #0 0x4daa7d in operator new[](unsigned long) (/home/nabijaczleweli/uwu/solvespace/build/bin/solvespace-benchmark+0x4daa7d)
    #1 0x573698 in SolveSpace::List<SolveSpace::STriangle>::ReserveMore(int) build/../src/dsc.h:218:31
    #2 0x573595 in SolveSpace::List<SolveSpace::STriangle>::AllocForOneMore() build/../src/dsc.h:230:13
    #3 0x5587bb in SolveSpace::List<SolveSpace::STriangle>::Add(SolveSpace::STriangle const*) build/../src/dsc.h:235:9
    #4 0x5587bb in SolveSpace::SMesh::AddTriangle(SolveSpace::STriangle const*) build/../src/mesh.cpp:37:7
    #5 0x5587bb in SolveSpace::SMesh::MakeFromCopyOf(SolveSpace::SMesh*) build/../src/mesh.cpp:309:9
    #6 0x6eda7b in .omp_outlined._debug__ build/../src/srf/surface.cpp:1059:13
    #7 0x6eda7b in .omp_outlined. build/../src/srf/surface.cpp:1053:1
    #8 0x7fea8275b6f2 in __kmp_invoke_microtask (/lib/x86_64-linux-gnu/libomp.so.5+0xb06f2)
    #9 0x7fea826eea42  (/lib/x86_64-linux-gnu/libomp.so.5+0x43a42)

Direct leak of 3072000 byte(s) in 20 object(s) allocated from:
    #0 0x4daa7d in operator new[](unsigned long) (/home/nabijaczleweli/uwu/solvespace/build/bin/solvespace-benchmark+0x4daa7d)
    #1 0x573698 in SolveSpace::List<SolveSpace::STriangle>::ReserveMore(int) build/../src/dsc.h:218:31
    #2 0x573595 in SolveSpace::List<SolveSpace::STriangle>::AllocForOneMore() build/../src/dsc.h:230:13
    #3 0x5587bb in SolveSpace::List<SolveSpace::STriangle>::Add(SolveSpace::STriangle const*) build/../src/dsc.h:235:9
    #4 0x5587bb in SolveSpace::SMesh::AddTriangle(SolveSpace::STriangle const*) build/../src/mesh.cpp:37:7
    #5 0x5587bb in SolveSpace::SMesh::MakeFromCopyOf(SolveSpace::SMesh*) build/../src/mesh.cpp:309:9
    #6 0x54d8ae in SolveSpace::Group::GenerateShellAndMesh() build/../src/groupmesh.cpp:223:23
    #7 0x51281e in SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate, bool, bool) build/../src/generate.cpp:274:24
    #8 0x7ad492 in SolveSpace::SolveSpaceUI::AfterNewFile() build/../src/solvespace.cpp:433:5
    #9 0x4de0a3 in main::$_1::operator()() const build/../bench/harness.cpp:66:20
    #10 0x4de0a3 in std::_Function_handler<bool (), main::$_1>::_M_invoke(std::_Any_data const&) /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:282:9
    #11 0x4dd9fd in std::function<bool ()>::operator()() const /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:687:14
    #12 0x4dd9fd in RunBenchmark(std::function<void ()>, std::function<bool ()>, std::function<void ()>, unsigned long, double) build/../bench/harness.cpp:26:9
    #13 0x4dd9fd in main build/../bench/harness.cpp:59:18
    #14 0x7fea81d8809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Direct leak of 614400 byte(s) in 4 object(s) allocated from:
    #0 0x4daa7d in operator new[](unsigned long) (/home/nabijaczleweli/uwu/solvespace/build/bin/solvespace-benchmark+0x4daa7d)
    #1 0x573698 in SolveSpace::List<SolveSpace::STriangle>::ReserveMore(int) build/../src/dsc.h:218:31
    #2 0x573595 in SolveSpace::List<SolveSpace::STriangle>::AllocForOneMore() build/../src/dsc.h:230:13
    #3 0x5587bb in SolveSpace::List<SolveSpace::STriangle>::Add(SolveSpace::STriangle const*) build/../src/dsc.h:235:9
    #4 0x5587bb in SolveSpace::SMesh::AddTriangle(SolveSpace::STriangle const*) build/../src/mesh.cpp:37:7
    #5 0x5587bb in SolveSpace::SMesh::MakeFromCopyOf(SolveSpace::SMesh*) build/../src/mesh.cpp:309:9
    #6 0x54d8ae in SolveSpace::Group::GenerateShellAndMesh() build/../src/groupmesh.cpp:223:23
    #7 0x51281e in SolveSpace::SolveSpaceUI::GenerateAll(SolveSpace::SolveSpaceUI::Generate, bool, bool) build/../src/generate.cpp:274:24
    #8 0x7ad492 in SolveSpace::SolveSpaceUI::AfterNewFile() build/../src/solvespace.cpp:433:5
    #9 0x4de0a3 in main::$_1::operator()() const build/../bench/harness.cpp:66:20
    #10 0x4de0a3 in std::_Function_handler<bool (), main::$_1>::_M_invoke(std::_Any_data const&) /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:282:9
    #11 0x4dd927 in std::function<bool ()>::operator()() const /bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:687:14
    #12 0x4dd927 in RunBenchmark(std::function<void ()>, std::function<bool ()>, std::function<void ()>, unsigned long, double) build/../bench/harness.cpp:14:9
    #13 0x4dd927 in main build/../bench/harness.cpp:59:18
    #14 0x7fea81d8809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

SUMMARY: AddressSanitizer: 7249920 byte(s) leaked in 84 allocation(s).

@whitequark
Copy link
Contributor Author

Yeah so this is a bug(ish) in the benchmark runner. Maybe worth asking the authors of mimalloc if that's still a restriction or might have been fixed?

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jul 9, 2020

Came across microsoft/mimalloc#96 (September 2019):

the assumption that a heap as accessed only from its owner thread is deeply builtin into mimalloc; only free operations can be done from any thread.

Expanded in microsoft/mimalloc#214 (March):

In mimalloc each thread has its own heap and no other thread can allocate or free in there: the only thing other threads (B) can do is add blocks to the page's thread_local_free list and the heap's thread_delayed_free list -- on each page, other threads will add first to the owning heap's thread (A) delayed free list, and after that to the page local one for performance.

This sounds like a deeply-ingrained design decision we're not hitting the corner of, so I wouldn't really get my hopes up?

@whitequark
Copy link
Contributor Author

Okay, so you can wrap mimalloc_heap_t into a RAII wrapper and put it into a C++11 thread_local variable, right? That should take care of the requirements of mimalloc.

nabijaczleweli added a commit to nabijaczleweli/solvespace that referenced this issue Jul 9, 2020
@nabijaczleweli
Copy link
Contributor

Something like nabijaczleweli@dc98046?

@whitequark
Copy link
Contributor Author

Well... no. Sure, the worker threads all clean up after themselves once they quit, but the main thread doesn't, so you still do need FreeAllTemporary.

nabijaczleweli added a commit to nabijaczleweli/solvespace that referenced this issue Jul 9, 2020
@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jul 9, 2020

Ah yeah, didn't think about that; nabijaczleweli@f123199?

@whitequark
Copy link
Contributor Author

Something like that

nabijaczleweli added a commit to nabijaczleweli/solvespace that referenced this issue Jul 9, 2020
The heaps are wrapped in a RAIIish thread_local handler,
since being affined affined to a single thread for allocations is
required by the API

Ref: solvespace#642
nabijaczleweli added a commit to nabijaczleweli/solvespace that referenced this issue Jul 9, 2020
The heaps are wrapped in a RAIIish thread_local handler,
since being affined affined to a single thread for allocations is
required by the API

Ref: solvespace#642
nabijaczleweli added a commit to nabijaczleweli/solvespace that referenced this issue Jul 9, 2020
The heaps are wrapped in a RAIIish thread_local handler,
since being affined affined to a single thread for allocations is
required by the API

Ref: solvespace#642
whitequark pushed a commit that referenced this issue Jul 12, 2020
The heaps are wrapped in a RAIIish thread_local handler,
since being affined affined to a single thread for allocations is
required by the API

Ref: #642
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants