New file format #75

whitequark · 2016-10-14T01:51:45Z

SolveSpace would benefit from a file format that suits the following required properties:

compact, with a binary representation;
has an alternate text representation, for tests;
fast to read and write;
easily extensible, while being backwards compatible;
easily supports deeply nested structures, such as for hierarchical sketches.

In addition, an optional but desirable properties would be:

easily readable and perhaps even writable from 3rd party software.

The current file format fulfills none of the above (not even the last requirement, since there is no explicit schema). I propose to replace it with a format based on Google Protocol Buffers, which has the following properties:

the binary representation is naturally compact--it writes your data and generally does no special tricks during it, except that it defines a variable-length encoding for integers;
the text representation is available alongside binary representation, and requires no manual work;
has a highly optimized reader and writer implementation in C++;
is literally designed because of a desire to handle protocol evolution well;
is naturally nested;
has a separate schema and very broad library support.

Such a format, with the initial version closely resembling the existing file format, seems like an ideal choice for us. The plan is to evolve it further by gradual extension, while remaining compatible with the old versions, as opposed to versioning; unknown entries are, generally, ignored, so with some care and testing, it is possible to retain an excellent degree of compatibility.

whitequark · 2016-10-14T01:53:44Z

@jwesthues Any objections? Disregard my earlier thoughts about using LLVM bitcode; after an in-depth discussion with LLVM folks that turned out to be a rather poor choice.

whitequark · 2016-10-14T02:01:57Z

The new file format should also use 64-bit handles (as would the in-memory representation). The rationale is that, once we make large assemblies fast, remapping of entities can rapidly exhaust the 65535-entity limit for one group, if one imports many instances of a large sketch. This migration is an ideal opportunity to add the 32→64 bit expansion code robustly.

jwesthues · 2016-10-14T03:29:43Z

That seems generally reasonable. It would be fashionable to use JSON or some other text representation today; but especially if you optimize assemblies to make better use of the GPU, a binary format's speedup seems material.

whitequark · 2016-10-14T03:53:41Z

Actually, the version 3 of Protocol Buffers has a canonical JSON serialization, so we could just use that. (Google generally recommends version 2, for reasons that I do not fully understand; but 3 is probably fine too.)

whitequark · 2016-10-14T12:26:47Z

It's also worth taking a look at Cap'n Proto, designed by the original author of Protocol Buffers while taking into account many of the downsides of the latter. It has some rather desirable properties, but a) completely lacks a human-readable format usable at runtime (in fact the author recommends the use of JSON in that case); b) lacks a bidirectional JSON parser and serializer; c) is not supported nearly as broadly. Overall I feel it's not the right choice, but it deserves a mention.

Evil-Spirit · 2016-10-15T03:05:25Z

The rationale is that, once we make large assemblies fast, remapping of entities can rapidly exhaust the 65535-entity limit

Actually when I implementing dxf import, I've easily found file which doesn't fit into 64k handles.

Evil-Spirit · 2016-10-15T03:24:58Z

The main problem is to build format which don't contains implementation details like remapping-lists. This format must be human-readable (in text form). I think it is good idea to have two sections:

The basic compact format representation - it is only reqests, groups and main parameters. This is fully-human readable and logically based. It must be sufficient to build whole sketch using only this part.
The precomputed part. This is for fast loading while building assemblies. By default this included into file, but can be avoided by user intention (for more compact files).

For the new-version migration we can just use 1) and don't bother about 2)

jwesthues · 2016-10-15T03:26:26Z

The main problem is to build format which don't contains implementation details like remapping-lists.

Without remapping, how do you propose to support nested assemblies?

Evil-Spirit · 2016-10-15T03:30:22Z

@jwesthues
Use some effective way to represent handle-chain path to entities. Like FullSubentPath used in AutoCAD.

jwesthues · 2016-10-15T03:31:31Z

You mean by making the entity's unique identifier variable-length?

Evil-Spirit · 2016-10-15T03:39:35Z

@jwesthues, yes. Since we want to have some nested sketches (assembly is one of the cases), we don't want to have one big heap of entities, so we can use variable-length identifiers.

jwesthues · 2016-10-15T03:43:52Z

@Evil-Spirit So what used to be a fixed 64-bit field now must be allocated dynamically for every single entity, and you also break backwards compatibility. What's the offsetting benefit?

Evil-Spirit · 2016-10-15T04:02:46Z

@jwesthues, entities itself can contain only current-sketch-id (or group) as it was before. Only constraints should contain full-path since it can be applied between different groups/sketches.

benefits:

This remove remap lists from files.
This allows nested asm-in-asm strutures without exp grow the memory needs
This faster of generation
This makes ability to store only one sketch-copy in memory and insert it multiple times just using asm-group with csys basis and pointer to sketch.

I know what this is dramatical change and not all the implementations aspects is clear. But this is just the way we can think about new format. May be we can invent someting more robust.

jwesthues · 2016-10-15T04:14:22Z

@Evil-Spirit The thing that dominates regeneration time is constraint solution and surface Booleans, and this speeds up neither. The memory saving is not material. You've proposed this change before, and I still don't see the benefit.

Evil-Spirit · 2016-10-15T04:14:29Z

Selection can operate on the nested structures and generate path for constraints. Actually existance of each entity in memory is not needed, it can be generated on the fly from requests and stored parameters for writing constraint equations of performing visualization. Reqesting for entity by its id inside group can use reqest id stored in entity id, entity id inside request and unique entity id inside group to decide what type and which params to take...

Evil-Spirit · 2016-10-15T04:27:06Z

@jwesthues
Constraint solution time is not the problem. I know what we can do with.
Booleans is pain, but still can be improved using some external libraries or improving the current solutions. It can be abstracted (as it actually done for using mesh or surfaces).

But we talking about file format and basis for all of aspects. It something like if we solve one small problem at now this solves a couple of big problems for the future. I not just talking about real changing, just thinking about how it can be done. May be some of this ideas help make better file format or solve some problems

whitequark · 2016-10-15T05:53:26Z

@Evil-Spirit

Actually when I implementing dxf import, I've easily found file which doesn't fit into 64k handles.

I don't consider importing such files useful (what are you going to do with them exactly?), so it doesn't matter. This is just an artificial case.

This format must be human-readable (in text form).

I agree that there should be a human-readable representation, for one reason only: we need to write tests. I see no reason to have the human-readable representation, as our only or main one. There is no benefit (it is actually easier to read protobuf files, because you don't have to write any parsers), and there are drawbacks: the files are bloated, and the parsing is slow.

For the new-version migration we can just use 1) and don't bother about 2)

Nope, the new file format should be (at least) functionally equivalent to the old one.

whitequark · 2016-10-15T05:55:13Z

@jwesthues Can you please write a (more in-depth than the comment in sketch.h) explanation of the function that remapping serves? I see that it's used for both assemblies and any groups that produce entities from entities, but I don't follow the logic behind that.

jwesthues · 2016-10-15T06:14:33Z

@whitequark To maintain an associative link between two entities (like an entity imported from a part, and the imported entity in the assembly), you can either identify entities by a complete path that specifies the associative link, or keep a table that records the association. Assuming arbitrarily deep import nesting, the former requires arbitrarily long identifiers. The latter is that remap table.

Evil-Spirit · 2016-10-15T06:17:06Z

I don't consider importing such files useful (what are you going to do with them exactly?), so it doesn't matter. This is just an artificial case.

Actually this is not such big file as you can think. And SolveSpace is good enough to be good pure 2d editor (since we made dxf import-export).
look at
https://github.com/whitequark/solvespace-testcases/tree/master/tests/import/Download

Evil-Spirit · 2016-10-15T06:45:05Z

@jwesthues, This sounds resoneable for assemblies (where we actually have the case with asm-in-asm hierarchy and case where we have the arbitrary number of the same details), but why this is for each group? Can we painless avoid this?

whitequark · 2016-10-15T06:45:58Z

Speaking of Protocol Buffers... I'm looking further into it and I see some potential scalability problems. Specifically, Protocol Buffers are designed to work by deserializing the entire message at once. The problems here are twofold:

Parsing demands. In general the documentation recommends to avoid messages larger than 1MB, has a 64MB soft-limit to guard against DoS, and has a 512MB hard-limit, at which point it breaks due to integer overflows. Additionally, parsing involves allocation (which is cheap, as it uses arenas, but it still expands the input in memory) as well as deserialization from the wire format.
Lack of a seek capability. When we are loading sketch that we know we will regenerate, it is absolutely wasteful to read all the meshes and such. Similarly, with hierarchical sketches, we might not be interested in many of them at all.

Cap'n Proto solves both quite elegantly. It looks like I'll have to prototype some code with both...

jwesthues · 2016-10-15T06:51:34Z

@Evil-Spirit The nesting of associative links can get pretty deep even without assemblies, like if you sketch a contour, step translating, then step translating again for a 2d array, then extrude, then ...

Evil-Spirit · 2016-10-15T07:05:17Z

@jwesthues, This seems like case where we can apply just remapping function instead of storing all this inside map... or not?

jwesthues · 2016-10-15T07:11:36Z

@Evil-Spirit How do you remap in a way that (a) uses a fixed-length identifier, (b) maintains the associative link, even when entities in the input group are added and deleted, and (c) doesn't require that table?

Evil-Spirit · 2016-10-15T07:25:58Z

@jwesthues, yes, (b) is serious fact. actually this solved if use pointers instead of id for runtime and use ids only for saving.. but this is the different story.

baryluk · 2020-01-16T12:46:30Z

+1 for the protocol buffers based format. It makes adding new fields very easy, and parsing / exporting is 1) fast, 2) easy to get right, 3) extensible in the future, 4) easy to use in other tools (like CLI post-process or analyzer written possibly in other language), 5) compact, 6) easy to debug even without access to protocol buffer description, 6) type checking works out of the box.

I can't stand XML and JSON in any serious project.

Maybe I am biased, but I used Protocol Buffers for last 8 years, and never found it problematic.

traverseda · 2020-01-16T13:00:02Z

I still think capnproto is a good idea, even years later. You can memory map a capnproto file (read only) and use copy-on-write techniques to make loading of complicated assemblies nearly instant, if your disc-format closely follows the actual data structures you use internally.

baryluk · 2020-01-16T13:51:13Z

@traverseda capnproto is not significantly faster, in fact if you include the object creation it can be much slower. capnproto can be a security risk, it doesn't validate all the fields afaik as strictly as protobuf during read. capnproto can and often is bigger, compared to protobuf, quite significantly (2-4x times in pathological cases). I did many benchmarks in the past, and capnproto was often the slowest of all serialization extensible binary methods.

So I would still lean on protobuf. Because it is good enough, extensible and has good cross platform support in many tools. Using a niche format wouldn't be a good idea.

Sure capnproto is somehow popular, but for how long? Plus IMHO capnproto IDL is ugly.

The best is to actually do benchmarks and comparison and do proper engineering on this topic. As selecting a format is quite a big deal long term. It impacts both changeability, stability of the file API, and also how you use things in the C++ code (setters, getters, extensions, etc).

whitequark · 2020-01-16T19:39:13Z

The current plan is to use neither capnproto nor protobuffers but rather flatbuffers. Especially protobuf v3 has many awful design decisions that only pass for good because people think Google knows what they're doing and they aren't.

baryluk · 2020-01-16T20:05:47Z

Well, I don't like protobuf v3 either, not only due to change of some features (things like optional fields, extensions, etc), but also how they are used in C++ and Go. I like the v2 and had the most experience with it.

I don't have opinion on flatbuffers. I am just always skeptical of new solutions that claim to be superior / better. Usually they might be better in some aspects, but worse in others.

whitequark · 2020-01-16T20:08:36Z

It's a good thing I spent a while going into detail on the possible solutions here, such as the two reasons protobuf v2 isn't suitable for the task.

baryluk · 2020-01-16T20:24:00Z

@whitequark Makes sense. Out of curiosity what kind of sizes of slvs file or memory usage you seen as an extreme?

My slvs files are usually 1, and are pretty simple, but that is in the text format. doing a quick gzip -1 on them to "emulate" binary format and remove basic field name repetitions, brings them down top 132kB.

I can imaging in current slvs format some medium projects can be pretty big indeed.

whitequark · 2020-01-16T20:27:40Z

Out of curiosity what kind of sizes of slvs file or memory usage you seen as an extreme?

The current slvs format includes the entire rendered 3d model for inclusion in subassemblies. This can grow in an essentially unbounded way, easily many megabytes, and well over the recommended limit for protobuf. If the authors of protobuf specifically say to not use it in cases like this one, I would rather trust them.

Flatbuffers allows arbitrary seeking and provides zero-copy parsing so it solves this issue nicely. Unused rendered model will not incur any overhead if we memory map the uncompressed file, or only minor overhead if we use compressed models carefully.

baryluk · 2020-01-16T20:29:08Z

@whitequark Yes, I can confirm that ProtoBuffers are rather a bad fit for 100MB+ messages. They are simply not designed / optimized in that in mind from scratch.

baryluk · 2020-01-16T20:54:09Z

I see solvespace already do have flatbuffers as a dependency used in one of the exporters. Nice. 10% already done :D

I looked into file.cpp and it is relatively short, to at least trivially start using flatbuffers, so it doesn't look to be impossible task. It will not get all the benefits initially, but could be a good start.

Did anybody start designing a flatbuffers schema taking into account also the other future wishlists (like hierarchical sketches) that could impact the format?

whitequark · 2020-01-16T20:56:06Z

No. I've been sick for several years and only had a limited ability to work on SolveSpace, unfortunately.

baryluk · 2020-01-16T20:56:54Z

@whitequark Ok, no worries! I will see what I can do, and do some prototyping for fun.

whitequark · 2020-01-16T20:59:12Z

Just keep in mind that it is unlikely that your prototype will be merged in any reasonable timeframe.

ConductiveInsulation · 2021-12-05T20:58:08Z

I think it also would be great to have the possibility to add comments inside the slvs file, currently it's possible to add a comment somewhere in the text but it'll get removed when the "corrupted" file gets repaired.

I use the comments when I use git to share some files (works great since they are just text files) so it's not really helpful when they just disappear, having them in a separate file also adds some inconvenience

A possible solution would be to allow longer group names, I haven't counted the letters but after a certain length Solvespace just crashes while opening the file. Alternatively a Group or file description might be doable.

phkahler · 2021-12-06T00:43:27Z

@ConductiveInsulation you know you can put comments directly on the sketch right? Press ; you can also select a point first to position a comment relative to it.

ConductiveInsulation · 2021-12-06T05:23:04Z

directly on the sketch .

Do you mean the text function? It incredibly bloats the file since not only the Message gets stored but also the shape/extrusion of the message. Unfortunately the Message itself also doesn't get stored on the beginning of the file so it won't be visible when the file is viewed because it for example got directly linked.

I made a example how i currently do it: https://bitbucket.org/conductiveinsulation/solvespace/raw/b638da1b1d850ff2bbe52965f9cc919ecf7775f9/Logo.slvs

ruevs · 2021-12-06T06:53:00Z

@ConductiveInsulation No, not the "Text" function. The "Comment" function from the menu Constrain | Comment or with the ; (semicolon) shortcut key. If you select a point before adding the comment it will be positioned relative to the point and will move with it.

ConductiveInsulation · 2021-12-06T07:28:55Z

@ruevs I'm using Solvespace for a pretty long time now but I have to admit that I must have overlooked that function totally. I'll look into it after work and give you a update.

ConductiveInsulation · 2021-12-06T13:36:35Z

@ruevs it appears to be Impossible to store the comments in the beginning of the file since the constraints get stored in the end of the file

hinell · 2022-02-12T21:40:39Z

~~Would certainly go for JSON based format. It would be best for WEB based software to handle such files. Performance of loading is less important when compared to a modelling one.~~

~~I would also vote for zip-based archiving approach. FreeCAD uses something like that for their main format.~~

hinell · 2023-08-11T19:30:54Z

Checkout SQLite. I think it would be best approach to utilize SQLite DB as application file container. It has so many benefits I can't even describe it. The only major drawback is that it's not easily stremable: you have to buffer data.

KronosTheLate · 2023-10-10T11:24:54Z

I want to mention that I had some issues getting my system to recognize the correct "MIME" type for `.slvs" files. (Linux)
My band-aid fix was to create a file "/usr/share/mime/packagessolvespace.xml", and put the following content inside:

<?xml version="1.0" encoding="utf-8"?>
<mime-type type="text/solvespace">
  <comment>Solvespace model source code</comment>
  <glob pattern="*.slvs"/>
</mime-type>

After running sudo update-mime-database /usr/share/mime in the shell, the file extension is now correctly recognized.

EDIT: I created this issue to get some help integrating the solvespace file type into the shared mime database.

whitequark added the internals label Oct 14, 2016

whitequark added this to the 3.0 milestone Oct 14, 2016

whitequark mentioned this issue Oct 14, 2016

Chirality-preserving constraints #76

Open

whitequark mentioned this issue Oct 14, 2016

Parametric sketches #77

Open

whitequark mentioned this issue May 21, 2019

WIP: Add an Emscripten port #419

Closed

13 tasks

whitequark removed this from the 3.0 milestone May 23, 2019

whitequark mentioned this issue May 25, 2019

"Step and repeat translating" optimization? #327

Closed

giannissc mentioned this issue Dec 11, 2020

Personal Issue Tracker #850

Closed

kekscode mentioned this issue Jun 7, 2021

Compress slvs files #1055

Open

ruevs mentioned this issue Feb 10, 2022

Performance drop with a step and repeat rotating #1214

Closed

Thovthe mentioned this issue Apr 15, 2022

Make Sketch self-contained #78

Open

dgramop mentioned this issue Dec 6, 2023

Address crash described in #1428 #1432

Closed

New file format #75

New file format #75

Comments

whitequark commented Oct 14, 2016 • edited Loading

whitequark commented Oct 14, 2016

whitequark commented Oct 14, 2016

jwesthues commented Oct 14, 2016

whitequark commented Oct 14, 2016

whitequark commented Oct 14, 2016

Evil-Spirit commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016 • edited Loading

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016 • edited Loading

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016 • edited Loading

whitequark commented Oct 15, 2016

whitequark commented Oct 15, 2016

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016 • edited Loading

Evil-Spirit commented Oct 15, 2016

whitequark commented Oct 15, 2016

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016

jwesthues commented Oct 15, 2016

Evil-Spirit commented Oct 15, 2016

baryluk commented Jan 16, 2020 • edited Loading

traverseda commented Jan 16, 2020

baryluk commented Jan 16, 2020

whitequark commented Jan 16, 2020

baryluk commented Jan 16, 2020

whitequark commented Jan 16, 2020

baryluk commented Jan 16, 2020

whitequark commented Jan 16, 2020

baryluk commented Jan 16, 2020

baryluk commented Jan 16, 2020

whitequark commented Jan 16, 2020

baryluk commented Jan 16, 2020

whitequark commented Jan 16, 2020

ConductiveInsulation commented Dec 5, 2021

phkahler commented Dec 6, 2021

ConductiveInsulation commented Dec 6, 2021

ruevs commented Dec 6, 2021 • edited Loading

ConductiveInsulation commented Dec 6, 2021

ConductiveInsulation commented Dec 6, 2021

hinell commented Feb 12, 2022 • edited Loading

hinell commented Aug 11, 2023

KronosTheLate commented Oct 10, 2023 • edited Loading

whitequark commented Oct 14, 2016 •

edited

Loading

Evil-Spirit commented Oct 15, 2016 •

edited

Loading

Evil-Spirit commented Oct 15, 2016 •

edited

Loading

Evil-Spirit commented Oct 15, 2016 •

edited

Loading

Evil-Spirit commented Oct 15, 2016 •

edited

Loading

baryluk commented Jan 16, 2020 •

edited

Loading

ruevs commented Dec 6, 2021 •

edited

Loading

hinell commented Feb 12, 2022 •

edited

Loading

KronosTheLate commented Oct 10, 2023 •

edited

Loading