-
Notifications
You must be signed in to change notification settings - Fork 512
New file format #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jwesthues Any objections? Disregard my earlier thoughts about using LLVM bitcode; after an in-depth discussion with LLVM folks that turned out to be a rather poor choice. |
The new file format should also use 64-bit handles (as would the in-memory representation). The rationale is that, once we make large assemblies fast, remapping of entities can rapidly exhaust the 65535-entity limit for one group, if one imports many instances of a large sketch. This migration is an ideal opportunity to add the 32→64 bit expansion code robustly. |
That seems generally reasonable. It would be fashionable to use JSON or some other text representation today; but especially if you optimize assemblies to make better use of the GPU, a binary format's speedup seems material. |
Actually, the version 3 of Protocol Buffers has a canonical JSON serialization, so we could just use that. (Google generally recommends version 2, for reasons that I do not fully understand; but 3 is probably fine too.) |
It's also worth taking a look at Cap'n Proto, designed by the original author of Protocol Buffers while taking into account many of the downsides of the latter. It has some rather desirable properties, but a) completely lacks a human-readable format usable at runtime (in fact the author recommends the use of JSON in that case); b) lacks a bidirectional JSON parser and serializer; c) is not supported nearly as broadly. Overall I feel it's not the right choice, but it deserves a mention. |
Actually when I implementing dxf import, I've easily found file which doesn't fit into 64k handles. |
The main problem is to build format which don't contains implementation details like remapping-lists. This format must be human-readable (in text form). I think it is good idea to have two sections:
For the new-version migration we can just use 1) and don't bother about 2) |
Without remapping, how do you propose to support nested assemblies? |
@jwesthues |
You mean by making the entity's unique identifier variable-length? |
@jwesthues, yes. Since we want to have some nested sketches (assembly is one of the cases), we don't want to have one big heap of entities, so we can use variable-length identifiers. |
@Evil-Spirit So what used to be a fixed 64-bit field now must be allocated dynamically for every single entity, and you also break backwards compatibility. What's the offsetting benefit? |
@jwesthues, entities itself can contain only current-sketch-id (or group) as it was before. Only constraints should contain full-path since it can be applied between different groups/sketches. benefits:
I know what this is dramatical change and not all the implementations aspects is clear. But this is just the way we can think about new format. May be we can invent someting more robust. |
@Evil-Spirit The thing that dominates regeneration time is constraint solution and surface Booleans, and this speeds up neither. The memory saving is not material. You've proposed this change before, and I still don't see the benefit. |
Selection can operate on the nested structures and generate path for constraints. Actually existance of each entity in memory is not needed, it can be generated on the fly from requests and stored parameters for writing constraint equations of performing visualization. Reqesting for entity by its id inside group can use reqest id stored in entity id, entity id inside request and unique entity id inside group to decide what type and which params to take... |
@jwesthues But we talking about file format and basis for all of aspects. It something like if we solve one small problem at now this solves a couple of big problems for the future. I not just talking about real changing, just thinking about how it can be done. May be some of this ideas help make better file format or solve some problems |
I don't consider importing such files useful (what are you going to do with them exactly?), so it doesn't matter. This is just an artificial case.
I agree that there should be a human-readable representation, for one reason only: we need to write tests. I see no reason to have the human-readable representation, as our only or main one. There is no benefit (it is actually easier to read protobuf files, because you don't have to write any parsers), and there are drawbacks: the files are bloated, and the parsing is slow.
Nope, the new file format should be (at least) functionally equivalent to the old one. |
@jwesthues Can you please write a (more in-depth than the comment in sketch.h) explanation of the function that remapping serves? I see that it's used for both assemblies and any groups that produce entities from entities, but I don't follow the logic behind that. |
@whitequark To maintain an associative link between two entities (like an entity imported from a part, and the imported entity in the assembly), you can either identify entities by a complete path that specifies the associative link, or keep a table that records the association. Assuming arbitrarily deep import nesting, the former requires arbitrarily long identifiers. The latter is that remap table. |
Actually this is not such big file as you can think. And SolveSpace is good enough to be good pure 2d editor (since we made dxf import-export). |
@jwesthues, This sounds resoneable for assemblies (where we actually have the case with asm-in-asm hierarchy and case where we have the arbitrary number of the same details), but why this is for each group? Can we painless avoid this? |
Speaking of Protocol Buffers... I'm looking further into it and I see some potential scalability problems. Specifically, Protocol Buffers are designed to work by deserializing the entire message at once. The problems here are twofold:
Cap'n Proto solves both quite elegantly. It looks like I'll have to prototype some code with both... |
@Evil-Spirit The nesting of associative links can get pretty deep even without assemblies, like if you sketch a contour, step translating, then step translating again for a 2d array, then extrude, then ... |
@jwesthues, This seems like case where we can apply just remapping function instead of storing all this inside map... or not? |
@Evil-Spirit How do you remap in a way that (a) uses a fixed-length identifier, (b) maintains the associative link, even when entities in the input group are added and deleted, and (c) doesn't require that table? |
@jwesthues, yes, (b) is serious fact. actually this solved if use pointers instead of id for runtime and use ids only for saving.. but this is the different story. |
+1 for the protocol buffers based format. It makes adding new fields very easy, and parsing / exporting is 1) fast, 2) easy to get right, 3) extensible in the future, 4) easy to use in other tools (like CLI post-process or analyzer written possibly in other language), 5) compact, 6) easy to debug even without access to protocol buffer description, 6) type checking works out of the box. I can't stand XML and JSON in any serious project. Maybe I am biased, but I used Protocol Buffers for last 8 years, and never found it problematic. |
I still think capnproto is a good idea, even years later. You can memory map a capnproto file (read only) and use copy-on-write techniques to make loading of complicated assemblies nearly instant, if your disc-format closely follows the actual data structures you use internally. |
@traverseda capnproto is not significantly faster, in fact if you include the object creation it can be much slower. capnproto can be a security risk, it doesn't validate all the fields afaik as strictly as protobuf during read. capnproto can and often is bigger, compared to protobuf, quite significantly (2-4x times in pathological cases). I did many benchmarks in the past, and capnproto was often the slowest of all serialization extensible binary methods. So I would still lean on protobuf. Because it is good enough, extensible and has good cross platform support in many tools. Using a niche format wouldn't be a good idea. Sure capnproto is somehow popular, but for how long? Plus IMHO capnproto IDL is ugly. The best is to actually do benchmarks and comparison and do proper engineering on this topic. As selecting a format is quite a big deal long term. It impacts both changeability, stability of the file API, and also how you use things in the C++ code (setters, getters, extensions, etc). |
The current plan is to use neither capnproto nor protobuffers but rather flatbuffers. Especially protobuf v3 has many awful design decisions that only pass for good because people think Google knows what they're doing and they aren't. |
Well, I don't like protobuf v3 either, not only due to change of some features (things like optional fields, extensions, etc), but also how they are used in C++ and Go. I like the v2 and had the most experience with it. I don't have opinion on flatbuffers. I am just always skeptical of new solutions that claim to be superior / better. Usually they might be better in some aspects, but worse in others. |
It's a good thing I spent a while going into detail on the possible solutions here, such as the two reasons protobuf v2 isn't suitable for the task. |
@whitequark Makes sense. Out of curiosity what kind of sizes of My I can imaging in current slvs format some medium projects can be pretty big indeed. |
The current slvs format includes the entire rendered 3d model for inclusion in subassemblies. This can grow in an essentially unbounded way, easily many megabytes, and well over the recommended limit for protobuf. If the authors of protobuf specifically say to not use it in cases like this one, I would rather trust them. Flatbuffers allows arbitrary seeking and provides zero-copy parsing so it solves this issue nicely. Unused rendered model will not incur any overhead if we memory map the uncompressed file, or only minor overhead if we use compressed models carefully. |
@whitequark Yes, I can confirm that ProtoBuffers are rather a bad fit for 100MB+ messages. They are simply not designed / optimized in that in mind from scratch. |
I see solvespace already do have flatbuffers as a dependency used in one of the exporters. Nice. 10% already done :D I looked into Did anybody start designing a flatbuffers schema taking into account also the other future wishlists (like hierarchical sketches) that could impact the format? |
No. I've been sick for several years and only had a limited ability to work on SolveSpace, unfortunately. |
@whitequark Ok, no worries! I will see what I can do, and do some prototyping for fun. |
Just keep in mind that it is unlikely that your prototype will be merged in any reasonable timeframe. |
I think it also would be great to have the possibility to add comments inside the slvs file, currently it's possible to add a comment somewhere in the text but it'll get removed when the "corrupted" file gets repaired. I use the comments when I use git to share some files (works great since they are just text files) so it's not really helpful when they just disappear, having them in a separate file also adds some inconvenience A possible solution would be to allow longer group names, I haven't counted the letters but after a certain length Solvespace just crashes while opening the file. Alternatively a Group or file description might be doable. |
@ConductiveInsulation you know you can put comments directly on the sketch right? Press ; you can also select a point first to position a comment relative to it. |
Do you mean the text function? It incredibly bloats the file since not only the Message gets stored but also the shape/extrusion of the message. Unfortunately the Message itself also doesn't get stored on the beginning of the file so it won't be visible when the file is viewed because it for example got directly linked. I made a example how i currently do it: https://bitbucket.org/conductiveinsulation/solvespace/raw/b638da1b1d850ff2bbe52965f9cc919ecf7775f9/Logo.slvs |
@ConductiveInsulation No, not the "Text" function. The "Comment" function from the menu |
@ruevs I'm using Solvespace for a pretty long time now but I have to admit that I must have overlooked that function totally. I'll look into it after work and give you a update. |
@ruevs it appears to be Impossible to store the comments in the beginning of the file since the constraints get stored in the end of the file |
I would also vote for zip-based archiving approach. FreeCAD uses something like that for their main format. |
Checkout SQLite. I think it would be best approach to utilize SQLite DB as application file container. It has so many benefits I can't even describe it. The only major drawback is that it's not easily stremable: you have to buffer data. |
I want to mention that I had some issues getting my system to recognize the correct "MIME" type for `.slvs" files. (Linux) <?xml version="1.0" encoding="utf-8"?>
<mime-type type="text/solvespace">
<comment>Solvespace model source code</comment>
<glob pattern="*.slvs"/>
</mime-type> After running EDIT: I created this issue to get some help integrating the solvespace file type into the shared mime database. |
SolveSpace would benefit from a file format that suits the following required properties:
In addition, an optional but desirable properties would be:
The current file format fulfills none of the above (not even the last requirement, since there is no explicit schema). I propose to replace it with a format based on Google Protocol Buffers, which has the following properties:
Such a format, with the initial version closely resembling the existing file format, seems like an ideal choice for us. The plan is to evolve it further by gradual extension, while remaining compatible with the old versions, as opposed to versioning; unknown entries are, generally, ignored, so with some care and testing, it is possible to retain an excellent degree of compatibility.
The text was updated successfully, but these errors were encountered: