Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRC part is also possible with v0/v1 #110

Closed
wants to merge 1 commit into from

Conversation

JeromeMartinez
Copy link
Contributor

FFmpeg permits to add CRC data after SliceContent in a v0/v1 stream (e.g. ./ffmpeg -f lavfi -i mandelbrot -t 1 -c ffv1 -slicecrc 1 a.mkv), so we document it (for the moment, no extra data is expected after the SliceContent part in v0/v1, so the stream may be considered as invalid).
It is an implicit signaling (no ec metadata), so we detect the presence of CRC by checking if 5 bytes are remaining at the end of the frame.

FFmpeg permits to add CRC data after a v0/v1 stream, so we document it.
@michaelni
Copy link
Member

The implementation does not detect the CRC in these files. So specification and decoder implementation currently agree that this is not correct.
Its the encoder implementation that should be fixed not the specification

@michaelni michaelni closed this Apr 20, 2018
@JeromeMartinez
Copy link
Contributor Author

Having the decoder ignoring it is not important IMO (does not impact other decoders)
The encoder does not agree with specifications.
Should we consider that all files created with FFmpeg -c ffv1 -slicecrc 1 (in the case versions 0/1 are the ones by default in FFmpeg) are invalid?

IMO the goal when doing this specification is to document files in the wild, and here such files are in the wild, decoding is fine with such files, I don't like the idea to consider such files currently created by FFmpeg as invalid when there is no decoding issue with it.

Just to be clear: do you say that FFmpeg may currently create, with -c ffv1 -slicecrc 1, invalid FFV1 streams, and that someone looking for long term preservation should rencode such files before storing it?

@michaelni
Copy link
Member

If the decoder doesnt decode it and the specification doesnt match it. Then i think there can be no question that it is invalid currently.
Do you disagree ?
Adding it to the specification now will not make it work better with implementations based on the current specification.
Also is there really an action required ? The implementation thats based on the clean specification decodes the file it just ignores the crc IIUC

Also I do not think the specification should describe every ffv1 file out there no matter what buggy version or potentially intentionally contradictionary parameters it was encoded with. And declare every such file valid.
the specification should describe primarly the intended and valid syntax. It also should describe enough so all files can be decoded correctly, We IMO very conciously do not want to create more oddball files even if they can be decoded. I think we should try to keep the spec clean and easy to implement
Do you disagree about some of this ? I mostly just write what felt common sense to me but i may easyly be missing something

@JeromeMartinez
Copy link
Contributor Author

I answered too vaguely yesterday.

So for your sentence: "So specification and decoder implementation currently agree that this is not correct" no they don't agree.

  • Decoder says that if there are trailing bytes in the bitstream, ignore it.
  • Specification says that there MUST NOT be any trailing bytes.

Specification adds a constraint compared to decoder.
Decoder has no problem with this difference, but it does not mean that the bitstream having trailing bytes are valid, from specs.

Adding it to the specification now will not make it work better with implementations based on the current specification.

IMO out of topic: permitting CRC in v0/v1 does not mandate handling CRC.

Also is there really an action required ?

Yes, because you think that FFmpeg does not create invalid bitstream because the decoder decodes it correctly, but the specification explicitly says that bitstream with trailing bytes are invalid (it does not permit it).

Do you disagree about some of this ?

I disagree, right, because the current spec says that bitstream created by "FFmpeg -level 1 -slicecrc 1" are invalid.

I think we should try to keep the spec clean and easy to implement

My patch does not make implementations more complex: both encoder and decoder can ignore this patch (encoder does not create that, decoder ignores as FFmpeg decoder does currently. Nothing in spec mandates to compute CRC, whatever is the version, it is optional).
Note that indicating that "we should keep spec clean and easy to implement" means that we should remove exceptions we already have (it would make my decoder a lot more simpler!), I understood that the discussion was to document files in the wild, not to have clean specs.


I see 2 different issues:

1/ We need to decide (no choice "don't decide", no decision is a decision) if trailing bytes in a bitstream are valid or not valid (for v0/v1, I think we all agree it is invalid for v3)
1a/ We decide it is invalid (we let specs as is): invalid files are in the wild (the ones created with "FFmpeg -level 1 -slicecrc 1" for the last few years). I don't like this idea at all, the goal by creating spec is to describe files in the wild, bugs included (reason with have exceptions in the current spec), and I explicitely ask to you that confirm that you consider such bitstream as invalid, so we are clear about that.
1b/ We decide it is valid: we need to add in spec "if( version <= 1 && remaining_bits_in_bitstream( NumBytes ) ) then reserved"

2/ the only files I found with "reserved" bytes are the ones with CRC, and I don't see something against documenting that: implementation would not be more complex (this part can be ignored by encoders and decoders), and it adds the crc feature to v0/v1. Additionally, it permits in theory FFV1 fixers to fix by eliminating invalid decodings when testing (if trailing bytes are valid, fixer can no more consider bitstream modified for testing validity as invalid); note that in practice it does not prevent fixers to be aware that the only trailing bytes in the wild are CRC and suggest to user that bitstreams with trailing bytes not being CRC are not the ones he is looking for.


Should I modify this PR (or new PR?) for putting "if( version <= 1 && remaining_bits_in_bitstream( NumBytes ) ) then reserved" instead of CRC so spec does not say that "FFmpeg -c ffv1 -slicecrc 1" creates invalid bitstreams?

@michaelni
Copy link
Member

  • Decoder says that if there are trailing bytes in the bitstream, ignore it.
  • Specification says that there MUST NOT be any trailing bytes.

No, you mix things up here. The specification has to require both that trailing bytes must not be added by an encoder following "this" version as well as must require a decoder to ignore them. Thats required so that we can add fields in a "minor" update without breaking decodability with existing decoders.

If you want to describe in some "non normative" part of the spec all kinds of broken files (there are many more than this) i surely dont mind but i do not agree to add implementation bugs into the normative spec so that we in the future start creating such files and all implementations must by made buggy to be compliant.
Also the crc flag is NOT stored in version 0 and 1.

@JeromeMartinez
Copy link
Contributor Author

No, you mix things up here.

I disagree: I just explained what is the current status of each part (decoder and spec)

You sentences are one of the change we could do, but this is not the only possibility. I suggest a possibility to add CRC feature to v0/v1 spec as it is currently in FFmpeg encoder, as it does not hurt, that's all. I didn't expect that it is a problem, as the decoder ignores such bits and it does not create any complexity (decoders not using the feature just ignore bits as done today, decoders willing to use the feature test if 5 bytes are remaining and set "ec" flag to 1 if it is the case). Just an easy/lazy way to add crc feature to v0/v1.

If you want to describe in some "non normative" part

I don't want to add "non normative" part, I just want to be clear about FFmpeg creating for the last few years invalid streams or not, and/or to add (normative) features when it does not hurt. Here, I don't see what is blocking for that unexpected feature from FFmpeg becomes normative, so I suggested first to make it normative.
Currently, with specs written as they are, ./ffmpeg -f lavfi -i mandelbrot -t 1 -c ffv1 -slicecrc 1 a.mkv creates an invalid stream if we check the spec.

Also the crc flag is NOT stored in version 0 and 1.

You mean "ec" flag? True, but I detect "implicitly" its presence with the patch. please check the patch for that. Not a problem.


Anyway, I created #112 for meeting "The specification has to require both that trailing bytes must not be added by an encoder following "this" version as well as must require a decoder to ignore them" sentence.

@michaelni
Copy link
Member

You are a bit stubborn. ;)

The specification describes what is a valid file and how to decode it.
The FFmpeg ffv1 encoder has a bug that allowed one to store CRCs with v1 files, while the syntax elements needed for CRCs to work properly where only added in later versions.
This practice is not intended and does not result in a valid file.
Thus the specification should not describe it as if it was valid, we do not want encoders to create this half CRC syntax as it does not provide the protection that CRCs normally do. It also is a huge mess, its detection interferes with potential future extensions.

The question about rencode for long term preservation. I dont want to jump to a hasty conclusion here and cause avoidable pain. But it does seem to me that reencoding these files if someone was so unfortunate to actually pick this unlucky combination of options ... yes it does make sense to reencode i think. I dont think it needs to be done immedeatly but maybe as part of some other maintaince or changes thats needed in the future ...

The person wanted CRCs, the code wrote only half the syntax for CRCs, also the person asked for slicecrc but there are also no slices just one "frame". So this might not be what the person wanted. Its just a bug that resulted in half the CRC syntax to be written into the old version. OTOH if this is exactly what the user wanted then he maybe doesnt need to reencode but most of the tools might not be able to work with the CRCs as these are different.

Also adding crc support into v0/v1 is alot of work nothing like this patch. It requires designing, implementing and testing of the detection and correction code, testcases, ...
We cannot "just" add something to the normative spec because we found a buggy file.
The additions have to actually work and make sense
To describe the bug, yes, to allow creation of that extension intentionally by new encoders, no.
I hope i have explained the issue clearer now

@JeromeMartinez
Copy link
Contributor Author

You are a bit stubborn. ;)

I have contradictory requests, like documenting what exists (files in the wild) and not documenting what exists (reject files not expected to be there even if FFmpeg created them). Note that I already accepted the idea to not add CRC in spec, see #112.

I also create a FFV1 checker, and I see during implementation that the reference decoder and the spec are contradictory. A checker says "valid" or "invalid", not possible to say "ignore trailing bytes". So please indicate which one is the correct one (are trailing bytes valid or not valid), not saying which one is correct is not good for a spec (introduce doubts).

Note that the goal of a spec is that a decoder could be implemented without looking at FFmpeg code. Here, a person relying on the spec may reject the files created by current version of "ffmpeg --level 0 -crc 1" even if FFmpeg accepts it, because the spec currently indicates that no bytes should appear after SliceContent in versions 0/1, so facing trailing bytes may indicate that something went wrong (exactly like if CRC is wrong in version 3). FFmpeg decoder code and spec do not match.

Trashing the issue will not make it disappear.

The FFmpeg ffv1 encoder has a bug that allowed one to store CRCs with v1 files

FFmpeg has other bugs, there are 2 exceptions in the spec because the priority was to document FFmpeg bugs, I just do same here so I am surprised that the patch proposal is rejected.

This practice is not intended and does not result in a valid file.

Do you confirm that "ffmpeg --level 0 -crc 1" currently creates invalid files?
If you say "no", spec must be changed, at least with #112 (modified if you don't like it as is).
If you say "yes", fine but I was thinking that you were wanting to avoid to have files created by FFmpeg documented as invalid, that's all.

also the person asked for slicecrc but there are also no slices just one "frame".

It is same with "ffmpeg -level 3 -slicecrc 1 -slices 1", could be considered as "no slice" too, and CRC is there anyway.

Also adding crc support into v0/v1 is alot of work nothing like this patch. It requires designing, implementing and testing of the detection and correction code, testcases, ...

Already done: before sending the patch, I implemented it in my decoder (latest version of MediaConch tests that CRC is present then test it. In previous release, it was flagging the stream as invalid as trailing bytes were detected and it is forbidden by spec; can be changed to whatever in next version of MediaConch, but right now it works pretty well with that, with the millions of combinations from FFmpeg options I tested), and I tested that FFmpeg decoder is not impacted (it ignores the CRC). Note that my patch here (and my code) only accepts 40 bits, rejecting any other trailing bytes size, in order to limit use cases, but we could say "40 bits or more" if we want to keep the possibility to add features in versions 0/1 in the future. I document FFmpeg encoder behavior, nothing else.
Correction code: would be exactly same as with version 3.

The additions have to actually work and make sense

It works, and IMO it makes lot of sense (expanding version 0/1 with CRC feature without breaking old decoders). But I don't use it myself, so I am fine if this idea is rejected. I actually just need to know if trailing bytes are accepted (FFmpeg decoder code) or not (FFV1 current specification).

I hope i have explained the issue clearer now

Sorry, but still confusing for me, please be more explicit: do you confirm that "ffmpeg -level 0 -crc 1" currently creates invalid files or do you consider that such files are valid?
If "valid", we have to update specs, at least for saying that trailing bytes are acceptable in v0/v1 (this is the alternate solution in #112). The question would be about accepting only 40 bits (handling FFmpeg bug) or any (authorizing future updates) (to be debated in the other PR).


I already accepted that the idea to keep current FFmpeg behavior (injecting CRC in version 0/1) is not OK, I don't argue anymore about that, but it does not remove the issue of mismatch between reference decoder and spec, please indicate which one is fine: decoder (so #112, modified or not depending of some tweaking, should be merged) or spec (so #112 is also rejected and we are clear that "ffmpeg -level 0 -crc 1" currently creates invalid files).

@JeromeMartinez
Copy link
Contributor Author

It also is a huge mess, its detection interferes with potential future extensions.

Just for reference: there is no extension possible with current spec (as well with this patch, I didn't change want to change that here).

And I don't think it would be a mess, there are lot of possibilities for handling that nicely, I could do some proposal if you change your mind about this PR (else #112 is enough as it is like current FFmpeg decoder code, people would think about that when they need to add extensions, I added a comment about buggy files for this reason)

@michaelni
Copy link
Member

I have contradictory requests, like documenting what exists (files in the wild) and not documenting what exists (reject files not expected to be there even if FFmpeg created them). Note that I already accepted the idea to not add CRC in spec, see #112.

There is some sort of misunderstanding here.
What i suggest is to document the bug but not to declare it valid syntax.
Thats as if ISO MPEG says, some people set the interlace flags incorrect. Thats not the same as saying the mpeg spec requires people to set the interlace flags wrong.

What i suggest is to document that there exist some files that include the CRC in v0/v1 and that decoders may want to support this but that it is not valid and that encoders must not generate such files.
do you disagree about this ?

ill reply to the rest of your message too seperatly

@michaelni
Copy link
Member

I also create a FFV1 checker, and I see during implementation that the reference decoder and the spec are contradictory. A checker says "valid" or "invalid", not possible to say "ignore trailing bytes". So please indicate which one is the correct one (are trailing bytes valid or not valid), not saying which one is correct is not good for a spec (introduce doubts).

Define what "valid" means
Define where the trailing bytes are that you talk about
I think we already talk about slight different things here

About "valid" if your checker checks that a stream is decodable by a compliant to spec 123 decoder thats different than if it checks if a stream is encoded correctly according to spec 123.
why ?
because a spec 124 can do something that is still decodable by spec 123 but will not be allowed output for a spec 123 encoder.
This is not ffv1 specific it more generally true.
For example a html X web browser will be able to display a page with some html X+1 but html X+1 is not going to pass a html X checker as it will find unknown tags or other extensions

@michaelni
Copy link
Member

Do you confirm that "ffmpeg --level 0 -crc 1" currently creates invalid files?
If you say "no", spec must be changed, at least with #112 (modified if you don't like it as is).
If you say "yes", fine but I was thinking that you were wanting to avoid to have files created by FFmpeg documented as invalid, that's all.

I think it does create invalid files. But if these files are widespread we should support it in the decoder (which it already does) and make sure any implementation of the spec behaves reasonable with these files

@JeromeMartinez
Copy link
Contributor Author

What i suggest is to document that there exist some files that include the CRC in v0/v1 and that decoders may want to support this but that it is not valid and that encoders must not generate such files.
do you disagree about this ?

I disagree, as I think it is not bad to extend v0/v1 this way, so we could have hit two targets with one shot in a smooth manner, but I did not succeed to convince you despite the tests I have done for being sure there is no conflict, so let's focus on how to document the original issue (decoder/spec mismatch).

Define what "valid" means

A decoder conforming to spec can decode a file conform to spec.
A decoder should not consider that what it was tolerated in decoder for spec X can be used in spec X+1 (e.g. if a decoder tolerates trailing bytes despite the spec indicates that no trailing byte is planned, developer of this decoder should not say "hey, my decoder can do that, so let's add that in spec X+1" because other decoders may have implemented a less tolerant decoder).
Note: I am aware that we usually say that the reference decoder has priority other spec, but here we talk about this decoder/spec difference, so it would not be fair for other decoders or checkers to say that the reference decoder has priority if we did not change the spec when we saw the difference.

Define where the trailing bytes are that you talk about

At the end of v0/v1 frame, see current patch proposal, it is indicated there.

For example a html X web browser will be able to display a page with some html X+1 but html X+1 is not going to pass a html X checker as it will find unknown tags or other extensions

Yes, if it is planned. Here, it is not the case (spec does not reserve bits for future extension).
I see a problem in the discussion: you say "its detection interferes with potential future extensions": currently, there is NO way to add extension. Talk and spec are not coherent. Spec says that no extension is authorized, you imagine there could be some extensions. We need to be clear about that (extension possible or not).

I think it does create invalid files. But if these files are widespread we should support it in the decoder (which it already does) and make sure any implementation of the spec behaves reasonable with these files

I am not comfortable with something "invalid but decoder should decode", IMO the spec should just say what is valid (and if not valid, it is invalid). Maybe with a background, but not saying "this is invalid but you should decode them". If a decoder should decode such stream, the stream is expected to be considered valid IMO.


semantic apart, let's be clear about what is authorized or not: which possibility is acceptable for you?

1/ Trailing bytes are acceptable in v0/v1 (for future extension of v0/v1) --> In that case, I already provided the patch: #112 (taking care of "-level 0 -slicecrc 1"), using reference decoder behavior, please merge it so we have coherency between decoder, talks, and spec.

2/ Trailing bytes are NOT acceptable in v0/v1 (except for "-level 0 -slicecrc 1") --> In that case, I adapt this PR for saying that the stream should be decoded ignoring trailing 40 bits if 40 bits are present. We are also clear that "future extension" is not possible (forbidden by spec), even if reference decoder currently authorize that by ignoring trailing bits (because other decoders may be less tolerant to streams not conforming to spec).

@JeromeMartinez JeromeMartinez deleted the crc_v0 branch June 23, 2020 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants