This document specifies the storage format for AV1 bitstreams in Matroska video tracks. Every time Matroska is mentioned it applies equally to WebM.
Elements in this document inside square brackets [] refer to elements as defined in the AV1 Specifications.
A Matroska element to store a Frame. Can also be a SimpleBlock
when not inside a BlockGroup
. Using Block
in this document will mean both forms of Block.
The name used to describe a codec in Matroska.
A Coded Video Sequence is a sequence of Temporal Units
where the contents of [sequence_header_obu] must be bit-identical for all the Sequence Header OBUs
found in the sequence before Matroska encapsulation except for the contents of [operating_parameters_info]. A Sequence Header OBU
made of all the identical bits in the CVS is referred to a the CVS Sequence Header OBU
.
Extra data store in Matroska and passed to the decoder before decoding starts. It can also be used to store the profiles and other data to better identify the codec.
Open Bitstream Unit is the basic unit of data in AV1. It contains a header and a payload.
The top Matroska element that contains interleaved audio, video, subtitles as well as track descriptions, chapters, tags, etc. Usually a Matroska file is made of one Segment.
All the OBUs that are associated with a time instant. It consists of a Temporal Delimiter OBU
, and all the OBUs that follow, up to but not including the next Temporal Delimiter OBU
. It MAY contain multiple frames but only one is presented.
EBML Path: \Segment\Tracks\TrackEntry\CodecID
| Mandatory: Yes
The CodecID
MUST be the ASCII string V_AV1
.
EBML Path: \Segment\Tracks\TrackEntry\CodecPrivate
| Mandatory: Yes
The CodecPrivate
consists of 4 octets similar to the first 4 octets of the ISOBMFF AV1CodecConfigurationRecord
. Most of the values in this bitfield come from the CVS Sequence Header OBU
. The bits are spread as follows, with the most significant bit first:
unsigned int (1) marker always 1
unsigned int (7) version currently 1
unsigned int (3) seq_profile
unsigned int (5) seq_level_idx_0
unsigned int (1) seq_tier_0
unsigned int (1) high_bitdepth
unsigned int (1) twelve_bit
unsigned int (1) monochrome
unsigned int (1) chroma_subsampling_x
unsigned int (1) chroma_subsampling_y
unsigned int (2) chroma_sample_position
unsigned int (3) reserved currently 0
unsigned int (1) initial_presentation_delay_present
unsigned int (4) initial_presentation_delay_minus_one
seq_profile
corresponds to the [seq_profile] in theCVS Sequence Header OBU
.seq_level_idx_0
corresponds to the [seq_level_idx[0]] in theCVS Sequence Header OBU
.high_bitdepth
corresponds to the [high_bitdepth] in theCVS Sequence Header OBU
.seq_tier_0
corresponds to the [seq_tier[0]] in theCVS Sequence Header OBU
.twelve_bit
corresponds to the [twelve_bit] in theCVS Sequence Header OBU
, or 0 if not present.monochrome
corresponds to the [mono_chrome] in theCVS Sequence Header OBU
.chroma_subsampling_x
corresponds to the [subsampling_x] in theCVS Sequence Header OBU
.chroma_subsampling_y
corresponds to the [subsampling_y] in theCVS Sequence Header OBU
.chroma_sample_position
corresponds to the [chroma_sample_position] in theCVS Sequence Header OBU
, or 0 (CSP_UNKNOWN) if not defined.
The initial_presentation_delay_minus_one
field indicates the number of frames (minus one) that need to be decoded prior to starting the presentation of the first frame so that that each frame will be decoded prior to its presentation time under the constraints indicated by seq_level_idx_0
in the CodecPrivate
. More precisely, the following procedure MUST NOT return any error:
- construct a hypothetical bitstream consisting of the OBUs carried in the frame followed by the OBUs carried in all the frames referring to that frame,
- for each
Sequence Header OBU
set [initial_display_delay_minus_1[0]] to the number of frames, minus one, contained in the first (initial_presentation_delay_minus_one
+ 1)Blocks
, including the non presentable frames, - set the [frame_presentation_time] field of the [frame_header_obu] of each presentable frame such that it matches the presentation time difference between the frame carrying this frame and the previous frame (if it exists, 0 otherwise),
- apply the decoder model specified in AV1 to this hypothetical bitstream using the first operating point. If [buffer_removal_time] information is present in bitstream for this operating point, the decoding schedule mode MUST be applied, otherwise the resource availability mode MUST be applied.
If a muxer cannot verify the above procedure, initial_presentation_delay_present
SHOULD be set to 0.
[initial_display_delay_minus_1[0]] and initial_presentation_delay_minus_one
are very similar. The former deals with all frames in the bitstream, even non-visible ones, whereas the latter only deals with visible frames found in the Blocks
. The non-visible frames are also in the Blocks
but not known by the container level.
If initial_presentation_delay_present
is 0, then all bits of initial_presentation_delay_minus_one
SHOULD be 0 and MUST be discarded.
This structure MAY be followed by OBUs that are valid for the whole CVS. Only OBUs of type OBU_SEQUENCE_HEADER
and OBU_METADATA
are allowed in the CodecPrivate
. If present, the OBU of type OBU_SEQUENCE_HEADER
, the CVS Sequence Header OBU
, MUST be the only one of type OBU_SEQUENCE_HEADER
and the first OBU after the structure.
OBUs in the CodecPrivate
MUST have the [obu_has_size_field] set to 1, indicating that the size of the OBU payload follows the header, and that it is coded using [LEB128], except for the last OBU in the CodecPrivate
, for which [obu_has_size_field] MAY be set to 0, in which case it is assumed to fill the remainder of the CodecPrivate
.
The [timing_info_present_flag] of the Sequence Header OBU
SHOULD be 0. Even when it is 1 the presentation time of the Frame Header OBUs
in Blocks
should be discarded. In other words, only the timestamps given by the Matroska container MUST be used.
EBML Path: \Segment\Tracks\TrackEntry\Video\PixelWidth
| Mandatory: Yes
The PixelWidth
MUST be [max_frame_width_minus_1]+1.
EBML Path: \Segment\Tracks\TrackEntry\Video\PixelHeight
| Mandatory: Yes
The PixelHeight
MUST be [max_frame_height_minus_1]+1.
Each Block
contains one Temporal Unit
containing one or more OBUs. Each OBU stored in the Block MUST contain its header and its payload.
The OBUs in the Block
follow the [Low Overhead Bitstream Format syntax]. They MUST have the [obu_has_size_field] set to 1 except for the last OBU in the frame, for which [obu_has_size_field] MAY be set to 0, in which case it is assumed to fill the remainder of the frame.
The order of OBUs should follow the order defined in the section 7.5 of the AV1 Specifications.
There MUST be at least one Frame Header OBU
per Block
.
OBU trailing bits SHOULD be limited to octet alignment and SHOULD NOT be used for padding.
OBUs of type OBU_TEMPORAL_DELIMITER
, OBU_REDUNDANT_FRAME_HEADER
and OBU_PADDING
SHOULD NOT be used.
OBUs of type OBU_TILE_LIST
MUST NOT be used.
A SimpleBlock
MUST NOT be marked as a Keyframe if it doesn't contain a Frame OBU
. A SimpleBlock
MUST NOT be marked as a Keyframe if the first Frame OBU
doesn't have a [frame_type] of KEY_FRAME
. A SimpleBlock
MUST NOT be marked as a Keyframe if it doesn't contains a Sequence Header OBU
.
A Block
inside a BlockGroup
MUST use ReferenceBlock
elements if the first Frame OBU
in the Block
has a [frame_type] other than KEY_FRAME
. A Block
inside a BlockGroup
MUST use ReferenceBlock
elements if the Block
doesn't contain a Sequence Header OBU
.
A Block
with [frame_header_obu] where the [frame_type] is INTRA_ONLY_FRAME
MUST use a ReferenceBlock
with a value of 0 to reference itself. This way it cannot be mistaken for a Random Access Point.
ReferenceBlocks
inside a BlockGroup
MUST reference frames in other previous Blocks
according to the [ref_frame_idx] values of the frame which is neither a KEYFRAME
nor an INTRA_ONLY_FRAME
.
Note: SimpleBlock
and BlockGroup
can be used for each type of frame. SimpleBlock
is usually preferred if features of the BlockGroup
(BlockDuration
, BlockAdditions
, etc) are not needed.
The [temporal_point_info] contained in Frame OBUs
or Frame Header OBUs
SHOULD be discarded.
The Block
timestamp is translated from the [PresentationTime] without the [InitialPresentationDelay].
When reconstructing the AV1 bitstream from a Block
a Temporal Delimiter OBU
should be prepended to the Block
data.
Matroska restricts the allowed changes within a codec for the whole Segment
. Each output frames of a Segment
MUST have the same pixel dimensions (PixelWidth
and PixelHeight
).
An AV1 Track
has the same requirements as the CVS
: the contents of [sequence_header_obu] must be bit-identical for all the Sequence Header OBUs
found in the Blocks
except for the contents of [operating_parameters_info] which can vary.
Matroska uses CuePoints
for seeking. Each Block
can be referenced in the Cues
but in practice it's better to only seek to proper Random Access Points of the codec. It means only SimpleBlock
marked as Keyframe and BlockGroup
with no ReferenceBlock
SHOULD be referenced in the Cues
.
The Encryption scheme is similar to the one used for WebM, using the ContentEncryption
elements and extra ContentEncAESSettings
and AESSettingsCipherMode
elements. Only the Subsample Encrypted Block Format mode SHOULD be used when encryption is needed. It is similar to the Common Encryption subsample pattern encryption scheme cens
, using partial encryption of subsamples with AES-CTR cipher mode. More details on the WebM encryption system can be found at https://www.webmproject.org/docs/webm-encryption/.
Protected Blocks
MUST be exactly spanned by one or more contiguous partitions.
-
An OBU MAY be spanned by one or more partitions, especially when it has multiple ranges of protected data. However writers SHOULD reduce the number of partitions as possible. This can be achieved by using a partition that spans multiple consecutive unprotected OBUs as well as the first unprotected part of the following protected OBU, if such protected OBU exists.
-
A large subsample that is larger than the maximum size of a single partition (stored on 32 bits integer) MAY be spanned over multiple partitions separated by a zero-size partition, since the partitions alternate protected/unprotected partitions.
Within a protected Block
, the following constraints apply to all the OBUs it contains:
-
All [obu_header] structures and associated [obu_size] fields MUST NOT be encrypted.
-
OBUs of type
OBU_TEMPORAL_DELIMITER
,OBU_SEQUENCE_HEADER
,OBU_FRAME_HEADER
(including within anOBU_FRAME
),OBU_REDUNDANT_FRAME_HEADER
andOBU_PADDING
MUST NOT be encrypted. -
OBUs of type
OBU_METADATA
MAY be encrypted. -
OBUs of type
OBU_FRAME
andOBU_TILE_GROUP
are partially encrypted. Within such OBUs, the following applies:-
Encrypted partitions MUST be a multiple of 16 bytes.
-
An encrypted partition MUST be created for each tile whose [decode_tile] structure size (including any trailing bits) is larger or equal to 16 bytes. Smaller [decode_tile] structures MUST NOT be encrypted.
-
Encrypted partitions MUST end on the last byte of the [decode_tile] structure (including any trailing bits).
-
Encrypted partitions MUST span all complete 16-byte blocks of the [decode_tile] structure (including any trailing bits).
-
Bytes at the beginning of the [decode_tile] that do not fit in the 16-byte encrypted partitions SHOULD be added to the preceding unprotected partition. As a result the Encrypted partitions might not start at the first byte of the [decode_tile] structure, but some number of bytes following that.
-
The elements described in the main TrackEntry
section are vital for correct playback. Here we present a list of elements found in a TrackEntry
that SHOULD also be mapped when possible.
The following TrackEntry
values SHOULD be extracted from the CVS Sequence Header OBU
, ie the bits common to all Sequence Header OBU
in the CVS.
EBML Path: \Segment\Tracks\TrackEntry\DefaultDuration
| Mandatory: No
The DefaultDuration
MAY be used if [timing_info_present_flag] and [equal_picture_interval] are set to 1.
EBML Path: \Segment\Tracks\TrackEntry\Video\DisplayWidth
| Mandatory: No
If custom aspect ratio, crop values are not needed and the DisplayUnit
is in pixels, the DisplayWidth
SHOULD be [render_width_minus_1]+1 if [render_and_frame_size_different] is 1 and [max_frame_width_minus_1]+1 otherwise.
Note: in Matroska the DisplayWidth
doesn't have to be written if it's the same value as the PixelWidth
EBML Path: \Segment\Tracks\TrackEntry\Video\DisplayHeight
| Mandatory: No
If custom aspect ratio, crop values are not needed and the DisplayUnit
is in pixels, the DisplayHeight
SHOULD be [render_height_minus_1]+1 if [render_and_frame_size_different] is 1 and [max_frame_height_minus_1]+1 otherwise.
Note: in Matroska the DisplayHeight
doesn't have to be written if it's the same value as the PixelHeight
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\Range
| Mandatory: No
The Range
corresponds to the [color_range].
- 0 (Studio) in AV1 corresponds to 1 in Matroska
- 1 (Full) in AV1 corresponds to 2 in Matroska
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\BitsPerChannel
| Mandatory: No
The BitsPerChannel
corresponds to the [BitDepth].
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MatrixCoefficients
| Mandatory: No
The MatrixCoefficients
corresponds to the [matrix_coefficients]. Some values might not map correctly to values found in Matroska.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\ChromaSitingHorz
| Mandatory: No
ChromaSitingHorz
is deduced from [chroma_sample_position]:
- 0 ([CSP_UNKNOWN]) in AV1 corresponds to 0 in Matroska
- 1 ([CSP_VERTICAL]) in AV1 corresponds to 1 in Matroska
- 2 ([CSP_COLOCATED]) in AV1 corresponds to 1 in Matroska
- 3 ([CSP_RESERVED]) in AV1 MUST NOT write a
ChromaSitingHorz
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\ChromaSitingVert
| Mandatory: No
ChromaSitingVert
is deduced from [chroma_sample_position]:
- 0 ([CSP_UNKNOW]) in AV1 corresponds to 0 in Matroska
- 1 ([CSP_VERTICAL]) in AV1 corresponds to 2 in Matroska
- 2 ([CSP_COLOCATED]) in AV1 corresponds to 1 in Matroska
- 3 ([CSP_RESERVED]) in AV1 MUST NOT write a
ChromaSitingVert
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\TransferCharacteristics
| Mandatory: No
The TransferCharacteristics
corresponds to the [transfer_characteristics].
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\Primaries
| Mandatory: No
The Primaries
corresponds to the [color_primaries]. Some values might not map correctly to values found in Matroska.
The following TrackEntry
values SHOULD be extracted from the Metadata OBUs
. They SHOULD NOT be set if the values vary across the entire CVS.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MaxCLL
| Mandatory: No
The MaxCLL
corresponds to [max_cll] of the Metadata OBU of type METADATA_TYPE_HDR_CLL.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MaxFALL
| Mandatory: No
The MaxFALL
corresponds to [max_fall] of the Metadata OBU of type METADATA_TYPE_HDR_CLL.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\PrimaryRChromaticityX
| Mandatory: No
The PrimaryRChromaticityX
corresponds to [primary_chromaticity_x[0]] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\PrimaryRChromaticityY
| Mandatory: No
The PrimaryRChromaticityX
corresponds to [primary_chromaticity_y[0]] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\PrimaryGChromaticityX
| Mandatory: No
The PrimaryRChromaticityX
corresponds to [primary_chromaticity_x[1]] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\PrimaryGChromaticityY
| Mandatory: No
The PrimaryRChromaticityX
corresponds to [primary_chromaticity_y[1]] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\PrimaryBChromaticityX
| Mandatory: No
The PrimaryRChromaticityX
corresponds to [primary_chromaticity_x[2]] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\PrimaryBChromaticityY
| Mandatory: No
The PrimaryRChromaticityX
corresponds to [primary_chromaticity_y[2]] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\WhitePointChromaticityX
| Mandatory: No
The WhitePointChromaticityX
corresponds to [white_point_chromaticity_x] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\WhitePointChromaticityY
| Mandatory: No
The WhitePointChromaticityY
corresponds to [white_point_chromaticity_y] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\LuminanceMin
| Mandatory: No
The LuminanceMin
corresponds to [luminance_min] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
EBML Path: \Segment\Tracks\TrackEntry\Video\Colour\MasteringMetadata\LuminanceMax
| Mandatory: No
The LuminanceMin
corresponds to [luminance_max] of the Metadata OBU of type METADATA_TYPE_HDR_MDCV.
Official PDF: https://aomediacodec.github.io/av1-spec/av1-spec.pdf
IETF draft: https://tools.ietf.org/html/draft-ietf-cellar-matroska
Original Specifications: https://www.matroska.org/technical/elements.html
AV1 Codec ISO Media File Format Binding: https://aomediacodec.github.io/av1-isobmff/
Official Specification based on the Matroska specifications: https://www.webmproject.org/docs/container/
The WebM encryption documentation: https://www.webmproject.org/docs/webm-encryption/
This is version 1 of this document.