You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 30 Next »

Shared Memory

Mapping Shared Memory

Before playback can be initiated the shared memory region should be mapped, ref Rialto Application Session Management#ApplicationManagement.

Shared Memory Layout

The shared memory buffer shall be split into two regions for each playback session, one for a video stream, and the other for audio (only one concurrent audio track supported, for audio track selection previous audio must be removed first). Each region must be big enough to accommodate the largest possible frame of audio/video data plus associated decryption parameters and metadata. At the end there is also a separate section for webaudio regions (common for all playback sessions). The buffer shall initially be sized to 8Mb per playback session to allow some overhead + 10kB * number of webaudio regions. There can be 0 or more Web Audio regions per Rialto Client.

For apps that can support more than one concurrent playback the shared memory buffer shall be sized accordingly and partitioned into different logical areas for each playback session. The partitions need not necessarily be equally sized, for example if an app supports one UHD and one HD playback the 'HD' partition may be smaller.


Shared memory partitioning for multiple playbacks


Note that the application is not directly aware of the layout of the shared memory region or even its existence, this is all managed internally by Rialto.

Metadata format

The following defines the format of the metadata (i.e. data about each AV frame) stored in the shm buffer. Note that all offsets are relative to the base of the shm region.

Metadata Format Version 1

V1 metadata uses a fixed byte format similar to the AVBus specification but with the following changes:

  • Removed is_timeposition_accurate as always set to true
  • Removed language as not used
  • Removed stream_type as not used
  • Moved decryption info before variable length fields
  • Added Media Keys ID field for multi-CDM instance support


ParameterSize
Shared memory buffer8Mb
Max frames to request[24]
Metadata size per frame~100bytes
Metadata region size~2.4kib
Video frame region size7Mb
Max video frame size7Mb - 2.4kb
Audio frame region size1Mb
Max audio frame size1Mb - 2.4kb
Web Audio region size-


The following diagram shows schematically how the shared memory buffer is partitioned into two regions for a playback session, for audio and video data respectively. Within each region the metadata for each frame is written sequentially from the beginning and the media data is stored in the remainder of the region, the offset & length of each frame being specified in the metadata.

Shared memory buffer layout for AV stream


The metadata regions have the format shown in the diagram below, namely a 4 byte version field followed by concatenated metadata structures for each media frame following the format specified in Metadata format below. The version field is used to specify the version of the metadata structure written to the buffer - this allows different versions of Rialto client (which may write different metadata formats) to interoperate with the same version of Rialto Server. Rialto server will only understand a certain number of metadata format versions, so only compatible client versions should be used. If Rialto Server sees an unsupported version number it should trigger an error and fail streaming.

Metadata format v1


Frame metadata is stored in a fixed format in the shm region as follows, undefined values should be set to 0. 

    uint32_t                        offset;                              /* Offset of first byte of sample in shm buffer */
    uint32_t                        length;                              /* Number of bytes in sample */
    int64_t                         time_position;                       /* Position in stream in nano-seconds */
    int64_t                         sample_duration;                     /* Frame/sample duration in ns */
    uint32_t                        stream_id;                           /* stream id (unique ID for ES, as defined in attachSource()) */
    uint32_t                        extra_data_size;                     /* extraData size */
    uint8_t                         extra_data[32];                      /* buffer containing extradata */

    uint32_t                        media_keys_id;                       /* Identifier of MediaKeys instance to use for decryption. If 0 use any CDM containing the MKS ID */
    uint32_t                        media_key_session_identifier_offset; /* Offset to the location of the MediaKeySessionIdentifier */
    uint32_t                        media_key_session_identifier_length; /* Length of the MediaKeySessionIdentifier */
    uint32_t                        init_vector_offset;                  /* Offset to the location of the initialization vector */
    uint32_t                        init_vector_length;                  /* Length of initialization vector */
    uint32_t                        sub_sample_info_offset;              /* Offset to the location of the sub sample info table */
    uint32_t                        sub_sample_info_len;                 /* Length of sub-sample Info table */
    uint32_t                        init_with_last_15;

    if (IS_AUDIO(stream_id))
    {
        uint32_t                    sample_rate;                         /* Samples per second */
        uint32_t                    channels_num;                        /* Number of channels */
    }
    else if (IS_VIDEO(stream_id))
    { 
        uint32_t                    width;                               /* Video width in pixels */
        uint32_t                    height;                              /* Video height in pixels */
    }

Metadata Format Version 2

ParameterSize
Shared memory buffer8Mb * max_playback_sessions + 10kb * max_web_audio_playback_sessions
Max frames to request[24]
Metadata size per frame

Clear: Variable but <100 bytes

Encrypted: TODO

Video frame region size7Mb
Max video frame sizeTODO
Audio frame region size1Mb
Max audio frame sizeTODO
Web Audio region size10kb


The following diagram shows schematically how the shared memory buffer is partitioned into two regions for a playback session, for audio and video data respectively. Within each region there is a 4 byte versions field indicating v2 metadata followed by concatenated metadata/frame pairs.

Shared memory buffer layout v2


V2 metadata uses protobuf to serialise the frames' properties to the shared memory buffer. This use of protobuf aligns with the IPC protocol but also allows support for optional fields and for fields to be added and removed without causing backward/forward compatibility issues. It also supports variable length fields so the MKS ID, IV & sub-sample information can all be directly encoded in the metadata, avoiding the complexities of interleaving them with the media frames and referencing them with offsets/lengths as used in the V1 metadata format.

enum SegmentAlignment {
  ALIGNMENT_UNDEFINED = 0;
ALIGNMENT_NAL = 1;
 ALIGNMENT_AU
= 2;
}

enum CipherMode {
  CIPHER_MODE_UNDEFINED = 0;
 CIPHER_MODE_CENC     = 1; /* AES-CTR scheme */
 CIPHER_MODE_CBC1     = 2; /* AES-CBC scheme */
 CIPHER_MODE_CENS     = 3; /* AES-CTR subsample pattern encryption scheme */
 CIPHER_MODE_CBCS     = 4; /* AES-CBC subsample pattern encryption scheme */
}

message MediaSegmentMetadata {
    optional uint32                 length               = 1;             /* Number of bytes in sample */
    optional sint64                 time_position        = 2;             /* Position in stream in nanoseconds */
    optional sint64                 sample_duration      = 3;             /* Frame/sample duration in nanoseconds */
    optional uint32                 stream_id            = 4;             /* stream id (unique ID for ES, as defined in attachSource()) */
    optional uint32                 sample_rate          = 5;             /* Samples per second for audio segments */
    optional uint32                 channels_num         = 6;             /* Number of channels for audio segments */
    optional uint32                 width                = 7;             /* Frame width in pixels for video segments */
    optional uint32                 height               = 8;             /* Frame height in pixels for video segments */
    optional SegmentAlignment       segment_alignment    = 9;             /* Segment alignment can be specified for H264/H265, will use NAL if not set */
    optional bytes                  extra_data           = 10;            /* Buffer containing extradata */
    optional bytes                  media_key_session_id = 11;            /* Buffer containing key session ID to use for decryption */
    optional bytes                  key_id               = 12;            /* Buffer containing Key ID to use for decryption */
    optional bytes                  init_vector          = 13;            /* Buffer containing the initialization vector for decryption */
    optional uint32                 init_with_last_15    = 14;            /* initWithLast15 value for decryption */
    optional repeated SubsamplePair sub_sample_info      = 15;            /* If present, use gather/scatter decryption based on this list of clear/encrypted byte lengths. */
                                                                          /* If not present and content is encrypted then entire media segment needs decryption (unless    */
                                                                          /* cipher_mode indicates pattern encryption in which case crypt/skip byte block value specify    */
                                                                          /* the encryption pattern)                                                                       */
    optional bytes                  codec_data           = 16;            /* Buffer containing updated codec data for video segments */
    optional CipherMode             cipher_mode          = 17;            /* Block cipher mode of operation when common encryption used */
    optional uint32                 crypt_byte_block     = 18;            /* Crypt byte block value for CBCS cipher mode pattern */
    optional uint32                 skip_byte_block      = 19;            /* Skip byte block value for CBCS cipher mode pattern */
optional Fraction         frame_rate           = 20;            /* Fractional frame rate of the video segments */
}

message SubsamplePair

{

    optional uint32_t               num_clear_bytes      = 1;             /* How many of next bytes in sequence are clear */
    optional uint32_t               num_encrypted_bytes  = 2;             /* How many of next bytes in sequence are encrypted */

}


Playback Control

Rialto interactions with Client & GStreamer

Start/Resume Playback


Pause Playback


Stop


Set Playback Rate


Cobalt Integration

Play/Pause/Set speed


Render Frame (Video Peek)

Render frame may be called whilst playback is paused either at the start of playback or immediately after a seek operation. The client must wait for the readyToRenderFrame() callback first.

Media data pipeline

Note that the data pipelines for different data sources (e.g. audio & video) should operate entirely independently. Rialto should

  • attempt to keep the shm buffer as full as possible by requesting a refill for that source whenever the source's memory buffer is empty
  • attempt to push all available frames for a source to GStreamer, i.e. push until Gstreamer indicates that it can accept no more data

Cobalt to Gstreamer



Netflix to Rialto Client



Cobalt to Rialto Client


Gstreamer Client to Rialto Server

Note: Due to the common APIs on the client and server the parameters must be used slightly differently depending on whether the app is running in a client process or directly on the Rialto server as shown in the following two diagrams. The shared memory buffer is refilled as follows when running in the client-server mode:



1. Rialto server notifies client that refill is required. sourceId should match that specified in attachSource() call for the A/V data stream. needDataRequestId must be a unique ID for this playback session.


Media data flows as follows when running in server only mode:


The code for populating the shm buffer from the parameters to addSegment() will be common on the client & server side so this should be stored in a common location to be used by both implementations.


See also Rialto Client MSE Player Session Streaming State Machine for some additional clarity on how the Rialto client should manage the flow of data in particular regard to seek operations.

Rialto Server to Gstreamer server

This algorithm should be run for all attached sources. A haveData() call in the above sequence can restart the algorithm when it previously stopped due to data exhaustion.



Frames are decrypted in the pipeline when they are pulled for playback.


Playback State

Position Reporting

The position reporting timer should be started whenever the PLAYING state is entered and stopped whenever the session moves to another playback state, i.e. stop polling during IDLE, BUFFERING, SEEKING etc.



End of stream

Underflow



  • No labels