CN112839242A

CN112839242A - Method for packaging audio/video media file

Info

Publication number: CN112839242A
Application number: CN202011620928.1A
Authority: CN
Inventors: 张雷鸣; 姚亮; 常吕伦
Original assignee: Sichuan Changhong Network Technology Co Ltd
Current assignee: Sichuan Changhong Network Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-25
Anticipated expiration: 2040-12-31
Also published as: CN112839242B

Abstract

The invention relates to the field of multimedia playing, in particular to an audio and video media file encapsulation implementation method, which divides an audio and video ES frame into data packets with fixed bytes and numbers the data packets in sequence, adds the heads of the fixed bytes to form new packets, realizes that each data packet contains necessary audio and video information, and solves the problems that the key information of the heads is lost or damaged due to abnormal power failure of a media file and the audio and video cannot be played due to loss of key description information during network transmission through the files consisting of the packets.

Description

Method for packaging audio/video media file

Technical Field

The invention relates to the field of multimedia playing, in particular to an audio and video media file encapsulation implementation method.

Background

In recent years, a great deal of network cameras have been developed, real-time images and audio records need to be stored in the cameras, and higher requirements are made on the storage of the images and the audio records.

The conventional media packaging formats, such as the existing media formats of MP4, TS, AVI, RA, RM, RMVB, MPEG, DixX, 3GP, FLV, etc., take up too much CPU for some, some cannot quickly locate a time point specified by a user when the user selects review, some cannot meet the requirement of network streaming, and some may cause the destruction of critical information of the head to cause the inability to play the whole file due to abnormal power failure during recording and storage.

Therefore, a media format that can be quickly located to a specific time, supports streaming, can be played even if the head is missing, occupies less resources, and is suitable for embedded use is needed.

Disclosure of Invention

The technical problems solved by the invention are as follows: the method for packaging the audio/video media file solves the problems that the key information of the head is lost or damaged due to abnormal power failure of the media file and the audio/video cannot be played due to the loss of the key description information during network transmission.

The invention adopts the technical scheme for solving the technical problems that: the method for packaging the audio/video media file comprises the following steps:

s01, dividing the audio and video frame into fixed byte data packets, and numbering in sequence;

s02, adding a head with numbered fixed bytes to the data packet to form a packet;

s03, arranging the packets according to the serial number sequence to form a packaging frame;

and S04, composing the packaging frames into a file according to the time sequence of the frames.

Further, in S01, when the audio/video ES frame data is less than the fixed byte, the insufficient portion is filled with 0 to form a data packet; when the audio/video ES frame exceeds a fixed byte, the audio/video ES frame is divided into a plurality of data packets, and the insufficient part of the last data packet is filled with 0 to form the last data packet.

Further, in S02, the header of the fixed byte includes a sync word, and the sync word is used to detect the start and end of the determination packet.

Further, in S02, the fixed-length header includes re-framing information, and the re-framing information is used to restore the split audio/video ES frame.

Further, in S02, the fixed-length header includes time, encryption flag, reserve _1, reserve _2, payload type, coding type, audio sampling rate, audio channel, number of audio sample bits, current packet valid data length, key frame flag, number of key frame offset packets, and total number of packets in the frame where the data of the current packet is located.

The invention has the beneficial effects that: the audio and video media file packaging implementation method of the invention divides the audio and video ES frame into data packets with fixed bytes and numbers the data packets in sequence, adds the head of the fixed byte to form a new packet, realizes that each data packet contains necessary audio and video information, and solves the problems that the key information of the head is lost or damaged due to abnormal power failure of the media file and the audio and video cannot be played due to loss of key description information during network transmission through the file composed of the packets.

Drawings

Fig. 1 is a flowchart of an embodiment of a method for packaging an audio/video media file according to the present invention.

Fig. 2 is a schematic diagram of a packaged media file according to an embodiment of the method for packaging audio/video media files of the present invention.

Detailed Description

The invention provides a method for packaging audio/video media files, which divides an audio/video ES frame into data packets with fixed bytes and numbers the data packets in sequence, adds the heads of the fixed bytes to form new packets, realizes that each data packet contains necessary audio/video information, and solves the problems that the audio/video cannot be played due to the loss or damage of head key information caused by abnormal power failure of a media file and the loss of key description information during network transmission through the files formed by the packets, and comprises the following steps:

Example (b):

an embodiment of the present invention is provided below, in which a data packet is fixed with 320 bytes and a packet header is fixed with 16 bytes, as shown in fig. 1:

the first step is as follows: splitting audio and video frames into 320-byte data packets, and numbering the data packets in sequence; specifically, the number of data bytes of the frame is divided by 320, if the remainder is 0, the quotient is the total number of packets of the frame, otherwise, the quotient is added by 1 to be the total number of packets of the frame, the remainder is the effective data length of the last packet of the frame, and the packets are numbered in sequence, wherein the numbering starts from 0.

Secondly, the following steps: adding a head with numbered fixed bytes to a data packet to form a packet; specifically, the packet contains sync _ byte, version, encryption _ type, sync _ byte, reserve _1, payload _ type, audio _ type, video _ type, smaple _ rate, channels, bits, pack _ length, reserve _2, key _ frame _ flag, last _ key _ frame _ offset, current _ pack _ number, pack _ count, and time _ stamp;

sync _ byte is an 8-bit sync word 0x55 for detecting the start of a judgment packet;

version is 3 bits, the initial Version is 1, and the Version is increased after the format field is expanded;

encryption _ type is a 2-bit encryption flag, wherein 0 represents no encryption and 1 represents an encryption mode 1;

reserve _1 is a 3-bit reserved bit;

payload _ type is 1 bit, representing the payload type, where 0 represents audio and 1 represents video;

audio _ type is 3 bits, indicating the audio coding type, where 0 denotes G711A, 1 denotes AAC;

video _ type is 4 bits and represents a video coding type, wherein 0 represents h264 and 1 represents h 265;

a smaple _ rate of 3 bits, representing the audio sampling rate, in bps where 0 refers to 8000, 1 refers to 16000, 2 refers to 32000, 3 refers to 48000, 4 refers to 64000, and 5 refers to 96000;

channels represents audio Channels for 3 bits, where 0 represents mono, 1 represents stereo, and 2 represents 5.1 Channels;

bits is 2 Bits, which represents the number of Bits of the audio sample, where 0 represents 8 Bits, 1 represents 16 Bits, and 2 represents 24 Bits;

pack _ length is 12 bits, and represents the valid data length of the current packet, i.e. the remainder in step 1;

reserv _2 is a 3-bit reserved bit;

key _ frame _ flag is 1 bit, indicating a key frame flag, where 0 indicates a non-key frame and 1 indicates a key frame;

last _ key _ frame _ offset is 16 bits, representing the number of last key frame offset packets;

current _ pack _ number is 16 bits and indicates the number of the current packet in the frame, i.e. the number in step 1;

pack _ count is 16 bits, and represents the total number of packets in the frame where the current packet load is located, i.e. the quotient in step 1 or the quotient plus 1;

time _ stamp is 32 bits, representing time in milliseconds;

the file head information is not updated after one frame of data is written in when the real-time video is stored, the file head cannot be damaged to influence file playing due to abnormal power failure, and the file cannot be played due to loss of key description information in network transmission.

The third step is: arranging the packets according to the serial number sequence to form a packaging frame; the encapsulation frame corresponds to an audio-video frame, which may correspond to an encapsulation frame containing only one packet or a plurality of packets.

Fourthly, the packaging frames are combined into a file according to the corresponding frame sequence, and the file schematic diagram is shown as the attached figure 2.

According to the embodiment, the position of the key frame can be directly jumped to through the offset information of the nearest key frame in the header information of the packet, and the specified time is not required to be located by searching one by one, so that the operation is simple and quick, the video key frame can be quickly located, and the occupation of CPU and IO read-write resources is reduced.

Claims

1. The method for packaging the audio/video media file is characterized by comprising the following steps of:

2. The method for implementing audio/video media file encapsulation according to claim 1, wherein in S01, when the audio/video ES frame data is less than the fixed byte, the insufficient portion is filled with 0 to form a data packet; when the audio/video ES frame exceeds a fixed byte, the audio/video ES frame is divided into a plurality of data packets, and the insufficient part of the last data packet is filled with 0 to form the last data packet.

3. The method for implementing encapsulation of an audio-video media file according to claim 1, wherein in S02, the header of the fixed byte includes a sync word, and the sync word is used to detect the start and end of the determination packet.

4. The method for implementing audio/video media file encapsulation according to any one of claims 1 to 3, wherein in S02, the header with a fixed length includes re-framing information, and the re-framing information is used to restore the split audio/video ES frames.

5. The method for implementing audio-video media file encapsulation according to any one of claims 1 to 3, wherein in S02, the fixed-length header includes time, encryption identifier, reserve _1, reserve _2, load type, coding type, audio sampling rate, audio channel, audio sample bit number, current packet valid data length, key frame identifier, key frame offset packet number, and total packet number of the frame where the data of the current packet is located.