CN113261283A - Video processing method, device and computer readable storage medium - Google Patents

Video processing method, device and computer readable storage medium Download PDF

Info

Publication number
CN113261283A
CN113261283A CN201980086119.6A CN201980086119A CN113261283A CN 113261283 A CN113261283 A CN 113261283A CN 201980086119 A CN201980086119 A CN 201980086119A CN 113261283 A CN113261283 A CN 113261283A
Authority
CN
China
Prior art keywords
frame
gop
video
playing
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201980086119.6A
Other languages
Chinese (zh)
Other versions
CN113261283B (en
Inventor
杨胜凯
刘俊
杨海涛
陈绍林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN113261283A publication Critical patent/CN113261283A/en
Application granted granted Critical
Publication of CN113261283B publication Critical patent/CN113261283B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video processing technology can be used for scenes such as playing or downloading GOPs. This scheme includes: one or more additional I-frames are inserted in the GOP, and the newly inserted I-frame is closer to the VI-frame than the original I-frame in the GOP. In this case, when the content in the GOP needs to be played or downloaded, the VI frame in the GOP is found according to the requested time, and the newly inserted I frame is used as the reference frame of the VI frame for video decoding, instead of the original I frame in the GOP as the reference value. Thereby improving video processing efficiency and playing time accuracy.

Description

Video processing method, device and computer readable storage medium Technical Field
The present invention relates to the field of internet technologies, and in particular, to a video processing method and apparatus, and a computer storage medium.
Background
With the rapid development of computer and network communication technologies, people increasingly have strong demands for obtaining multimedia information. In recent years, applications related to video cover various fields such as video conferencing, video surveillance, and mobile television. In these fields, to save network transmission resources, video is usually transmitted in a compressed manner. In the existing video transmission process, a group of pictures (GOP) structure is mostly adopted, and one GOP is a group of continuous picture pictures (i.e. frame pictures, which are simply referred to as frames).
Correspondingly, after the computing equipment receives the video, the video needs to be decoded and played. For example, when the computing device receives a request for playing a dragged progress bar for a video, in response to the request for playing, a plurality of GOPs constituting the video are acquired from a dragged stop position, and each GOP is decoded and played. Specifically, if the target frame pointed by the dragging stop position is a non-I frame, the computing device needs to search for an I frame in a plurality of frames before or after the target frame, so as to decode and play from the I frame. When the distance between the I frame and the target frame is large, the video processing efficiency is reduced to a certain extent, and the viewing experience of the user is affected.
If the frames before or after the target frame do not contain I frames, the GOP cannot be decoded and played. The computing device may discard the GOP in which the target frame is located, and proceed to decode and play the next GOP. This may result in some important video information being discarded, which may affect the user viewing experience.
Disclosure of Invention
The embodiment of the invention discloses a video processing method, a video processing device and a computer readable storage medium, which can solve the problems of reducing video processing efficiency, losing important video information and the like in the existing scheme.
In a first aspect, an embodiment of the present invention discloses a video processing method applied in a computing device, where the method includes: acquiring a group of pictures (GOP) in a video, wherein the first frame of the GOP is a first I frame, the GOP comprises M frames, and M is a positive integer. It is determined whether a virtual intra-coded VI frame is included in the M frames, and when a VI frame is included in the M frames, a second I frame is inserted before the VI frame. Wherein, the second I frame is a frame which is referred to by the VI frame in video decoding.
By implementing the embodiment of the invention, the second I frame is inserted before the VI frame, so that the video can be conveniently decoded and played from the second I frame. The problems that video processing efficiency is reduced, important video information is lost, storage resources of computing equipment are wasted and the like in the prior art can be solved, and therefore video processing efficiency is improved.
With reference to the first aspect, in some possible embodiments, the computing device determines, in response to the video play request, that a start time of the video in the video play request is located after a second I frame in the GOP. And then decodes and plays the video starting from the I-th frame.
By implementing the step, after the second I frame is inserted into the VI frame, the video can be decoded and played from the second I frame in the video playing scene. Compared with the prior art that decoding is started from the first I frame of the GOP, the video decoding time can be saved, and the video processing efficiency can be improved.
With reference to the first aspect, in some possible embodiments, the second I frame is a previous frame of the VI frame.
With reference to the first aspect, in some possible embodiments, the GOP further includes index information of the GOP, and the storage address of the second I frame is recorded in the index information. Before inserting the second I-frame before the VI-frame, the computing device may retrieve the second I-frame from a memory address of the second I-frame based on the index information of the GOP.
With reference to the first aspect, in some possible embodiments, after a second I frame is inserted before a VI frame, index information of the VI frame is inserted in a VI frame after the second I frame. The index information of the VI frame is used to point to the second I frame. The computing device may obtain, from the index information of the VI frame, a second I frame to which the index information points.
By implementing this step, the computing device can obtain the second I frame to be inserted according to the index information of the GOP or the index information of the VI frame. Facilitating the subsequent insertion of a second I frame before the VI frame. Thereby enabling faster decoding of the video.
In combination with the first aspect, in some possible embodiments, the second I frame is used only for decoding VI frames and is not used for output display.
In some possible embodiments, in combination with the first aspect, the GOP includes at least one network abstraction layer unit NALU, and the computing device identifies whether supplemental enhancement information SEI NALU is included in the GOP to determine whether VI frames are included in the M frames. Wherein, the SEI NALU is used for indicating that a frame where the ith NALU before the SEI NALU is located is a VI frame, or indicating that a frame where the jth NALU after the SEI NALU is located is a VI frame.
By implementing the step, the computing device can identify the VI frames in the GOP by identifying the SEI NALU in the GOP, so that the convenience and the efficiency of VI frame identification can be improved.
With reference to the first aspect, in some possible embodiments, the GOP includes reference frame, RPS, information for the frame. The computing device determines whether the VI frame is included in the M frames by identifying RPS information for each frame in the GOP. When the RPS information of a frame is used to indicate that an I frame is referred to when decoding the frame, and a previous frame of the frame is a non-I frame, the frame is a VI frame.
By implementing this step, the computing device directly identifies the reference frame RPS information in the frame to determine whether the frame is a VI frame. Therefore, the accuracy of VI frame identification can be improved.
With reference to the first aspect, in some possible embodiments, a computing device receives a video processing request, where the video processing request carries a start time of a video, and the video includes at least one GOP. In response to the video processing request, a GOP corresponding to the start time is acquired from the GOP index table. The GOP index table records at least one mapping relation, the mapping relation is that each GOP corresponds to the index information of the GOP, and the index information of the GOP comprises the starting time of the GOP.
With reference to the first aspect, in some possible embodiments, the video processing request includes a video play request or a video download request. When the video processing request is a video playing request, the computing device may respond to the video playing request and obtain the GOP with the start time from the GOP index table. On the contrary, when the video processing request is a video downloading request, at least one GOP starting from the GOP where the starting time is located is acquired from the GOP index table in response to the video downloading request.
By implementing the step, the computing device can acquire the corresponding GOPs in the video according to different application scenes. To process the GOP. Therefore, the method is beneficial to acquiring the corresponding GOP according to the actual requirement of the equipment to carry out video processing.
With reference to the first aspect, in some possible embodiments, the index information of the GOP further includes a playing time of the frame. When the video processing request is a video playing request, the VI frame is a VI frame with the smallest difference between the playing time in the GOP and the start time of the GOP.
By implementing the step, in a video playing scene, the computing equipment can search the VI frame closest to the playing time to insert the second I frame, so that I frame insertion processing of each VI frame in a GOP is avoided, equipment resources are saved, and video processing efficiency is improved.
In a second aspect, embodiments of the present invention provide a video processing apparatus comprising functional modules or units for performing the methods as described in the first aspect or any possible implementation manner of the first aspect.
In a third aspect, an embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory to store instructions; a processor for invoking instructions in a memory for performing the method described in the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided for executing the instructions of the method described in the first aspect above.
In a fifth aspect, a computer program product is provided which, when run on a computer, causes the computer to perform the instructions of the method described in the first aspect above.
In a sixth aspect, there is provided a chip product for carrying out the method of the first aspect or any possible embodiment of the first aspect.
The invention can be further combined to provide more implementation modes on the basis of the implementation modes provided by the aspects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic structural diagram of a GOP according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a NALU according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an SEI NALU according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a video processing system according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a video decoding unit according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of another GOP structure according to the embodiment of the present invention.
Fig. 7 is a schematic structural diagram of an SEI NALU insertion GOP according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of another SEI NALU insertion GOP structure according to the embodiment of the present invention.
Fig. 9 is a schematic structural diagram of a video read/write unit according to an embodiment of the present invention.
Fig. 10 is a schematic diagram of another GOP structure according to the embodiment of the present invention.
Fig. 11 is a schematic diagram of a user dragging a video playing progress bar according to an embodiment of the present invention.
Fig. 12 is a schematic diagram of another GOP structure according to the embodiment of the present invention.
Fig. 13A is a schematic diagram of storing GOPs in a time-indexed manner according to an embodiment of the present invention.
Fig. 13B is a schematic diagram of storing a GOP using frame number index mode according to an embodiment of the present invention.
Fig. 14 is a flowchart illustrating a video processing method according to an embodiment of the present invention.
Fig. 15 is a schematic diagram of a GOP composing a video according to an embodiment of the present invention.
Fig. 16 is a schematic diagram illustrating an operation of a user downloading a video offline according to an embodiment of the present invention.
Fig. 17 is a diagram illustrating a new GOP structure according to an embodiment of the present invention.
Fig. 18 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention.
Fig. 19 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail below with reference to the accompanying drawings of the present invention.
First, some technical terms or technical concepts to which the present invention is applicable are introduced.
GOPs, also called groups of pictures. Refers to a set of consecutive image pictures (also called frames), in particular a set of images between two I-frames. The GOP indicates the distance between two I frames.
An I frame, also called an intra-coded frame, is an independent frame with all information, and can be independently decoded without referring to other frames. The first frame in a video is typically an I-frame.
non-I frames refer to frames other than I frames, and specifically include B frames or P frames.
B-frames, also called bi-directional predictive coded frames. B-frames record the difference between the current frame and previous and next frames. That is, when decoding a B frame, it is necessary to refer to the previous frame and the next frame of the B frame for decoding. The frame before the B frame is a frame which is before the B frame and is adjacent to the B frame; the frame next to the B frame refers to a frame that follows the B frame and is adjacent to the B frame.
P frames, also called inter-frame predictive coded frames. The P-frame records the difference between the current frame and the previous frame. That is, when decoding a P frame, it is necessary to refer to a previous frame (specifically, a P frame or an I frame) of the P frame for decoding.
VI frame (VI), also called virtual I frame. VI frames are also P frames in nature, but when decoding VI frames, reference is made to I frames preceding the VI frames. Please refer to fig. 1, which illustrates a structure diagram of a GOP. As shown in fig. 1, the GOP includes 3 VI frames, and each VI frame is decoded by referring to only the I frame that appears before the VI frame in the GOP, as shown by the arrow in the figure.
A Network Abstraction Layer Unit (NALU) is a basic unit of video compression. In video coding, each frame consists of at least one NALU. Fig. 2 is a schematic structural diagram of a NALU according to an embodiment of the present invention. As in fig. 2, NALU includes NAL Header (NAL Header) and NAL Body (NAL Body). In the h.264 video coding standard, the length of one NAL Header is fixed to 1 byte, i.e., 8 bits (bit). The NAL Header includes three fields, which are: a disable bit field forbidden _ zero _ bit, an important indication field nal _ ref _ idc and a type field nal _ unit _ type. Wherein the content of the first and second substances,
the forbidden _ zero _ bit takes 1bit, and the forbidden _ zero _ bit field needs to be 0 as specified in video coding standards (e.g., h.264). If the network finds a NALU in error, then forbidden _ zero _ bit may be set to 1, facilitating the receiving party to correct the error or discard the NALU.
nal _ ref _ idc takes 2 bits to indicate the importance of the NALU. The value range of nal _ ref _ idc is 00-11. When the value of nal _ ref _ idc is larger, the current NALU is more important, and priority protection is needed.
nal _ unit _ type occupies 5 bits for indicating the type of NALU.
NAL Body includes encapsulation of payload data (video data). In practical application, a video code stream obtained by video coding has 3-layer encapsulation. A first layer: an Extended Byte String Payload (EBSP), specifically includes an emulation _ prediction _ three _ byte field, which is set to prevent NAL Body internal occurrence from colliding with NALU start code (0x 000001 or 0x 00000001). A second layer: the Raw Byte Sequence Payload (RBSP) is equivalent to the data after NAL Body removes the emulation _ prediction _ three _ byte, and is the data generated after the raw syntax element code stream (encoded data) is further processed. The basic structure of RBSP is to add tail bits after the original encoded data to facilitate byte alignment. And a third layer: the stream of data bytes (SODB) identifies the actual original binary stream after the syntax elements of the h.264 coding standard have been coded.
Alternatively, in the h.264 video coding standard, only NAL Header (NAL Header) and RBSP may be included in NALU. That is, the NAL body is RBSP. For NAL header and RBSP, see the above description, and are not repeated here.
An enhanced supplemental information network abstraction layer unit (SEI NALU), refers to a NALU whose type field nal _ unit _ type is a supplemental enhancement information unit SEI type. Fig. 3 is a schematic structural diagram of an SEI NALU according to an embodiment of the present invention. As shown in fig. 3, the SEI NALU includes a NAL Header (NAL Header) and a NAL Body (NAL Body). The NAL Header may correspond to the description of the embodiment described with reference to fig. 2. NAL _ unit _ type in NAL Header occupies 5 bits for indicating the type of NALU. In practical application, the types of different NALUs are indicated by setting the value of the nal _ unit _ type field. For example, when nal _ unit _ type is "0X 06", it indicates that the type of NALU indicated by nal _ unit _ type is SEI type; when nal _ unit _ type is "0X 67", it indicates that the type of NALU indicated by nal _ unit _ type is a Sequence Parameter Set (SPS) type; when the nal _ unit _ type is "0X 68", it indicates that the type of NALU indicated by the nal _ unit _ type is a Picture Parameter Set (PPS) type or the like. In the present invention, nal _ unit _ type is 0X06, indicating that NALU is of SEI type.
The NAL body includes an SEI payload type (SEI payload type), an SEI payload size (SEI payload size), and a global unique identifier (SEI uniqueidentifier, SEI UUID) and custom fields of the SEI payload. Wherein, the SEI payload type field occupies 1 byte, i.e. 8 bits, and is used to indicate the type of payload data carried in the SEI NALU, such as video data, SPS or PPS data. The SEI payload size field is used to indicate the size of the payload data, simply the payload size. The SEI UUID field occupies 16 bytes and is used to indicate a unique identification of payload data. The number of bytes occupied by the custom field can be set for system customization and is used for carrying system customization data, and the invention is not limited.
With the progress of network communication technology and the increase of bandwidth networks, network video is increasingly developed and applied. Currently, to save network transmission resources, video is usually transmitted in a compressed manner. Video is composed of several temporally successive frames, and encoding can be performed by dividing the video into several GOPs. For example, when the computing device receives a request for playing a dragged progress bar of a video, if a target frame pointed by a dragging stop position is a non-I frame, an I frame closest to the target frame needs to be found from a plurality of frames before or after the target frame, and GOP decoding and playing are started from the I frame. When the GOP is large, if the distance between the target frame and the I frame is large, the decoding time may be prolonged, and the efficiency of video processing may be affected to a great extent, resulting in a decrease in video processing efficiency and an impact on the viewing experience of the user. On the contrary, if a plurality of frames before or after the target frame lack the I frame, the target frame cannot be decoded, and a part of the video is discarded, and decoding and playing are required to be started from the next I frame position. Therefore, some important video information is discarded, the accuracy rate of video information acquisition is influenced, and the watching experience of a user is influenced.
For another example, when a computing device receives a video reverse play request, a complete GOP constituting a video is input to a decoder for decoding, and a decoding result (decoded video) is stored in a buffer and then played in reverse order. For example, a 5 minute short video, in a video reverse play scene, the computing device needs to play the short video in reverse order from the end of the short video (i.e., the 5 th minute) to the end of the beginning of the short video. In practice, it is found that if the GOP is large, the cache space occupied by the decoded GOP is large. For example, 4K video transmitted at a frame rate of 25fps (frames per second), if the size of a single GOP is 20s (seconds), the storage space occupied by the computing device to buffer the decoded video needs to be 5.8 GB. This can result in wasted storage resources of the computing device.
In order to solve the problems that the video processing efficiency is reduced, some important video information is lost or storage resources of computing equipment are wasted in a large GOP reverse playing scene and the like in the prior art, the invention provides a video processing method, a system applicable to the method and a related product. Fig. 4 is a schematic structural diagram of a video processing system according to an embodiment of the present invention. The video processing system 100 shown in fig. 4 includes a video encoding unit 102, a video read/write unit 104, a video decoding unit 106, and a storage unit 108. Wherein the content of the first and second substances,
the video encoding unit 102 is responsible for encoding an input original video into a video stream, and specifically may convert a format file of the original video into a file of another video format. For example, the video encoding unit 102 may encode the original video into a video bitstream using an encoding standard such as h.261, h.263, h.264, h.265, or h.266. Common video formats include, but are not limited to, Audio Video Interleaved (AVI), digital video-audio video interleaved (DV-AVI), Moving Picture Experts Group (MPEG), Advanced Streaming Format (ASF), windows media video format (WMV), real media format (RM), or other video-supported formats.
In video encoding, video encoding unit 102 may divide the video into several GOPs for encoding. In other words, a video (i.e. a video stream) may include one or more GOPs, and the invention will be described below with reference to the case where a video (or a video stream) includes one GOP.
In practical applications, the video encoding unit 102 may be specifically an encoder or other device supporting image or video encoding. For example, the video encoding unit 102 may be deployed in a camera device, such as a camera, or the like; may be deployed as a separate encoder, etc.
The storage unit 108 is used for storing video, for example, storing a video code stream obtained by encoding by the video encoding unit 102.
The video read-write unit 104 is responsible for writing the video code stream into the storage unit 108. Or the video stream (specifically, the GOP in the video stream) is read from the storage unit 108 and then input to the video decoding unit 106 for decoding.
The video decoding unit 106 is responsible for decoding the input video code stream and outputting the decoded video code stream. Specifically, a GOP included in a video stream is decoded, and each frame included in the GOP is output.
In practical applications, the video read/write unit 104 may specifically be an input/output (IO) device supporting a data read/write function, such as an IO interface. The video decoding unit 106 may specifically be a device or a device supporting a video decoding function, such as a decoder. The video decoding unit 106 may be disposed in a video processing apparatus of a computing device, or may be disposed as a separate decoder, and the invention is not limited thereto. The storage unit 108 may be a device supporting a data storage function, and may include, but is not limited to, a Random Access Memory (RAM) flash memory, a Read Only Memory (ROM), a hard disk, a register, and the like.
The video processing technology provided by the embodiment of the invention can be suitable for scenes such as GOP playing or downloading. This scheme includes: one or more additional I-frames are inserted in the GOP, and the newly inserted I-frame is closer to the VI-frame than the original I-frame in the GOP. In this case, when the content in the specified GOP needs to be played or downloaded, the GOP is found according to the requested time and the VI frame corresponding to the requested time, and the newly inserted I frame is used as the reference frame of the VI frame for video decoding (without performing video decoding with the original I frame in the GOP as a reference value). Thereby improving video processing efficiency and playing time accuracy. The beneficial effects of the embodiments of the present invention are more prominent when there are more frames in the GOP.
Fig. 5 is a schematic structural diagram of a video encoding unit 102 according to an embodiment of the present invention. As shown in fig. 5, the video coding unit 102 includes a VI detector 1021. Optionally, the video encoding unit 102 may further divide the system framework into two layers in video encoding: a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). As shown in fig. 5, the video coding unit 102 may further include a video coding layer VCL1022 and a network abstraction layer NAL 1023.
The video encoding unit 102 encodes an input original video through the video encoding layer VCL1022 to obtain a bit stream after video encoding, which is referred to as a video stream for short, and specifically refers to a GOP in the video stream. The VI detector 1021 then performs VI frame identification on the resulting GOP of the video coding layer VCL 1022. The specific embodiment of VI frame identification is not limited, and VI frame identification may be performed according to the definition of the VI frame, or according to the received out-of-band information, for example. The out-of-band information is used to indicate that a frame corresponding to a preset timestamp in the GOP is a VI frame, for example, the out-of-band information is used to indicate that a frame corresponding to the 3 rd s in the GOP is a VI frame, and so on.
If a VI frame is identified, the network abstraction layer NAL1023 is notified to mark the VI frame for indicating its position in the GOP. The specific implementation of the VI frame marking is not limited, for example, a supplemental enhancement information SEI marking manner, other marking manners conforming to the video coding standard for marking the specific location of the VI frame in the GOP, or an out-of-band manner for informing the location of the VI frame in the GOP is adopted.
Meanwhile, the GOP encoded by the video encoding unit 102 may also be sent to the network abstraction layer NAL1023 for encapsulation, so as to encapsulate the GOP into a unit packet NALU of the network abstraction layer NALU. In other words, a GOP is composed of multiple NALUs. Fig. 6 is a schematic diagram of a GOP. As in fig. 6, the GOP is composed of a series of NALUs. Typically, the first frame data of a GOP is a Picture Parameter Set (PPS) and a Sequence Parameter Set (SPS), followed by an I frame and other frames. As shown, the GOP includes at least one frame, each of which includes one or more NALUs. Where the PPS comprises information for all slices (slices) of a picture (i.e., frame) and the SPS comprises all information for a sequence of pictures (i.e., each frame in the GOP).
For example, taking the VI frame flag as an SEI flag to indicate the location of the VI frame in the GOP as an example, the VI detector may notify the network abstraction layer NAL1023 to generate a custom supplemental enhancement information network abstraction layer unit (SEI NAL unit, SEI NALU) after identifying the VI frame in the GOP. The SEI NALU is inserted before or after the VI frame to indicate that the frame where the ith NALU before the SEI NALU is located is the VI frame, or to indicate that the frame where the jth NALU after the SEI NALU is located is the VI frame, for example, the SEI NALU may be specifically used to indicate that the previous frame or the next frame of the SEI NALU is the VI frame. Please refer to fig. 7, which illustrates an SEI NALU insertion GOP. As shown in fig. 7, the original structure of the GOP is shown, and the structure of a new GOP is shown after inserting SEI NALU before and after VI frame in the GOP. As shown in fig. 7, the GOP obtained by the video coding layer VCL1022 includes P frames and VI frames. After the video coding unit 102 detects the VI frame in the GOP through the VI detector 1021, it notifies the network abstraction layer NAL1023 to add SEI NALU in front of the VI frame; or inform the network abstraction layer NAL1023 to add SEI NALU after the VI frame. Wherein, the specific position of the SEI NALU added before or after the VI frame is not limited, for example, the SEI NALU is added before the VI frame as the jth NALU before the first NALU contained in the VI frame, so as to indicate that the frame where the jth NALU after the SEI NALU is located is the VI frame; or the SEI NALU is added after the VI frame as the ith NALU after the last NALU contained in the VI frame, so as to indicate that the frame where the ith NALU before the SEI NALU is located is the VI frame.
Please refer to fig. 8, which illustrates an exemplary diagram of inserting SEI NALUs in a GOP. As shown in fig. 8, each frame (including VI frame) in the GOP is composed of one or more NALUs, and the VI frame in the figure includes 3 NALUs, NALUs 1-3. Accordingly, when the network abstraction layer NAL1023 marks the VI frame in an SEI marking manner, an SEI NALU may be added in front of a first NALU (illustrated as NALU1) included in the VI frame, i.e., as the first NALU before NALU 1; alternatively, SEI NALUs are added after the last NALU (illustrated as NALU3) contained in VI frames, i.e., added as the first NALU after NALU 3. In this example, the SEI NALU specifically indicates that the frame where the previous NALU or the next NALU of the SEI NALU is located is a VI frame.
In practical applications, the network abstraction layer NAL1023 may specifically set the value of the relevant field in the SEI NALU to indicate that the frame where the ith NALU before the SEI NALU is located is a VI frame; or to indicate that the frame in which the jth NALU following the SEI NALU is located is a VI frame. For example, the network abstraction layer NAL1023 can indicate the position of the VI frame in the GOP by setting the value of the Type field (in particular, the SEI payload Type field) or the SEI UUID field in the SEI NALU. Alternatively, the network abstraction layer NAL1023 may add a new field to the custom field of the SEI NALU, and set the value of the new field to indicate the location of the VI frame in the GOP. Taking the value of the SEI payload type field as an example, if the network abstraction layer NAL1023 sets the SEI payload type to +1, it indicates that the frame where the NALU previous to the SEI NALU is located is a VI frame. On the contrary, if the network abstraction layer NAL1023 sets the SEI payload type to-1, it indicates that the frame where the NALU next to the SEI NALU is located is a VI frame.
Fig. 9 is a schematic structural diagram of a video read/write unit 104 according to an embodiment of the present invention. As shown in fig. 9, the video reader/writer unit 104 includes a stream detector 1041, an index generator 1042 and a stream modifier 1043. Video read write unit 104 and storage unit 108 are in communication with each other. Wherein the content of the first and second substances,
the stream detector 1041 is configured to perform frame detection (i.e., frame identification) on a video stream (specifically, a GOP in the video stream) input to the video read-write unit 104, so as to determine each frame included in the GOP and a position of each frame. For example, the present invention can determine the respective locations of the I and VI frames in the GOP, etc. The representation of the position is not limited, and may be represented by a frame index, a playing time corresponding to the frame in the GOP, a storage location (also referred to as a storage address) of the frame in the storage unit 108, or other information indicating the position of the frame in the GOP.
Specifically, taking the example that the code stream detector 1041 detects the VI frame in the GOP as an example, the code stream detector 1041 performs VI frame mark detection on the GOP to detect the VI frame in the GOP and the position of the VI frame. Since the video encoding unit 102 marks the VI frame in the GOP differently, the specific implementation of the code stream detector 1041 for VI frame mark detection is also different, and the following two specific ways of VI frame mark detection are exemplarily given.
First, the bitstream detector 1041 detects whether the GOP includes the SEI NALU, and if so, determines that the frame where the ith NALU before the SEI NALU is located is a VI frame or the frame where the jth NALU after the SEI NALU is located is a VI frame according to the indication of the SEI NALU. The number of the SEI NALUs is not limited, and may be one or more. When the number of SEI NALUs is plural, the code stream detector 1041 can detect plural VI frames included in the GOP and the position of each VI frame in the GOP according to the above principle.
Secondly, the code stream detector 1041 performs VI frame analysis on the GOP according to the out-of-band information sent from the video encoding unit 102, and determines the VI frame included in the GOP and the position of the VI frame in the GOP. The out-of-band information is used to indicate or notify the location of the VI frame in the GOP, for example, the fifth frame in the GOP is the VI frame, or the frame corresponding to the 3 rd second in the GOP is the VI frame, etc. Optionally, the code stream detector 1041 may further detect the VI frame in the GOP by parsing the GOP, which is specifically shown with reference to the following third embodiment.
Thirdly, the code stream detector 1041 parses each frame included in the GOP, identifies Reference Picture Sequence (RPS) information in each frame, and determines a VI frame included in the GOP and a position of the VI frame.
It should be understood that in video coding a frame of pictures is coded as one or more slices (slices), and these slices per frame are carried in the NALU for transmission. The first slice of each frame contains an RPS information. The RPS information is composed of some identification information, and the meaning indicated by the identification information is specifically system-defined setting, such as indicating whether the frame is used as a reference for decoding the current frame or the subsequent frame, and the like. The RPS information includes reference frame information of a current frame, and if the reference frame information is used to indicate that the current frame has an I frame that is a unique decoding reference and a previous frame of the current frame is a non-I frame, the current frame is a VI frame. Specifically, the RPS information indicates a Picture Order Count (POC) of a reference frame, and if the POC of the reference frame is 1, it indicates that the current frame has 1 reference frame and the reference frame is an I frame. I.e. decoding the current frame only refers to I-frames. Further, if the code stream detector 1041 detects that the previous frame of the current frame is a non-I frame, it may determine that the current frame is a VI frame.
When detecting that the GOP includes the VI frame, the code stream detector 1041 may send a VI frame identification signal to the index generator 1042, for notifying that the GOP includes the VI frame and related information of the VI frame, such as an index (i.e., a frame number) of the VI frame, a playing time of the VI frame in the GOP, a storage address of the VI frame in the storage unit 108, and the like. Optionally, in the same way, when detecting that the GOP includes the I frame, the code stream detector 1041 may send an I frame identification signal to the index generator 1042, where the I frame identification signal is used to notify the I frame included in the GOP and related information of the I frame, for example, a frame number of the I frame, a playing time of the I frame in the GOP, a storage address of the I frame in the storage unit 108, and the like.
The index generator 1042 is configured to receive the I frame identification signal and the VI frame identification signal sent by the code stream detector 1041. After receiving the VI frame identification signal, the index generator 1042 may determine a target I frame (hereinafter, also referred to as a second I frame) to be referred to when decoding the VI frame, and store the target I frame in association with the VI frame, for example, store a storage address of the target I frame in index information of a GOP to indicate that the target I frame stored at the storage address is referred to when decoding the VI frame. Or, the corresponding index information in the VI frame is used to point to the target I frame, specifically, to indicate the target I frame pointed by the index information when decoding the VI frame. The index information of the GOP is used to identify the GOP, and may include, but is not limited to, an index number of the GOP, a time length of the GOP, a start time and an end time of a GOP (video stream) correspondence (GOP), whether a VI frame identifier is included in the GOP, a storage address of the GOP in the storage unit 108, a storage address or an offset of the VI frame in the GOP, and other information. The target I frame may specifically refer to an I frame that occurs before the VI frame in the GOP, that is, the playing time corresponding to the target I frame is prior to the playing time corresponding to the VI frame. Alternatively, the target I frame herein may also refer to the I frame closest to the VI frame in the GOP.
For example, please refer to fig. 10, which shows a schematic diagram of a GOP. As shown in fig. 10, the GOP is one video stream of 10 s. The 7 th s frame in the figure is a VI frame. In this example, if the target I frame referred to by the decoded VI frame is an I frame appearing before the VI frame in the GOP, the target I frame is specifically the I frame at 0s in the drawing. If the target I frame referred to by decoding the VI frame is the I frame closest to the VI frame in the GOP, the target I frame is specifically the I frame of the 9 th s in the drawing.
Alternatively, if there are a plurality of GOPs included in the video, each GOP corresponds to the index information of its own GOP, and the index generator 1042 may store each GOP and the index information of the GOP in the form of a GOP index table in the storage unit 108. At least one mapping relation is stored in the GOP index table, and the mapping relation is that one GOP correspondingly has index information of the GOP. The index information of the GOP may be referred to above, and is not described herein again.
The stream modifier 1043 is configured to modify the GOP input by the video read-write unit 104 to obtain a modified new GOP. Specifically, the code stream modifier 1043 reads the VI frame included in the GOP and the target I frame referred to correspondingly when decoding the VI frame, and then inserts the target I frame in front of the VI frame, thereby obtaining at least two new GOPs (which may also be referred to as multiple GOPs). The specific position of the target I frame inserted before the VI frame is not limited, for example, the target I frame is inserted as the m-th frame before the VI frame, and m is a positive integer.
It should be noted that, for more visual description of the embodiments of the present invention, the embodiments of the present invention can be visually understood as follows: a GOP is divided into new GOPs by inserting new I frames into the original GOP, where each new GOP has an I frame. However, in the embodiment of the present invention, the newly inserted I frame is to be able to be decoded and referred to by the VI frame, and therefore may not possess all functions of the I frame in the original GOP (for example, may not possess a function of being played), as long as it is sufficient for the VI frame to refer to when decoding, in other words, the newly inserted frame only possesses a function of decoding the I frame for the VI frame reference, and therefore, such a newly inserted frame may be referred to as a quasi I frame. In this case, since the inserted I-frame is not true, the original GOP frame can be considered as not being actually divided into a plurality of new GOPs, but still being a GOP (except that one or more quasi-I-frames are newly added in the GOP). Of course, in another case, if the newly inserted I-frame is identical to the I-frame in the original GOP, the original GOP may be considered to be divided into a plurality of new GOPs. For convenience of description, in the following embodiments of the present invention, without specific reference, the two cases are not particularly distinguished, the inserted frames are collectively referred to as I-frames, and the result of such an operation of inserting I-frames (or quasi-I-frames) in the GOP is collectively referred to as obtaining a "new GOP". In short, the I frame (e.g., the second I frame) inserted in the embodiment of the present invention is: the frame same as the I frame in the original GOP, or the frame which is owned by the I frame in the original GOP and is used for the VI frame reference decoding function.
In a specific implementation, when the video read/write unit 104 receives the video processing request, the code stream detector 1041 detects whether there is a VI frame in the GOP. If there is a VI frame in the GOP, the target I frame is read from the storage address of the target I frame recorded in the index information of the GOP to be referred to by decoding the VI frame through the code stream modifier 1043. The stream modifier 1043 then inserts the read target I frame before the VI frame, resulting in a plurality of new GOPs. Therefore, the problems that when the GOP is large, if the distance between the I frame and the VI frame is large, the decoding time is too long, the video processing efficiency is reduced, or some important video information is lost and the like in the prior art can be solved. The method and the device have the advantages that the mode of inserting the I frame in front of the VI frame is adopted, the large GOP can be split into the plurality of small GOPs, decoding playing can be carried out based on the split small GOPs during video playing, compared with the prior art, decoding of some unnecessary information can be avoided, video decoding efficiency is improved, the problems of discarding of some important video information and the like are avoided, and watching experience of a user is guaranteed.
By implementing the embodiment of the present invention, the video encoding unit 102 can perform VI frame marking on the VI frame included in the GOP and transmit the VI frame marking along with the GOP, so that the compatibility of the video encoding unit 102 can be improved. The video read-write unit 104 can insert the target I frame before the VI frame, and divide the large GOP into a plurality of new GOPs, so that control is performed based on the VI frame as granularity, and the video playing effect can be effectively improved. Especially in the video reverse playing scene, the new GOP is used for replacing the large GOP and caching, so that the storage resource can be effectively saved.
Two application scenarios to which the present invention is applicable are described below.
First, a video plays a scene. The video processing request is specifically a video playing request. Specifically, when the user watches the video, the user can drag the progress bar for playing the video at will according to the user's own needs, please refer to fig. 11, which shows a schematic diagram of the user dragging the progress bar for playing the video. When detecting that the user drags the video playing progress bar, the computing device may generate a corresponding video playing request. And further responding to the video playing request, acquiring a GOP (group of pictures) where the dragging stop position is located, and then identifying whether the GOP comprises a VI frame or not. If the GOP comprises the VI frame, the storage address of a target I frame which is referred to when the VI frame is decoded is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted in front of the I frame. The target I frame may specifically refer to an I frame appearing before the VI frame in the GOP, and may also refer to an I frame closest to the VI frame in the GOP, and specifically refer to the example shown in fig. 10.
If the number of VI frames included in the GOP is multiple, in order to save the device processing resources in the video playing scene, the computing device may only need to process the VI frame closest to the dragging stop position in the GOP, that is, insert the target I frame before the VI frame to obtain two new GOPs. Optionally, the playing time corresponding to the inserted target I frame is prior to the playing time corresponding to the dragging stopping position. And then decoding and playing the new GOP where the dragging stop position is located. Please refer to fig. 12, which illustrates a structure diagram of a GOP. As shown in fig. 12, the GOP is a video stream of 10s, and the GOP includes two VI frames, i.e., VI frame 1 and VI frame 2. The playing time corresponding to VI frame 1 is 5 seconds, and the playing time corresponding to VI frame 2 is 7 seconds. When the user watches the video code stream on line, the user can drag the progress bar of video playing at will. If the user drags the progress bar to stop at the 3 rd position, the VI frame closest to the dragging stop position is the VI frame 1. At this time, the computing device may insert the target I frame before VI frame 1, where an insertion position of the target I frame is not limited, for example, any position between the dragging stopping position and the VI frame, or any position before the dragging stopping position, so that it may be ensured that the playing time corresponding to the inserted target I frame is not later than (i.e., greater than or equal to) the playing time corresponding to the dragging stopping position, and some important video information may be prevented from being lost.
Second, a video download scene. The video processing request is specifically a video downloading request. Specifically, if the user wants to watch the video offline, the video can be downloaded and cached locally in advance. Accordingly, after receiving the video download request, the computing device may download the video (specifically, one or more GOPs included in the video) in response to the video download request. Optionally, the video download request may carry a start time and an end time of the video, and the computing device downloads the video (i.e., one or more GOPs in the video) from the start time to the end time, specifically, from the GOP at the start time to the GOP at the end time. It is then identified whether VI frames are included in each GOP. If the GOP comprises the VI frame, the storage address of a target I frame which is referred to when the VI frame is decoded is obtained from the GOP index table, the target I frame is obtained from the storage address, and the target I frame is inserted in front of the VI frame. For the description of the target I frame, reference may be made to the related description in the first application scenario, and details are not repeated here.
In practical application, different GOPs correspond to different playing periods, and when the start time is within the playing period of a certain GOP, it can be simply understood that the start time is in the GOP, and the GOP is taken as the GOP where the start time is. See in particular the example of fig. 15 below in the present application.
In a video downloading scene, considering that a user can drag a progress bar for video playing to start playing a video at any position, the computing device can process each VI frame included in each GOP in the video, that is, insert a target I frame before the VI frame, thereby realizing the splitting of a large GOP into a small GOP. The processing procedure of the computing device for any VI frame in each GOP is the same, and specific reference may be made to the related description of the foregoing embodiments, which is not described herein again.
The following describes related embodiments involved in GOP storage. Different video processing systems may use different indexing schemes to create and store corresponding index information for the GOP. In other words, the indexing modes corresponding to the indexing information of the GOPs in different video processing systems may be different, for example, a time indexing mode, a frame number indexing mode, or the like is supported. Specific implementations of the two indexing approaches are given as examples below.
First, the time-indexed approach. The computing device creates and stores corresponding index information for the GOP in a time index manner. Specifically, the computing device creates an index for the GOP according to a preset time length (for example, 1s), and obtains index information of the GOP. The index information includes, but is not limited to, the number of the GOP, whether the GOP includes an I frame, a storage address of the I frame, whether the GOP includes a VI frame, a storage address of the GOP in the storage unit 108, a playing time corresponding to each frame in the GOP, and the like. The preset time length is set by a system in a self-defined mode, for example, the preset time length is set by the user in a self-defined mode according to user requirements, or is obtained according to a series of empirical data statistics and the like. Fig. 13A is a diagram of a GOP stored in a time index manner. As shown, the GOP is a 10s video stream, specifically, the 0 th to 9 th seconds video stream is shown. Each second corresponds to one frame (picture) in the GOP.
Second, frame number indexing. And the computing equipment creates corresponding index information for the GOP in a frame number index mode and stores the index information. Specifically, the computing device creates an index for the GOP according to the I-frame interval, and obtains index information of the GOP. The GOP is used to indicate a set of consecutive frames between two I frames. The index information of the GOP may be referred to above, and is not described herein again. Please refer to fig. 13B, which illustrates a schematic diagram of a GOP stored by using frame number index. As shown, the GOP is a video stream including 10 frames, which are respectively frame 0 to frame 10 as shown. Each frame corresponds to the index number of the frame.
Based on the foregoing embodiments, please refer to fig. 14, which is a flowchart illustrating a video processing method according to an embodiment of the present invention. The method as shown in fig. 14 comprises the following implementation steps:
step S102, the computing equipment acquires a group of pictures (GOP) in the video, wherein the first frame of the GOP is a first I frame. The GOP includes M frames, where M is a positive integer.
The computing device obtains a video processing request, wherein the video processing request carries the starting time of the video. And responding to the video processing request, and acquiring the video corresponding to the starting time, namely acquiring at least one GOP in the video. The video processing request may also carry an end time of the video, or other system-defined information, and the like, which is not limited in the present invention. The video processing request may be generated by a user performing a corresponding video operation on a video, or may be received from another device. The video processing request may also be different in different application scenarios. For example, in a video playing scene, the computing device detects a dragging operation of the user for the video playing progress bar, and may generate a corresponding video playing request. In a video downloading scene, when the computing device detects a downloading operation of a user for a preset time period (a time period from a starting time to an ending time) video, a corresponding video downloading request and the like can be generated.
The following takes the video processing request as a video playing request and a video downloading request respectively as an example, and details the specific implementation of step S102.
In one embodiment, if the video processing request is a video playing request, the video playing request carries a start time T of the videos. The video includes a plurality of GOPs. The computing device may retrieve the start time T from the multiple GOPs of the video in response to the video play requestsThe group of pictures GOP in which it is located.
In another embodiment, if the video processing request is a video downloading request, the video downloading request carries the start time T of the videosOptionally, the end time T of the video may also be carriede. The computing device may respond to the video playback request from the start time TsThe GOP begins to download until the end time TeThe GOP is ended, so that at least one GOP forming the video is downloaded.
For example, please refer to fig. 15, which is a schematic diagram of a GOP composing a video according to an embodiment of the present invention. A user playing a movie on a computing device online [ ]XXX (supplement of the general formula XXX). As in fig. 15, the movie includes 8 GOPs. Suppose that the user drags the playing progress bar of the movie to stay at TsAt the moment of time, in order to be driven from TsThe video starts playing at that moment. The computing device may generate a video playback request upon detecting that a user drags a playback progress bar of the movie. The video playing request carries the start time T of the videos. Further, the computing device may respond to the video play request by obtaining the start time TsThe GOP is shown specifically as GOP 3.
If the user needs to download the movie offline, the computing device may generate a video download request when detecting a download operation of the user for the movie. The video download request may carry a start time and an end time of the video to be downloaded. The video to be downloaded may be a video clip (e.g., a leader or trailer) of the movie XXX, or may be the entire video. The start time and the end time of the video to be downloaded can be set by a user according to actual requirements in a self-defined way, for example, 00:01: 00-00:21:00 (i.e., download video segments from minute 1 to minute 21). A user may perform offline download setup on a display interactive interface provided by a computing device, please refer to fig. 16, which illustrates an operation diagram of offline video download by the user. As shown in fig. 16, information such as the start time, the end time, and the video name of the video to be downloaded is set in the display interactive interface according to the requirement of the user. Accordingly, when the computing device detects an offline downloading operation for the display interactive interface, the computing device can start downloading from the GOP where the starting time is located until the GOP where the ending time is located is ended. Assuming that the GOP with the start time 00:01:00 is GOP1 and the GOP with the end time 00:21:00 is GOP3 in this example, the 20-minute video downloaded by the computing device may specifically include GOP1, GOP2, and GOP 3.
Step S104, the computing device determines whether the M frames include a VI frame.
In one embodiment, the GOP includes one or more NALUs. The computing device determines whether a VI frame is included in the M frames of the GOP by identifying whether an SEI NALU is included in the GOP. Specifically, if the GOP includes the SEI NALU, according to the indication of the SEI NALU, it is determined that the frame where the ith NALU before the SEI NALU is located is the VI frame, or it is determined that the frame where the jth NALU after the SEI NALU is located is the VI frame. The number of SEI NALUs is not limited, and may be one or more. When the number of SEI NALUs is plural, the computing device may determine, with reference to the VI frame determination principle described above, the VI frame indicated by each of the plural SEI NALUs. Thereby determining one or more VI frames included in the M frames.
In yet another embodiment, the GOP includes at least one frame. Each frame includes reference frame RPS information for the frame. The computing device may analyze the RPS information of each of the M frames to determine whether each frame is a VI frame. Specifically, if the RPS information of any frame in the GOP is used to indicate that the frame has a reference decoded I frame, and a frame preceding the frame is a non-I frame (specifically, a B frame or a P frame), the frame is determined to be a VI frame. Otherwise, determining that any frame is not a VI frame.
In yet another embodiment, the computing device obtains out-of-band information for the GOP, the out-of-band information indicating a location of a VI frame contained in the GOP. The position refers to a specific or determined position of the VI frame in the GOP, which may include, but is not limited to, a frame number (index number) of the VI frame, a playing time corresponding to the VI frame, and the like. The out-of-band information may be specifically sent by the computing device receiving from another device (e.g., a server); or may be obtained from its own video encoding unit for the computing device, and the invention is not limited thereto. Correspondingly, the computing device identifies whether the VI frame and the position of the VI frame are included in the M frames of the GOP according to the out-of-band information of the GOP.
In an alternative embodiment, when the computing device determines that the VI frame is not included in the GOP, the computing device need not process the GOP. While playing the video corresponding to the GOP, the computing device may start decoding and playing from the first I frame in the GOP.
Step S106, when the computing device determines that the M frames comprise the VI frame, inserting a second I frame before the VI frame to obtain a plurality of new GOPs. The number of the new GOPs increases by 1 the number of VI frames included in the GOP.
After identifying that a VI frame is included in the M frames, the computing device may obtain a target I frame (also referred to as a second I frame) associated with the VI frame. Specifically, for example, the computing device may determine, from the index information of the GOP, a storage address of the second I frame corresponding to the VI frame, and further obtain the second I frame from the storage location. Or the computing device may look up the second I frame it points to from the index information of the VI frame. The second I frame may specifically be an I frame appearing before the VI frame in the GOP, or may also be an I frame closest to the VI frame in the GOP, and reference may be made to the related description about the target I frame, which is not repeated herein. The index information of the GOP records information such as a second I frame referred to when decoding the VI frame, a storage address of the second I frame, a frame index of each frame, a playing time corresponding to each frame, a playing time of the GOP, a start time and an end time of the GOP.
After obtaining the second I frame, the computing device may insert the second I frame before the VI frame, specifically, as an m-th frame before the VI frame, where m is a positive integer. For example, the second I frame is inserted as the previous frame to the VI frame, etc. The computing device may thus split the GOP into a number of new GOPs that increases by 1 the number of VI frames in the GOP. For example, the GOP includes 4 VI frames, and after the second I frame insertion is performed on each VI frame, 5 new GOPs are obtained. Referring specifically to fig. 17, a diagram of a new GOP is shown. As shown, the GOP includes 4 VI frames, and the computing device inserts a corresponding second I frame before each VI frame using the above-described I frame insertion principle, resulting in 5 new GOPs.
Optionally, after inserting the second I frame before the VI frame, the computing device may modify the value of the relevant field of the second I frame (e.g., the value of the control field or the flag field in the second I frame) to mark the second I frame as a non-display frame, or a non-output frame, without affecting the video playback quality. In other words, the second I frame is used only for decoding VI frames and is not used for output display. The new GOP referred to in this application is not synonymous with the GOP conventionally defined, and the term description of the new GOP is still used for ease of understanding. The new GOP is used to indicate the distance between two I-frames, but the first I-frame of the new GOP is used for decoding only and not for display output. Illustratively, the pseudo-code description of the computing device modifying the second I frame is specified as follows:
Figure PCTCN2019125411-APPB-000001
it should be noted that, for different application scenarios, the specific processing objects of the GOP of the video and the VI frame included in the GOP in the present invention also differ. Specifically, the method comprises the following steps:
first, in a video playing scene, the video processing request in S102 is specifically a video playing request. The video playing request carries the start time T of the videos. Correspondingly, the computing equipment responds to the video playing request to acquire the starting time TsThe GOP in which it is located, and then identifies whether the GOP includes a VI frame. If the GOP comprises a plurality of VI frames, the computing device acquires and starts time T from the VI framessThe most recent VI frame is processed, i.e., a second I frame is inserted before the acquired VI frame, so that two new GOPs are obtained. For the acquisition of the VI frame, reference may be specifically made to the related description in the example described in fig. 12, and details are not repeated here. Optionally, to ensure that the video information is not lost, the playing time corresponding to the inserted second I frame is prior to the start time Ts
Secondly, in a video downloading scene, the video processing request in S102 is specifically a video downloading request. The video downloading request carries the starting time T of the videosAnd an end time Te. The computing device responds to the video downloading request from the starting time TsThe GOP begins to download until the end time TeThe GOP is ended, so that a plurality of GOPs forming the video are downloaded. For each GOP, it is identified whether a VI frame is included in the GOP. If the GOP includes one or more VI frames, the computing device inserts a corresponding second I frame before each VI frame, thereby splitting a GOP into multiple new GOPs. Reference may be made to the related description in the embodiment described in fig. 9, which is not repeated herein.
In an optional embodiment, after obtaining a plurality of new GOPs, if a video playing request is obtained, the computing device may respond to the video playing request to decode and play the corresponding new GOPs. In different application scenarios, the specific implementation modes are as follows:
in a video playing scene, the video processing request in S102 is a video playing request, and the computing device responds to the video playing request and counts the distance T from the start time in the GOPsThe latest VI frame is inserted with the second I frame to get two new GOPs. Further responding to the video playing request and obtaining the starting time TsAnd the new GOP is decoded and played from the second I frame of the new GOP. In other words, the computing device determines the start time T in response to the video playback requestsAfter the second I frame in the GOP, the video corresponding to the GOP is then decoded and played starting from the second I frame.
In a video downloading scene, the video processing request in S102 is a video downloading request, the computing device responds to the video downloading request, downloads a plurality of GOPs included in the video, and performs second I-frame insertion on each VI frame included in each GOP to obtain a plurality of new GOPs. When watching the video, the user can drag the playing progress bar of the video at will. The computing device may generate a corresponding video play request upon detecting that the user drags the video play progress bar. The video playing request carries the start time T of the videos. In response to the video play request, the start time T is searched for from a plurality of new GOPssThe new GOP, then starts decoding from the second I frame of the new GOP and plays the new GOP.
By implementing the embodiment of the invention, the problems of low video processing efficiency, loss of some important video information, waste of storage resources of computing equipment in a large GOP reverse playing scene and the like in the prior art can be solved.
In conjunction with the foregoing description of the embodiments described with reference to fig. 1-17, the following description will describe devices and apparatuses to which the present invention is applicable. Fig. 18 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention. As shown in fig. 18, the video processing apparatus 18 includes an acquisition unit 181, a determination unit 182, and an insertion unit 183. Optionally, a decoding playing unit 184 may also be included. Wherein the content of the first and second substances,
the acquiring unit 181 is configured to acquire a group of pictures GOP in a video, where a first frame of the GOP is a first I frame, the GOP includes M frames, and M is a positive integer;
the determining unit 182, configured to determine whether a virtual intra-coded VI frame is included in the M frames;
the inserting unit 183, configured to insert a second I frame before the VI frame when the VI frame is included in the M frames;
wherein the second I frame is a frame to which the VI frame refers at video decoding time.
In some possible embodiments, the video processing apparatus 180 may further include a decoding playing unit 184. Wherein the determining unit 182 is configured to determine, in response to a video playing request, that a start time of a video in the video playing request is located after the second I frame in the GOP; the decoding playing unit 184 is configured to decode and play the video starting from the second I frame.
In some possible embodiments, the second I frame is a previous frame of the VI frame.
In some possible embodiments, the GOP further includes index information of the GOP, the index information records a storage address of the second I frame, and before the second I frame is inserted before the VI frame, the obtaining unit 181 is further configured to obtain the second I frame from the storage address of the second I frame according to the index information of the GOP.
In some possible embodiments, the second I frame is used to decode the VI frame and is not used for output display.
In some possible embodiments, the obtaining unit 181 is specifically configured to receive a video processing request, where the video processing request carries a start time of a video, and the video includes at least one group of pictures GOP; responding to the video processing request, and acquiring a group of pictures (GOP) corresponding to the starting time from a GOP index table;
the GOP index table records at least one mapping relation, the mapping relation is that each GOP corresponds to the index information of the GOP, and the index information of the GOP comprises the starting time of the GOP.
In some possible embodiments, the index information of the GOP further includes a playing time of the frame, and when the video processing request is a video playing request, the VI frame is a VI frame with a smallest difference between the playing time in the GOP and the starting time of the GOP.
In practical applications, the functions of the acquisition unit 181 and the determination unit 182 of the present invention can be implemented by the codestream detector 1041 of fig. 9. The function of the insertion unit 183 of the present invention can be realized by the codestream modifier 1043 of fig. 9. The functions of the decoding playing unit 184 of the present invention can be implemented by the video decoding unit 106 of fig. 4. In other words, the code stream detector 1041 in the video read-write unit 104 in fig. 4 or fig. 9 may be specifically implemented by functional modules such as the obtaining unit 181 and the determining unit 182. The code stream modifier 1043 in the video read-write unit 104 may be specifically implemented by a functional module such as the insertion unit 183. The video decoding unit 106 can be implemented by functional modules such as the decoding playing unit 184.
The modules or units involved in the apparatus 18 of the embodiment of the present invention may be specifically implemented by software programs or hardware. When implemented by a software program, each module or unit involved in the apparatus 18 is a software module or a software unit, and when implemented by hardware, each module or unit involved in the apparatus 18 may be implemented by an application-specific integrated circuit (ASIC), or a Programmable Logic Device (PLD), which may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof, which is not limited in the present invention.
It should be noted that fig. 18 is only one possible implementation manner of the embodiment of the present invention, and in practical applications, the video processing apparatus may further include more or less components, which is not limited herein. For the content that is not shown or described in the embodiment of the present invention, reference may be made to the relevant explanation in the foregoing method embodiment, which is not described herein again.
Fig. 19 is a schematic structural diagram of a computing device 19 according to an embodiment of the present invention. The computing device shown in fig. 19 includes one or more processors 1901, a communication interface 1902, and a memory 1903, and the processors 1901, the communication interface 1902, and the memory 1903 may be connected by a bus, or may communicate by other means such as wireless transmission. The embodiment of the present invention is exemplified by being connected through a bus 1904, wherein the memory 1903 is used for storing instructions, and the processor 1901 is used for executing the instructions stored in the memory 1903. The memory 1903 stores program code, and the processor 1901 may call the program code stored in the memory 1903 to implement the video processing apparatus 18 as shown in fig. 18.
In practical applications, in the embodiment of the present invention, the processor 1901 may call the program code stored in the memory 1903 to execute all or part of the steps described in the embodiment of the method illustrated in fig. 14, and/or other contents described in the text, and details are not described here again.
It is to be appreciated that the processor 1901 may be comprised of one or more general-purpose processors such as a Central Processing Unit (CPU). The processor 1901 may be used to run programs of the following functional blocks in the associated program code. The functional module may specifically include, but is not limited to, any one or combination of modules of the obtaining unit 181, the determining unit 182, and the inserting unit 183 described above. That is, the processor 1901 executes the functions of any one or more of the functional blocks described above. For each functional module mentioned herein, reference may be made to the relevant explanations in the foregoing embodiments, and details are not described here.
The communication interface 1902 may be a wired interface (e.g., an ethernet interface) or a wireless interface (e.g., a cellular network interface or using a wireless local area network interface) for communicating with other modules or equipment devices. For example, the communication interface 1902 in this embodiment of the present invention can be specifically used to obtain GOPs in videos, and the like.
The Memory 1903 may include Volatile Memory (Volatile Memory), such as Random Access Memory (RAM); the Memory may also include a Non-Volatile Memory (Non-Volatile Memory), such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, HDD), or a Solid-State Drive (SSD); the memory 1903 may also include a combination of memories of the sort described above. The memory 1903 may be used for storing a set of program codes, so that the processor 1901 calls the program codes stored in the memory 1903 to implement the functions of the above-mentioned functional modules involved in the embodiments of the present invention.
It should be noted that fig. 19 is only one possible implementation manner of the embodiment of the present invention, and in practical applications, the computing device may further include more or less components, which is not limited herein. For the content that is not shown or described in the embodiment of the present invention, reference may be made to the relevant explanation in the foregoing method embodiment, which is not described herein again.
Embodiments of the present invention further provide a computer-readable storage medium, which stores instructions and implements the method flow shown in the embodiment of fig. 14 when the computer-readable storage medium runs on a computing device.
Embodiments of the present invention further provide a computer program product, where when the computer program product runs on a computing device, the method flow shown in the embodiment in fig. 14 is implemented.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware or in software executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in a Random Access Memory (RAM), a flash Memory, a Read Only Memory (ROM), an Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable ROM (EEPROM), a register, a hard disk, a removable hard disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a computing device. Of course, the processor and the storage medium may reside as discrete components in a computing device.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Claims (16)

  1. A video processing method applied to a computing device, the method comprising:
    acquiring a group of pictures (GOP) in a video, wherein the first frame of the GOP is a first I frame, the GOP comprises M frames, and M is a positive integer;
    determining whether a virtual intra-coded VI frame is included in the M frames;
    inserting a second I frame before a VI frame when the VI frame is included in the M frames;
    wherein the second I frame is a frame to which the VI frame refers at video decoding time.
  2. The method of claim 1, wherein the method further comprises:
    in response to a video playing request, determining that the starting time of a video in the video playing request is located after the second I frame in the GOP;
    decoding and playing the video starting from the second I frame.
  3. The method of claim 1 or 2, wherein the second I frame is a previous frame to the VI frame.
  4. The method of any of claims 1-3, wherein the GOP further comprises index information for the GOP, wherein the index information records a storage address of the second I-frame, and wherein the method further comprises, before inserting a second I-frame before the VI-frame:
    and acquiring the second I frame from the storage address of the second I frame according to the index information of the GOP.
  5. The method of any of claims 1-4, wherein the second I frame is used to decode the VI frame and is not used for output display.
  6. The method of any of claims 1-5, wherein said obtaining a group of pictures, GOP, in the video comprises:
    receiving a video processing request, wherein the video processing request carries the starting time of a video, and the video comprises at least one group of pictures (GOP);
    responding to the video processing request, and acquiring a group of pictures (GOP) corresponding to the starting time from a GOP index table;
    the GOP index table records at least one mapping relation, the mapping relation is that each GOP corresponds to the index information of the GOP, and the index information of the GOP comprises the starting time of the GOP.
  7. The method of claim 6, wherein the index information of the GOP further includes a play time of the frame,
    and when the video processing request is a video playing request, the VI frame is the VI frame with the minimum difference between the playing time in the GOP and the starting time of the GOP.
  8. A video processing apparatus comprising an acquisition unit, a determination unit, and an insertion unit, wherein:
    the acquisition unit is used for acquiring a group of pictures (GOP) in a video, wherein the first frame of the GOP is a first I frame, the GOP comprises M frames, and M is a positive integer;
    the determining unit is configured to determine whether a virtual intra-coded VI frame is included in the M frames;
    the inserting unit is configured to insert a second I frame before a VI frame when the VI frame is included in the M frames;
    wherein the second I frame is a frame to which the VI frame refers at video decoding time.
  9. The apparatus of claim 8, wherein the apparatus further comprises a decoding playback unit,
    the determining unit is used for responding to a video playing request, and determining that the starting time of the video in the video playing request is positioned after the second I frame in the GOP;
    and the decoding playing unit is used for decoding and playing the video from the second I frame.
  10. The apparatus of claim 8 or 9, wherein the second I frame is a previous frame to the VI frame.
  11. The apparatus according to any of claims 8-10, wherein the GOP further comprises index information of the GOP, the index information recording a storage address of the second I-frame, prior to inserting the second I-frame prior to the VI-frame,
    the obtaining unit is further configured to obtain the second I frame from the storage address of the second I frame according to the index information of the GOP.
  12. The apparatus of any of claims 8-11, wherein the second I frame is used to decode the VI frame and is not used for output display.
  13. The apparatus of any one of claims 8-12,
    the acquiring unit is specifically configured to receive a video processing request, where the video processing request carries a start time of a video, and the video includes at least one group of pictures (GOP); responding to the video processing request, and acquiring a group of pictures (GOP) corresponding to the starting time from a GOP index table;
    the GOP index table records at least one mapping relation, the mapping relation is that each GOP corresponds to the index information of the GOP, and the index information of the GOP comprises the starting time of the GOP.
  14. The apparatus of claim 13, wherein the index information of the GOP further includes a play time of the frame,
    and when the video processing request is a video playing request, the VI frame is the VI frame with the minimum difference between the playing time in the GOP and the starting time of the GOP.
  15. A computing device comprising a processor and an interface, the processor in communication with the interface, the interface to receive a GOP and send to the processor, the processor to execute the method of any of claims 1-7 by executing program instructions.
  16. A computer program product, which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 7.
CN201980086119.6A 2019-12-13 2019-12-13 Video processing method, apparatus and computer readable storage medium Active CN113261283B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/125411 WO2021114305A1 (en) 2019-12-13 2019-12-13 Video processing method and apparatus, and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113261283A true CN113261283A (en) 2021-08-13
CN113261283B CN113261283B (en) 2024-07-05

Family

ID=76328817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980086119.6A Active CN113261283B (en) 2019-12-13 2019-12-13 Video processing method, apparatus and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN113261283B (en)
WO (1) WO2021114305A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102378008A (en) * 2011-11-02 2012-03-14 深圳市融创天下科技股份有限公司 Video encoding method, video encoding device and video encoding system for shortening waiting time for playing
CN105847825A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Encoding, index storage and access methods for video encoding code stream and corresponding apparatus
CN105847790A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Code stream transmission method and device
CN106791875A (en) * 2016-11-30 2017-05-31 华为技术有限公司 Video data decoding method, coding method and relevant device
CN107124610A (en) * 2017-04-06 2017-09-01 浙江大华技术股份有限公司 A kind of method for video coding and device
US20190289322A1 (en) * 2016-11-16 2019-09-19 Gopro, Inc. Video encoding quality through the use of oncamera sensor information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08242452A (en) * 1995-03-02 1996-09-17 Matsushita Electric Ind Co Ltd Video signal compression coder
CN101127919B (en) * 2007-09-28 2010-08-04 中兴通讯股份有限公司 A video sequence coding method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102378008A (en) * 2011-11-02 2012-03-14 深圳市融创天下科技股份有限公司 Video encoding method, video encoding device and video encoding system for shortening waiting time for playing
CN105847825A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Encoding, index storage and access methods for video encoding code stream and corresponding apparatus
CN105847790A (en) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 Code stream transmission method and device
US20190289322A1 (en) * 2016-11-16 2019-09-19 Gopro, Inc. Video encoding quality through the use of oncamera sensor information
CN106791875A (en) * 2016-11-30 2017-05-31 华为技术有限公司 Video data decoding method, coding method and relevant device
CN107124610A (en) * 2017-04-06 2017-09-01 浙江大华技术股份有限公司 A kind of method for video coding and device

Also Published As

Publication number Publication date
CN113261283B (en) 2024-07-05
WO2021114305A1 (en) 2021-06-17

Similar Documents

Publication Publication Date Title
US9992555B2 (en) Signaling random access points for streaming video data
US8918533B2 (en) Video switching for streaming video data
US20220303574A1 (en) Dependent random access point pictures
CN107634930B (en) Method and device for acquiring media data
CN110784740A (en) Video processing method, device, server and readable storage medium
EP2589222B1 (en) Signaling video samples for trick mode video representations
US20230164371A1 (en) Method, device, and computer program for improving random picture access in video streaming
US11336965B2 (en) Method and apparatus for processing video bitstream, network device, and readable storage medium
US10136153B2 (en) DRAP identification and decoding
CN112653904B (en) Rapid video clipping method based on PTS and DTS modification
JP7338075B2 (en) Video blocking method, transmission method, server, adapter and storage medium
CN115134629A (en) Video transmission method, system, device and storage medium
JP2005123907A (en) Data reconstruction apparatus
CN112672163A (en) Transcoder adaptation for segment mobility
CN104980763B (en) Video code stream, video coding and decoding method and device
CN115278307B (en) Video playing method, device, equipment and medium
CN109302574B (en) Method and device for processing video stream
CN113261283B (en) Video processing method, apparatus and computer readable storage medium
CN114615549B (en) Streaming media seek method, client, storage medium and mobile device
WO2023078048A1 (en) Video bitstream encapsulation method and apparatus, video bitstream decoding method and apparatus, and video bitstream access method and apparatus
CN115866300A (en) TS (transport stream) time synchronization information insertion method, device, equipment and readable storage medium
CN117793459A (en) Video processing method, device and storage medium
CN117061813A (en) Media playback method and related media playback device
CN114760486A (en) Live broadcasting method, device, equipment and storage medium
JP2018129615A (en) Processing device and processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant