WO2003084235A1 - Video pre-processing - Google Patents

Video pre-processing Download PDF

Info

Publication number
WO2003084235A1
WO2003084235A1 PCT/GB2003/001323 GB0301323W WO03084235A1 WO 2003084235 A1 WO2003084235 A1 WO 2003084235A1 GB 0301323 W GB0301323 W GB 0301323W WO 03084235 A1 WO03084235 A1 WO 03084235A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
pixels
pixel
background
frames
Prior art date
Application number
PCT/GB2003/001323
Other languages
French (fr)
Inventor
Othon Kamariotis
Original Assignee
British Telecommunications Public Limited Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0230332A external-priority patent/GB0230332D0/en
Application filed by British Telecommunications Public Limited Company filed Critical British Telecommunications Public Limited Company
Publication of WO2003084235A1 publication Critical patent/WO2003084235A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/507Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction using conditional replenishment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/23Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with coding of regions that are present throughout a whole video segment, e.g. sprites, background or mosaic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present invention relates to a method and apparatus for pre-processing video frames, and in particular to a method and apparatus for processing a sequence of video frames to form a single background frame which can be used with each of a plurality foreground object frames (extracted from the original sequence of complete video frames) to recreate each of the original complete video frames respectively.
  • a video sequence comprises a sequential series of complete video frames intended to be displayed successively at a predetermined rate.
  • Each frame comprises a matrix of pixels, each of which has, in the case of black and white video sequences, an associated value indicating the brightness of the pixel, or, in the case of colour video sequences, a triplet of values which together indicate both the brightness and the colour of each pixel.
  • MPEG-4 is a standard for transmitting such a video sequence from a transmitting unit to a receiving unit over a packet-switched data network, especially when employing the Internet Protocol.
  • the MPEG-4 standard is distinguished from earlier versions of the standard in that it permits a video "object" to be transmitted separately from a background against which the object moves (in the foreground). Where the background is static (ie is substantially the same in all of the frames in a particular sequence), the amount of bandwidth required to transmit the sequence can be reduced by transmitting the background information only once for the complete sequence of frames in which the background is static and transmitting only the object information for each frame on a frame-by-frame basis.
  • the amount of bandwidth can still be reduced by transmitting the complete background information only once for the complete sequence of frames wherein different frames show different parts of the complete background.
  • the background information need not be transmitted at all and the receiving unit generates (or selects) an arbitrary (or pre-stored) background for combination with the foreground objects prior to displaying a set of complete video frames.
  • the background information is separately available to the transmitting unit and the transmitting unit does not need to perform any pre-processing on the original video sequence to be transmitted in order to extract the background information.
  • Pixel based routine are distinguished by the independence of all the pixels.
  • Non-pixel based techniques employ a global updating control derived from some measurable image attribute.
  • the present inventor has realized that there are some important cases where some information about the object moving in the foreground is already available to the transmitting unit, (for example in the form of a binary mask specifying, in respect of each frame in the sequence, which pixels are background pixels and which are object pixels in that particular frame), and that methods such as those described above are inefficient ways to produce a single background frame in such cases.
  • a method of detecting camera panning occurring from one video frame to another comprising the steps of comparing an input pair of pre-processing complete video frames with an input corresponding pair of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby determine which of the pixels in each of the complete video frames represent background pixels and performing motion estimation in respect of the background pixels only, ignoring the object pixels.
  • a method of generating a single background frame for combining with a series of object frames to create a series of post-processing complete video frames each of which consists of the respective object frame superimposed onto the single background frame or a part of the single background frame comprising comparing an input series of pre-processing complete video frames with an input corresponding series of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby derive said single background video frame.
  • Each of the comparison frames may comprise a binary mask having a number of pixels each of which corresponds to a respective one of the pixels in each input preprocessing complete video frame and each of which is allocated a first or a second value (eg a zero or a one) in dependence upon whether or not the corresponding pixel in the corresponding input pre-processing complete video frame is a foreground object pixel or a background pixel.
  • the pixels may be grouped into blocks of pixels, and each block may be assigned a first or second value in dependence upon whether or not the corresponding block of pixels includes one or more object pixels or not.
  • Each block may comprise two hundred and fifty six pixels arranged in a sixteen by sixteen square arrangement, which blocks are hereinafter referred to as macroblocks.
  • the comparison step may comprise the sub-steps of: identifying a first pair of one of the input series of pre-processing complete video frames and the corresponding comparison frame; generating a reference background video frame by providing a plurality of pixels each of which corresponds to a respective one of the pixels in each of the input pre-processing complete video frames, setting each pixel corresponding to a pixel not indicated by the comparison frame of the first pair as being an object pixel to the value (for black-and-white) or triplet of values (for colour) associated with the corresponding pixel in the input pre-processing complete video frame of the first pair, and marking each such pixel as set, whilst leaving each pixel corresponding to a pixel which is indicated by the comparison frame of the first pair as being an object pixel to the default value or triplet of values and marking such pixels as unset; identifying a next pair of a complete pre-processing video frame and its corresponding comparison frame to form a current pair; identifying unset pixels in the reference background video frame which correspond to pixels indicated by the
  • apparatus for generating a single background frame for combining with a series of object frames to create a series of post-processing complete video frames each of which consists of the respective object frame superimposed onto the single background frame, the apparatus comprising means for comparing an input series of pre-processing complete video frames with an input corresponding series of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby derive said single background video frame.
  • Figure 1 is a block diagram of a system incorporating an embodiment of the present invention
  • Figure 2 is a flow chart illustrating the steps performed in accordance with a method according to the present invention
  • Figure 3 is a flow chart illustrating the steps performed in accordance with a subroutine forming one of the steps illustrated in Figure 2;
  • Figure 4 is a schematic diagram illustrating the effects of the steps illustrated in Figure 3;
  • Figure 5 is a schematic diagram showing two of the video frames and two of the mask frames illustrated in Figure 4 in more detail.
  • Figure 1 is a schematic illustration of a system comprising a mobile device 10 connected via an air interface 1 5 to a base station 20 and thence, via a computer network 30,40 to the Internet 50.
  • the mobile device 10 which includes a screen 10, keypad 1 2 and aerial 13, is operable to receive and display in real time a video originating from a computer (not shown) connected to the Internet 50, such as, for example, a video telephone call or a live streamed video broadcast.
  • the video broadcast which originates from the Internet 50, is routed firstly to a video pre-processing server computer 40 which is connected to the Internet 50 via a Local Area Network (LAN) 30.
  • the video pre-processing server computer 40 pre-processes the video broadcast to be transmitted to the mobile device 10 to account for the fact that the air interface 1 5 over which the video broadcast must travel to reach the mobile device 10 has only a limited amount of bandwidth (eg for a GPRS over GSM connection an average bandwidth of approximately 20kbps would be typical, whilst an average bandwidth of approximately 10kbps would be typical for a normal GSM connection (without GPRS)).
  • the video broadcast is transmitted over the Internet 50 and across the air interface 1 5 using the MPEG-4 standard (see ISO/IEC 14496-2:2001 (E), 1 rue de Varembe, 1 21 1 Geneva 20 Switzerland, tel. + 41 22 749 01 1 1 , fax + 41 22 734 1079, internet: [email protected]. http://www.m4if.org/).
  • the MPEG-4 standard permits background frames to be transmitted separately from foreground frames.
  • the video pre-processing server computer 40 in the present embodiment is used in cases where the background in the video broadcast is substantially static.
  • the server computer 40 extracts a single background frame 60 (see Figure 4) which, together with a foreground frame in respect of each video frame contained in the original video broadcast, can be used to reconstruct, to a certain extent at least, the original video broadcast. This reconstruction is done in the present embodiment by the mobile device 10 which includes a video decoder compliant with the MPEG-4 standard.
  • the present inventor has realised that in cases where the background is substantially static, very little notice is taken of the background by a user who is viewing a video transmission and thus there is very little perception of a lack of quality of the video transmission in cases where the background is kept exactly static because only a single background frame is employed.
  • the server computer 40 of the present embodiment is also employed in cases where a mask frame is available in respect of each frame of the original video broadcast.
  • the mask frames may be generated automatically from the original video frames using an alternative program running in the server computer 40.
  • An example of a method which could be employed by the server computer 40 to generate the mask frames in this way is described in the present Applicant's co-pending European patent application No (IPD ref A30150).
  • Alternative algorithms could however be employed, or the mask frames may have been received by the server computer together with the complete video frames as part of the received video broadcast.
  • the video pre-processing server computer 40 requires a sequence of complete video frames together with a corresponding sequence of respective mask frames. Such a sequence can be derived from the video broadcast by dividing the video broadcast up into a series of sequences each corresponding to a period of video transmission in the order of a few seconds in duration.
  • step S5 the sequence of complete video frame and corresponding mask frames are received by the server computer 40.
  • Step S10 is a subroutine, the steps of which are described below with reference to Figure 3, which processes the complete video frames and corresponding mask frames received in step S5 to produce a single background frame.
  • step S15 Upon completion of subroutine S10, flow passes to step S15 in which the single background frame extracted in subroutine S10 is transmitted from the server computer 40 to the mobile device 10. This transmission is done in the present embodiment via the LAN 30, base station 20 and air interface 1 5.
  • step S20 Upon completion of step S1 5, flow passes to step S20 in which a sequence of foreground video frames are transmitted from the server computer 40 to the mobile device 10 (also via the LAN 30, base station 20 and air interface 1 5).
  • step S20 By transmitting only information about the pixels representing the foreground object, considerable bandwidth is spared. The extra bandwidth generated in this way can then be used to enhance the quality of the video information transmitted in respect of the foreground object thus improving the perceived quality of the received video transmission at the mobile device as a whole.
  • FIG 4 schematically illustrates the overall operation of the subroutine.
  • a sequence of complete video frames V1 to Vn are compared with a corresponding sequence of mask frames M1 to Mn according to an automatic background extraction process 55 (which is the subroutine S10 whose steps are illustrated in Figure 3) to from a single background frame 60.
  • each mask frame M1 to Mn includes an opaque part 51 and a transparent part 52.
  • each frame comprises MxN pixels arranged in M columns 1 ,2, 3,...,M and N rows 1 ,2,3,...,N.
  • Each frame can therefore be represented as a two dimensional array comprising MxN members.
  • each member of the array comprises a multi-value number or triplet of multi-value numbers indicating the brightness or brightness, chroma and hue (or equivalent values such as red, green and blue values).
  • each member of the array is a binary number which can take either 0 to represent transparent pixels or 1 to represent opaque pixels. Where a pixel takes the value associated with transparency, it indicates that the corresponding pixel in the corresponding complete video frame is a pixel associated with the background. Similarly, where a pixel in a mask frame takes the value 1 corresponding to an opaque pixel, it indicates that the corresponding pixel in the corresponding complete video frame is a pixel associated with the foreground "object" (see the MPEG-4 standard).
  • step S100 a reference background frame is set by masking the first complete video frame V1 with the first mask frame M1 .
  • This essentially involves not setting any of the pixels in the reference background frame which correspond to opaque pixels in the mask frame M1 , whilst setting all of the pixels in the reference background frame which correspond to transparent pixels in the mask frame M1 to the same values as the corresponding pixels in the first complete video frame V1 .
  • step S100 Upon completion of step S100, flow passes to step S1 10 in which a reference mask frame which is exactly identical to the first mask frame M1 is created.
  • step S1 10 Upon completion of step S1 10, flow passes to step S120 in which the next video frame and mask frame are selected to form a current pair.
  • step S120 On the first occurrence of this step (ie when step S1 20 has been reached directly from step S1 10 rather than step S1 70 for which see below) the next video frame is video frame V2 and the next mask frame is M2 and these are selected as the current pair.
  • step S1 22 Upon completion of step S120, flow passes to step S1 22 in which an attempt is made to detect a scene change. This is done in the present embodiment by taking the Sum of Absolute Difference (SAD) between the new pair and the reference pair, disregarding any pixels which correspond to either object pixels in the current pair or not yet filled in pixels in the reference background frame. This therefore contains only background pixels common to both pairs (current and reference). If this is zero or very close to zero (ie below some predefined low threshold), it is assumed that there is no scene change and flow proceeds to step S130. Otherwise, flow passes to step S124.
  • SAD Sum of Absolute Difference
  • step S1 24 an attempt is made to detect panning. This is done by attempting to determine if the background is largely unchanged except for having been translated by a certain amount.
  • a well known technique used in standards such as MPEG-4 and H263 and referred to as motion estimation is employed.
  • the motion estimation is carried out in the following manner: i) select sample box - first an attempt is made to find a square area of pixels which contains (set values of) background pixels in both the reference background frame and the current video frame and which has a border which contains set background pixel values in the reference background frame.
  • the next step is to move the entire reference frame by the corresponding selected amounts and perform a SAD on all (set values of) background pixels in both the reference background frame and the current video frame (ignoring pixels where there is now no overlap between the reference and the current frames). If any of these result in a SAD below a minimum threshold value, then a match is considered to have been made such that it is determined that panning has been detected and the translation corresponding to the lowest calculated SAD is selected as the result of the panning and flow passes to step S 1 26, other wise flow passes to step S 1 28.
  • step S1 26 in the present embodiment, the reference background frame is firstly output as a background frame for all of the frames processed up to this point, and a new reference frame for the post panning video frames is generated by moving the old reference frame in the selected direction (by the selected amount) and "filling as many gaps" as is possible from the background pixels in the current video frame.
  • flow passes back to step S1 20.
  • step S128 it is assumed that there has been a change of scene such that the old reference background frame will not be valid for video frames post the scene change. Therefore, the reference background frame is firstly output as a background frame for all of the frames processed up to this point (ie before the change of scene). Then we designate the current pair of video frame and mask frame (post the change of scene, as the new first video and mask frames and re-start the process by passing flow back to step S100.
  • step S1 30 the mask frame in the current pair is compared with the reference mask frame to seek to detect any pixels which are transparent in the current mask frame (ie with value 0 in the present embodiment indicating that the corresponding pixel in the current video frame represents a background pixel) but which are set as opaque (ie with a value of 1 in the present embodiment) in the reference mask frame.
  • step S 140 it is determined whether any pixels were detected which are transparent in the current mask frame but opaque in the reference mask frame. Such pixels represent newly uncovered background pixels which were previously obscured by the foreground object image, and which have therefore not yet been set to their appropriate background values in the reference background frame. If any such pixels are detected, flow passes to step S1 50.
  • step S1 50 the pixels in the reference background frame which correspond to the newly uncovered pixels detected in step S130 are set to the values of the corresponding pixels in the current video frame.
  • step S1 50 Upon completion of step S1 50, flow passes to step S1 60 in which the pixels in the reference mask frame which correspond to the newly uncovered pixels identified in step S130 are set to the transparent value, to indicate that the corresponding pixels in the reference background frame have now been set and do not therefore need to be set again.
  • step S170 Upon completion of step S1 60, flow passes to step S170. Additionally, if it is determined in step S140 that no newly uncovered pixels were detected in step S130, then flow passes directly from step S140 to step S170 missing out steps S1 50 and S1 60.
  • step S170 it is determined whether there are both more video frames remaining in the sequence which could be processed to attempt to uncover more background pixels and one or more pixels in the reference mask frame which have not yet been set to transparent, indicating that there are more background pixels to uncover.
  • step S120 Provided that both of these conditions are met, flow returns to step S120 and a new pair of video frame and mask frame are selected for comparison in step s130, etc.
  • step S180 in which the reference background frame is set as the single background frame 60 which is the final output from the subroutine S10 as a whole.
  • step S180 the subroutine S10 comes to an end and flow passes to step S1 5.
  • the embodiment has been described with special reference to the MPEG-4 standard, the present invention is, of course, suited for use with any video transmission standard which permits video information about foreground objects to be transmitted separately from video information about the background of a particular sequence of video frames.
  • the present invention is of course applicable to any arrangement in which a transmitting unit transmits a sequence of video frames to a receiving unit using a transmission standard which permits video information about foreground objects to be transmitted separately from video information about the background of a particular sequence of video frames.
  • the background frame could be extended and additional information transmitted to indicate which part of the larger background frame should be used for each of the transmitted video frames.

Abstract

A sequence of complete video frames (V1) to (Vn) are compared with a corresponding sequence of mask frames (M1) to (Mn) according to an automatic background extraction process (55) to from a single background frame (60). Each mask frame (M1) to (Mn) includes an opaque part (51) and a transparent part (52). Each frame comprises MxN pixels arranged in (M) columns 1,2,3,…,M and (N) rows 1,2,3,… ,N. Where a pixel takes the value associated with transparency, it indicates that the corresponding pixel in the corresponding complete video frame is a visible background pixel. Similarly, where a pixel in a mask frame takes the corresponding to an opaque pixel, it indicates that the corresponding pixel in the corresponding complete video frame is a pixel associated with the foreground 'object'. All of the mask frames (M1) to (Mn) are considered in turn to identify all of the background pixels visible in at least one complete video frame of the sequence. As soon as a background pixel is visible for the first time, the value of the background pixel (ie brightness, colour, etc) at that time is stored in a reference background frame until all possible pixels within the reference background frame have been set to the corresponding values of background pixels in the complete video frames at which point the reference background frame is outputted as the single background frame (60).

Description

VIDEO PRE-PROCESSING
The present invention relates to a method and apparatus for pre-processing video frames, and in particular to a method and apparatus for processing a sequence of video frames to form a single background frame which can be used with each of a plurality foreground object frames (extracted from the original sequence of complete video frames) to recreate each of the original complete video frames respectively.
A video sequence comprises a sequential series of complete video frames intended to be displayed successively at a predetermined rate. Each frame comprises a matrix of pixels, each of which has, in the case of black and white video sequences, an associated value indicating the brightness of the pixel, or, in the case of colour video sequences, a triplet of values which together indicate both the brightness and the colour of each pixel.
MPEG-4 is a standard for transmitting such a video sequence from a transmitting unit to a receiving unit over a packet-switched data network, especially when employing the Internet Protocol. The MPEG-4 standard is distinguished from earlier versions of the standard in that it permits a video "object" to be transmitted separately from a background against which the object moves (in the foreground). Where the background is static (ie is substantially the same in all of the frames in a particular sequence), the amount of bandwidth required to transmit the sequence can be reduced by transmitting the background information only once for the complete sequence of frames in which the background is static and transmitting only the object information for each frame on a frame-by-frame basis.
Additionally, where there is a background which is larger than can be seen in a single frame, but which the camera pans around, the amount of bandwidth can still be reduced by transmitting the complete background information only once for the complete sequence of frames wherein different frames show different parts of the complete background. In some circumstances, the background information need not be transmitted at all and the receiving unit generates (or selects) an arbitrary (or pre-stored) background for combination with the foreground objects prior to displaying a set of complete video frames. Additionally, in some circumstances, the background information is separately available to the transmitting unit and the transmitting unit does not need to perform any pre-processing on the original video sequence to be transmitted in order to extract the background information.
However, in many circumstances no information is available to the transmitting unit apart from the actual original frames of the video sequence itself. Methods have been developed for processing sequences of video frames in such circumstances. In particular, a paper called "Approaches to static background identification and removal" by Shelley and Seed (Sheffield Univ., UK), IEE Colloquium on 'Image Processing for Transport Applications' (Digest No. 1 993/236), 9 Dec. 1 993, IEE pp 6/1 -4; 4631073 - Conference Paper, describes a number of such methods in which the rate of change of the values of a pixel between successive frames is used to try to determine which pixels (and which values associated with the pixels) represent background pixels and which represent foreground (object) pixels. As soon as a pixel is identified as a background pixel, the corresponding values associated with that pixel (for the particular frame used in identifying the pixel as a background pixel) are set as the values for the correspondingly positioned pixel in the single background frame being created. Pixel based routine are distinguished by the independence of all the pixels. Non-pixel based techniques employ a global updating control derived from some measurable image attribute.
However, the present inventor has realized that there are some important cases where some information about the object moving in the foreground is already available to the transmitting unit, (for example in the form of a binary mask specifying, in respect of each frame in the sequence, which pixels are background pixels and which are object pixels in that particular frame), and that methods such as those described above are inefficient ways to produce a single background frame in such cases. According to a first aspect of the present invention, there is provided a method of detecting camera panning occurring from one video frame to another, the method comprising the steps of comparing an input pair of pre-processing complete video frames with an input corresponding pair of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby determine which of the pixels in each of the complete video frames represent background pixels and performing motion estimation in respect of the background pixels only, ignoring the object pixels.
According to a second aspect of the present invention, there is provided a method of generating a single background frame for combining with a series of object frames to create a series of post-processing complete video frames each of which consists of the respective object frame superimposed onto the single background frame or a part of the single background frame, the method comprising comparing an input series of pre-processing complete video frames with an input corresponding series of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby derive said single background video frame.
Each of the comparison frames may comprise a binary mask having a number of pixels each of which corresponds to a respective one of the pixels in each input preprocessing complete video frame and each of which is allocated a first or a second value (eg a zero or a one) in dependence upon whether or not the corresponding pixel in the corresponding input pre-processing complete video frame is a foreground object pixel or a background pixel. Instead of allocating a first or second value to each pixel, the pixels may be grouped into blocks of pixels, and each block may be assigned a first or second value in dependence upon whether or not the corresponding block of pixels includes one or more object pixels or not. Each block may comprise two hundred and fifty six pixels arranged in a sixteen by sixteen square arrangement, which blocks are hereinafter referred to as macroblocks.
The comparison step may comprise the sub-steps of: identifying a first pair of one of the input series of pre-processing complete video frames and the corresponding comparison frame; generating a reference background video frame by providing a plurality of pixels each of which corresponds to a respective one of the pixels in each of the input pre-processing complete video frames, setting each pixel corresponding to a pixel not indicated by the comparison frame of the first pair as being an object pixel to the value (for black-and-white) or triplet of values (for colour) associated with the corresponding pixel in the input pre-processing complete video frame of the first pair, and marking each such pixel as set, whilst leaving each pixel corresponding to a pixel which is indicated by the comparison frame of the first pair as being an object pixel to the default value or triplet of values and marking such pixels as unset; identifying a next pair of a complete pre-processing video frame and its corresponding comparison frame to form a current pair; identifying unset pixels in the reference background video frame which correspond to pixels indicated by the comparison frame of the current pair as not corresponding to object pixels, and setting any thus identified pixels in the reference background frame to the value or values associated with the corresponding pixels in the complete pre-processing video frame of the current pair; and repeating the last two preceding steps until all of the input complete pre- processing video frames and comparison frames have been processed.
According to a third aspect of the present invention, there is provided apparatus for generating a single background frame for combining with a series of object frames to create a series of post-processing complete video frames each of which consists of the respective object frame superimposed onto the single background frame, the apparatus comprising means for comparing an input series of pre-processing complete video frames with an input corresponding series of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby derive said single background video frame. In order that the present invention may be better understood, an embodiment thereof will now be described in greater detail with reference to the accompanying drawings in which:
Figure 1 is a block diagram of a system incorporating an embodiment of the present invention;
Figure 2 is a flow chart illustrating the steps performed in accordance with a method according to the present invention;
Figure 3 is a flow chart illustrating the steps performed in accordance with a subroutine forming one of the steps illustrated in Figure 2; Figure 4 is a schematic diagram illustrating the effects of the steps illustrated in Figure 3; and
Figure 5 is a schematic diagram showing two of the video frames and two of the mask frames illustrated in Figure 4 in more detail.
Figure 1 is a schematic illustration of a system comprising a mobile device 10 connected via an air interface 1 5 to a base station 20 and thence, via a computer network 30,40 to the Internet 50. In the present embodiment, the mobile device 10, which includes a screen 10, keypad 1 2 and aerial 13, is operable to receive and display in real time a video originating from a computer (not shown) connected to the Internet 50, such as, for example, a video telephone call or a live streamed video broadcast.
In the present embodiment, the video broadcast, which originates from the Internet 50, is routed firstly to a video pre-processing server computer 40 which is connected to the Internet 50 via a Local Area Network (LAN) 30. The video pre-processing server computer 40 pre-processes the video broadcast to be transmitted to the mobile device 10 to account for the fact that the air interface 1 5 over which the video broadcast must travel to reach the mobile device 10 has only a limited amount of bandwidth (eg for a GPRS over GSM connection an average bandwidth of approximately 20kbps would be typical, whilst an average bandwidth of approximately 10kbps would be typical for a normal GSM connection (without GPRS)). In the present embodiment, the video broadcast is transmitted over the Internet 50 and across the air interface 1 5 using the MPEG-4 standard (see ISO/IEC 14496-2:2001 (E), 1 rue de Varembe, 1 21 1 Geneva 20 Switzerland, tel. + 41 22 749 01 1 1 , fax + 41 22 734 1079, internet: [email protected]. http://www.m4if.org/).
The MPEG-4 standard permits background frames to be transmitted separately from foreground frames. The video pre-processing server computer 40 in the present embodiment is used in cases where the background in the video broadcast is substantially static. The server computer 40 extracts a single background frame 60 (see Figure 4) which, together with a foreground frame in respect of each video frame contained in the original video broadcast, can be used to reconstruct, to a certain extent at least, the original video broadcast. This reconstruction is done in the present embodiment by the mobile device 10 which includes a video decoder compliant with the MPEG-4 standard. The present inventor has realised that in cases where the background is substantially static, very little notice is taken of the background by a user who is viewing a video transmission and thus there is very little perception of a lack of quality of the video transmission in cases where the background is kept exactly static because only a single background frame is employed.
The server computer 40 of the present embodiment is also employed in cases where a mask frame is available in respect of each frame of the original video broadcast. There are a number of circumstances in which such masks will be available. For example, the mask frames may be generated automatically from the original video frames using an alternative program running in the server computer 40. An example of a method which could be employed by the server computer 40 to generate the mask frames in this way is described in the present Applicant's co-pending European patent application No (IPD ref A30150). Alternative algorithms could however be employed, or the mask frames may have been received by the server computer together with the complete video frames as part of the received video broadcast.
The video pre-processing server computer 40 requires a sequence of complete video frames together with a corresponding sequence of respective mask frames. Such a sequence can be derived from the video broadcast by dividing the video broadcast up into a series of sequences each corresponding to a period of video transmission in the order of a few seconds in duration.
The steps performed by the server computer 40 are now described with reference to Figure 2. Upon commencement of the method, flow passes to step S5 in which the sequence of complete video frame and corresponding mask frames are received by the server computer 40.
Upon completion of step S5, flow passes to step S10. Step S10 is a subroutine, the steps of which are described below with reference to Figure 3, which processes the complete video frames and corresponding mask frames received in step S5 to produce a single background frame.
Upon completion of subroutine S10, flow passes to step S15 in which the single background frame extracted in subroutine S10 is transmitted from the server computer 40 to the mobile device 10. This transmission is done in the present embodiment via the LAN 30, base station 20 and air interface 1 5.
Upon completion of step S1 5, flow passes to step S20 in which a sequence of foreground video frames are transmitted from the server computer 40 to the mobile device 10 (also via the LAN 30, base station 20 and air interface 1 5). By transmitting only information about the pixels representing the foreground object, considerable bandwidth is spared. The extra bandwidth generated in this way can then be used to enhance the quality of the video information transmitted in respect of the foreground object thus improving the perceived quality of the received video transmission at the mobile device as a whole.
The detailed steps performed by the subroutine S10 are now described with reference to Figures 3, 4 and 5.
Figure 4 schematically illustrates the overall operation of the subroutine. A sequence of complete video frames V1 to Vn are compared with a corresponding sequence of mask frames M1 to Mn according to an automatic background extraction process 55 (which is the subroutine S10 whose steps are illustrated in Figure 3) to from a single background frame 60. It can be seen from Figure 4 that each mask frame M1 to Mn includes an opaque part 51 and a transparent part 52. Referring now to Figure 5, it can be seen that each frame comprises MxN pixels arranged in M columns 1 ,2, 3,...,M and N rows 1 ,2,3,...,N. Each frame can therefore be represented as a two dimensional array comprising MxN members. In the case of each video frame, each member of the array comprises a multi-value number or triplet of multi-value numbers indicating the brightness or brightness, chroma and hue (or equivalent values such as red, green and blue values). In the case of each mask frame, each member of the array is a binary number which can take either 0 to represent transparent pixels or 1 to represent opaque pixels. Where a pixel takes the value associated with transparency, it indicates that the corresponding pixel in the corresponding complete video frame is a pixel associated with the background. Similarly, where a pixel in a mask frame takes the value 1 corresponding to an opaque pixel, it indicates that the corresponding pixel in the corresponding complete video frame is a pixel associated with the foreground "object" (see the MPEG-4 standard).
Referring now to Figure 3, upon commencement of the sub-routine S10, flow passes to step S100 in which a reference background frame is set by masking the first complete video frame V1 with the first mask frame M1 . This essentially involves not setting any of the pixels in the reference background frame which correspond to opaque pixels in the mask frame M1 , whilst setting all of the pixels in the reference background frame which correspond to transparent pixels in the mask frame M1 to the same values as the corresponding pixels in the first complete video frame V1 .
Upon completion of step S100, flow passes to step S1 10 in which a reference mask frame which is exactly identical to the first mask frame M1 is created.
Upon completion of step S1 10, flow passes to step S120 in which the next video frame and mask frame are selected to form a current pair. On the first occurrence of this step (ie when step S1 20 has been reached directly from step S1 10 rather than step S1 70 for which see below) the next video frame is video frame V2 and the next mask frame is M2 and these are selected as the current pair.
Upon completion of step S120, flow passes to step S1 22 in which an attempt is made to detect a scene change. This is done in the present embodiment by taking the Sum of Absolute Difference (SAD) between the new pair and the reference pair, disregarding any pixels which correspond to either object pixels in the current pair or not yet filled in pixels in the reference background frame. This therefore contains only background pixels common to both pairs (current and reference). If this is zero or very close to zero (ie below some predefined low threshold), it is assumed that there is no scene change and flow proceeds to step S130. Otherwise, flow passes to step S124.
In step S1 24, an attempt is made to detect panning. This is done by attempting to determine if the background is largely unchanged except for having been translated by a certain amount. In order to do that, a well known technique used in standards such as MPEG-4 and H263 and referred to as motion estimation is employed. In the present embodiment, the motion estimation is carried out in the following manner: i) select sample box - first an attempt is made to find a square area of pixels which contains (set values of) background pixels in both the reference background frame and the current video frame and which has a border which contains set background pixel values in the reference background frame. Initially an attempt is made to find such a square of size (1 6) by (1 6) with a border of 16 pixels width (to permit the sample area of 1 6 by 1 6 pixels to be tried in positions up to 1 6 pixels translated in any direction away from the original position. If nor such square can be found, the search is carried out with progressively smaller sample areas and borders until one is found. ii) once a sample box has been found, a SAD is calculated for all possible translated positions of the pixels in the current video frame superimposed onto the reference background frame. The position giving the smallest SAD is selected, or, if there is more than one minimum SAD value, all such positions are selected. iii) the next step is to move the entire reference frame by the corresponding selected amounts and perform a SAD on all (set values of) background pixels in both the reference background frame and the current video frame (ignoring pixels where there is now no overlap between the reference and the current frames). If any of these result in a SAD below a minimum threshold value, then a match is considered to have been made such that it is determined that panning has been detected and the translation corresponding to the lowest calculated SAD is selected as the result of the panning and flow passes to step S 1 26, other wise flow passes to step S 1 28.
In step S1 26, in the present embodiment, the reference background frame is firstly output as a background frame for all of the frames processed up to this point, and a new reference frame for the post panning video frames is generated by moving the old reference frame in the selected direction (by the selected amount) and "filling as many gaps" as is possible from the background pixels in the current video frame. Upon completion of step S1 26, flow passes back to step S1 20.
In step S1 28, it is assumed that there has been a change of scene such that the old reference background frame will not be valid for video frames post the scene change. Therefore, the reference background frame is firstly output as a background frame for all of the frames processed up to this point (ie before the change of scene). Then we designate the current pair of video frame and mask frame (post the change of scene, as the new first video and mask frames and re-start the process by passing flow back to step S100.
If at step S1 22 no scene change is detected, flow passes to step S1 30 in which the mask frame in the current pair is compared with the reference mask frame to seek to detect any pixels which are transparent in the current mask frame (ie with value 0 in the present embodiment indicating that the corresponding pixel in the current video frame represents a background pixel) but which are set as opaque (ie with a value of 1 in the present embodiment) in the reference mask frame.
Upon completion of step S130, flow passes to step S 140 in which it is determined whether any pixels were detected which are transparent in the current mask frame but opaque in the reference mask frame. Such pixels represent newly uncovered background pixels which were previously obscured by the foreground object image, and which have therefore not yet been set to their appropriate background values in the reference background frame. If any such pixels are detected, flow passes to step S1 50.
In step S1 50, the pixels in the reference background frame which correspond to the newly uncovered pixels detected in step S130 are set to the values of the corresponding pixels in the current video frame.
Upon completion of step S1 50, flow passes to step S1 60 in which the pixels in the reference mask frame which correspond to the newly uncovered pixels identified in step S130 are set to the transparent value, to indicate that the corresponding pixels in the reference background frame have now been set and do not therefore need to be set again.
Upon completion of step S1 60, flow passes to step S170. Additionally, if it is determined in step S140 that no newly uncovered pixels were detected in step S130, then flow passes directly from step S140 to step S170 missing out steps S1 50 and S1 60.
In step S170 it is determined whether there are both more video frames remaining in the sequence which could be processed to attempt to uncover more background pixels and one or more pixels in the reference mask frame which have not yet been set to transparent, indicating that there are more background pixels to uncover.
Provided that both of these conditions are met, flow returns to step S120 and a new pair of video frame and mask frame are selected for comparison in step s130, etc.
If one of the above conditions is not met, flow passes to step S180 in which the reference background frame is set as the single background frame 60 which is the final output from the subroutine S10 as a whole. Upon completion of step S180, the subroutine S10 comes to an end and flow passes to step S1 5. although the embodiment has been described with special reference to the MPEG-4 standard, the present invention is, of course, suited for use with any video transmission standard which permits video information about foreground objects to be transmitted separately from video information about the background of a particular sequence of video frames.
Although the embodiment described above referred to a particular networked arrangement of devices, the present invention is of course applicable to any arrangement in which a transmitting unit transmits a sequence of video frames to a receiving unit using a transmission standard which permits video information about foreground objects to be transmitted separately from video information about the background of a particular sequence of video frames.
As an alternative to the described method of generating old and new reference background frames when there is a camera panning detected, the background frame could be extended and additional information transmitted to indicate which part of the larger background frame should be used for each of the transmitted video frames.

Claims

1 . A method of detecting camera panning occurring from one video frame to another, the method comprising the steps of comparing an input pair of pre- processing complete video frames with an input corresponding pair of comparison frames each of which indicates which of the pixels, in the corresponding input preprocessing complete video frame, represent object pixels, to thereby determine which of the pixels in each of the complete video frames represent background pixels and performing motion estimation in respect of the background pixels only, ignoring the object pixels.
2. A method of generating a single background frame for combining with a series of foreground object frames to create a series of post-processing complete video frames each of which consists of the respective foreground object frame superimposed onto the single background frame, the method comprising comparing an input series of pre-processing complete video frames with an input corresponding series of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby derive. said single background video frame.
3. A method according to claim 2 wherein each of the comparison frames comprises a binary mask having a number of pixels each of which corresponds to a respective one of the pixels in each input pre-processing complete video frame and each of which is allocated a first or a second value in dependence upon whether or not the corresponding pixel in the corresponding input pre-processing complete video frame is a foreground object pixel or a background pixel.
4. A method according to any one of claims 2 or 3 wherein the comparing step comprises the substeps of: identifying a first pair of one of the input series of pre-processing complete video frames and the corresponding comparison frame; generating a reference background video frame by providing a plurality of pixels each of which corresponds to a respective one of the pixels in each of the input pre-processing complete video frames, setting each pixel corresponding to a pixel not indicated by the comparison frame of the first pair as being an object pixel to the value (for black-and-white) or triplet of values (for colour) associated with the corresponding pixel in the input pre-processing complete video frame of the first pair and marking each such pixel as set, whilst leaving each pixel corresponding to a pixel which is indicated by the comparison frame of the first pair as being an object pixel to the default value or triplet of values and marking such pixels as unset; identifying a next pair of a complete pre-processing video frame and its corresponding comparison frame to form a current pair; identifying unset pixels in the reference background video frame which correspond to pixels indicated by the comparison frame of the current pair as not corresponding to object pixels, and setting any thus identified pixels in the reference background frame to the value or values associated with the corresponding pixels in the complete pre-processing video frame of the current pair; and repeating the last two preceding steps until all of the input complete preprocessing video frames and comparison frames have been processed or there are no further unset pixels in the reference background frame.
5. A method as claimed in claim 4 wherein the marking of pixels as set or unset in the reference background frame is performed by generating a reference mask frame having a corresponding pixel for each pixel in the reference background frame and setting each pixel to one of two possible values depending on whether the corresponding pixel in the reference background frame is set or not.
6. A carrier medium carrying processor implementable instructions for causing a digital processor to carry out the method of any of claims 1 to 5 during implementation of the instructions.
7. A method of transmitting a sequence of video frame from a transmitting unit to a receiving unit including generating a single background frame from the sequence of video frames according to the method of any one of claims 2 to 5 and transmitting the single background frame to the receiving unit.
8. A method of any of claims 2 to 5 or 7 including the step of detecting camera panning as claimed in claim 1 and generating a first background frame in respect of video frames prior to the camera panning action, and generating a new background frame for video frames post the camera panning which new background frame incorporates some of the pixel values from the first background frame.
9. Apparatus for generating a single background frame for combining with a series of object frames to create a series of post-processing complete video frames each of which consists of the respective object frame superimposed onto the single background frame, the apparatus comprising means for comparing an input series of pre-processing complete video frames with an input corresponding series of comparison frames each of which indicates which of the pixels, in the corresponding input pre-processing complete video frame, represent object pixels, to thereby derive said single background video frame.
10. Apparatus according to claim 9 wherein each of the comparison frames comprises a binary mask having a number of pixels each of which corresponds to a respective one of the pixels in each input pre-processing complete video frame and each of which is allocated a first or a second value in dependence upon whether or not the corresponding pixel in the corresponding input pre-processing complete video frame is a foreground object pixel or a background pixel.
1 1 . Apparatus according to either claim 9 or claim 10 wherein the comparing means comprises: first identifying means for identifying a first pair of one of the input series of pre-processing complete video frames and the corresponding comparison frame; generating means for generating a reference background video frame by providing a plurality of pixels each of which corresponds to a respective one of the pixels in each of the input pre-processing complete video frames, setting each pixel corresponding to a pixel not indicated by the comparison frame of the first pair as being an object pixel to the value (for black-and-white) or triplet of values (for colour) associated with the corresponding pixel in the input pre-processing complete video frame of the first pair and marking each such pixel as set, whilst leaving each pixel corresponding to a pixel which is indicated by the comparison frame of the first pair as being an object pixel to the default value or triplet of values and marking such pixels as unset; second identifying means for identifying a next pair of a complete pre- processing video frame and its corresponding comparison frame to form a current pair; third identifying means for identifying unset pixels in the reference background video frame which correspond to pixels indicated by the comparison frame of the current pair as not corresponding to object pixels, and setting any thus identified pixels in the reference background frame to the value or values associated with the corresponding pixels in the complete pre-processing video frame of the current pair; and decision means for causing the' second and third identifying means to repeatedly identify subsequent pairs of video and comparison frames and to identify and appropriately set previously unset pixels in the reference background frame until all of the input complete pre-processing video frames and comparison frames have been processed or there are no further unset pixels in the reference background frame.
1 2. Apparatus for detecting camera panning occurring from one video frame to another, the apparatus comprising comparison means for comparing an input pair of pre-processing complete video frames with an input corresponding pair of comparison frames each of which indicates which of the pixels, in the corresponding input preprocessing complete video frame, represent object pixels, to thereby determine which of the pixels in each of the complete video frames represent background pixels and motion estimation means for performing motion estimation in respect of the background pixels only, ignoring the object pixels.
PCT/GB2003/001323 2002-03-28 2003-03-27 Video pre-processing WO2003084235A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP02252314.6 2002-03-28
EP02252314 2002-03-28
GB0230332.9 2002-12-31
GB0230332A GB0230332D0 (en) 2002-12-31 2002-12-31 Video pre-processing

Publications (1)

Publication Number Publication Date
WO2003084235A1 true WO2003084235A1 (en) 2003-10-09

Family

ID=28676396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/001323 WO2003084235A1 (en) 2002-03-28 2003-03-27 Video pre-processing

Country Status (1)

Country Link
WO (1) WO2003084235A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009031751A1 (en) * 2007-09-05 2009-03-12 Electronics And Telecommunications Research Institute Video object extraction apparatus and method
EP2327212A2 (en) * 2008-09-11 2011-06-01 Google Inc. System and method for video encoding using constructed reference frame
US8665952B1 (en) 2010-09-15 2014-03-04 Google Inc. Apparatus and method for decoding video encoded using a temporal filter
US9014266B1 (en) 2012-06-05 2015-04-21 Google Inc. Decimated sliding windows for multi-reference prediction in video coding
US9154799B2 (en) 2011-04-07 2015-10-06 Google Inc. Encoding and decoding motion via image segmentation
US9392280B1 (en) 2011-04-07 2016-07-12 Google Inc. Apparatus and method for using an alternate reference frame to decode a video frame
US9426459B2 (en) 2012-04-23 2016-08-23 Google Inc. Managing multi-reference picture buffers and identifiers to facilitate video data coding
EP3094090A1 (en) * 2015-01-16 2016-11-16 Hangzhou Hikvision Digital Technology Co., Ltd. Systems, devices and methods for video encoding and decoding
US9609341B1 (en) 2012-04-23 2017-03-28 Google Inc. Video data encoding and decoding using reference picture lists
US9756331B1 (en) 2013-06-17 2017-09-05 Google Inc. Advance coded reference prediction
US10616576B2 (en) 2003-05-12 2020-04-07 Google Llc Error recovery using alternate reference frame
CN111526417A (en) * 2020-04-20 2020-08-11 北京英迈琪科技有限公司 Video image transmission method and transmission system
WO2023221636A1 (en) * 2022-05-19 2023-11-23 腾讯科技(深圳)有限公司 Video processing method and apparatus, and device, storage medium and program product

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19827835A1 (en) * 1998-06-23 1999-12-30 Bosch Gmbh Robert Image transmission method e.g. for video surveillance system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19827835A1 (en) * 1998-06-23 1999-12-30 Bosch Gmbh Robert Image transmission method e.g. for video surveillance system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CAVALLARO A ET AL: "VIDEO OBJECT EXTRACTION BASED ON ADAPTIVE BACKGROUND AND STATISTICAL CHANGE DETECTION", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 4310, 24 January 2001 (2001-01-24), pages 465 - 475, XP001061316 *
DUFAUX F ET AL: "Background mosaicking for low bit rate video coding", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) LAUSANNE, SEPT. 16 - 19, 1996, NEW YORK, IEEE, US, vol. 1, 16 September 1996 (1996-09-16), pages 673 - 676, XP010202155, ISBN: 0-7803-3259-8 *
GUSE W ET AL: "EFFECTIVE EXPLOITATION OF BACKGROUND MEMORY FOR CODING OF MOVING VIDEO USING OBJECT MASK GENERATION", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 1360, 1 October 1990 (1990-10-01), pages 512 - 523, XP000374164 *
KUI ZHANG ET AL: "Using background memory for efficient video coding", IMAGE PROCESSING, 1998. ICIP 98. PROCEEDINGS. 1998 INTERNATIONAL CONFERENCE ON CHICAGO, IL, USA 4-7 OCT. 1998, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 4 October 1998 (1998-10-04), pages 944 - 947, XP010308944, ISBN: 0-8186-8821-1 *
MECH R ET AL: "A noise robust method for 2D shape estimation of moving objects in video sequences considering a moving camera", SIGNAL PROCESSING. EUROPEAN JOURNAL DEVOTED TO THE METHODS AND APPLICATIONS OF SIGNAL PROCESSING, ELSEVIER SCIENCE PUBLISHERS B.V. AMSTERDAM, NL, vol. 66, no. 2, 30 April 1998 (1998-04-30), pages 203 - 217, XP004129641, ISSN: 0165-1684 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10616576B2 (en) 2003-05-12 2020-04-07 Google Llc Error recovery using alternate reference frame
WO2009031751A1 (en) * 2007-09-05 2009-03-12 Electronics And Telecommunications Research Institute Video object extraction apparatus and method
US9374596B2 (en) 2008-09-11 2016-06-21 Google Inc. System and method for video encoding using constructed reference frame
EP2327212A2 (en) * 2008-09-11 2011-06-01 Google Inc. System and method for video encoding using constructed reference frame
US8385404B2 (en) 2008-09-11 2013-02-26 Google Inc. System and method for video encoding using constructed reference frame
US11375240B2 (en) 2008-09-11 2022-06-28 Google Llc Video coding using constructed reference frames
EP2327212A4 (en) * 2008-09-11 2012-11-28 Google Inc System and method for video encoding using constructed reference frame
JP2012502590A (en) * 2008-09-11 2012-01-26 グーグル インコーポレイテッド Video coding system and method using configuration reference frame
US8665952B1 (en) 2010-09-15 2014-03-04 Google Inc. Apparatus and method for decoding video encoded using a temporal filter
US9154799B2 (en) 2011-04-07 2015-10-06 Google Inc. Encoding and decoding motion via image segmentation
US9392280B1 (en) 2011-04-07 2016-07-12 Google Inc. Apparatus and method for using an alternate reference frame to decode a video frame
US9426459B2 (en) 2012-04-23 2016-08-23 Google Inc. Managing multi-reference picture buffers and identifiers to facilitate video data coding
US9609341B1 (en) 2012-04-23 2017-03-28 Google Inc. Video data encoding and decoding using reference picture lists
US9014266B1 (en) 2012-06-05 2015-04-21 Google Inc. Decimated sliding windows for multi-reference prediction in video coding
US9756331B1 (en) 2013-06-17 2017-09-05 Google Inc. Advance coded reference prediction
EP3094090A1 (en) * 2015-01-16 2016-11-16 Hangzhou Hikvision Digital Technology Co., Ltd. Systems, devices and methods for video encoding and decoding
US10567796B2 (en) 2015-01-16 2020-02-18 Hangzhou Hikvision Digital Technology Co., Ltd. Systems, devices and methods for video encoding and decoding
CN111526417A (en) * 2020-04-20 2020-08-11 北京英迈琪科技有限公司 Video image transmission method and transmission system
WO2023221636A1 (en) * 2022-05-19 2023-11-23 腾讯科技(深圳)有限公司 Video processing method and apparatus, and device, storage medium and program product

Similar Documents

Publication Publication Date Title
US8020180B2 (en) Digital video signature apparatus and methods for use with video program identification systems
US8488868B2 (en) Generation of a depth map from a monoscopic color image for rendering stereoscopic still and video images
CN106067974B (en) For handling the method and apparatus of the video flowing in video camera
RU2504010C2 (en) Method and device for filling occluded areas of depth or disparity map estimated from two images
US7340094B2 (en) Image segmentation by means of temporal parallax difference induction
CN108933935B (en) Detection method and device of video communication system, storage medium and computer equipment
CN109862389B (en) Video processing method, device, server and storage medium
WO2003084235A1 (en) Video pre-processing
JP2008282416A (en) Method and apparatus for segmenting image prior to coding
KR100301113B1 (en) How to segment video objects by contour tracking
CA2486164A1 (en) Video pre-processing
US7268834B2 (en) Method and apparatus for combining video signals to one comprehensive video signal
US5940140A (en) Backing luminance non-uniformity compensation in real-time compositing systems
CN112788329A (en) Video static frame detection method and device, television and storage medium
EP2833637A1 (en) Method for processing a current image of an image sequence, and corresponding computer program and processing device
Köppel et al. Filling disocclusions in extrapolated virtual views using hybrid texture synthesis
CN115941920A (en) Naked eye 3D video generation method, device, equipment and storage medium
CN109389674B (en) Data processing method and device, MEC server and storage medium
KR101473648B1 (en) Method and system for real-time chroma-key image synthesis without background screen
CN112637573A (en) Multi-lens switching display method and system, intelligent terminal and storage medium
CN111179317A (en) Interactive teaching system and method
US20230267620A1 (en) Computing platform using machine learning for foreground mask estimation
CN115686402A (en) Image display system and method, and non-volatile storage medium
CN117635788A (en) Image rendering method and device, electronic equipment and storage medium
Lin et al. 2D-to-3D Video Conversion: Techniques and Applications in 3D Video Communications

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase