GB2549942A - Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame - Google Patents

Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame Download PDF

Info

Publication number
GB2549942A
GB2549942A GB1607587.1A GB201607587A GB2549942A GB 2549942 A GB2549942 A GB 2549942A GB 201607587 A GB201607587 A GB 201607587A GB 2549942 A GB2549942 A GB 2549942A
Authority
GB
United Kingdom
Prior art keywords
mask
frame
pixel
encoded
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1607587.1A
Other versions
GB201607587D0 (en
Inventor
M Williams John
Ono Tomohiro
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kudan Ltd
Original Assignee
Kudan Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kudan Ltd filed Critical Kudan Ltd
Priority to GB1607587.1A priority Critical patent/GB2549942A/en
Publication of GB201607587D0 publication Critical patent/GB201607587D0/en
Priority to PCT/EP2017/059437 priority patent/WO2017186580A2/en
Publication of GB2549942A publication Critical patent/GB2549942A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/29Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/21Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with binary alpha-plane coding for video objects, e.g. context-based arithmetic encoding [CAE]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/88Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention is directed to a method 100 of encoding 120 a mask of a frame, wherein the mask comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest. The method 100 comprises: segmenting 121 the mask into a first, second and third segments; setting 123 the value of the first colour component of each pixel of an encoded mask to the value of a component of a respective pixel of the first segment; setting 124 the value of the second colour component of each pixel of the encoded mask to the value of a component of a respective pixel of the second segment; and setting 125 the value of the third colour component of each pixel of the encoded mask to the value of a component of a respective pixel of the third segment. Additional inventions defined by the claims relate to a means for extracting a mask window and a means for scaling a mask, segmenting it and rearranging the segments.

Description

METHOD. APPARATUS AND COMPUTER PROGRAM PRODUCT FOR ENCODING A MASK OF A FRAME AND DECODING AN ENCODED MASK OF A FRAME
Field of the invention [0001] The invention relates to augmented reality in which video elements (e.g. elements of a live direct or indirect view of a physical, real-world environment) are augmented or supplemented by computer-generated video or graphical input. In particular, the invention is directed to a method, an apparatus and a computer program product for encoding a mask of a frame and decoding an encoded mask of a frame, e.g. for purpose of generating an alpha video (i.e. a transparent video).
Background [0002] An alpha video is a video wherein each frame includes an alpha channel (i.e. transparency channel) in addition to a green channel, a blue channel and a red channel or in addition to a greyscale channel.
[0003] Conventionally, a method of generating an alpha video comprises inputting a frame 2, see Fig. 1, to a processor. Here, the frame 2 is a color frame and thus includes a green channel 2g, a blue channel 2b and a red channel 2r. The frame 2 comprises an area of interest, here represented as a triangle, capturing objects and/or people in the foreground. The processor generates a mask 4 of the frame 2. Here, the mask 4 (the alpha channel) is coded as a grayscale mask and thus is represented by equal values of green, blue and red in the three channels. If a pixel is within the area of interest, the grayscale component is set to '255', otherwise the grayscale component is set to 'O'. By encoding the alpha channel (mask 4) in this way as a grayscale image on a second frame, the image and t33he mask 4 can be processed by standard video processing hardware.
[0004] At a rendering time, the processor sets the alpha component of each pixel of the frame 2 to the grayscale component of a respective pixel of the mask 4. If a pixel has an alpha component set to '255' that pixel is to be rendered (i.e. the pixel is opaque) whereas if a pixel has an alpha component set to '0' that pixel is not to be rendered (i.e. the pixel is transparent). In this way, the processor can extract the area of interest from the frame 2 and can overlay it over another frame (not represented).
[0005] An objective of the present invention is to compress an alpha video to reduce the amount of memory resources required to store the alpha video and the amount of communication resources required to send the alpha video.
Summary of the Invention [0006] The invention relates to a method of encoding a mask of a frame, wherein the mask comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest. The method comprises: segmenting the mask into a first, second and third segments; setting the value of the first color component of each pixel of an encoded mask to the value of a component of a respective pixel of the first segment; setting the value of the second color component of each pixel of the encoded mask to the value of a component of a respective pixel of the second segment; and setting the value of the third color component of each pixel of the encoded mask to the value of a component of a respective pixel of the third segment. In this way, the mask of the frame can be compressed at least by a factor three.
[0007] According to one aspect, the mask comprises a plurality of pixels with a grayscale component, wherein the value of the grayscale component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest. The method comprises: segmenting the mask into a first, second and third segments; setting the value of the first color component of each pixel of an encoded mask to the value of the grayscale component of a respective pixel of the first segment; setting the value of the second color component of each pixel of the encoded mask to the value of the grayscale component of a respective pixel of the second segment; and setting the value of the third color component of each pixel of the encoded mask to the value of the grayscale component of a respective pixel of the third segment.
[0008] According to another aspect, the mask comprises a plurality of pixels with first, second and third color components, wherein the values of the first, second and third color component of a pixel are all set to a first value if the pixel is within an area of interest or all set to a second value if a pixel is outside the area of interest. The method comprises: segmenting the mask into a first, second and third segments; setting the value of the first color component of each pixel of an encoded mask to the value of the first color component of a respective pixel of the first segment; setting the value of the second color component of each pixel of the encoded mask to the value of the second color component of a respective pixel of the second segment; and setting the value of the third color component of each pixel of the encoded mask to the value of the third color component of a respective pixel of the third segment.
[0009] According to one aspect, the first, second and third color components are green, blue and red components.
[0010] According to another aspect, the encoded mask and the first, second and third segments have the same dimensions.
[0011] According to one aspect, the area of interest captures at least one person or an object in the foreground of the frame.
[0012] According to another aspect, the first value is '255' and the second value is '0' or vice versa.
[0013] According to one aspect, the method further comprises reducing the dimensions of the frame and the encoded mask.
[0014] The invention further relates to a method of decoding an encoded mask of a frame, wherein the encoded mask comprises a plurality of pixels with first, second and third color components, comprising: segmenting the frame into a first, second and third segments; setting an alpha component of each pixel of the first segment of the frame to the first color component of a respective pixel of the encoded mask; setting an alpha component of each pixel of the second segment of the frame to the second color component of a respective pixel of the encoded mask; and setting an alpha component of each pixel of the third segment of the frame to the third color component of a respective pixel of the encoded mask.
[0015] According to one aspect, the first, second and third color components are green, blue and red components.
[0016] According to another aspect, the encoded mask and the first, second and third segments have the same dimensions.
[0017] According to one aspect, the first, second and third color components of each pixel of the encoded mask are set to a first value or to a second value.
[0018] According to one aspect, the first value is '255' and the second value is '0' or vice versa.
[0019] According to another aspect, the method further comprises increasing (130) the dimensions of the frame and the encoded mask.
[0020] The invention further relates to a method of encoding a mask of a frame, wherein the mask comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest. The method comprises: selecting at least one mask window overlapping with an area of interest; extracting the at least one mask window to form an encoded mask. In this way, the mask of a frame can be compressed.
[0021] According to one aspect, the method comprises: selecting a plurality of mask windows overlapping with the area of interest; extracting and rearranging (544) the plurality of mask windows to form the encoded mask.
[0022] According to another aspect, the area of interest captures at least one person or an object in the foreground of the frame.
[0023] According to one aspect, the at least one mask window is selected for the frame independently from any other frame, such that the at least one mask window varies from one frame to another.
[0024] According to another aspect, the at least one mask window is selected for the frame and at least one other frame, such that the at least one mask window is identical from one frame to another.
[0025] According to one aspect, the method further comprises: determining an area of interest for the frame; determining an area of interest for the at least one other frame; selecting the at least one mask window to be the smallest at least one mask window that encompasses all the areas of interest.
[0026] According to another aspect, the method further comprises encoding the frame, wherein encoding the frame includes: selecting at least one frame window overlapping with the area of interest; extracting the at least one frame window to form an encoded frame; and storing the encoded in association with information to locate the at least one frame window within the frame.
[0027] According to one aspect, the method further comprises: selecting a plurality of frame windows overlapping with the area of interest; extracting and rearranging the plurality of frame windows to form the encoded frame; and storing the encoded frame in association with information to locate the plurality of frame windows within the frame and within the encoded frame.
[0028] According to another aspect, each mask window has a corresponding frame window with the same shape, with the same dimensions, with the same location within the mask and within the frame and with the same location within the encoded mask and within the encoded frame.
[0029] According to one aspect, the component of each pixel of the mask is a greyscale component set to a first value or a second value.
[0030] According to another aspect, the first value is '255' and the second value is '0' or vice versa.
[0031] The invention further relates to a method of decoding an encoded mask (4*) of a frame, comprising: setting an alpha component of each pixel of an encoded frame to a component of a respective pixel of the encoded mask.
[0032] According to one aspect, the method further comprises: extracting a plurality of frame windows from an encoded frame based on location information locating the plurality of frame windows within the encoded frame; and rearranging the plurality of frame windows based on location information locating the plurality of frame windows within the encoded frame.
[0033] The invention further relates to a method of encoding a mask of a frame, wherein the mask comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest. The method comprises decreasing the dimensions of the mask by a factor N, where N is an integer greater than 1; segmenting the mask into N segments; and rearranging the N segments to form an encoded mask having one common dimension with the mask. In an example, the encoded mask and the mask have a same height and a different width. In another example, the encoded mask and the mask have a same width and a different height.
[0034] According to one aspect, the method comprises: decreasing the dimensions h x w of the mask by a factor N to have dimensions h/N x w/N; segmenting the mask into N segments having dimensions h/N x w/N2; rearranging the N segments to form an encoded mask having dimensions h x w/N2.
[0035] According to another aspect, the method comprises: decreasing the dimensions h x w of the mask by a factor N to have dimensions h/N x w/N; segmenting the mask into N segments having dimensions h/N2 x w/N; rearranging the N segments to form a encoded mask having dimensions h/N2 x w.
[0036] According to one aspect, the factor N is equal to 2.
[0037] The invention further relates to a method of decoding an encoded mask of a frame, comprising: setting an alpha component of each pixel of a frame to a component of a respective pixel of the encoded mask.
[0038] The invention further relates to an apparatus comprising means for performing the above methods.
[0039] The invention finally relates to a computer program product comprising instructions which when executed by an apparatus perform the above methods.
[0040] Other features and advantages of the invention will become apparent after review of the entire application, including the following sections: brief description of the drawings, detailed description and claims.
Brief description of the drawings [0041] The accompanying drawings illustrate exemplary aspects of the invention, and, together with the general description given above and the detailed description given below, serve to explain features of the invention. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
[0042] Fig. 1 represents a frame and a mask of a frame.
[0043] Fig. 2 represents a flow diagram of a first method of encoding a mask of a frame.
[0044] Fig. 3 to Fig. 5 represent a frame and a mask of a frame at various stages of the first method of encoding a mask of a frame.
[0045] Fig. 6 represents a flow diagram of a method of decoding an encoded mask of a frame.
[0046] Fig. 7 represents a flow diagram of a second method of encoding a mask of a frame.
[0047] Fig. 8 and Fig. 9 represent a frame and a mask of a frame at various stages of the second method of encoding a mask of a frame.
[0048] Fig. 10 represents a flow diagram of a method of decoding an encoded mask of a frame.
[0049] Fig. 11 represents a flow diagram of a third method of encoding a mask of a frame.
[0050] Fig. 12 and Fig. 13 represent a frame and a mask of a frame at various stages of the third method of encoding a mask of a frame.
[0051] Fig. 14 represents a flow diagram of a method of decoding an encoded mask of a frame.
[0052] Fig. 15 represents a flow diagram of a fourth method of encoding a mask of a frame.
[0053] Fig. 16 and Fig. 17 represent a frame and a mask of a frame at various stages of the third method of encoding a mask of a frame.
[0054] Fig. 18 represents a flow diagram of a fourth method of decoding an encoded mask of a frame.
[0055] Fig. 19 a mobile device for implementing any one of the previous methods.
Detailed Description [0056] The following detailed description describes four solutions to the problem of compressing an alpha video.
First solution [0057] Fig. 2 represents a flow diagram of a method 100 of encoding the mask 4 of the frame 2 shown in Fig. 3. The method starts at step 110 with the frame 2 and the mask 4 being input.
[0058] At step 120, the mask 4 is encoded following operations 121 to 127. At operation 121, the mask 4 is segmented into three segments 4i, 42 and 43 of the same dimensions (see Fig. 3). Here, the segments 4lf 42 and 43are vertical but alternatively the segments 4i, 42 and 43 could be horizontal or could have some other (preferably non-overlapping) configuration.
[0059] Then, an encoded mask 4* (see Fig. 4) is generated based on the segments 4i, 42 and 43. The encoded mask 4* is split into a green channel, a blue channel and a red channel. The encoded mask 4* (in total) has the same dimensions as the segments 4lf 42 and 43.
[0060] At operation 122, a pixel of the encoded mask 4* is selected. At operation 123, the green component of the selected pixel is set to the alpha component of a respective pixel of the segment 4\. At operation 124, the blue component of the selected pixel is set to the alpha component of a respective pixel of the segment 42. At operation 125, the red component of the selected pixel is set to the alpha component of a respective pixel of the segment 43. At operation 126, it is determined whether each pixel of the encoded mask 4* has been selected. If at least one pixel of the encoded mask 4* has not been selected, operations 122 to 126 are performed again for a next pixel (e.g. in a raster scan manner). Otherwise, step 120 comes to an end at operation 127.
[0061] At an optional step 130 (Fig. 5), the resolution of the frame 2 and the encoded mask 4* can be reduced such that the dimensions of both the frame 2 and the encoded mask 4* are equal to the dimensions of the original frame 2 input at step 110. In this way, the size of both the frame 2 and the encoded mask 4* is equal to the size of the original frame 2. This is useful because video encoding hardware components are designed to handle standard frame sizes and can therefore process the frame and its mask as a single frame of standard size.
[0062] In another embodiment (not represented) the mask 4 of the frame 2 input at step 110 is not a grayscale mask. Instead, the mask 4 is a color mask and includes a green channel 4g, a blue channel 4b and a red channel 4r. In this other embodiment, if a pixel is within the area of interest, the green component, blue component and red component would be set to '255', otherwise the green component, blue component and red component would be set to '0'. In this other embodiment, operations 121 to 127 are slightly modified. At operation 121, the mask 4 is still segmented into three segments 4lr 42 and 43 of the same dimensions. At operation 122, a pixel of the encoded mask 4* is selected. At operation 123, the green component of the selected pixel is set to the green component of a respective pixel of the segment 4i. At operation 124, the blue component of the selected pixel is set to the blue component of a respective pixel of the segment 42. At operation 125, the red component of the selected pixel is set to the red component of a respective pixel of the segment 43. At operation 126, it is determined whether each pixel of the encoded mask 4* has been selected. If at least one pixel of the encoded mask 4* has not been selected, operations 122 to 126 are performed again for a next pixel. Otherwise, step 120 comes to an end at operation 127.
[0063] Fig. 6 represents a flow diagram of a method 200 of decoding the encoded mask 4* obtained via the method 100. The method 200 starts at step 210 with the frame 2 and the encoded mask 4* being input.
[0064] At an optional step 220, the dimensions of the frame 2 and the encoded mask 4* can be increased such that the dimensions the frame 2 are equal to the dimensions of the original frame 2.
[0065] At step 230, the encoded mask 4* is decoded following operations 231 to 236. At operation 231, the frame 2 is segmented into three segments 23, 22 and 23 having the same dimensions as the encoded mask 4*. Here, the segments 2lr 22 and 23 are vertical segments but alternatively the segments 23, 22 and 23 could be horizontal segments. Then, the encoded mask 4* is used to generate the alpha channel of the frame 2. At operation 232, a pixel of the frame 2 is selected. At operation 233, if the pixel is selected in the first segment 2i, an alpha component of that selected pixel is set to the green component of a respective pixel of the encoded mask 4*. At operation 234, if the pixel is selected in the first segment 22, an alpha component of that selected pixel is set to the blue component of a respective pixel of the encoded mask 4*. At operation 235, if the pixel is selected in the first segment 2i, an alpha component of that selected pixel is set to the red component of a respective pixel of the encoded mask 4*. At operation 236, it is determined whether each pixel of the frame 2 has been selected. If at least one pixel of the frame 2 has not been selected, operations 232 to 236 are performed again. Otherwise, step 230 comes to an end at operation 237.
[0066] In the above description, the step 130 of redimensioning is included in order to provide a frame 2 and mask 4* that conform with standard frame sizes and can be further handled by standard video software and hardware. If it is not necessary to keep within standard frame sizes, the frame 2 and the mask 4* can be combined together without redimensioning and thus without loss of resolution. This principle applies equally to other solutions described herein.
Second solution [0067] Fig. 7 represents a flow diagram of a method 300 of encoding the mask 4 of the frame 2. The method 300 starts at step 310 with the frame 2 being input.
[0068] At step 320, the frame 2 is encoded following operations 322 to 326. At operation 322, a frame window 10 (see Fig. 8) is selected within the frame 2. The frame window 10 overlaps with the area of interest and preferably entirely encompasses the area of interest. The frame window 10 has an offset 12 within the frame 2. The offset 12 is a set of coordinates of a reference point (e.g. upper left corner) of the frame window 12 within the frame 2. It is to be understood that the frame window 10 is always smaller than the frame 2 and can be of any shape. Here the frame window 10 is a rectangular window but alternatively it could be a square window, a triangular window or the like or it could be a window of a coded defined shape (1 for square, 2 for triangle etc.). At operation 324, the frame window 10 is extracted to form an encoded frame 2* (see Fig. 9).
[0069] At step 330, the mask 4 of the frame 2 is input.
[0070] At step 340, the mask 4 is encoded following operations 342 and 344. At operation 342, a mask window 14 (see Fig. 8) is selected within the mask 4. The mask window 14 has an offset 16 within the mask 4. The offset 16 is a set of coordinates of a reference point (e.g. upper left corner) of the mask window 14 within the mask 4. Here, the frame window 10 corresponds to the mask window 14, that is they have the same shapes, the same dimensions and the same offsets 12 and 16. At operation 344, the mask window 14 is extracted to form an encoded mask 4* (see Fig. 9). In Fig. 9, the mask is shown as being duplicated across all three channels. It will be understood that the technique of Fig. 2 can be applied to split it in three and avoid repeating the same information across different channels.
[0071] The offset 16 need not be duplicated across all three channels. It is the same offset for each. It can be encoded in a selected one of the channels (e.g. in the green channel by default) or it can be split into elements (e.g. x, y and size) and the different elements can be encoded on different channels.
[0072] At step 350, the encoded frame 2*, the encoded mask 4* and location information (e.g. offset 12) to locate the frame window 10 within the frame 2 are stored.
[0073] Fig. 10 represents a flow diagram of a method 400 of decoding the encoded mask 4* obtained by the method 300. The method 400 starts at step 410 with the encoded frame 2* the encoded mask 4* and the location information (e.g. offset 12) being input.
[0074] At step 420, the encoded mask 4* is decoded following operations 421 to 424 to generate an alpha channel for the encoded frame 2*. At operation 421, a pixel is selected within the encoded frame 2*. At operation 422, the alpha component of that pixel is set to the greyscale component of a respective pixel of the encoded mask 4*. At operation 423, it is determined whether each pixel of the encoded frame 2* has been selected. If at least one pixel of the encoded frame 2* has not been selected, then operations 421 to 423 are performed again. Otherwise, step 420 comes to an end at operation 424.
[0075] It is to be understood that the area of interest may vary over time from one frame 2 to another as a result of the number, the size and the location of the objects and/or people captured in the foreground varying. For example, the frame window 10 and the mask window 14 are selected on a frame 2 basis such that the frame window 10 and the mask window 14 change from one frame 2 to another. Alternatively, the frame window 10 and the mask window 14 are selected on a plurality of frames 2 basis such that the frame window 10 and the mask window 14 are identical from one frame 2 to the next frame in a set of frames 2. For example, the area of interest is determined in each frame 2 of the set of frames 2 and the frame window 10 and the mask window 14 are selected to be the smallest frame window 10 and mask window 14 that encompasses all the areas of interest. In this way, we are sure that the area of interest will be captured in each frame 2 of the set of frames 2.
Third solution [0076] Fig. 11 represents a flow diagram of a method 500 of encoding the mask 4 of the frame 2. The method 500 starts at step 510 with the frame 2 being input.
[0077] At step 520, the frame 2 is encoded following operations 522 to 526. At operation 522, a plurality of frame windows 10 (see Fig. 8) is selected within the frame 2. Each frame window 10 overlaps partially with the area of interest. Preferably the frame windows 10 are non-overlapping with each other, i.e. define mutually exclusive portions of an area of interest, but this is not essential. Each frame window 10 has an offset 12 within the frame 2. Each offset 12 is a set of coordinates of a reference point (e.g. upper left corner) of a respective frame window 12 within the frame 2. It is to be understood that each frame window 10 is smaller than the frame 2 and can be of any shape. Here each frame window 10 is a rectangular window but alternatively it could be a square window, a triangular window or the like. Preferably the frame windows 10 all have the same shape but, as before, the shape of all the frame windows 10 or the shapes of individual frame windows 10 can be parameters to be encoded.
[0078] In the illustration, three frame windows 10 are shown, but there could be fewer or more (e.g. 16 or 64). The number of frame windows 10 can be an encoding parameter.
[0079] At operation 524, the plurality of frame windows 12 is extracted and rearranged to form an encoded frame 2* (see Fig. 13). Each frame window 12 has an offset 13 with in the encoded frame 2*. Each offset 13 is a set of coordinates of a reference point (e.g. upper left corner) of a respective frame window 12 within the encoded frame 2*.
[0080] At step 530, the mask 4 of the frame 2 is input.
[0081] At step 540, the mask 4 is encoded following operations 542 and 544. At operation 542, a plurality of mask windows 14 (see Fig. 12) is selected within the mask 4. Each mask window 14 has an offset 16 within the mask 4. Each offset 16 is a set of coordinates of a reference point (e.g. upper left corner) of a respective mask window 14 within the mask 4.
[0082] At operation 544, the plurality of mask windows 14 is extracted and rearranged to form an encoded mask 4* (see Fig. 13). Each mask window 14 has an offset 17 within the encoded mask 4*. Each offset 17 is a set of coordinates of a reference point (e.g. upper left corner) of a respective mask window 14 within the encoded mask 4*. Here, the plurality of frame windows 10 are paired with the plurality of mask windows 14, that is each frame window 10 corresponds to mask window 14, that is they have the same shapes, the same dimensions, the same offsets 12 and 16 and the same offsets 13 and 17.
[0083] Advantageously, there are three mask windows and each of these, with its respective offset, is encoded in a different one of the three channels as shown in Fig. 13. Alternatively, there may be more mask windows, but it is advantageous that the number of mask windows in a multiple of 3.
[0084] At step 550, the encoded frame 2*, the encoded mask 4* and location information (e.g. offsets 12, 13) to locate the frame windows within the frame 2 and within the encoded frame 2* are stored.
[0085] Fig. 14 represents a flow diagram of a method 600 of decoding the encoded mask 4* obtained with method 500. The method 600 starts at step 610 with the encoded frame 2* the encoded mask 4* and the location information (e.g. offsets 12, 13) being input (as well as the number of frame windows 10 if this is an encoding parameter and their shape or shapes if these are encoding parameters).
[0086] At step 620, the encoded mask 4* is decoded following operations 621 to 624 to generate an alpha channel for the encoded frame 2*. At operation 621, a pixel is selected within the encoded frame 2*. At operation 622, the alpha component of that pixel is set to the greyscale component of a respective pixel of the encoded mask 4*. At operation 623, it is determined whether each pixel of the encoded frame 2* has been selected. If at least one pixel of the encoded frame 2* has not been selected, then operations 621 to 623 are performed again. Otherwise, step 620 comes to an end at operation 624.
[0087] As per the second solution, it is to be understood that the area of interest may vary over time from one frame 2 to another as a result of the number, the size and the location of the objects and/or people captured in the foreground varying. In a first embodiment, the plurality of frame windows 10 and the plurality of mask windows 14 are selected on a frame 2 basis such that the plurality of frame windows 10 and the plurality of mask windows 14 changes from one frame 2 to another. In a second embodiment, the plurality of frame windows 10 and the plurality of mask windows 14 are selected on a plurality of frames 2 basis such that the plurality of frame windows 10 and the plurality of mask windows 14 are identical from one frame 2 to the other. For example, the area of interest is determined in each frame 2 of the set of frames 2. Then, the plurality of frame window 10 and the plurality of mask window 14 are selected to be the smallest plurality of frame window 10 and the smallest plurality of mask window 14 that encompasses all the areas of interest. In this way, we are sure that the area of interest will be captured in each frame 2 of the set of frames 2.
Fourth solution [0088] Fig. 15 represents a flow diagram of a method 700 of encoding the mask 4 of the frame 2. The method 700 starts at step 710 with the frame 2 and the mask 4 being input. Both the frame 2 and the mask 4 have same dimensions h x w, where h is an integer representing a height and w is an integer representing the width.
[0089] At step 720, the mask 4 is encoded following operations 721 to 723. At operation 721, the resolution of the mask 4 is reduced by a factor 2 such that the dimensions of the mask 4 are h/2 x w/2 (see Fig. 16). At operation 722, the mask 4 is segmented into two vertical segments 4i and 42 having dimensions h/2 x w/4.
[0090] At operation 723, the segments 4i and 42 are rearranged on top of each other to form an encoded mask 4* having dimensions h x w/4 (see Fig. 17).
[0091] At an optional step 730, the resolution of the resultant frame with its encoded mask 4* can be reduced such the dimensions of both the frame 2 and the encoded mask are equal to the dimensions of the original frame 2 input at step 710. In this way, the size of both the frame 2 and the encoded mask 4* is equal to the size of the original frame 2 and, moreover, both the frame 2 and the encoded mask 4* can be handled by standard video software and hardware.
[0092] It is to be understood that at step 720 the mask 4 could be encoded following modified operations 721 to 723. For example, at operation 721, the resolution of the mask 4 could be reduced by a factor 2 such that the dimensions of the mask 4 are h/2 x w/2 (see Fig. 16). At operation 722, the mask 4 could be segmented into two horizontal segments 4i and 42 having dimensions h/4 x w/2. At operation 723, the segments 41 and 42 could be rearranged next to each other to form an encoded mask 4* having dimensions h/4 x w.
[0093] Fig. 14 represents a flow diagram of a method 800 of decoding the encoded mask 4* obtained with method 700. The method 800 starts at step 810 with the the frame 2 and the encoded mask 4* being input.
[0094] At an optional step 820, the dimensions of the frame 2 and the encoded mask 4* can be increased such that the dimensions of the frame 2 is equal to the dimensions of the original frame 2.
[0095] At step 830, the encoded mask 4* is decoded following operations 831 to 833 to generate an alpha channel for the frame 2. At operation 831, a pixel of the frame 2 is selected. At operation 832, the alpha component of that pixel is set to the greyscale component of a corresponding pixel of the encoded mask 4*. At operation 833, it is determined whether at least one pixel of the frame 2 has not been selected. If at least one pixel of the frame has not been selected yet, then operations 831 to 833 are performed again. Otherwise, step 830 comes to an end at operation 834.
Combination of the first, second, third and fourth solutions [0096] For increased benefit, the above solutions can be combined (i.e. any two or three solutions can be applied one after another).
[0097] In particular, solution 1 can be combined with solution 2, 3 or 4. That is, the encoded mask 4* obtained via step 340, via step 540 or via step 720 can be further encoded via step 120. For example, at operation 121, the encoded mask 4* is segmented into three segments 4*i, 4*2 and 4*3 of the same dimensions. Then, an encoded mask 4** is generated based on the segments 4*i, 4*2 and 4*3. The encoded mask 4** includes a green channel, a blue channel and a red channel. The encoded mask 4** (in total) has the same dimensions as the segments 4*i, 4*2 and 4*3. At operation 122, a pixel of the encoded mask 4** is selected. At operation 123, the green component of the selected pixel is set to the greyscale component of a respective pixel of the segment 4*i. At operation 124, the blue component of the selected pixel is set to the greyscale component of a respective pixel of the segment 4*2. At operation 125, the red component of the selected pixel is set to the greyscale component of a respective pixel of the segment 4*3. At operation 126, it is determined whether each pixel of the encoded mask 4** has been selected. If at least one pixel of the encoded mask 4** has not been selected, operations 122 to 126 are performed again for a next pixel (e.g. in a raster scan manner). Otherwise, step 120 comes to an end at operation 127.
[0098] Equally, solution 4 can be combined with solution 1, 2 or 3. That is, the encoded mask 4* obtained via step 120, via step 340 or via step 540 can be further encoded via step 720. For example, at operation 721, the resolution of the mask 4* is reduced by a factor 2 such that the dimensions of the mask 4 are h/2 x w/2 (see Fig. 16). At operation 722, the mask 4** is segmented into two vertical segments 4*i and 4*2 having dimensions h/2 x w/4. At operation 723, the segments 4*i and 4*2 are rearranged on top of each other to form an encoded mask 4** having dimensions h x w/4.
Mobile device [0099] Fig. 20 shows a mobile device 1000 comprising means for implementing the steps of any one of the previous methods. The mobile device 1000 comprises a video camera 1010 for capturing a video, a processor 1030 for generating an alpha video based on the captured video, a display 1020 for rendering the alpha video and volatile and non-volatile memories 1040 for storing instructions which when executed by the processor 1030 perform the steps of any one of the previous methods.
[00100] It will be understood that above aspects have been described by way of example only, and that various changes and modifications may be made without departing from the scope of the claims.

Claims (38)

1. A method (100) of encoding (120) a mask (4) of a frame (2), wherein the mask (4) comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest, the method (100) comprising: segmenting (121) the mask (4) into a first, second and third segments (4i, 42, 43); setting (123) the value of the first color component of each pixel of an encoded mask (4*) to the value of a component of a respective pixel of the first segment (4i); setting (124) the value of the second color component of each pixel of the encoded mask (4*) to the value of a component of a respective pixel of the second segment (42) ; and setting (125) the value of the third color component of each pixel of the encoded mask (4*) to the value of a component of a respective pixel of the third segment (43) .
2. The method (100) of claim 1, wherein the mask (4) comprises a plurality of pixels with a grayscale component, wherein the value of the grayscale component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest, the method comprising: - segmenting (121) the mask (4) into a first, second and third segments (4i, 42, 43); - setting (123) the value of the first color component of each pixel of an encoded mask (4*) to the value of the grayscale component of a respective pixel of the first segment (4i); - setting (124) the value of the second color component of each pixel of the encoded mask (4*) to the value of the grayscale component of a respective pixel of the second segment (42); and - setting (125) the value of the third color component of each pixel of the encoded mask (4*) to the value of the grayscale component of a respective pixel of the third segment (43).
3. The method (100) of claim 1, wherein the mask (4) comprises a plurality of pixels with first, second and third color components, wherein the values of the first, second and third color component of a pixel are all set to a first value if the pixel is within an area of interest or all set to a second value if a pixel is outside the area of interest, the method (100) comprising: - segmenting (121) the mask (4) into a first, second and third segments (4i, 42, 43); - setting (123) the value of the first color component of each pixel of an encoded mask (4*) to the value of the first color component of a respective pixel of the first segment (4i); - setting (124) the value of the second color component of each pixel of the encoded mask (4*) to the value of the second color component of a respective pixel of the second segment (42); and - setting (125) the value of the third color component of each pixel of the encoded mask (4*) to the value of the third color component of a respective pixel of the third segment (43).
4. The method (100) of any one of claims 1 to 3, wherein the first, second and third color components are green, blue and red components.
5. The method (100) of any one of claims 1 to 4, wherein the encoded mask (4*) and the first, second and third segments (4i, 42,43) have the same dimensions.
6. The method (100) of any one of claims 1 to 5, wherein the area of interest captures at least one person or an object in the foreground of the frame (2).
7. The method (100) of any one of claims 1 to 6, wherein the first value is '255' and the second value is '0' or vice versa.
8. The method (100) of any one of claims 1 to 7, further comprising: reducing (130) the dimensions of the frame (2) and the encoded mask (4*).
9. A method (200) of decoding (230) an encoded mask (4*) of a frame (2), wherein the encoded mask (4*) comprises a plurality of pixels with first, second and third color components, comprising: segmenting the frame (2) into a first, second and third segments; - setting an alpha component of each pixel of the first segment of the frame to the first color component of a respective pixel of the encoded mask (4*); setting an alpha component of each pixel of the second segment of the frame to the second color component of a respective pixel of the encoded mask (4*); and setting an alpha component of each pixel of the third segment of the frame to the third color component of a respective pixel of the encoded mask (4*).
10. The method (200) of any one of claim 9, wherein the first, second and third color components are green, blue and red components.
11. The method (200) of any one of claims 9 to 10, wherein the encoded mask (4*) and the first, second and third segments have the same dimensions.
12. The method (200) of any one of claims 9 to 11, wherein the first, second and third color components of each pixel of the encoded mask (4*) are set to a first value or to a second value.
13. The method (200) of any one of claim 12, wherein the first value is '255' and the second value is '0' or vice versa.
14. The method (200) of any one of claims 1 to 3, further comprising: increasing (130) the dimensions of the frame (2) and the encoded mask (4*).
15. An apparatus comprising means for performing the method of any of claims 1 to 14.
16. A computer program product comprising instructions which when executed by an apparatus perform the method of any of claims 1 to 15.
17. A method (BOO; 500)) of encoding (340; 540) a mask (4) of a frame (2), wherein the mask (4) comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest, comprising: selecting (342; 542) at least one mask window (14) overlapping with an area of interest; - extracting (344; 544) the at least one mask window (14) to form an encoded mask (4*).
18. The method (500) of claim 17, comprising: - selecting (542) a plurality of mask windows (14) overlapping with the area of interest; - extracting and rearranging (544) the plurality of mask windows (14) to form the encoded mask (4*).
19. The method (300; 500) of any one of claims 17 or 18, wherein the area of interest captures at least one person or an object in the foreground of the frame (2).
20. The method (300; 500) of any of claims 17 to 19, wherein the at least one mask window (14) is selected for the frame (2) independently from any other frame (2), such that the the at least one mask window (14) varies from one frame (2) to another.
21. The method (300; 500) of any of claims 17 to 20, wherein the at least one mask window (14) is selected for the frame (2) and at least one other frame (2), such that the at least one mask window (14) is identical from one frame to another.
22. The method (300; 500) of any of claim 21, further comprising: - determining an area of interest for the frame (2); - determining an area of interest for the at least one other frame (2); - selecting the at least one mask window (14) to be the smallest at least one mask window (14) that encompasses all the areas of interest.
23. The method (300; 500) of any of claims 17 to 22, further comprising encoding (320; 520) the frame, wherein encoding (320; 520) the frame includes: selecting (322; 522) at least one frame window (10) overlapping with the area of interest; - extracting (324; 524) the at least one frame window (10) to form an encoded frame (2*); and - storing the encoded frame (2*) in association with information to locate the at least one frame window (10) within the frame (2).
24. The method (500) of claim 23, comprising: selecting (542) a plurality of frame windows (10) overlapping with the area of interest; extracting and rearranging (544) the plurality of frame windows (10) to form the encoded frame (2*); and storing the encoded frame (2*) in association with information to locate the plurality of frame windows (10) within the frame (2) and within the encoded frame (2*).
25. The method (300; 500) of any one of claims 23 to 24, wherein each mask window (14) has a corresponding frame window (10) with the same shape, with the same dimensions, with the same location within the mask (4) and within the frame (2) and with the same location within the encoded mask (4*) and within the encoded frame (2*).
26. The method (300; 500) of any one of claims 17 to 25, wherein the component of each pixel of the mask is a greyscale component set to a first value or a second value.
27. The method (BOO; 500) of any one of claims 17 to 26, wherein the first value is '255' and the second value is '0' or vice versa.
28. A method (400; 600) of decoding (420; 620) an encoded mask (4*) of a frame (2), comprising: setting (421, 422, 423; 621, 622, 623) an alpha component of each pixel of an encoded frame (2*) to a component of a respective pixel of the encoded mask (4*).
29. The method (400; 600) of claim 28, further comprising: extracting (624) a plurality of frame windows (10) from an encoded frame (2*) based on location information (12) locating the plurality of frame windows (10) within the encoded frame (2*); and rearranging (625) the plurality of frame windows (10) based on location information (13) locating the plurality of frame windows (10) within the encoded frame (2*).
30. An apparatus comprising means for performing the method of any of claims 17 to 29.
31. A computer program product comprising instructions which when executed by an apparatus perform the method of any of claims 17 to 29.
32. A method (700) of encoding (720) a mask (4) of a frame (2), wherein the mask (4) comprises a plurality of pixels with at least one component, wherein the value of each component of a pixel is set to a first value if the pixel is within an area of interest or set to a second value if a pixel is outside the area of interest, comprising: decreasing (721) the dimensions of the mask (4) by a factor N, where N is an integer greater than 1; segmenting (722) the mask (4) into N segments (4i, 42); and rearranging (723) the N segments (Alt 42) to form an encoded mask (4*) having one common dimension with the mask (4).
33. The method (700) of claim 36, comprising: decreasing (721) the dimensions h x w of the mask (4) by a factor N to have dimensions h/N x w/N; segmenting (722) the mask (4) into N segments (4i, 42) having dimensions h/N x w/N2; rearranging (723) the N segments (4i, 42) to form an encoded mask (4*) having dimensions h x w/N2.
34. The method of claim 36, comprising: - decreasing (721) the dimensions h x w of the mask (4) by a factor N to have dimensions h/N x w/N; segmenting (722) the mask into N segments (4i, 42) having dimensions h/N2 x w/N; rearranging (723) the N segments (4i, 42) to form a encoded mask having dimensions h/N2 x w.
35. The method of any of claims 36 to 38, wherein the factor N is equal to 2.
36. A method of decoding (830) an encoded mask (4*) of a frame (2), comprising: setting (831, 832, 833) an alpha component of each pixel of a frame (2) to a component of a respective pixel of the encoded mask (4*).
37. An apparatus comprising means for performing the method of any of claims 36 to 40.
38. A computer program product comprising instructions which when executed by an apparatus perform the method of any of claims 36 to 40.
GB1607587.1A 2016-04-29 2016-04-29 Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame Withdrawn GB2549942A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1607587.1A GB2549942A (en) 2016-04-29 2016-04-29 Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame
PCT/EP2017/059437 WO2017186580A2 (en) 2016-04-29 2017-04-20 Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1607587.1A GB2549942A (en) 2016-04-29 2016-04-29 Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame

Publications (2)

Publication Number Publication Date
GB201607587D0 GB201607587D0 (en) 2016-06-15
GB2549942A true GB2549942A (en) 2017-11-08

Family

ID=56234199

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1607587.1A Withdrawn GB2549942A (en) 2016-04-29 2016-04-29 Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame

Country Status (2)

Country Link
GB (1) GB2549942A (en)
WO (1) WO2017186580A2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2688910A1 (en) * 1992-03-17 1993-09-24 Thomson Csf Non-supervised process for segmenting a texture image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6307967B1 (en) * 1995-09-29 2001-10-23 Kabushiki Kaisha Toshiba Video coding and video decoding apparatus
US7627180B2 (en) * 2004-02-17 2009-12-01 Toa Corporation Image compression apparatus
US9955173B2 (en) * 2014-01-06 2018-04-24 Cisco Technology Inc. Transparency information retention

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2688910A1 (en) * 1992-03-17 1993-09-24 Thomson Csf Non-supervised process for segmenting a texture image

Also Published As

Publication number Publication date
WO2017186580A3 (en) 2017-12-07
WO2017186580A2 (en) 2017-11-02
GB201607587D0 (en) 2016-06-15

Similar Documents

Publication Publication Date Title
WO2022257759A1 (en) Image banding artifact removal method and apparatus, and device and medium
CN107346546B (en) Image processing method and device
US11431977B2 (en) Image coding method and apparatus, and image decoding method and apparatus
CN113784124B (en) Block matching encoding and decoding method for fine division using multi-shape sub-blocks
CN107424137B (en) Text enhancement method and device, computer device and readable storage medium
US10148963B2 (en) Methods of and apparatus for encoding data arrays
JP2011091510A (en) Image processing apparatus and control method therefor
CN111429357A (en) Training data determination method, video processing method, device, equipment and medium
CN111918065A (en) Information compression/decompression method and device
US20200074722A1 (en) Systems and methods for image style transfer utilizing image mask pre-processing
CN105992003B (en) Image compression method and device for numbering palette colors according to ordering or frequency
CN112954355B (en) Image frame processing method and device
KR20110105437A (en) System and method for digesting video using space variation, and record medium
JP6219648B2 (en) Image compression apparatus, image compression method, program, and recording medium
GB2549942A (en) Method, apparatus and computer program product for encoding a mask of a frame and decoding an encoded mask of a frame
CN116485657A (en) Image super-division device, image super-division method and system based on image edge detection
US20160284056A1 (en) Image processing apparatus and method
US20220138906A1 (en) Image Processing Method, Apparatus, and Device
US11582464B2 (en) Using morphological operations to process frame masks in video content
CN111355953B (en) Method and system for adding image content containing graphic objects to image frames
CN105991937A (en) Virtual exposure method and device based on Bayer format image
CN114245027A (en) Video data mixing processing method, system, electronic equipment and storage medium
US20120114042A1 (en) In loop contrast enhancement for improved motion estimation
CN111447444A (en) Image processing method and device
US11601665B2 (en) Embedding frame masks in a video stream

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)