WO2023123512A1 - 滤波系数生成及滤波方法、视频编解码方法、装置和*** - Google Patents

滤波系数生成及滤波方法、视频编解码方法、装置和*** Download PDF

Info

Publication number
WO2023123512A1
WO2023123512A1 PCT/CN2021/144056 CN2021144056W WO2023123512A1 WO 2023123512 A1 WO2023123512 A1 WO 2023123512A1 CN 2021144056 W CN2021144056 W CN 2021144056W WO 2023123512 A1 WO2023123512 A1 WO 2023123512A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video image
category
filter
neighborhood
Prior art date
Application number
PCT/CN2021/144056
Other languages
English (en)
French (fr)
Inventor
元辉
邢金睿
王璐
王婷婷
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202180104218.XA priority Critical patent/CN118235392A/zh
Priority to PCT/CN2021/144056 priority patent/WO2023123512A1/zh
Publication of WO2023123512A1 publication Critical patent/WO2023123512A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing

Definitions

  • Embodiments of the present disclosure relate to, but are not limited to, video technologies, and more specifically, relate to a filter coefficient generation and filtering method, a video encoding and decoding method, device, and system.
  • Image filtering is an important operation in image processing to suppress the noise of the target image while preserving the details of the image as much as possible.
  • Neighborhood filtering is a commonly used filtering method. Neighborhood filtering is calculated based on the pixel value of each pixel in the image and the pixel values of its neighboring pixels, and the result is used as the new pixel value of the pixel. Neighborhood filtering includes Wiener filtering, Gaussian filtering, mean filtering, etc. The effect of neighborhood filtering needs to be enhanced. Take Wiener filtering of images as an example.
  • the filter order K is constant, for large-scale images and images with severe local changes, the quality enhancement effect obtained by filtering with the same set of coefficients is not good; if blindly Increasing the size of K will only slightly improve the filtering effect, and at the same time will increase the code stream size and time complexity to a certain extent, and even make the overall performance worse.
  • An embodiment of the present disclosure provides a video decoding method, including:
  • Neighborhood filtering is performed on the first video image according to the filtering parameters.
  • An embodiment of the present disclosure also provides a method for generating filter coefficients, including:
  • Corresponding filter coefficients are respectively generated for some or all of the multiple categories.
  • An embodiment of the present disclosure also provides a video filtering method, wherein:
  • pixels of each category with corresponding filter coefficients in the first video image are filtered using the filter coefficients corresponding to the category.
  • An embodiment of the present disclosure also provides a video encoding method, including:
  • the pixels in the first video image are divided into multiple categories, and corresponding filter coefficients are respectively generated for some or all of the multiple categories, one
  • the category corresponds to a set of filter coefficients
  • An embodiment of the present disclosure also provides a code stream, wherein the code stream is an encoded video code stream, the code stream includes encoded filter parameters, the filter parameters include filter coefficients, and the filter coefficients It is used to perform neighborhood filtering on the first video image.
  • An embodiment of the present disclosure also provides a video decoding device, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the video decoding described in any embodiment of the present disclosure is implemented. method.
  • An embodiment of the present disclosure also provides a video decoding device, including a geometric frame reconstruction module and a texture conversion module, which also includes:
  • the Wiener filter module is configured to receive the reconstructed geometric video image output by the geometric frame reconstruction module, and the filter parameters obtained by parsing from the code stream, execute the video decoding method as described in any embodiment of the present disclosure, and output the filtered The reconstructed geometric video image to the texture conversion module.
  • An embodiment of the present disclosure also provides a video encoding device, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the video encoding described in any embodiment of the present disclosure is implemented. method.
  • An embodiment of the present disclosure also provides a video encoding device, which is applied to a video-based point cloud compression system, including a texture frame generation module, and a geometric frame generation module, a geometric frame filling module, and a geometric frame video compression module connected in sequence, Among them, also include:
  • the Wiener filter module is configured to receive the reconstructed geometric video image output by the geometric frame video compression module, and the original geometric video image output by the geometric frame generation module or the geometric frame filling module, and execute the method described in the embodiment of the present disclosure A video encoding method, outputting the filtered reconstructed geometric video image to the texture frame generation module.
  • An embodiment of the present disclosure further provides a video encoding and decoding system, which includes the video encoding device according to any embodiment of the present disclosure and the video decoding device according to any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a video filtering device, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the video filtering described in any embodiment of the present disclosure is implemented. method.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized.
  • the method described in the example is the same as the computer program.
  • FIG. 1 is a flowchart of a method for generating filter coefficients according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a video filtering method according to an embodiment of the present disclosure
  • Figure 3A, Figure 3B and Figure 3C are schematic diagrams of several neighborhoods of a pixel
  • Figure 4A, Figure 4B and Figure 4C are several examples of the pixel values of a pixel and its neighboring pixels
  • FIG. 5 is a schematic diagram of a rhombus window used in an embodiment of the present disclosure.
  • FIG. 6 is a flow chart of the subclass merging process in the method for generating filter coefficients according to an embodiment of the present disclosure
  • Fig. 7 is the frame diagram of V-PCC encoding end
  • Fig. 8 is a frame diagram of the V-PCC decoder
  • FIG. 9 is a flowchart of a video encoding method according to an embodiment of the present disclosure.
  • Fig. 10 is a schematic diagram of an exemplary video geometry frame according to an embodiment of the present disclosure.
  • FIG. 11 is a block diagram of a way of adding a Wiener filter module at the V-PCC encoding end according to an embodiment of the present disclosure
  • Fig. 12 is a module diagram of another way of adding a Wiener filter module at the V-PCC encoding end according to an embodiment of the present disclosure
  • FIG. 13 is a schematic diagram of adding a Wiener filtering unit to a video encoding device according to an embodiment of the present disclosure
  • Fig. 14 is a flowchart of a video decoding method according to an embodiment of the present disclosure.
  • FIG. 15 is a schematic diagram of adding a Wiener filter module at the V-PCC decoding end according to an embodiment of the present disclosure
  • FIG. 16 is a schematic diagram of adding a Wiener filtering unit to a video decoding device according to an embodiment of the present disclosure
  • Figure 17 is a schematic diagram of a part of the test results of each sequence on CTC_C2;
  • Figure 18 is a schematic diagram of another part of the test results of each sequence on CTC_C2;
  • Fig. 19, Fig. 20 and Fig. 21 are schematic diagrams of comparing the original point cloud, the reconstructed point cloud and the quality-enhanced point cloud according to an embodiment of the present disclosure
  • FIG. 22 is a hardware architecture diagram of a video encoding device according to an embodiment of the present disclosure.
  • words such as “exemplary” or “for example” are used to mean an example, illustration or illustration. Any embodiment described in this disclosure as “exemplary” or “for example” should not be construed as preferred or advantageous over other embodiments.
  • "And/or” in this article is a description of the relationship between associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.
  • “A plurality” means two or more than two.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not necessarily limit the difference.
  • the current neighborhood filtering algorithm uses the same set of coefficients to filter all the pixels in the video image, without considering the neighborhood differences of the pixels in the video image, and the quality enhancement effect is not good.
  • the encoding end and decoding end of the video codec system filter separately, and the decoding end does not use the filtering parameters of the encoding end, which affects the filtering effect.
  • an embodiment of the present disclosure provides a method for generating filter coefficients, as shown in FIG. 1 , including:
  • Step 110 dividing the pixels in the first video image into multiple categories according to the neighborhood difference of the pixels in the first video image
  • Step 120 generating corresponding filter coefficients for some or all of the categories.
  • different types of pixel points may use different filter coefficients, so the neighborhood filtering in the present disclosure does not include neighborhood filtering algorithms with fixed coefficients such as mean filtering and median filtering.
  • An embodiment of the present disclosure also provides a video filtering method, as shown in FIG. 2 , including:
  • Step 210 acquiring filter coefficients generated by the filter coefficient generation method described in any embodiment of the present disclosure and corresponding to some or all of the multiple categories;
  • the filter coefficients obtained here may be generated locally or transmitted from outside.
  • the video encoding end generates the filter coefficients and encodes them for transmission, and the video decoding end acquires the filter coefficients through analysis and uses them.
  • Step 220 Perform neighborhood filtering on the first video image, wherein, for each category of pixels in the first video image that has corresponding filter coefficients, use the filter coefficients corresponding to the category to perform filtering.
  • a set of weighting coefficients used in the weighted average adopts a set of filter coefficients corresponding to the category.
  • the embodiment of the present disclosure classifies the pixels in the video image according to the neighborhood difference of the pixels, generates corresponding filter coefficients for different categories and uses them as the filter coefficients of the pixels of the category during neighborhood filtering, which can be neighborhood Pixels with different differences adaptively generate appropriate filter coefficients, thereby improving the effect of video image neighborhood filtering.
  • the adaptive neighborhood filtering method based on neighborhood differences proposed by the embodiments of the present disclosure can enhance image quality through adaptive neighborhood filtering.
  • the first video image in the present disclosure includes but is not limited to a video frame, and the first video image can also be a smaller video unit such as a slice or a slice segment in a video frame, or a larger video A unit, such as a sequence of video frames.
  • the processing of video frames in the following embodiments of the present disclosure is also applicable to the processing of other video images.
  • the pixel value of a pixel point may be the value of any one of the three components (also referred to as three channels) of the color.
  • Y represents the brightness (Luminance or Luma) component
  • the value of the brightness component is usually called the gray value
  • U and "V” represent the chroma (Chrominance or Chroma) Components
  • chrominance components can be stored as Cb (also written as Chroma Cb) and Cr (also written as Chroma Cr), where Cb is the blue chrominance component and Cr is the red chrominance component.
  • the pixel value of the pixel point can be the value of the luminance component, the blue chroma component or the red chrominance component; in the RGB format image, the pixel value of the pixel point can be the value of red, green or blue, etc.
  • the pixel value of the pixel point refers to grayscale value.
  • the two chrominance components can be filtered separately, then when the blue chrominance component is filtered, the pixel value of the above-mentioned pixel point refers to the value of the blue chrominance component, and when the red chrominance component is filtered, The above-mentioned pixel value of the pixel point refers to the value of the red chrominance component.
  • performing neighborhood filtering on a video image is to use k filter coefficients used by the pixel as weighting coefficients for each pixel in the video image, and k filter coefficients in the window centered on the pixel
  • the pixel value of the pixel point is weighted and averaged, and the result is used as the new pixel value of the pixel point, where k is the filter order.
  • This is basically the same as traditional neighborhood filtering, except that traditional neighborhood filtering uses the same filter coefficients for all pixels.
  • the pixel points of each category with corresponding filter coefficients use the filter coefficients corresponding to the category, and the filter coefficients corresponding to different categories are generated separately, and usually they are not the same. There may also be some pixels in the video image, and the category to which they belong does not have corresponding filter coefficients.
  • the "window” in this article refers to the window used in the neighborhood filter scan, and can also be called a template, a template window, a convolution kernel, a box, and the like.
  • the scan operation may also be referred to as convolution.
  • one example of this embodiment uses a diamond-shaped filter window, as shown in FIG. 5 . Compared with the original rectangular box filter, this example changes the shape of the window, which can better adapt to boundary changes and extract neighborhood information more efficiently.
  • the present disclosure may also use filters of other orders such as 9th order, 16th order, 36th order, etc. to perform neighborhood filtering.
  • the neighborhood difference of a pixel in the first video image is based on the absolute value of the difference between the pixel value of the pixel and the pixel value of each pixel in the neighborhood of the pixel Statistics are obtained, and the statistics are summation, mean value or maximum value.
  • the neighborhood of the pixel refers to the eight neighborhoods, four neighborhoods or diagonal neighborhoods of the pixel.
  • FIG. 3A shows the eight neighborhoods of pixel A, that is, the area with hatching, and the eight neighborhoods include eight pixels in a circle around pixel A.
  • FIG. 3B shows the four neighborhoods of pixel A, that is, the area with cross-hatching. The four neighborhoods include 4 pixels located at the top, bottom, left and right among the 8 pixels around pixel A.
  • FIG. 3C shows the diagonal neighborhood of pixel A, that is, the area with cross-hatching, and the diagonal neighborhood includes pixels with 8 pixels around pixel A located at the four corners. But the present disclosure is not limited thereto.
  • the neighborhood of a pixel is not limited to the above-mentioned eight neighborhoods, four neighborhoods and diagonal neighborhoods, for example, it may also include 24 pixels in two circles around the pixel. point.
  • the neighborhood of a pixel is eight neighborhoods as an example, the neighborhood difference of a pixel in the first video image is based on the pixel value of the pixel and the pixel value of each pixel in the neighborhood of the pixel The absolute values of the differences are summed.
  • the pixel value of pixel A is 2
  • the pixel points in the eight neighborhoods are 3, 1, 3, 2, 4, 2, 2, and 4 respectively
  • the neighborhood difference is diff. Then there are:
  • the pixel value of pixel A is 1, and the pixel points in the eight neighborhoods are 1, 1, 1, 2, 1, 2, 2, 1 respectively, and the neighborhood difference is diff. Then there are:
  • the pixel value of pixel A is 5, the pixel points in the eight neighborhoods are 4, 7, 8, 8, 6, 5, 2, and 3 respectively, and the neighborhood difference is diff. Then there are:
  • the statistics can also be the mean value, that is, the sum is divided by the number of pixels in the neighborhood; or the statistics can also be the maximum value, that is, the multiple absolute values obtained the maximum value in .
  • the statistical value can be used to directly represent the neighborhood difference.
  • a neighborhood difference of a pixel in the first video image is determined according to a difference between pixel values of the pixel and pixels in the neighborhood of the pixel.
  • the neighborhood difference of pixel A is obtained from the difference in the overall pixel values of the 3 ⁇ 3 area, that is, considering the differences of 2, 3, 1, 3, 2, 4, 2, 2, 4 , the difference can be represented by such as range, mean square error, etc.
  • the neighborhood difference in the last embodiment is obtained according to the absolute value statistics of the difference between the pixel value of the pixel point and the pixel value of each pixel point in the pixel point neighborhood, and the pixel-by-pixel variation is more accurate. Sensitive, it can capture the drastic changes in the gray value at the edge of the object, more accurately classify the pixels at the edge into the same category, generate appropriate filter coefficients, and improve the filtering effect.
  • the filter coefficient corresponding to each category in the part or all categories is set as the filter coefficient used by the pixel points of the category when performing neighborhood filtering on the first video image;
  • Performing neighborhood filtering on the first video image includes: for each category with corresponding filter coefficients, using a window to scan each pixel of the category, weighted average of the pixel values of all pixels in the window, and The pixel value of the category pixel located at the center of the window is updated as the result of weighted average, a set of weighting coefficients used in the weighted average is obtained from the filter coefficients corresponding to the category, and the window is rectangular or diamond-shaped.
  • the operation of performing neighborhood filtering on pixels of a class described here is also applicable to the following process of merging multiple subclasses into one class, and performing neighborhood filtering on pixels of a merged class.
  • Wiener filter The process of using filter coefficients for filtering and the process of generating filter coefficients will be described below using the Wiener filter.
  • the embodiment of the present disclosure takes Wiener filtering as an example, the present disclosure can also be used in other neighborhood filtering algorithms that can directly calculate the optimal coefficients based on the lossy signal and the original signal, and other neighborhood filtering algorithms in which the filter coefficients can be changed .
  • the Wiener filter is a linear filter proposed by the mathematician Norbert Wiener. It is a method of filtering a signal mixed with noise by using the correlation characteristics and spectral characteristics of a stationary random process. Under certain constraints, the square of the difference between its output and a given function (called the expected signal) reaches the minimum, and through mathematical operations, it can finally become a problem of solving the Toblitz equation.
  • the error between the output signal and the desired signal can be calculated:
  • the Wiener filter uses the Minimum Mean Squared Error (MMSE) as the objective function, so the objective function is:
  • MMSE Minimum Mean Squared Error
  • the derivative of the objective function to the filter coefficient should be 0, namely:
  • Rxx is the autocorrelation matrix of the signal to be filtered (that is, the input signal containing noise)
  • Rxd is the cross-correlation matrix of the signal to be filtered and the desired signal.
  • Finding the Wiener filter coefficients requires the signal to be filtered and the desired signal.
  • the two can correspond to the lossy image (also called distorted image) and the original image (also called real image) respectively.
  • the algorithm can calculate k coefficients according to the pixel value of each pixel in the lossy image and the original image, as the optimal coefficient of the Wiener filter, and use this coefficient to dimension the lossy image.
  • Nanofiltering can obtain a restored image that is close to the original image in terms of mean square error, and also has better subjective effects.
  • the order of the filter is k.
  • the total number of pixels in the video image is n
  • the matrix P(n,k) is an n ⁇ k matrix
  • the n rows correspond to n pixels in the image.
  • the pixel will be used as The sequence of k pixel values in the central window is called the filter correlation vector of the pixel
  • the matrix P(n,k) is called the filter correlation matrix of the video image in this paper
  • the filter correlation vector of n pixels in the video image composition The vector S(n) represents n pixel values of n pixel points in the original image, which is called the original pixel value vector.
  • H(k) is a group of filter coefficients of the k-order filter, also called Wiener coefficients, including k filter coefficients in total.
  • the neighborhood filtering is Wiener filtering
  • the first video image includes a lossy video image
  • the filter correlation matrix is composed of the filter correlation vectors of all pixels of the category in the lossy video image, and the pixel values of the pixels of the category in the corresponding positions in the corresponding original video image are used to form the original pixel value of the category vector, and multiply the filter correlation matrix of this category by the cross-correlation matrix of the original pixel value vector of this category by the inverse matrix of the autocorrelation matrix of the filter correlation matrix of this category to obtain the filter coefficient corresponding to this category.
  • the i-th category cross-correlation matrix B i (k) P(n i ,k) T ⁇ S(n i );
  • the i-th category autocorrelation matrix A i (k,k) P(n i ,k) T ⁇ P(n i ,k);
  • the filter coefficient H i (k) corresponding to the i-th category A i (k,k) ⁇ 1 ⁇ B i (k).
  • the filter correlation vector of a pixel refers to the vector composed of k pixel values in the window centered on the pixel point
  • P(n i , k) is the i-th category filter correlation matrix
  • n i is the i-th category pixel
  • S(n i ) is the original pixel value vector of the i-th category pixel point.
  • the algorithm when Wiener filtering is performed on the first video image, the algorithm is basically the same as that of traditional Wiener filtering. For each pixel in the category of the corresponding filter coefficient in the video image, the pixel values of k pixels in the window centered on the pixel are also weighted and averaged, and the result is used as the new pixel value of the pixel, just The filter coefficient corresponding to this category is used in the weighted average, and the same filter coefficient is no longer used for all pixels.
  • the embodiment of the present disclosure also allows a part of pixels to be classified into a category without corresponding filter coefficients, and the original pixel values are directly retained when Wiener filtering is performed on the first video image without participating in the calculation.
  • no matter before or after merging whether neighborhood filtering is performed on a category of pixels or a category of filter coefficients is generated, for a pixel in the first video image, the The k pixel values in the window centered on the pixel point do not change, that is, the filter correlation vector of the pixel point remains unchanged, which are the k pixels covered by the window centered on the pixel point in the first video image value.
  • the dividing the pixels in the first video image into multiple categories according to the neighborhood difference of the pixels in the first video image includes: dividing the value of the neighborhood difference The range is divided into multiple value intervals, the value interval to which the neighborhood difference of each pixel in the first video image belongs is determined, and the pixel is classified into a category corresponding to the value interval.
  • the corresponding relationship between the category and the value range of the neighborhood difference is set.
  • the value range of the neighborhood difference is [0,100]
  • the set number of categories is 3
  • the value range corresponding to the first category is [0,3]
  • the value range corresponding to the second category is The value interval is [4,10]
  • the value interval corresponding to the third category is [11,100].
  • This method classifies the pixels in the video frame according to the neighborhood differences of the pixels. Under this classification, the number of categories and the corresponding relationship between categories and value intervals can be set, or can be learned through machine learning. Obtained, the effect of filtering can still be improved.
  • the filter coefficients corresponding to each category can be calculated by an algorithm, or set empirically, or select the optimal set from the multiple sets of filter coefficients set by the gain of the image quality before and after filtering.
  • the correspondence between categories and value ranges is not preset, but can be dynamically selected.
  • the pixel points in the first video frame are divided into multiple categories according to the neighborhood difference of the pixel points in the first video frame, including:
  • Dividing the value range of the neighborhood difference into a plurality of value intervals determining the value interval to which the value of the neighborhood difference of each pixel in the first video image belongs, and dividing the pixel into the value interval corresponding to A subcategory of , the number of said subcategories is greater than the number of said categories;
  • the relationship between the value range of the difference between the sub-category and the neighborhood is fixed, and the sub-category is merged into a category through dynamic selection, so that the adaptive adjustment of the value range of the difference between the category and the neighborhood can be realized. achieve the best filtering effect.
  • the multiple ways of merging subcategories into categories are traversed, and the pixels of the multiple subcategories are divided into the multiple categories according to the optimal combination method, include:
  • the first round of multiple merging is performed on the plurality of subclasses, each time merging part or all of the subclasses into the first class in a different manner, and the second class is merged based on the filter coefficients generated for the first class. Neighborhood filtering and gain calculations are performed on pixels of a category, and the pixels of the merged sub-category with the largest gain and greater than or equal to the corresponding gain threshold are classified into the first category, and the records are merged into the first category. Multiple subcategories;
  • the merged round i is less than the set maximum number of rounds, and the number of unmerged subclasses is greater than 1, perform the i+1th round once for the unmerged multiple subclasses or merge multiple times, each time in a different way to merge some or all of the subclasses into the i+1th category, and based on the filter coefficients generated for the i+1th category, the Neighborhood filtering is performed on the pixels and the gain is calculated, and the pixels of the merged subclass with the largest gain and greater than or equal to the corresponding gain threshold are classified into the i+1th category, and the records are merged into the i+1th category multiple subcategories.
  • the entire merging process can be ended if the following conditions are met:
  • the current round if the gains of all merging methods are less than the corresponding gain threshold, the current round will not merge and end the entire merging process;
  • the pixels of all the unmerged subclasses are classified into a category without corresponding filter coefficients.
  • the pixels of this category The point does not participate in the filtering operation, and the pixel value does not need to be updated.
  • subclasses are combined in various ways, including:
  • subclasses are combined in various ways, including:
  • the constraint conditions include one or more of the following:
  • Condition 2 In each round of merging, first traverse the possible merging methods between the unmerged subclass at the front of the queue and other unmerged subclasses. If the merging fails, then traverse other possible merging methods or end the entire merging process;
  • the queue refers to a queue in which the multiple subcategories are arranged in ascending order of values in the corresponding value range.
  • Constrained traversal can exploit regularities found during experiments to improve computational efficiency.
  • the gain generated by performing neighborhood filtering on pixels of this category can be represented by the enhancement of the filtered image quality of pixels of this category relative to the image quality before filtering, for example, the difference between the filtered image quality and the pre-filtered image quality
  • the difference value representation of the image quality can be regarded as a sub-image by the collection of all pixels of the category, and the PSNR, structural similarity (SSIM) or average structural similarity (SSIM) of the sub-image can be used Mean Structural Similarity, referred to as MSSIM) and other quality parameters.
  • the gain calculated for each combination is a weighted gain, and the weight is equal to the ratio of the total number of pixels in all subclasses combined this time to the total number of pixels in the first video frame, This can better reflect the influence of the local gain on the entire video frame, so as to better achieve the purpose of improving the overall image quality of the video frame.
  • the gain thresholds set in different rounds may be the same or different.
  • the number of sub-classes can be preset, and the number of classes may vary with video frame data, but the maximum number of rounds for merging attempts for sub-classes is configurable. In this example, the number of subclasses is greater than or equal to 8 and less than or equal to 20, and the maximum number of rounds is equal to 1 or 2 or 3 or 4.
  • the filter coefficients corresponding to a category are a set of weighting coefficients used in the weighted average.
  • the set of weighting coefficients includes 2N coefficients in a symmetric matrix, and the filter coefficients corresponding to one category include N coefficients on one side of the main diagonal in the symmetric matrix, where N ⁇ 1.
  • Step 310 divide the pixel points into a plurality of subclasses according to the value range of the neighborhood difference
  • the correspondence between the subclasses and the value intervals of neighborhood differences can be fixed, or can be dynamically calculated according to an agreed rule. At this time, the correspondence calculated by different video frames may be different. However, the encoding end and the decoding end can obtain the same corresponding relationship according to the same rules.
  • Step 320 judging whether the maximum round of merging has been reached? If yes, end, if no, execute step 330;
  • the number of rounds in this embodiment is set to 3, but it can also be set to 1 or 2 or a value greater than 3.
  • Step 330 traversing the possible merging modes of the unmerged subclasses, and calculating the filter coefficient and the filtering gain of the class obtained by each merging;
  • Step 340 judging whether there is any gain greater than the corresponding gain threshold among the calculated gains? If yes, execute step 350, if no, end;
  • the gain threshold can be set to 0, or some positive value. Gain thresholds for different rounds can be the same or different. If no merging method that brings the expected gain can be found, there is no need to merge and the entire merging process can be ended.
  • Step 350 recording the combination method with the largest gain and the obtained filter coefficient
  • the merge method can be represented by the indices of the first and last subclasses. For example, there are 12 subcategories in total, the value interval corresponding to the 0th subcategory is 0, the value interval corresponding to the first subcategory is 1, ..., the value interval corresponding to the 10th subcategory is 10, The value range corresponding to the 11th subcategory is all values above 11.
  • the merging method can use the index of the first subclass among the 6 merged subclasses "0" and index "5" of the last subclass represent, record as 0,5.
  • the recorded filter coefficients can be used to locally perform neighborhood filtering on the first video frame, or not to locally perform neighborhood filtering on the first video frame, for example, the original video frame can be used at the encoding end to generate an optimal filter for the reconstructed video frame Coefficients, the optimal filter coefficients are sent to the decoding end for the decoding end to enhance the quality of the reconstructed video frame.
  • Step 360 remove the serial number of the subclass to be merged from the queue, and determine whether there is any subclass that can be merged? If yes, go to step 320 to record the combination method with the largest gain and the filter coefficient obtained; if no, end.
  • the pixels of this subclass are not filtered. Or it can be judged based on the number of pixels of the sub-category, when the number is greater than the threshold, it is judged that there is still a sub-category that can be merged, when the number is less than the threshold, it is judged that there is no sub-category that can be merged, and so on.
  • the first video For the lossy first video image in the video image sequence, after the pixel points of the plurality of subclasses are divided into the plurality of categories by means of subclass merging for one of the first video images, the first video For the first video image after the image, the pixel points of the multiple sub-categories are classified into the multiple categories by adopting the same sub-category merging method as the first video image; or
  • adjacent video images in the sequence have relatively large similarity in content, and by sharing the small class merging method of the previous video image with the latter video image, the traversal and merging method of the latter video image can be saved. Compare time and improve efficiency. Less impact on quality. However, if each video image is independently merged into small categories, better quality enhancement effect will usually be obtained after filtering.
  • the filtering coefficient generation method and the corresponding video filtering method of the above-mentioned embodiments of the present disclosure divide the pixel points into multiple categories according to the neighborhood difference of the pixel points, and the pixel points of each category use the filter coefficients generated for the category to perform neighborhood filtering , fully considering the influence of pixel neighborhood differences on neighborhood filtering, can significantly improve the filtering effect.
  • the method for generating filter coefficients and the corresponding video filtering method in the embodiments of the present disclosure can be used in various video coding and decoding systems to filter reconstructed video frames and improve image quality.
  • V-PCC codec framework is shown in Figure 7 and Figure 8.
  • Fig. 7 shows the structure of the video coding device of V-PCC coding end, and this video coding device can realize following video coding process:
  • the 3D patch (block) generation module (3D patch generation) 11 After the 3D patch (block) generation module (3D patch generation) 11 generates a 3D patch based on the input point cloud frame, it is respectively output to the patch packaging module (patch packing) 13, the geometry frame generation module (Geometry image generation) 19, and the texture frame Generation module (Attribute image generation) 15, patch sequence compression module (Patch sequence compression) 27 and smoothing module (Smoothing) 17.
  • Patch encapsulation module 13 encapsulates patch, generates occupancy map (Occupancy map) and outputs to geometry frame generation module 19 and first video compression (Video compression) module 21, and first video compression module 21 compresses occupancy map, output via Compressed occupancy sub stream (occupancy sub stream) to multiplexer (Multiplexer) 33, and output reconstructed occupancy map (Reconstructed occupancy map) to geometry frame filling module (image padding) 23, texture frame filling module (image padding) 25. Texture frame generating module 15 and smoothing module 17.
  • the geometric frame generation module 19 generates a geometric frame according to the input occupancy map, 3D patch and point cloud frame and outputs it to the geometric frame filling module 23, and the geometric frame filling module 23 outputs the filled geometric frame ( Padded geometry) to the second video compression module 31, the second video compression module 31 outputs the compressed geometry sub stream (geometry sub stream) to the multiplexer 33, and outputs the reconstructed geometry frame (Reconstructed geometry image) to the smoothing module 17.
  • the smoothing module 17 performs smoothing processing on the reconstructed geometric frame according to the reconstructed occupancy map and the 3D patch, and outputs the smoothed reconstructed geometric frame to the texture frame generation module 15.
  • the texture frame generating module 15 generates a texture frame according to the input smoothed reconstructed geometry frame, 3D patch, reconstructed occupancy map and point cloud frame and outputs it to the texture frame filling module 25, and the texture frame filling module 25 outputs the filled texture frame to The third video compression module 29 .
  • the third video compression module 29 outputs the compressed texture sub-stream to the multiplexer 33 .
  • the Patch sequence compression module 27 outputs the compressed Patch sub-stream to the multiplexer 33 .
  • the multiplexer 33 outputs a compressed code stream (Compression bitstream) after multiplexing the input Patch substream, texture substream, geometry substream and occupied substream.
  • FIG. 8 shows the structure of a video decoding device at the V-PCC decoding end.
  • the video decoding device can realize the following video decoding processing:
  • the compressed code stream is demultiplexed by a demultiplexer (Demultiplexer) 41 to output a sequence parameter set (Sequence Paramater Set, SPS for short), Patch sub-stream, texture sub-stream, geometry sub-stream and occupied sub-stream.
  • a sequence parameter set Sequence Paramater Set, SPS for short
  • SPS parsing 43 outputs grammatical element to Patch sequence decompression module (Patch sequence decompression) 45, the first video decompression module (Video decompression) 47, the second video decompression module ( Video decompression) 49, third video decompression module (Video decompression) 51, geometry and texture reconstruction module (Geometry/Attribute Reconstruction) 53, geometry post-processing module (Geometry Post-Processing (e.g.smoothing)) 55 and texture conversion and smoothing Module (Attribute transfer&smoothing) 57.
  • SPS parsing 43 outputs grammatical element to Patch sequence decompression module (Patch sequence decompression) 45, the first video decompression module (Video decompression) 47, the second video decompression module ( Video decompression) 49, third video decompression module (Video decompression) 51, geometry and texture reconstruction module (Geometry/Attribute Reconstruction) 53, geometry post-processing module (Geo
  • the Patch sequence decompression module 45 decompresses the Patch substream of the input according to the syntax element, and outputs the patch information (patch information) to the geometry and texture reconstruction module 53.
  • the first video decompression module 47 according to the syntax element Decompress the input occupancy substream, and output the occupancy map to the geometry and texture reconstruction module 53.
  • the second video decompression module 49 decompresses the input geometry substream according to the syntax elements, and outputs geometry frames to the geometry and texture reconstruction module 53.
  • the third video decompression module 51 decompresses the input texture substream according to the syntax elements, and outputs the texture frame to the geometry and texture reconstruction module 53.
  • the geometry and texture reconstruction module 53 is based on the input syntax elements, patch information, occupancy map, geometry frame and the texture frame to obtain the reconstruction geometry frame and the reconstruction texture frame, the reconstruction geometry frame is output to the geometry post-processing module 55, and the reconstruction texture frame is output to the texture conversion and smoothing module 57.
  • the geometry post-processing module 55 performs the reconstruction on the reconstruction geometry frame according to the syntax element After the smoothing process, it is output to the texture conversion and smoothing module 57.
  • the texture conversion and smoothing module 57 performs texture conversion and smoothing according to the input syntax elements, texture frames and smoothed reconstructed geometric frames, and outputs smoothed reconstructed point cloud frames.
  • V-PCC there are multiple smoothing processes on the video frame (video) obtained after lossy compression, such as smoothing on the reconstructed geometric frame and smoothing on the reconstructed texture frame at the encoding end (not shown in the figure) output); another example is the smoothing of the reconstructed geometric frame and the smoothing of the reconstructed point cloud at the decoding end.
  • V-PCC may be very close to non-adjacent blocks (patches) in the image in three-dimensional space, causing the patch-based video encoder to confuse adjacent pixels and make the reconstruction false. If there is a problem in the shadow, the color smoothing algorithm can be used to find the pixel corresponding to the patch boundary point in the reconstructed frame by using the occupancy map, and then use the median filter for smoothing.
  • V-PCC can use the boundary filtering algorithm to find the pixels corresponding to the boundary points of the patch to solve the problem that the reconstruction point cloud patch may be discontinuous, and locally change the depth value of the edge of the patch.
  • 3D points can be added to make the boundary continuous for the patch discontinuity problem.
  • the reconstructed point cloud can be gridded, and the center of the grid can be used to perform trilinear filtering on the edge points of the patch to improve the visual effect of the point cloud.
  • the V-PCC codec framework After the basic reconstruction of the decoded video frame, the V-PCC codec framework only performs image smoothing operations, and the quality of the reconstructed point cloud still has a large room for improvement. In order to further improve compression performance, it is beneficial to obtain quality-enhanced point clouds on the basis of little change in bitstream size.
  • Wiener filtering on an image can enhance the quality of the image, but as mentioned above, the Wiener filtering of the image has some shortcomings.
  • Image the effect of image quality enhancement obtained by filtering with the same set of coefficients is not good; if the size of K is increased blindly, the filtering effect will only be slightly improved, and at the same time, the size of the code stream and time complexity will be increased to a certain extent, and even making the overall performance worse.
  • Wiener filtering which divides the filter into 25 categories, calculates the gradient direction and degree of change of each 4 ⁇ 4 small block in the image, and determines Filter class for this block, computes coefficients and performs filtering.
  • this method is difficult to apply to codec quality enhancement, because it does not specifically address the characteristics of closely arranged V-PCC image frame patches and large neighborhood changes. At the same time, this method needs to transmit more data and has a large code stream overhead.
  • the embodiments of the present disclosure propose a video encoding method and a video decoding method that can be applied to various video encoding and decoding frameworks based on a video filtering method in a neighborhood difference domain.
  • An embodiment of the present disclosure proposes a video encoding method, as shown in FIG. 9 , including:
  • Step 410 divide the pixels in the first video image into multiple categories, and generate corresponding filter coefficients for some or all of the multiple categories , one category corresponds to a set of filter coefficients;
  • Step 420 Encode and send the filter parameters, or encode and send filter parameters that meet the sending conditions among the filter coefficients, where the filter parameters include the filter coefficients and category information.
  • the filter parameters are encoded and sent.
  • the filter coefficient at the encoding end can be used to improve the filtering effect at the decoding end and reduce the computation load at the decoding end.
  • the original video image at the encoding end can be used to calculate the optimal filter coefficient, so that the decoding end can also use the optimal filter coefficient to enhance the quality of the video image.
  • the filter coefficients generated by the encoding end may be sent directly or selectively sent after being judged first. Because sending filter parameters will increase the code stream and affect the coding efficiency.
  • the encoding and sending the filter parameters that meet the sending conditions among the filter coefficients includes: grouping and judging whether the generated filter coefficients meet the sending conditions, and grouping each group that meets the sending conditions
  • the filter coefficients and a category of information corresponding to the set of filter coefficients are encoded and sent;
  • the sending conditions include any one or more of the following conditions:
  • the rate-distortion becomes smaller and smaller than the rate-distortion when the neighborhood filtering is not performed
  • the amount is greater than the corresponding rate-distortion gain threshold.
  • the gain threshold used in the above sending condition may be the same as or different from the gain threshold used in the merging process.
  • the gain threshold used in the sending condition may be higher than the gain threshold used in the combining process.
  • Rate-distortion (distortion-rate) is used to measure the relationship between image distortion and encoding bit rate.
  • the image distortion can be measured by the PSNR between the original image and the reconstructed image, which can be the PSNR of the luminance component, or a linear combination of the PSNR of the luminance component and the PSNR of the chrominance component.
  • the encoding rate indicates the amount of overall encoding data that needs to be transmitted based on the selected encoding parameters, quantization parameters, and prediction modes.
  • the rate-distortion can be calculated through the cost function, and the smaller the rate-distortion, the higher the encoding efficiency of the encoder.
  • the aforementioned rate-distortion gain threshold may be set to 0, or a value greater than 0.
  • the reduced amount is the difference obtained by subtracting the rate-distortion when the neighborhood filtering is performed from the rate-distortion when the neighborhood filtering is not performed.
  • the encoding of each group of filter coefficients that meet the sending conditions and a category of information corresponding to the group of filter coefficients includes: each group of filter coefficients and the group of filter coefficients A flag is added before the information of a corresponding category to indicate whether there is category information and filter coefficients.
  • the information of at least one category is represented by index information of a plurality of subcategories merged into the category, and each subcategory corresponds to an agreed value range of neighborhood differences. Examples have been given above and will not be repeated here.
  • the filter coefficients in this embodiment correspond to categories, it is necessary to transmit the corresponding category information while transmitting the filter coefficients. Because the sending conditions are not met or the number of categories to be merged changes, the number of groups of filter coefficients actually transmitted is also variable, and the filter coefficients of some groups or all groups may be empty, so one can be set for each group of filter coefficients Flag bits to indicate whether there is class information and filter coefficients.
  • the encoding end and the decoding end can agree on a maximum number of groups that can be transmitted, for example, it can be set to the aforementioned maximum number of rounds.
  • the video encoding method is applied to an encoding end of a video-based point cloud compression system.
  • the input point cloud of each frame will be divided into patches before encoding, and the patches are closely arranged and mapped to two single-channel images of the far layer and the near layer, that is, the generated geometry Video frame (Geometry video); and the texture information of each patch will be mapped to two three-channel images, that is, generate texture video frame (Attribute video); the difference between the two geometry/texture images in each frame is very small.
  • the occupancy map is used to represent the occupancy of useful pixels in the video.
  • the generated signal includes reconstructed geometry video frame (lossy) and original geometry video frame (lossless).
  • the compressed code stream output by the encoding end includes a geometric code stream written with encoding information.
  • the reconstructed geometric video frame can be filtered using the neighborhood difference-based adaptive Wiener filtering method of the embodiment of the present disclosure, so as to improve the quality of the reconstructed geometric video frame.
  • the first video frame includes a lossy reconstructed geometric video frame
  • the video coding method further includes: performing neighborhood filtering on the reconstructed geometric video frame, and using the filtered reconstructed geometric video frame to generate a corresponding texture video frame; wherein, the reconstructed When neighborhood filtering is performed on the geometric video frame, pixels of each category with corresponding filter coefficients in the reconstructed geometric video frame are filtered using a set of filter coefficients corresponding to the category.
  • a Wiener filtering device may be added at the V-PCC coding end, and FIG. 11 shows a related partial architecture.
  • a Wiener filter module 35 is added to the encoding device, which may also be called a Wiener filter.
  • the Wiener filtering module 35 receives the smoothed reconstructed geometric video frame (referred to as the reconstructed geometric frame) output from the smoothing module 17, and the original geometric video frame output from the geometric frame filling module 23 (or geometric frame generation module 19) (referred to as the original geometry frame).
  • Wiener filtering is performed on the reconstructed geometric video frame, and the filtered reconstructed geometric video frame is output to the texture frame generating module 15 for generating corresponding texture video frames.
  • the Wiener filter module 35 can also record the generated filter coefficients, encode them and send them together with the geometric video frame data, or send them as syntax elements.
  • Fig. 12 is another exemplary architecture that can implement the video encoding method of this embodiment. as the picture shows.
  • the added Wiener filter module 35 receives the reconstructed geometric video frame output from the second video compression module 31 and the original geometric video frame output from the geometric frame filling module 23 (or the geometric frame generation module 19 ).
  • the filtered reconstructed geometric video frames are output to the smoothing module 17 .
  • An effect similar to that of the architecture in Figure 11 can also be achieved.
  • the smoothing module 17 can also be eliminated, and the Wiener filtering module 35 receives the reconstructed geometric video frame output from the second video compression module 31 and the original geometric video frame output from the geometric frame filling module 23 .
  • the filtered reconstructed geometric video frames are output to the texture frame generation module 15 .
  • the Wiener filtering method of this embodiment may also be used to filter the reconstructed texture video frame, generate corresponding filter coefficients, encode and send them. I won't go into details here.
  • the video encoding method is applied to an encoding end of a video-based point cloud compression system, and the neighborhood filtering is Wiener filtering; the first video image includes a reconstructed geometric video image ; or, the first video image includes a reconstructed texture video image.
  • the encoding end may only generate filter coefficients to perform Wiener filtering, and the filter coefficients are encoded and sent to the decoding end for reconstruction of geometric video images and/or quality enhancement of reconstructed texture video images at the decoding end.
  • the first video image includes two reconstructed video images mapped from point cloud frames
  • the video coding method further includes: for the two reconstructed video images mapped from the same point cloud frame, according to the method for generating filter coefficients described in any embodiment of subclass merging in the present disclosure, generate a filter coefficient for the first reconstructed video image After the filter coefficients, when generating the filter coefficients for the second reconstructed video image, after the pixels of the second reconstructed video image are divided into subclasses, the same subclass merging method as that of the first reconstructed video image is adopted, and the The pixel points of multiple subcategories are classified into the multiple categories; wherein, the reconstructed video image includes a lossy geometric video image or a texture video image. Because the same point cloud frame is mapped into two reconstructed video images, the sharing and merging method can improve the encoding speed and effect, and the impact on quality is controllable.
  • the video coding method is also applicable to a video codec system for processing two-dimensional video images, for example. H.264/AVC, H.265/HEVC, VVC/H.266 and other similar standard video codec systems.
  • a traditional video encoding device 1000 includes a prediction processing unit 1100, a division unit 1101, a residual generation unit 1102, a transformation processing unit 1104, a quantization unit 1106, an inverse quantization unit 1108, an inverse transformation processing unit 1110, A reconstruction unit 1112 , a filter unit 1113 , a decoded picture buffer 1114 , an image resolution adjustment unit 1115 , and an entropy encoding unit 1116 .
  • the prediction processing unit 1100 includes an inter prediction processing unit 121 and an intra prediction processing unit 1126 .
  • video encoder 20 may contain more, fewer or different functional components than this example.
  • the division unit 1101 cooperates with the prediction processing unit 1100 to divide the received video data into slices (Slices), CTUs or other larger units.
  • the video data received by the dividing unit 1101 may be a video sequence including video frames such as I frames, P frames, or B frames.
  • the prediction processing unit 1100 may divide a CTU into CUs, and perform intra-frame predictive coding or inter-frame predictive coding on the CUs.
  • the CU can be divided into one or more prediction units (PU: prediction unit).
  • the inter prediction processing unit 1121 may perform inter prediction on the PU to generate prediction data of the PU, the prediction data including the prediction block of the PU, motion information of the PU and various syntax elements.
  • the intra prediction processing unit 1126 may perform intra prediction on the PU to generate prediction data for the PU.
  • the prediction data for a PU may include the prediction block and various syntax elements for the PU.
  • the residual generation unit 1102 may generate the residual block of the CU based on the original block of the CU minus the prediction block of the PU divided by the CU.
  • the transform processing unit 1104 may divide the CU into one or more transform units (TU: Transform Unit), and the residual block associated with the TU is a sub-block obtained by dividing the residual block of the CU.
  • a TU-associated coefficient block is generated by applying one or more transforms to the TU-associated residual block.
  • the quantization unit 1106 can quantize the coefficients in the coefficient block based on the selected quantization parameter, and the degree of quantization of the coefficient block can be adjusted by adjusting the QP value.
  • the inverse quantization unit 1108 and the inverse transformation unit 1110 may respectively apply inverse quantization and inverse transformation to the coefficient blocks to obtain TU-associated reconstruction residual blocks.
  • the reconstruction unit 1112 may add the reconstruction residual block and the prediction block generated by the prediction processing unit 1100 to generate a reconstruction block of the CU.
  • the filter unit 1113 After the filter unit 1113 performs loop filtering on the reconstructed block, it stores it in the decoded picture buffer 1114 as a reference image.
  • the intra prediction processing unit 1126 may extract reference images of blocks adjacent to the PU from the decoded picture buffer 1114 to perform intra prediction.
  • the inter prediction processing unit 1121 may perform inter prediction on the PU of the current frame image using the reference image of the previous frame buffered in the decoded picture buffer 1114 .
  • the image resolution adjustment unit 1115 resamples the reference images stored in the decoded picture buffer 1114, which may include upsampling and/or downsampling, and obtains reference images of various resolutions and stores them in the decoded picture buffer 1114.
  • the entropy encoding unit 1116 may perform an entropy encoding operation on received data (such as syntax elements, quantized systematic blocks, motion information, etc.).
  • a Wiener filtering unit 1128 can be added between the filter unit 1113 and the decoded picture buffer 1114 in the figure. , at this time, the output from the filter unit 1113 to the decoded picture buffer 1114 is disconnected.
  • the Wiener filtering unit receives the filtered reconstructed video image output by the filter unit 1113 (which may be a video image of any specification specified in the corresponding standard), and obtains the corresponding original video image from the division unit 1101, and performs reconstruction on the reconstructed video signal Wiener filtering, the filtered reconstructed video image is output to the decoded image buffer 1114 for storage.
  • the Wiener filtering unit 1128 can also be arranged before the filter unit 1113, receive the reconstructed video image output by the adder 1112 and the original video image output by the division unit 1101, and perform Wiener filtering on the reconstructed video image. After filtering, it is output to the filter unit 1113 , or the Wiener filter unit 1128 may be integrated in the filter unit 1113 or replace the filter unit 1113 .
  • An embodiment of the present disclosure also provides a code stream, wherein the code stream is an encoded video code stream, and the code stream includes encoded filter parameters, and the filter parameters include Filter coefficients for neighborhood filtering.
  • the encoded video code stream carries filtering parameters for performing neighborhood filtering on the first video image, and the optimal filter coefficient obtained at the encoding end can be transmitted to the decoding end for enhancing the video image. Improve the quality of reconstructed video images.
  • the encoded filter parameters include one or more information units, each of which includes the following subunits:
  • a flag subunit set to indicate whether there is filter coefficient and class information
  • Index subunit set to write information of a category, or empty, where the information of this category is represented by the index information of this category or the index information of multiple subcategories merged into this category;
  • the coefficient subunit is set to write a set of filter coefficients, or is empty; wherein, the category in the index subunit is the category corresponding to the set of filter coefficients.
  • the transmission of filter parameters through the above data format can enable the encoding end to have sufficient encoding flexibility, and can send or not send part or even all of the filter coefficients according to the situation, thereby ensuring higher encoding efficiency.
  • the decoding end can quickly and correctly read the required filtering parameters by analyzing the signs.
  • the code stream is a code stream sent by an encoding end of a video-based point cloud compression system, and the first video image includes a reconstructed video image;
  • the encoded filter parameters are carried in the video image data stream, located after the delimiter in the geometric code stream and before the data of the reconstructed video image, wherein the reconstructed video image includes a lossy reconstructed geometric video image or reconstruct the textured video image; or
  • the encoded filter parameters are carried in the sequence parameter set of the code stream.
  • An embodiment of the present disclosure also provides a video decoding method, as shown in FIG. 14 , including:
  • Step 510 decode the code stream, and determine the filter parameters of the first video image, where the filter parameters include filter coefficients;
  • Step 520 Perform neighborhood filtering on the first video image according to the filtering parameters.
  • the filter coefficient sent by the encoding end is analyzed from the code stream, and used to perform neighborhood filtering on the decoded first video image, which can reduce the calculation burden on the decoding end. Moreover, when the original video image exists at the encoding end, it is easier to generate optimal filter coefficients, so that the filtering effect at the decoding end is better.
  • the filter parameters include one or more sets of filter coefficients, and information of a category corresponding to each set of filter coefficients; wherein, the one category is based on the first video
  • the neighborhood difference of pixels in the image divides the pixels in the first video image into one of multiple categories, and the one category corresponds to one or more value intervals of the neighborhood difference.
  • the filtering parameters include one or more sets of filtering coefficients, which is the content of defining the filtering parameters.
  • the corresponding decoding device must receive valid filter coefficients and category information for all video images, and the decoding device may not receive any filter coefficient and category information, or only receive the filter coefficient and category information indicating that there is no filter coefficient and category Information sign.
  • the neighborhood difference of a pixel in the first video image is based on the absolute value of the difference between the pixel value of the pixel and the pixel value of each pixel in the neighborhood of the pixel Obtained by statistics, the statistics are summation, average value or maximum value; or, the neighborhood difference of a pixel in the first video image is based on the difference between the pixel and the pixel value of the pixel in the pixel neighborhood The difference between them is determined; wherein, the pixel neighborhood refers to the eight neighborhoods, four neighborhoods or diagonal neighborhoods of the pixel. See above for how domain differences are calculated.
  • the decoding of the code stream to determine the filtering parameters of the first video image includes:
  • Analyzing one or more information units carrying the filter parameters in the code stream respectively first reading a 1-bit flag for each information unit, if the value of the flag indicates that there is filter coefficient and category information, and then reading A group of filter coefficients and information of a category corresponding to the group of filter coefficients. If the value of the flag indicates that there is no filter coefficient and category information, continue to read other subsequent information units.
  • This parsing method is based on the data format of the flag of the information unit+category information+filter coefficient, for details, please refer to the description of the code stream in the embodiments of the present disclosure.
  • the category information may be category index information,
  • performing neighborhood filtering on the first video image according to the filtering parameters includes: for each set of analyzed filter coefficients, according to information of the category corresponding to the set of filter coefficients Determine the value interval of the neighborhood difference corresponding to the category, and when performing neighborhood filtering on the first video image, pixels whose neighborhood difference value in the first video image belongs to the value interval use this group filter coefficients.
  • the information of at least one category includes index information of multiple subcategories merged into the category, and each subcategory corresponds to an agreed value interval of neighborhood difference; each The value range of the neighborhood difference corresponding to the category is the union of the value ranges corresponding to the multiple sub-categories merged into the category.
  • the video decoding method is applied to a decoding end of a video-based point cloud compression system
  • the first video image includes a lossy reconstructed geometric video image, and after performing neighborhood filtering on the reconstructed geometric video image, the video decoding method further includes: using the filtered reconstructed geometric video image for the corresponding texture video image quality enhancement; or
  • the first video image includes a lossy reconstructed textured video frame.
  • the video decoding method of the embodiment of the present disclosure is executed at the decoding end of the V-PCC system.
  • a video decoding device as shown in FIG. 15 can be used, and a Wiener filtering module is added to the original architecture of the video decoding device 59.
  • the Wiener filtering module 59 receives the reconstructed geometric video frame output by the geometric post-processing module 55, filters the reconstructed geometric video frame according to the analyzed filter parameters, and sends the reconstructed geometric video frame after the Wiener filter to texture conversion and smoothing module 57 for generating the reconstructed point cloud.
  • the filter parameters can be obtained from the SPS syntax analysis module 43 , and can also be obtained by parsing from the geometric substream.
  • the filter parameters can be parsed by the second video decompression module 49 and then output to the Wiener filter module 59 .
  • the Wiener filter module 59 can also be set between the geometry and texture reconstruction module 53 and the geometry post-processing module 55, or be integrated with the geometry post-processing module 55, or replace the original geometry post-processing module 55.
  • the filter parameters in this embodiment include one or more sets of filter coefficients, and information of a category corresponding to each set of filter coefficients.
  • the video decoding method is also applicable to a video codec system for processing two-dimensional video images, for example. H.264/AVC, H.265/HEVC, VVC/H.266 and other similar standard video codec systems.
  • the neighborhood filter is a Wiener filter; the first video image includes a reconstructed video image.
  • a traditional video decoding device 101 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transformation processing unit 156, and a reconstruction unit 158 (indicated by a circle with a plus sign in the figure) , a filter unit 159, and a picture buffer 160.
  • video decoder 30 may contain more, fewer or different functional components.
  • the entropy decoding unit 150 may perform entropy decoding on the received code stream to extract information such as syntax elements, quantized coefficient blocks, and PU motion information.
  • the prediction processing unit 152 , the inverse quantization unit 154 , the inverse transform processing unit 156 , the reconstruction unit 158 and the filter unit 159 can all perform corresponding operations based on the syntax elements extracted from the code stream.
  • the inverse quantization unit 154 may inverse quantize the quantized TU-associated coefficient blocks.
  • Inverse transform processing unit 156 may apply one or more inverse transforms to the inverse quantized coefficient block in order to generate the reconstructed residual block of the TU.
  • Prediction processing unit 152 includes inter prediction processing unit 162 and intra prediction processing unit 164 . If the PU is encoded using intra-frame prediction, the intra-frame prediction processing unit 164 can determine the intra-frame prediction mode of the PU based on the syntax elements parsed from the code stream, and according to the determined intra-frame prediction mode and the adjacent PU obtained from the picture buffer device 60 Intra prediction is performed on the reconstructed reference information, resulting in a prediction block of the PU. If the PU is encoded using inter-prediction, inter-prediction processing unit 162 may determine one or more reference blocks for the PU based on the motion information of the PU and corresponding syntax elements to generate a predictive block for the PU.
  • the reconstruction unit 158 may obtain the reconstruction block of the CU based on the reconstruction residual block associated with the TU and the prediction block of the PU generated by the prediction processing unit 152 (ie intra prediction data or inter prediction data).
  • the filter unit 159 may perform loop filtering on the reconstructed block of the CU to obtain a reconstructed picture.
  • the reconstructed pictures are stored in the picture buffer 160 .
  • the picture buffer 160 can provide reference pictures for subsequent motion compensation, intra prediction, inter prediction, etc., and can also output the reconstructed video data as decoded video data for presentation on a display device.
  • the above display 105 may be, for example, a liquid crystal display, a plasma display, an organic light emitting diode display or other types of display devices.
  • the decoding end may not include the display 105, but may include other devices that can apply the decoded data.
  • a Wiener filter unit 166 When applying the video decoding method of the embodiment of the present disclosure to the video decoding device, as shown in FIG. 16 , a Wiener filter unit 166 needs to be added, and at this time, the output from the filter unit 159 to the picture buffer 160 is disconnected.
  • a Wiener filter unit 166 can be added between the filter unit 159 and the picture buffer 160 in the figure, and the Wiener filter unit receives the filtered reconstruction video image output by the filter unit 159 (can be any video image specified in the corresponding standard), and receive the filtered filter parameters (including filter coefficients and category information) analyzed from the entropy encoding unit 150, carry out Wiener filtering to the reconstructed video image, and the reconstructed video after filtering
  • the signal is stored in the picture buffer 160 .
  • the Wiener filter unit can also be set to receive the reconstructed video image output by the adder 158 before the filter unit 159, and output the filter unit 159 after performing Wiener filtering on the reconstructed video image, or the Wiener filter unit Can be integrated in the filter unit 159 or replace the filter unit 159 .
  • the filter parameters in this embodiment include one or more sets of filter coefficients, and information of a category corresponding to each set of filter coefficients.
  • the video encoding device and/or video decoding device in the above-mentioned embodiments of the present disclosure can be implemented by using any one of the following circuits or any combination of the following circuits: one or more microprocessors, digital signal processors, application-specific integrated circuits, Field Programmable Gate Array, Discrete Logic, Hardware. If the present disclosure is implemented partially in software, instructions for the software may be stored in a suitable non-transitory computer-readable storage medium and executed in hardware using one or more processors to thereby Methods of implementing embodiments of the present disclosure.
  • An embodiment of the present disclosure provides a video encoding device, which is applied to a video-based point cloud compression system, as shown in FIG. 11 and FIG. 12 , including a texture frame generation module, a geometric frame generation module, a geometric frame filling module and a sequentially connected geometric frame generation module.
  • the geometric frame video compression module (corresponding to the second video compression module 31 in Fig. 11 and Fig.
  • a Wiener filter module configured to receive the reconstructed geometric video frame output by the geometric frame video compression module, and the original geometric video frame output by the geometric frame generation module or the geometric frame filling module, execute the video coding method described in the embodiment of Wiener filtering on the reconstructed geometric video frame as disclosed in the present disclosure, and output the filtered reconstructed geometric video frame to the texture frame generation module.
  • the Wiener filtering module receives the reconstructed geometric video frame output by the geometric frame video compression module, which can be directly or indirectly output to the Wiener filtering module (such as passing through the smoothing module in the middle), and the dimension
  • the nanofiltering module outputs the filtered reconstructed geometric video frame to the texture frame generation module, and may also directly or indirectly output to the texture frame generation module (such as through a smoothing module in the middle).
  • An embodiment of the present disclosure also provides a video encoding device, as shown in FIG. 22 , including a processor 5 and a memory 6 storing a computer program, wherein, when the processor 5 executes the computer program, it implements the The video encoding method described in any one of the embodiments.
  • An embodiment of the present disclosure also provides a video decoding device, including a geometric frame reconstruction module and a texture conversion module, see FIG. 15 , and also includes:
  • the Wiener filter module is configured to receive the reconstructed geometric video frame output by the geometric frame reconstruction module, and the filter parameters obtained by parsing from the code stream, execute the video decoding method as described in any embodiment of the present disclosure, and output the filtered The reconstructed geometry of the video frame to the texture conversion module.
  • the geometric frame reconstruction module described in this embodiment can be integrated in the geometric and texture reconstruction module 53 in Figure 15, and the reconstructed geometric video frame output by the geometric frame reconstruction module can be directly or indirectly output to the Wiener filter module (or called Wiener filter, other embodiments are the same).
  • the texture conversion module described in this embodiment can be integrated into the texture conversion and smoothing module 57 in FIG. 15 , and the reconstructed geometric video frame filtered by the Wiener filter module can be directly or indirectly output to the texture conversion module.
  • An embodiment of the present disclosure also provides a video decoding device, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the video decoding described in any embodiment of the present disclosure is implemented. method.
  • An embodiment of the present disclosure further provides a video encoding and decoding system, including the video encoding device according to any embodiment of the present disclosure and the video decoding device according to any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a video filtering device, including a processor and a memory storing a computer program, wherein, when the processor executes the computer program, the video filtering described in any embodiment of the present disclosure is implemented. method.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized.
  • the method described in the example is the same as the computer program.
  • the processor in the above-mentioned video encoding device and decoding device can be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor ( DSP), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • An embodiment of the present disclosure also provides a method for performing neighborhood difference-based adaptive Wiener filtering on geometric frames reconstructed by V-PCC, so as to achieve the purpose of enhancing point cloud quality and subjective effect.
  • This embodiment proposes to set one or more Wiener filters at the encoding and decoding end according to the data and filtering effects, and the encoding end obtains the optimal filter coefficient through calculation and transmits it to the decoding end, and the decoding end decodes the optimal filtering coefficient Then do post-processing on the point cloud reconstruction geometry frame.
  • This embodiment is aimed at the V-PCC geometrically lossy and attribute lossy encoding manner.
  • the input point cloud of each frame will be divided into patches before encoding, and the patches are closely arranged and mapped to two single-channel images of the far layer and the near layer, that is, the Geometry video; and the texture information of each patch will be mapped to two three-channel images, that is, generate Attribute video; the difference between the two geometric/texture images in each frame is very small.
  • the Occupancy map is used to indicate the occupancy of useful pixels in the video.
  • the V-PCC geometry coding part we can obtain the reconstructed (lossy) geometry video frame and the original (lossless) geometry video frame, and at the same time get the geometry code stream with coding information.
  • this embodiment calculates the sum of the absolute value of the difference between each pixel point and each point pixel value of its eight neighbors, which is recorded as diff, and uses this value for classification.
  • the diff is limited in the range of 0 to 11 (the value greater than 11 is 11), that is, it is divided into 12 sub-categories, and the best combination point (that is, the best combination method) is searched, from which 3 categories are combined out of 12 subcategories.
  • the best combination method refer to Fig. 6 and corresponding explanations.
  • the diff values corresponding to subcategories used for merging in the traversal merging method are all adjacent. Based on test results, adjacent categories have better results when sharing filter coefficients, and can effectively reduce Time complexity; at the same time, when calculating the maximum gain of each combination method, the gain represented by the PSNR increase value is weighted and then compared, because the number of pixels in the merged category is different under different combination methods. When the number of points is large , PSNR improvement is more difficult and contributes more to the quality enhancement of the entire image. Therefore, the ratio of the number of pixels in the category obtained by the current combination method to the resolution of the entire image is used as the weight to weight the gain.
  • the maximum rounds of merging set for each video frame (that is, each image) is 3, and three sets of filter coefficients can be generated at most, and the three categories corresponding to the three sets of filter coefficients do not necessarily cover all pixels in the video frame Points, the remaining pixels can be regarded as a category that does not have corresponding filter coefficients and does not need to be filtered. And subsequently, it is necessary to judge whether performing Wiener filtering on the pixels of each category based on the filter coefficients of each category will improve the overall performance. If not, it is not necessary to pass all the filter coefficients.
  • this embodiment proposes a method for two images to share the best way of merging: the second image After the difference value diff is calculated, the traversal of the merging method is no longer performed, but the merging method information saved in the previous image frame is directly used to combine the pixels and perform subsequent operations. Experiments prove that this method can effectively reduce the time complexity, and the final performance is almost the same as that of the original method.
  • a Wiener filtering operation may be performed on the pixels of each category.
  • the shape of the filter is changed in this embodiment, and a rhombus filter is used, so that neighborhood information can be extracted more effectively.
  • this embodiment calculates the PSNR and rate-distortion performance of these pixels after filtering and the original points, and compares them with those before filtering.
  • the cost function is used to trade off the rate distortion:
  • D is the SSE of the original point set (original pixel point set) and the reconstructed point set (reconstructed pixel point set without filtering) or the filtered point set (filtered reconstructed pixel point set), that is, the sum of the squares of the corresponding point errors ;
  • is the quantity related to the quantization parameter QP, in this scheme, take R is the bitstream size. If the cost J f after filtering is less than the cost J r of the reconstructed point set without filtering, the flag of this group is set to 1, and the group of coefficients and the best combination method (or combined point) are saved; otherwise, the flag is set to 0, do not save other information.
  • the data to be written at the encoding end is: for each category of each video frame, write the flag (bool type) first; if the flag is 0, then do not continue to write; if the flag is 1, then write the best
  • the way of merging is also the category information (char type), and write k filter coefficients (converted to int type).
  • the category information can be represented by the index of the subclass, and the multiple subclasses connected in the queue can be represented by the index of the first subclass and the index of the last subclass.
  • V-PCC decoding Since the relevant data of the filter coefficient transmitted by the filter at the encoding end is before the geometric video frame data, after V-PCC decoding locates the geometric code stream, it first reads the Wiener filter flag and category from the fifth bit according to the format of the encoding end. Information and a set of filter coefficients, and then restore the original code stream, continue the V-PCC decoding operation to obtain the reconstructed geometric video frame, and pass the reconstructed geometric video frame and filter parameters to the Wiener filter.
  • the flags analyzed in the code stream it can be judged which categories of pixels in each frame need to be filtered, and the pixels that need to be filtered are divided into their own categories according to the category information (such as the way of merging subcategories).
  • the corresponding filter coefficients are filtered separately, and the filtered pixel values are replaced with the pixel values corresponding to the pixel points of the reconstructed point cloud; after all the frames are filtered, the geometric video frames with enhanced quality are obtained. Then return to the V-PCC procedure. Due to the optimization of the geometric video frame, the position of the points in the reconstructed point cloud can be closer to the real value, thereby improving the quality of the entire point cloud and having a better subjective effect.
  • Geom.BD-TotGeomRate is the BD-Rate of the geometric PSNR relative to the geometric code stream
  • End-to-End BD-AttrRate is the BD-Rate of the end-to-end attribute PSNR relative to the attribute code stream
  • Geom.BD-TotalRate is the geometric PSNR Relative to the BD-Rate of the total code stream
  • End-to-End BD-TotalRate is the BD-Rate of the end-to-end attribute PSNR relative to the total code stream.
  • test results of each sequence on CTC_C2 are shown in Figure 17 and Figure 18, and the lower lines of Figure 17 and Figure 18 represent the average gain of the test on CTC_C2, where BD-TotalRate shown in Figure 17 is the geometric or texture quality Compared with the overall code stream, the overall compression rate is improved; Figure 18 shows that the geometric quality improvement is for the geometric code stream, and the texture quality improvement is for the texture code stream, and the respective compression rates are improved. It can be seen from the figure that compared with the original program, after the geometric video frame adaptive Wiener filtering, the quality of the point cloud has been greatly improved, and the compression efficiency has been further increased, and the BD-Rate has been significantly reduced.
  • the compression rate of color attributes has not changed much, while the geometry attributes have been greatly improved.
  • the BD-Rate of the color attribute also has a small decrease, which is the beneficial effect of the geometric quality improvement on color coding and reconstruction; the geometric gain of the Cat2-C sequence is still considerable, indicating that the The algorithm has a more obvious effect on improving the quality of point clouds with higher reconstruction quality.
  • Figure 19 shows the original point cloud (Ground Truth)
  • Figure 20 shows the reconstructed point cloud
  • Figure 21 shows the point cloud after geometric video frame quality enhancement. It can be clearly seen from the picture that after the geometric video frame is subjected to adaptive Wiener filtering based on neighborhood differences, the obtained point cloud has a smoother boundary contour than the reconstructed point cloud, and some outlier points have also returned to the correct position. , giving people a better subjective feeling.
  • the V-PCC point cloud post-processing quality enhancement method proposed in this embodiment uses an adaptive Wiener filter algorithm based on neighborhood differences, and has at least the following characteristics:
  • the Wiener filtering of the whole image is no longer performed, but the total difference of the eight neighborhoods of each pixel is calculated first, and the sub-category is carried out according to the difference value. divided.
  • the pixel points with the most quality improvement after sharing the filter parameters are combined into large categories, which are divided into three categories, and Wiener filtering is performed in turn to obtain three sets of filter coefficients.
  • Whether information such as filter coefficients is transmitted to the decoding end is determined based on rate-distortion indicators.
  • rate-distortion indicators When the code stream cost of data such as filter coefficients and combination methods is greater than the effect of quality improvement, the transmission of filter coefficients is not performed.
  • the filter shape adopts diamond shape to extract neighborhood information more efficiently.
  • the optimal combination method of the second frame image is no longer recalculated, but is directly determined to be the same as the first frame image, which can reduce time complexity while ensuring the effect .
  • the first frame image and the second frame image may be geometric video frames or texture video frames, both of which include two frames of near layer and far layer images.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit.
  • Computer-readable media may include computer-readable storage media that correspond to tangible media such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • a computer program product may comprise a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk or other magnetic storage, flash memory, or may be used to store instructions or data Any other medium that stores desired program code in the form of a structure and that can be accessed by a computer.
  • any connection could also be termed a computer-readable medium. For example, if a connection is made from a website, server or other remote source for transmitting instructions, coaxial cable, fiber optic cable, dual wire, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
  • disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, or blu-ray disc, etc. where disks usually reproduce data magnetically, while discs use lasers to Data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
  • processors can be implemented by one or more processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chipset).
  • IC integrated circuit
  • Various components, modules, or units are described in the disclosed embodiments to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (comprising one or more processors as described above) in combination with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

一种滤波系数生成及滤波方法、视频编解码方法、装置和***,解码端从码流中解析出第一视频图像的滤波参数,滤波参数包括滤波系数;根据所述滤波参数对解码得到的所述第一视频图像进行邻域滤波。其中,所述滤波系数是根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别,为所述多个类别中的部分或全部类别分别生成的对应的滤波系数;可以由编码端生成并编码发送。本公开实施例可以提升滤波效果,增强视频图像的图像质量。

Description

滤波系数生成及滤波方法、视频编解码方法、装置和***
技术邻域
本公开实施例涉及但不限于视频技术,更具体地,涉及一种滤波系数生成及滤波方法、视频编解码方法、装置和***。
背景技术
图像滤波要在尽量保留图像细节特征的条件下对目标图像的噪声进行抑制,是图像处理中重要的操作。邻域滤波是一种常用的滤波方式,邻域滤波基于图像中每一个像素点的像素值及其邻域像素点的像素值进行计算,结果作为该像素点新的像素值。邻域滤波包括维纳滤波、高斯滤波、均值滤波等。邻域滤波的效果还有待增强。以图像的维纳滤波为例,比如在滤波器阶数K一定的情况下,对于大尺度图像以及局部变化较为剧烈的图像,利用同一组系数进行滤波得到的质量增强效果并不好;如果一味增大K的大小,滤波效果只有小幅度提升,同时在一定程度上会增加码流大小及时间复杂度,甚至使得综合性能更差。
发明概述
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本公开一实施例提供了一种视频解码方法,包括:
解码码流,确定第一视频图像的滤波参数,所述滤波参数包括滤波系数;
根据所述滤波参数对所述第一视频图像进行邻域滤波。
本公开一实施例还提供了一种滤波系数生成方法,包括:
根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别;
为所述多个类别中的部分或全部类别分别生成对应的滤波系数。
本公开一实施例还提供了一种视频滤波方法,其中:
获取按照本公开任一实施例所述的滤波系数生成方法生成的,与所述多个类别中的部分或全部类别对应的滤波系数;
对所述第一视频图像进行邻域滤波时,所述第一视频图像中有对应滤波系数的每一类别的像素点使用该类别对应的滤波系数进行滤波。
本公开一实施例还提供了一种视频编码方法,包括:
按照如本公开任一实施例所述的滤波系数生成方法,将第一视频图像中的像素点分成多个类别,为所述多个类别中的部分或全部类别分别生成对应的滤波系数,一个所述类别对应一组滤波系数;
对滤波参数编码并发送,或者对所述滤波系数中符合发送条件的滤波参数编码并发送,其中,所述滤波参数包括所述滤波系数和类别信息。
本公开一实施例还提供了一种码流,其中,所述码流为已编码视频码流,所述码流中包括已编码的滤波参数,所述滤波参数包括滤波系数,所述滤波系数用于对第一视频图像进行邻域滤波。
本公开一实施例还提供了一种视频解码装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频解码方法。
本公开一实施例还提供了一种视频解码装置,包括几何帧重建模块和纹理转换模块,其中,还包括:
维纳滤波模块,设置为接收所述几何帧重建模块输出的重建几何视频图像,及从码流中解析得到的滤波参数,执行如本公开任一实施例所述的视频解码方法,输出滤波后的重建几何视频图像到所述纹理转换模块。
本公开一实施例还提供了一种视频编码装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开一实施例还提供了一种视频编码装置,应用于基于视频的点云压缩***,包括纹理帧生成模块,以及依次连接的几何帧生成模块、几何帧填充模块和几何帧视频压缩模块,其中,还包括:
维纳滤波模块,设置为接收所述几何帧视频压缩模块输出的重建几何视频图像,及所述几何帧生成模块或几何帧填充模块输出的原始几何视频图像,执行如本公开实施例所述的视频编码方法,输出滤波后的重建几何视频图像到所述纹理帧生成模块。
本公开一实施例还提供了一种视频编解码***,其中,包括如本公开任一实施例所述的视频编码装置和如本公开任一实施例所述的视频解码装置。
本公开一实施例还提供了一种视频滤波装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频滤波方法。
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的方法。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
附图用来提供对本公开实施例的理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。
图1是本公开一实施例滤波系数生成方法的流程图;
图2是本公开一实施例视频滤波方法的流程图;
图3A、图3B和图3C是一个像素点几种邻域的示意图;
图4A、图4B和图4C是一个像素点及其邻域像素点的像素值取值的几个示例;
图5是本公开一实施例采用的菱形窗口的示意图;
图6是本公开一实施例滤波系数生成方法中小类合并过程的流程图;
图7是V-PCC编码端的框架图;
图8是V-PCC解码端的框架图;
图9是本公开一实施例视频编码方法的流程图;
图10是本公开一实施例作为示例的视频几何帧的示意图;
图11是本公开一实施例在V-PCC编码端增加维纳滤波模块的一种方式的模块图;
图12是本公开一实施例在V-PCC编码端增加维纳滤波模块的另一方式的模块图;
图13是本公开一实施例在视频编码装置增加维纳滤波单元的示意图;
图14是本公开一实施例视频解码方法的流程图;
图15是本公开一实施例在V-PCC解码端增加维纳滤波模块的示意图;
图16是本公开一实施例在视频解码装置增加维纳滤波单元的示意图;
图17是CTC_C2上每个序列的测试结果中一部分的示意图;
图18是CTC_C2上每个序列的测试结果中另一部分的示意图;
图19、图20和图21是本公开一实施例方法原始点云、重建点云和经本公开实施例质量增强的点云比较的示意图;
图22是本公开一实施例视频编码装置的硬件架构图。
详述
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本邻域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。
本公开的描述中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性的”或者“例如”的任何实施例不应被解释为比其他实施例更优选或更具优势。本文中的“和/或”是对关联对象的关联关系的一种描述,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。“多个”是指两个或多于两个。另外,为了便于清楚描述本公开实施例的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本邻域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
在描述具有代表性的示例性实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本邻域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本邻域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
目前的邻域滤波算法对视频图像中所有的像素点采用同一组系数进行滤波,没有考虑视频图 像中像素点的邻域差异,质量增强效果不好。在前景和背景交界处亮度变化极为剧烈的区域,与前景或背景亮度变化平缓的区域,如果采用相同的滤波系数进行邻域滤波,难以达到好的效果。此外,视频编解码***的编码端和解码端分别滤波,解码端没有利用编码端的滤波参数,影响了滤波效果。
为此,本公开一实施例提供了一种滤波系数生成方法,如图1所示,包括:
步骤110,根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别;
步骤120,为所述多个类别中的部分或全部类别分别生成对应的滤波系数。
本公开实施例不同类别像素点可以使用不同的滤波系数,因此本公开的邻域滤波不包括那些系数固定的邻域滤波算法如均值滤波、中值滤波等。
本公开一实施例还提供了一种视频滤波方法,如图2所示,包括:
步骤210,获取本公开任一实施例所述的滤波系数生成方法生成的,与所述多个类别中的部分或全部类别对应的滤波系数;
此处获取的滤波系数可以是本地生成的,也可以是从外部传输来的,例如视频编码端生成滤波系数并编码发送,视频解码端通过解析获取所述滤波系数并使用。
步骤220,对所述第一视频图像进行邻域滤波,其中,对所述第一视频图像中有对应滤波系数的每一类别的像素点,使用该类别对应的滤波系数进行滤波。
也即对所述第一视频图像进行邻域滤波时,对有对应滤波系数的每一类别,使用窗口扫描该类别的每一像素点,对所述窗口内所有像素点的像素值加权平均,将位于所述窗口中心的该类别像素点的像素值更新为加权平均的结果,加权平均使用的一组加权系数采用该类别对应的一组滤波系数。
本公开实施例根据像素点的邻域差域对视频图像中的像素点分类,对不同的类别分别生成对应的滤波系数并在邻域滤波时作为该类别像素点的滤波系数,可以为邻域差异不同的像素点自适应地生成合适的滤波系数,从而提高视频图像邻域滤波的效果。本公开实施例提出的基于邻域差异的自适应邻域滤波方法、可以通过自适应邻域滤波增强图像的质量。
本公开的第一视频图像包括但不限于视频帧,第一视频图像还可以是视频帧中的条带(slice)、条带片段(slice segment)等更小的视频单位,或者更大的视频单位,如多个视频帧序列。本公开以下实施例对视频帧的处理也可适用于对其他视频图像的处理。
本文中,像素点的像素值可以是颜色的三个分量(也称为三个通道)中任一分量的值。例如,在YUV格式的图像中,“Y”表示亮度(Luminance或Luma)分量,亮度分量的值通常称为灰度值;“U”和“V”表示的则是色度(Chrominance或Chroma)分量,色度分量可以被存储成Cb(也写为Chroma Cb)和Cr(也写入Chroma Cr),其中Cb为蓝色色度分量,Cr为红色色度分量。则像素点的像素值可以是亮度分量、蓝色色度分量或红色色度分量的值;在RGB格式的图像中,像素点的像素值可以是红色、绿色或蓝色的值,等等。有时,第一视频图像只具有部分分量时,例如在基于视频的点云压缩(video-based point cloud compression,简称V-PCC)***中,对重建几何视频图像滤波时,像素点的像素值指灰度值。而对重建纹理视频图像滤波时,可以对两个色度分量分别滤波,则对蓝色色度分量滤波时,上述像素点的像素值指蓝色色度分量的值,对红色色度分量滤波时,上述像素点的像素值指红色色度分量的值。
本公开实施例中,对视频图像进行邻域滤波,是对视频图像中每一像素点,以该像素点使用的k个滤波系数为加权系数,对以该像素点为中心的窗口内k个像素点的像素值进行加权平均,结果作为该像素点新的像素值,其中,k为滤波阶数。这与传统的邻域滤波是基本相同的,只是传统的邻域滤波所有像素点使用的是相同的滤波系数。而在本公开实施例中,具有对应滤波系数的每一类别的像素点使用的是该类别对应的滤波系数,不同类别对应的滤波系数分别生成,通常并不相同。视频图像中还可能存在一部分像素点,其所属的类别没有对应的滤波系数,对视频图像进行邻域滤波时,这部分像素点保留原来的像素值,不参与滤波运算。
本文中的“窗口”均指邻域滤波扫描时使用的窗口,也可以称为模板、模板窗口、卷积核、方框等。所述扫描操作也可以称之为卷积。但本实施例的一个示例使用菱形的滤波器窗口,如图5所示。相较于原始的矩形框滤波器,该示例改变了窗口的形状,可以更好地适应边界变化,更有效地提取邻域信息。图5中的窗口包括25个像素点,对应于k=25阶的滤波器。本公开也可以使用如9阶、16阶、36阶等其他阶数的滤波器进行邻域滤波。
在本公开一示例性实施例中,所述第一视频图像中一个像素点的邻域差异根据该像素点的像素值与该像素点邻域中每一像素点的像素值之差的绝对值统计得到,所述统计为求和、求均值或者求最大值。
在一个示例中,该像素点的邻域指该像素点的八邻域或四邻域或对角邻域。图3A示出了像素点A的八邻域即带有剖面线的区域,八邻域中包括像素点A周围一圈的8个像素点。图3B示出了像素点A的四邻域即带有剖面线的区域,四邻域中包括像素A周边8个像素点中位于上、下、左、右的4个像素点。图3C示出了像素点A的对角邻域即带有剖面线的区域,对角邻域中包括像素A周边8个像素点位于四个角上的像素点。但本公开不局限于此,在另一示例中,一个像素点的邻域不局限于上述八邻域、四邻域和对角邻域,如也可以包括该像素点周围二圈的24个像素点。
在一个示例中,像素点的邻域为八邻域为例,第一视频图像中一个像素点的邻域差异根据该像素点的像素值与该像素点邻域中每一像素点的像素值之差的绝对值求和得到。如图4A的示例,像素点A的像素值为2,八邻域中的像素点分别为3,1,3,2,4,2,2,4,邻域差异为diff。则有:
diff=|2-3|+|2-1|+|2-3|+|2-2|+|2-4|+|2-2|+|2-2|+|2-4|=7
如图4B的示例,像素点A的像素值为1,八邻域中的像素点分别为1,1,1,2,1,2,2,1,邻域差异为diff。则有:
diff=|1-1|+|1-1|+|1-1|+|1-2|+|1-1|+|1-2|+|1-2|+|1-1|=3
如图4C的示例,像素点A的像素值为5,八邻域中的像素点分别为4,7,8,8,6,5,2,3,邻域差异为diff。则有:
diff=|5-4|+|5-7|+|5-8|+|5-8|+|5-6|+|5-5|+|5-2|+|5-3|=15
除了求和外,所述统计也可以是求均值,即求和之后除以邻域中像素点的个数;或者所述统计也可以是求最大值,即求得到的所述多个绝对值中的最大值。此外,在得到统计值之后,可以用该统计值直接表示邻域差异。但也可以对第一视频图像中所有像素点的统计值做归一化处理,如将统计值映射到[0,10]的区间内,用归一化之后的值表示邻域差异。
在本公开一示例性实施例中,所述第一视频图像中一个像素点的邻域差异根据该像素点及该像素点邻域中像素点的像素值之间的差异确定。以图3A的数据为例,像素点A的邻域差异由该3×3区域整体的像素值的差异得到,即考虑2,3,1,3,2,4,2,2,4的差异,该差异可以用如极差、均方差等表示。
与本实施例相比,上一实施例邻域差异根据该像素点的像素值与该像素点邻域中每一像素点的像素值之差的绝对值统计得到,对逐像素的变化更为敏感,能捕捉到物体边缘处灰度值的剧烈变化,较准确地将边缘处的像素点分入同一类别,生成相适应的滤波系数,提高滤波效果。
在本公开一示例性实施例中,,所述部分或全部类别中每一类别对应的滤波系数设置为对所述第一视频图像进行邻域滤波时该类别像素点使用的滤波系数;所述对所述第一视频图像进行邻域滤波,包括:对有对应滤波系数的每一类别,使用窗口扫描该类别的每一像素点,对所述窗口内所有像素点的像素值加权平均,将位于所述窗口中心的该类别像素点的像素值更新为加权平均的结果,所述加权平均使用的一组加权系数根据该类别对应的滤波系数得到,所述窗口为矩形或菱形。此处描述的对一个类别的像素点进行邻域滤波的操作也适用于以下将多个小类合并为一个类别的过程中,对合并成的一个类别的像素点进行邻域滤波。
下面用维纳滤波器(Wiener filter)说明一下使用滤波系数进行滤波的过程以及滤波系数的生成过程。虽然本公开实施例以维纳滤波为例,但本公开也可用于可以根据有损信号和原始信号直接计算出最优系数的其他邻域滤波算法,以及其他滤波系数可以变化的邻域滤波算法。
维纳滤波器是由数学家Norbert Wiener提出的一种线性滤波器,是利用平稳随机过程的相关特性和频谱特性对混有噪声的信号进行滤波的方法。在一定的约束条件下,其输出与一给定函数(称为期望信号)的差的平方达到最小,通过数学运算最终可变为一个托布利兹方程的求解问题。
维纳滤波具体算法如下:
对于一列混有噪声的信号(称为待滤波信号)x,滤波器长度或阶数为M时输出为:
Figure PCTCN2021144056-appb-000001
用矩阵形式表示为:y(n)=H(m)×X(n)
已知期望信号为d,则可以计算输出信号与期望信号之间的误差:
e(n)=d(n)-y(n)=d(n)–H(m)×X(n),m=0,1…..M
维纳滤波器以最小均方误差(Minimum Mean Squared Error,简称MMSE)为目标函数,故令目标函数为:
Min E(e(n) 2)=E[(d(n)-H(m)×X(n) 2)]
当滤波系数为最优时,目标函数对滤波系数的导数应该为0,即:
Figure PCTCN2021144056-appb-000002
2E[(d(n)-H(m)×X(n))]×X(n)=0
E[d(n)X(n)]-H(m)E[X(n)X(n)]=0
上式可表示为:
Rxd–H×Rxx=0
从而由维纳--霍夫方程有:
H=Rxx -1×Rxd
得到最优滤波系数的矩阵H,其中,Rxx为待滤波信号(即含有噪声的输入信号)的自相关矩阵,Rxd为待滤波信号与期望信号的互相关矩阵。
求取维纳滤波系数需要待滤波信号与期望信号,在图像处理邻域,两者可以分别对应于有损图像(也称为失真图像)与原始图像(也称为真实图像)。对于k阶的维纳滤波器,该算法可以根据有损图像与原始图像中每个像素的像素值计算得到k个系数,作为维纳滤波最优的系数,利用该系数对有损图像进行维纳滤波,可以得到在均方误差上逼近原始图像的恢复图像,同时也会具有更好的主观效果。
以视频图像的滤波为例,假定滤波器阶数为k。视频图像的像素总数为n,矩阵P(n,k)是一个n×k矩阵,n行分别对应于图像中n个像素点,对于图像中的每一像素点,文中将以该像素点为中心的窗口内k个像素值的序列称为该像素点的滤波相关向量,文中将矩阵P(n,k)称为视频图像的滤波相关矩阵,由视频图像中n个像素点的滤波相关向量组成。向量S(n)表示原始图像内n个像素点的n个像素值,称为原始像素值向量。
由上述算法可得:
互相关矩阵B(k):B(k)=P(n,k) T×S(n)
自相关矩阵A(k,k):A(k,k)=P(n,k) T×P(n,k)
则最优滤波系数(向量)H(k):H(k)=A(k,k) -1×B(k)
即,H(k)是该k阶滤波器的一组滤波系数,也叫维纳系数,共包括k个滤波系数。
生成维纳系数后,使用维纳系数对待滤波信号进行维纳滤波,输出信号可以最大限度地恢复原始图像,输出信号R(n)=P(n,k)×H(k),R(n)代表输出信号的n个像素值。
在本公开一示例性的实施例中,所述邻域滤波为维纳滤波,所述第一视频图像包括有损视频图像;为所述多个类别中的一个类别生成对应的滤波系数时,是将所述有损视频图像中该类别所有像素点的滤波相关向量组成该类别滤波相关矩阵,将该类别像素点在相应原始视频图像中相应位置的像素点的像素值组成该类别原始像素值向量,将该类别滤波相关矩阵和该类别原始像素值向量的互相关矩阵左乘该类别滤波相关矩阵的自相关矩阵的逆矩阵,得到该类别对应的滤波系数。用公式表示为:
第i类别互相关矩阵B i(k)=P(n i,k) T×S(n i);
第i类别自相关矩阵A i(k,k)=P(n i,k) T×P(n i,k);
第i类别对应的滤波系数H i(k)=A i(k,k) -1×B i(k)。
其中,一个像素点的滤波相关向量指以该像素点为中心的窗口内的k个像素值组成的向量,P(n i,k)是第i类别滤波相关矩阵,n i是第i类别像素点的数量,S(n i)是第i类别像素点的原始像素值向量。
本公开一示例性实施例中,对第一视频图像进行维纳滤波时,与传统维纳滤波的算法基本相同。对视频图像中有对应滤波系数的类别中的每一像素点,也是对以该像素点为中心的窗口内k个像素点的像素值进行加权平均,结果作为该像素点新的像素值,只是加权平均时使用该类别对应的滤波系数,不再对所有像素点使用相同的滤波系数。本公开实施例还允许一部分像素点分入没有对应滤波系数的一个类别,对第一视频图像进行维纳滤波时直接保留原像素值,不需要参与运算。本公开任一实施例中,无论是合并前还是合并后,无论是对一个类别的像素点进行邻域滤波还是生成一个类别的滤波系数,对第一视频图像中的一个像素点来说,以该像素点为中心的窗口内的k个像素值均不改变,也即 该像素点的滤波相关向量不变,均是第一视频图像中以该像素点为中心的窗口所覆盖的k个像素值。
在本公开一示例性的实施例中,所述根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别,包括:将邻域差异的取值范围分为多个取值区间,确定所述第一视频图像中每一像素点的邻域差异所属的取值区间,将该像素点分入该取值区间对应的一个类别。
本实施例中,类别与邻域差异的取值区间之间的对应关系是设定好的。在一示例中,假定邻域差异的取值范围为[0,100],设定的类别个数为3,第一个类别对应的取值区间为[0,3],第二个类别对应的取值区间为[4,10],第三个类别对应的取值区间为[11,100]。对视频图像中的每一像素点,均可以按照前述方法计算出一个邻域差异的值,例如用diff表示邻域差异,如果diff=5,则将该像素点分入第二类别或者说该像素点属于第二类别,如果diff=15,则将该像素点分入第三类别或者说该像素点属于第三类别。这种方式根据像素点的邻域差异对视频帧中的像素点做了分类,这种分类下,类别的个数和类别与取值区间的对应关系是可以设定的,或者可以通过机器学习得到,仍然能够提升滤波的效果。每一类别对应的滤波系数可以通过算法计算,或者过经验设置,或者从设置好的多组滤波系数中通过滤波前后图像质量的增益选择出最优的一组。
在本公开另一示例性的实施例中,类别与取值区间的对应关系不是预先设定的,而是可以动态地择优选择。本实施例中,以视频帧为例,所述根据第一视频帧中像素点的邻域差异将所述第一视频帧中的像素点分成多个类别,包括:
将邻域差异的取值范围分成多个取值区间,确定所述第一视频图像中每一像素点的邻域差异的值所属的取值区间,将该像素点分入该取值区间对应的一个小类,所述小类的数量大于所述类别的数量;
遍历多种将小类合并为类别的方式,按照最优的合并方式将所述多个小类的像素点分入所述多个类别。
本公开实施例中,小类与邻域差异的取值区间之间关系是固定,通过动态择优的方式将小类合并为类别,可以实现类别与邻域差异的取值区间的自适应调整,达到最优的滤波效果。
在本实施例一示例性的实施例中,所述遍历多种将小类合并为类别的方式,按照最优的合并方式将所述多个小类的像素点分入所述多个类别,包括:
对所述多个小类进行第一轮的多次合并,每次按不同方式将其中的部分或全部小类合并为第一个类别,基于为该第一个类别生成的滤波系数对该第一个类别的像素点进行邻域滤波并计算增益,将增益最大且大于等于相应增益阈值的一次合并所合并小类的像素点分入该第一个类别,记录合并为该第一个类别的多个小类;
在前一轮次合并成功,已合并的轮次i小于设定的最大轮数且未合并的小类数量大于1的情况下,对未合并的多个小类进行第i+1轮的一次或多次合并,每次按不同方式将其中的部分或全部小类合并为第i+1个类别,基于为所述第i+1个类别生成的滤波系数对此次合并的小类中的像素点进行邻域滤波并计算增益,将增益最大且大于等于相应增益阈值的一次合并所合并小类的像素点分入所述第i+1个类别,记录合并为该第i+1个类别的多个小类。
在一个示例中,上述对小类合并的过程中,如果满足以下条件时可以结束整个合并过程:
在当前轮次,如果所有合并方式的增益均小于相应增益阈值的情况下,当前轮次不进行合并且结束整个合并过程;
在已合并的轮次等于设定的最大轮数或没有可以合并的小类时,结束整个合并过程;
整个合并过程结束如果还有未合并的小类,则将未合并的所有小类的像素点分入没有对应滤波系数的一个类别,在对第一视频帧进行邻域滤波时,该类别的像素点不参与滤波运算,像素值不需要更新。
在一个示例中,所述按不同方式将其中的部分或全部小类合并,包括:
遍历所有可能的合并方式,将其中的部分或全部小类合并,这是一种无约束的遍历方式,在小类的数量比较多时,会比较耗时,但找到最优的合并方式的可能性更大。
在一个示例中,所述按不同方式将其中的部分或全部小类合并,包括:
遍历满足约束条件时的可能的合并方式,将其中的部分或全部小类合并,所述约束条件包括以下一个或多个:
条件一,只能将队列中位置连续的多个小类合并;
条件二,每一轮合并时,先遍历队列位置最靠前的未合并小类与其他未合并小类之间可能的 合并方式,如合并失败再遍历其他可能的合并方式或结束整个合并过程;
其中,所述队列指按照对应取值区间中的值从小到大的顺序将所述多个小类排列成的队列。
有约束的遍历可以利用实验过程中找到的规律性来提高运算效率。在基于一些数据的实验中,发现遍历时合并的小类在队伍中的位置连续时,使得相邻类别即diff相似的像素点共享滤波系数,才可能有更好的结果,这可以有效降低算法的时间复杂度。
在一个示例中,对该类别的像素点进行邻域滤波产生的增益可以用该类别像素点滤波后的图像质量相对滤波前的图像质量的增强表示,例如可以用滤波后的图像质量与滤波前的图像质量的差值表示。本申请一个类别像素点的图像质量可以通过将该类别所有像素点的集合视为一幅子图像,用该子图像的PSNR、结构相似性(Structural Similarity,简称SSIM)、或平均结构相似性(Mean Structural Similarity,简称MSSIM)等质量参数表示。
在一个示例中,所述每次合并计算的增益均是加权后的增益,权值等于此次合并的所有小类中像素点的总数与所述第一视频帧中像素点的总数的比值,这样能够更好地反映局部增益对整个视频帧的影响,以更好地达到提高视频帧整体的图像质量的目的。
在该示例中,不同轮次设置的增益阈值可以相同也可以不同。
在一个示例中,小类的数量可以预先设定,而类别的数量可能会随着视频帧数据的不同而有所变化,但对小类进行合并尝试的最大轮次是可以设定的。在该示例中,所述小类的数量大于等于8且小于等于20,所述最大轮数等于1或2或3或4。
在一个示例中,一个类别对应的滤波系数为所述加权平均使用的一组加权系数;或者
所述一组加权系数包括对称矩阵中的2N个系数,一个类别对应的滤波系数包括所述对称矩阵中主对角线一侧的N个系数,N≥1。
下面结合图6对本实施例上述小类合并的过程再简述一下,如图所示,该过程包括:
步骤310,按邻域差异的取值区间将像素点分入多个小类;
小类与邻域差异的取值区间之间的对应关系可以是固定的,也可以按照约定的某种规则动态计算,此时不同的视频帧计算出的对应关系可能是不同的。但编码端和解码端可以按照相同的规则得到相同的对应关系。
步骤320,判断是否已达到合并的最大轮次?如果是,结束,如果否,执行步骤330;
本实施例的轮次设置为3,但也可以设置为1或2或大于3的数值。
步骤330,遍历未合并小类可能的合并方式,计算每次合并得到的类别的滤波系数和滤波的增益;
步骤340,判断计算出的增益中是否有大于相应增益阈值的增益?如果是,执行步骤350,如果否,结束;
增益阈值可以设置为0,或者某个正值。不同的轮次的增益阈值可以相同或不同。如果已经找不到带来预期增益的合并方式,则无需合并,可以结束整个合并过程。
步骤350,记录增益最大的合并方式及其得到的滤波系数;
在需要满足合并小类在队列中位置连续这一约束条件的情况下,合并方式可以用首尾两个小类的索引表示。例如一共有12个小类,第0个小类对应的取值区间为0,第1个小类对应的取值区间为1,……,第10个小类对应的取值区间为10,第11个小类对应的取值区间是11以上的所有取值。假定第一轮增益最大且大于等于相应增益阈值的合并方式是将第0个小类至第5个小类合并,则该合并方式可以用合并的6个小类中第1个小类的索引“0”和最后一个小类的索引“5”表示,记录为0,5。在其他示例中,也可以记录第1个小类的索引“0”以及合并的小类的数量“6”,或者用位图的方式将一个12比特的位图的前6位置为1,后6位置为0。
记录的滤波系数可以用于本地对第一视频帧进行邻域滤波,也可以不在本地对第一视频帧进行邻域滤波,例如可以在编码端利用原始视频帧为重建视频帧生成最优的滤波系数,将该最优的滤波系数发送到解码端,用于解码端对重建视频帧的质量增强。
步骤360,将被合并的小类的序号从队列中去除,判断是否还有可以合并的小类?如果是,转入步骤320,记录增益最大的合并方式及其得到的滤波系数;如果否,结束。
一般来说,如果还有2个未合并的小类,判断为还有可以合并的小类。如果所有小类均已合并完成,判断为没有可以合并的小类。对于还有1个未合并的小类的情况,可以判断为还有可以合并的小类,在后续的处理中直接将该小类的像素点分入下一个类别,计算相应的滤波系数和增益,在增益大于阈值时记录该类别的滤波系数和该小类的索引信息。对于还有1个未合并的小类的情况,也可以判断为没有可以合并的小类,不对这个小类的像素点进行滤波。或者结合该小类的像素点的个数判断, 在个数大于阈值时判断还有可以合并的小类,在个数小于阈值时判断没有可以合并的小类,等等。这里可以根据情况选择一种方式处理。
在本实施例一示例性的实施例中,
对视频图像序列中有损的第一视频图像,对其中一个第一视频图像通过小类合并的方式将所述多个小类的像素点分入所述多个类别后,对该第一视频图像后的第一个视频图像,采用与该第一视频图像相同的小类合并方式,将所述多个小类的像素点分入所述多个类别;或者
对视频图像序列中有损的第一视频图像,分别执行所述滤波系数生成方法。
本实施例对序列中相邻的视频图像,内容上通有较大的相似性,通过将前一视频图像的小类合并方式共享给后一视频图像,可以节省后一视频图像遍历合并方式和比较的时间,提高效率。对质量的影响较小。而每一视频图像都独立地进行小类合并,则滤波后通常会得到更好的质量增强效果。
本公开上述实施例的滤波系数生成方法和相应的视频滤波方法,针对像素点的邻域差异将像素点分成多个类别,每个类别的像素点用为该类别生成的滤波系数进行邻域滤波,充分地考虑到像素点邻域差异对邻域滤波的影响,可以显著提升滤波的效果。
本公开实施例的滤波系数生成方法和相应的视频滤波方法可以用于各种视频编解码***,用于对重建视频帧的滤波,提升图像质量。
以V-PCC***为例,V-PCC编解码框架如图7和图8所示。
图7示出了V-PCC编码端的视频编码装置的结构,该视频编码装置可实现以下视频编码处理:
3D patch(块)生成模块(3D patch generation)11基于输入的点云帧生成3D的patch后,分别输出到patch封装模块(patch packing)13、几何帧生成模块(Geometry image generation)19、纹理帧生成模块(Attribute image generation)15、patch序列压缩模块(Patch sequence compression)27以及平滑模块(Smoothing)17。Patch封装模块13对patch进行封装,生成占用图(Occupancy map)并输出到几何帧生成模块19和第一视频压缩(Video compression)模块21,第一视频压缩模块21对占用图进行压缩,输出经压缩的占用子流(occupancy sub stream)到多路复用器(Multiplexer)33,并输出重建占用图(Reconstructed occupancy map)到几何帧填充模块(image padding)23、纹理帧填充模块(image padding)25、纹理帧生成模块15和平滑模块17。几何帧生成模块19根据输入的占用图、3D patch和点云帧生成几何帧并输出到几何帧填充模块23,几何帧填充模块23根据输入的几何帧和重建占用图,输出已填充几何帧(Padded geometry)到第二视频压缩模块31,第二视频压缩模块31输出经压缩的几何子流(geometry sub stream)到多路复用器33,并输出重建几何帧(Reconstructed geometry image)到平滑模块17。平滑模块17根据重建占用图、3D patch对重建几何帧进行平滑处理,输出经平滑处理的重建几何帧到纹理帧生成模块15。纹理帧生成模块15根据输入的经平滑处理的重建几何帧、3D patch、重建占用图和点云帧生成纹理帧并输出到纹理帧填充模块25,纹理帧填充模块25输出已填充的纹理帧到第三视频压缩模块29。第三视频压缩模块29输出已压缩的纹理子流到多路复用器33。Patch序列压缩模块27则输出已压缩的Patch子流到多路复用器33。多路复用器33对输入的Patch子流、纹理子流、几何子流和占用子流进行复用后输出经压缩的码流(Compression bitstream)。
图8示出了V-PCC解码端的视频解码装置的结构。该视频解码装置可实现以下视频解码处理:
经压缩的码流经解多路复用器(Demultiplexer)41解复用,输出序列参数集(Sequence Paramater Set”,简称SPS)、Patch子流、纹理子流、几何子流和占用子流。SPS语法分析模块(SPS parsing)43对SPS进行语法分析后输出语法元素到Patch序列解压缩模块(Patch sequence decompression)45、第一视频解压缩模块(Video decompression)47、第二视频解压缩模块(Video decompression)49、第三视频解压缩模块(Video decompression)51、几何和纹理重建模块(Geometry/Attribute Reconstruction)53、几何后处理模块(Geometry Post-Processing(e.g.smoothing))55和纹理转换和平滑模块(Attribute transfer&smoothing)57。Patch序列解压缩模块45根据语法元素对输入的Patch子流解压缩,输出patch信息(patch information)到几何和纹理重建模块53。第一视频解压缩模块47根据语法元素对输入的占用子流解压缩,输出占用图到几何和纹理重建模块53。第二视频解压缩模块49根据语法元素对输入的几何子流解压缩,输出几何帧到几何和纹 理重建模块53。第三视频解压缩模块51根据语法元素对输入的纹理子流解压缩,输出纹理帧到几何和纹理重建模块53。几何和纹理重建模块53根据输入的语法元素、patch信息、占用图、几何帧和纹理帧得到重建几何帧和重建纹理帧,将重建几何帧输出到几何后处理模块55,将重建纹理帧输出到纹理转换和平滑模块57。几何后处理模块55根据语法元素对重建几何帧进行平滑处理后输出到纹理转换和平滑模块57,纹理转换和平滑模块57根据输入的语法元素、纹理帧和经平滑处理的重建几何帧进行纹理转换和平滑,输出经平滑处理的重建点云帧。
V-PCC中存在对有损压缩后解码得到的视频帧(video)的多次平滑处理,例如在编码端,对重建几何帧的平滑处理,及对重建纹理帧的平滑处理(图中未示出);又如在解码端,对重建几何帧的平滑处理和对重建点云的平滑处理。
对于纹理视频帧(Texture video/frame),V-PCC针对在三维空间中非相邻的块(patch)在图像内可能非常临近,导致基于patch的视频编码器可能将相邻像素混淆使得重建伪影出现的问题,可以采用颜色平滑算法,在重建帧中利用占有图(Occupancy map)找到patch边界点对应的像素,然后利用中值滤波进行平滑处理。对于几何视频帧(Geometry video/frame),V-PCC针对重建点云patch可能不连续的问题,可以采用边界滤波算法,找到patch边界点对应的像素,局部改变patch边缘的深度值。对于重建点云,针对patch不连续的问题,可以添加3D点,使边界连续。针对由于失真导致外点和噪点产生的问题,可以将重建点云网格化(grid),利用网格的中心对patch边缘点进行三线性滤波,提升点云的视觉效果。
V-PCC编解码框架在对解码后视频帧进行基础性重建后,只进行了图像平滑操作,重建点云质量依然有较大的提升空间。为了进一步提升压缩性能,在码流大小变化不大的基础上得到质量增强的点云是有益的。
对图像进行维纳滤波可以增强图像的质量,但如上文所述,图像的维纳滤波存在着一些缺点,比如在滤波器阶数K一定的情况下,对于大尺度图像以及局部变化较为剧烈的图像,利用同一组系数进行滤波得到的质量增强的图像效果并不好;如果一味增大K的大小,滤波效果只有小幅度提升,同时在一定程度上会增加码流大小及时间复杂度,甚至使得综合性能更差。而更为常用的自适应环路滤波器,也是以维纳滤波为基础,将滤波器分为25个类别,计算图像中每个4×4小块的梯度方向和变化剧烈程度,据此确定该块的滤波器类别,计算系数并进行滤波。不过这种方法难以应用在编解码质量增强方面,因为没有专门针对V-PCC图像帧patch紧密排列、邻域变化大的特点,同时此方法需要传递的数据较多,码流开支大。
在2D图像的编解码方面也存在着类似的问题,这里不再赘述。
针对以上问题,本公开实施例基于邻域差域的视频滤波方法,提出了可以应用于各种视频编解码框架的视频编码方法和视频解码方法。
本公开一实施例提出一种视频编码方法,如图9所示,包括:
步骤410,按照本公开任一实施例所述的滤波系数生成方法,将第一视频图像中的像素点分成多个类别,为所述多个类别中的部分或全部类别分别生成对应的滤波系数,一个所述类别对应一组滤波系数;
步骤420,对滤波参数编码并发送,或者对所述滤波系数中符合发送条件的滤波参数编码并发送,其中,所述滤波参数包括所述滤波系数和类别信息。
本实施例在编码端按照前述滤波系数生成方法将像素点分类并得到各类别的滤波系数之后,将滤波参数编码发送。可以利用编码端的滤波系数,提升解码端的滤波效果,减轻解码端的运算量。特别在采用维纳滤波等算法时,可以利用编码端的原始视频图像计算出最优的滤波系数,使得解码端也可以利用该最优的滤波系数,对视频图像进行质量增强。
本实施例编码端生成的滤波系数可以直接发送或者先进行判决再有选择性的发送。因为发送滤波参数会增大码流,影响编码效率。
在本公开一示例性的实施例中,所述对所述滤波系数中符合发送条件的滤波参数编码并发送,包括:分组判断生成的滤波系数是否符合发送条件,将符合发送条件的每一组滤波系数及该组滤波系数对应的一个类别的信息进行编码并发送;
其中,所述发送条件包括以下条件中的任意一种或更多种:
对一组滤波系数,使用该组滤波系数对所述第一视频图像中对应类别的像素点进行邻域滤波,获 得的增益大于相应的增益阈值;
对一组滤波系数,使用该组滤波系数对所述第一视频图像中对应类别的像素点进行邻域滤波时的率失真相对不进行所述邻域滤波时的率失真变小且变小的量大于相应的率失真增益阈值。
在通过小类合并为类别的情况下,上述发送条件中使用的增益阈值可以与合并过程中使用的增益阈值相同,也可以不同。例如,在发送条件中使用的增益阈值可以高于合并过程中使用的增益阈值。
率失真(distortion-rate)用于衡量图像失真度与编码码率二者之间的相互关系。其中的图像失真度可以采用原始图像与重建图像之间的PSNR来衡量,可以是亮度分量的PSNR,或者亮度分量的PSNR与色度分量的PSNR的线性组合。编码码率表示基于选取的编码参数、量化参数、预测模式最终所需传输的总体编码数据的多少。率失真可以通过代价函数计算得到,率失真越小,表示编码器的编码效率越高。上述率失真增益阈值可以设置为0,或大于0的某个数值。所述变小的量即不进行所述邻域滤波时的率失真减去进行邻域滤波时的率失真得到的差。
在本公开一示例性的实施例中,所述将符合发送条件的每一组滤波系数及该组滤波系数对应的一个类别的信息进行编码,包括:在每一组滤波系数及该组滤波系数对应的一个类别的信息前增加一个标志,用于指示是否存在类别信息和滤波系数。在一示例中,至少一个所述类别的信息用合并为该类别的多个小类的索引信息表示,每一个小类对应于一个约定的邻域差异的取值区间。上文已经给出示例,这里不再赘述。
因为本实施例的滤波系数是与类别对应的,因此需要在传输滤波系数的同时传输与其对应的类别信息。因为不满足发送条件或合并成的类别数量是变化的,实际传输的滤波系数的组数也是可变的,部分组或全部组的滤波系数可能为空,所以可以为每一组滤波系数设置一个标志位,来指示是否存在类别的信息和滤波系数。编码端和解码端可以约定一个最多可以传输的组数,例如可设置为前述的最大轮数。
在本公开一示例性的实施例中,所述视频编码方法应用于基于视频的点云压缩***的编码端。
依据V-PCC的编码过程,对于每一帧输入点云,在编码前都会被划分成一块块patch,patch间紧密排列,并映射到远层和近层两幅单通道图像上,即生成几何视频帧(Geometry video);而每个patch的纹理信息则会映射到两幅三通道图像上,即生成纹理视频帧(Attribute video);每帧中两幅几何/纹理图像差距很小。占用图(Occupancy map)用来表示视频(video)中有用像素的占用情况。在V-PCC几何编码部分,生成的信号包括重建几何视频帧(有损)、原始几何视频帧(无损)。编码端输出的经压缩的码流中包括写有编码信息的几何码流。
本实施例中,考虑到几何视频帧背景为黑色,而一些几何视频帧中前景的一块块patch灰度分布不均匀,且在与背景交界处亮度变化极为剧烈,如图10所示,如果采用传统的维纳滤波方法,整个几何视频帧使用一组滤波系数并进行滤波,效果很难达到预期。经过测试,如果采用k=25的滤波器直接进行维纳滤波,图像的峰值信噪比(peak signal to noise ratio,简称PSNR)甚至会下降0.05dB左右。这是由于没有充分考虑到在几何视频帧中每个像素点之间邻域差异较大的特点。因此可以使用本公开实施例基于邻域差异的自适应维纳滤波方法对重建几何视频帧进行滤波,以提升重建几何视频帧的质量。
本实施例中,所述第一视频帧包括有损的重建几何视频帧;
所述视频编码方法在生成滤波系数之后,还包括:对所述重建几何视频帧进行邻域滤波,将滤波后的重建几何视频帧用于对应的纹理视频帧的生成;其中,对所述重建几何视频帧进行邻域滤波时,所述重建几何视频帧中有对应滤波系数的每一类别的像素点使用该类别对应的一组滤波系数进行滤波。
为了执行本实施例的视频编码方法,可以在V-PCC编码端增加一个维纳滤波装置,图11示出了相关的局部架构。如图所示,本实施例在编码装置中增加了一个维纳滤波模块35,也可以称为维纳滤波器。该维纳滤波模块35接收从平滑模块17输出的经平滑处理的重建几何视频帧(简称为重建几何帧),以及从几何帧填充模块23(或几何帧生成模块19)输出的原始几何视频帧(简称为原始几何帧)。对所述重建几何视频帧进行维纳滤波,滤波后的重建几何视频帧输出到纹理帧生成模块15,用于对应的纹理视频帧的生成。经维纳滤波后,重建几何视频帧的质量得到增强,这也能够使得生成的纹理视频帧的质量得到提升。在本实施例中,维纳滤波模块35同样可以记录生成的滤波系数,编码后随几何视频帧数据一起发送,或者作为语法元素发送。
图12是可以执行本实施例视频编码方法的另一种示例性的架构。如图所示。本示例中,增加的维纳滤波模块35接收从第二视频压缩模块31输出的重建几何视频帧,以及从几何帧填充模块23(或几何帧生成模块19)输出的原始几何视频帧。滤波后的重建几何视频帧输出到平滑模块17。也可以达到与图11架构类似的效果。
在其他示例中,也可以取消平滑模块17,维纳滤波模块35接收从第二视频压缩模块31输出的重建几何视频帧,以及从几何帧填充模块23输出的原始几何视频帧。滤波后的重建几何视频帧输出到纹理帧生成模块15。
容易理解,在本公开另一示例性的实施例中,也可以采用本实施例的维纳滤波方法对重建纹理视频帧进行滤波,生成相应的滤波系数并编码和发送。这里不再赘述。
在本公开一示例性的实施例中,所述视频编码方法应用于基于视频的点云压缩***的编码端,所述邻域滤波为维纳滤波;所述第一视频图像包括重建几何视频图像;或者,所述第一视频图像包括重建纹理视频图像。在本实施例编码端可以只生成滤波系数而进行维纳滤波,该滤波系数编码后发送到解码端,用于解码端重建几何视频图像和/或重建纹理视频图像的质量增强。
在本公开一示例性的实施例中,所述第一视频图像包括点云帧映射成的两幅重建视频图像;
所述视频编码方法还包括:对同一点云帧映射成的两幅重建视频图像,按照如本公开进行小类合并的任一实施例所述的滤波系数生成方法为第一幅重建视频图像生成滤波系数后,在为第二幅重建视频图像生成滤波系数时,对第二幅重建视频图像的像素点划分小类后,采用与第一幅重建视频图像相同的小类合并方式,将所述多个小类的像素点分入所述多个类别;其中,所述重建视频图像包括有损的几何视频图像或纹理视频图像。因为同一点云帧映射成的两幅重建视频图像,共享合并方式可以提高编码速度和效果,对质量的影响可控。
在本公开一示例性的实施例中,所述视频编码方法同样可应用于处理二维视频图像的视频编解码***,例如。H.264/AVC、H.265/HEVC、VVC/H.266及其他类似标准的视频编解码***。
如图13所示,一种传统的视频编码装置1000包含预测处理单元1100、划分单元1101、残差产生单元1102、变换处理单元1104、量化单元1106、反量化单元1108、反变换处理单元1110、重建单元1112、滤波器单元1113、已解码图片缓冲器1114、图像分辨率调整单元1115,以及熵编码单元1116。预测处理单元1100包含帧间预测处理单元121和帧内预测处理单元1126。在其他实施例中,视频编码器20可以包含比该示例更多、更少或不同功能组件。
划分单元1101与预测处理单元1100配合将接收的视频数据划分为切片(Slice)、CTU或其它较大的单元。划分单元1101接收的视频数据可以是包括I帧、P帧或B帧等视频帧的视频序列。
预测处理单元1100可以将CTU划分为CU,对CU执行帧内预测编码或帧间预测编码。对CU做帧内预测和帧间预测时,可以将CU划分为一个或多个预测单元(PU:prediction unit)。
帧间预测处理单元1121可对PU执行帧间预测,产生PU的预测数据,所述预测数据包括PU的预测块、PU的运动信息和各种语法元素。
帧内预测处理单元1126可对PU执行帧内预测,产生PU的预测数据。PU的预测数据可包含PU的预测块和各种语法元素。
残差产生单元1102可基于CU的原始块减去CU划分的PU的预测块,产生CU的残差块。
变换处理单元1104可将CU划分为一个或多个变换单元(TU:Transform Unit),TU关联的残差块是CU的残差块划分得到的子块。通过将一种或多种变换应用于TU关联的残差块来产生TU关联的系数块。
量化单元1106可基于选定的量化参数对系数块中的系数进行量化,通过调整QP值可以调整对系数块的量化程度。
反量化单元1108和反变换单元1110可分别将反量化和反变换应用于系数块,得到TU关联的重建残差块。
重建单元1112可将所述重建残差块和预测处理单元1100产生的预测块相加,产生CU的重建块。
滤波器单元1113对所述重建块执行环路滤波后,将其存储在已解码图片缓冲器1114中作为参考图像。帧内预测处理单元1126可以从已解码图片缓冲器1114中提取PU邻近的块的参考图像以执行帧内预测。帧间预测处理单元1121可使用已解码图片缓冲器1114缓存的上一帧的参考图像对当前帧图像的PU执行帧间预测。
图像分辨率调整单元1115对已解码图片缓冲器1114中存储的参考图像进行重采样,可以包括上 采样和/或下采样,得到多种分辨率的参考图像保存在已解码图片缓冲器1114中。
熵编码单元1116可以对接收的数据(如语法元素、量化后的***块、运动信息等)执行熵编码操作。
将本公开实施例的视频编码方法应用于图13所示的编码架构时,在一个示例中,可以在图中的滤波器单元1113和已解码图片缓冲器1114之间增加一个维纳滤波单元1128,此时断开滤波器单元1113到已解码图片缓冲器1114的输出。该维纳滤波单元接收滤波器单元1113输出的已滤波的重建视频图像(可以是相应标准中规定的任何规格的视频图像),并且从划分单元1101获取相应的原始视频图像,对重建视频信号进行维纳滤波,滤波后的重建视频图像输出到已解码图像缓冲器1114保存。类似的,在其他示例中,该维纳滤波单元1128也可以设置在滤波器单元1113之前,接收加法器1112输出的重建视频图像以及划分单元1101输出的原始视频图像,对重建视频图像进行维纳滤波后输出到滤波器单元1113,或者该维纳滤波单元1128可以集成在滤波器单元1113中,或者替代滤波器单元1113。
本公开一实施例还提供了一种码流,其中,所述码流为已编码视频码流,所述码流中包括已编码的滤波参数,所述滤波参数包括用于对第一视频图像进行邻域滤波的滤波系数。
本实施例在已编码视频码流中携带用于对第一视频图像进行邻域滤波的滤波参数,可以将编码端得到的最优的滤波系数传输到解码端,用于对视频图像的增强,提升重建视频图像的质量。
在本公开一示例性的实施例中,所述已编码的滤波参数包括一个或多个信息单元,每个所述信息单元包括以下子单元:
标志子单元,设置为指示是否存在滤波系数和类别信息;
索引子单元,设置为写入一个类别的信息,或为空,其中该类别的信息用该类别的索引信息或者合并为该类别的多个小类的索引信息表示;
系数子单元,设置为写入一组滤波系数,或为空;其中,所述索引子单元中的类别是该组滤波系数对应的类别。
通过上述数据格式传递滤波参数,可以使得编码端具有足够的编码灵活性,可以根据情况发送或不发送部分乃至全部的滤波系数,从而保证获得较高的编码效率。而解码端通过对标志的解析,可以快速、正确读取到所需要的滤波参数。
在本公开一示例性的实施例中,所述码流是基于视频的点云压缩***的编码端发送的码流,所述第一视频图像包括重建视频图像;
所述已编码的滤波参数携带在视频图像数据流中,位于几何码流中的分隔符之后和所述重建视频图像的数据之前,其中,所述重建视频图像包括有损的重建几何视频图像或重建纹理视频图像;或者
所述已编码的滤波参数携带在码流的序列参数集中。
本公开一实施例还提供了一种视频解码方法,如图14所示,包括:
步骤510,解码码流,确定第一视频图像的滤波参数,所述滤波参数包括滤波系数;
步骤520,根据所述滤波参数对所述第一视频图像进行邻域滤波。
本实施例从码流中解析出编码端发送的滤波系数,用于对解码得到的所述第一视频图像进行邻域滤波,可以减轻解码端的运算负担。而且编码端在存在原始视频图像的情况下,更容易生成最优的滤波系数,从而使得解码端的滤波效果更好。
本公开一示例性的实施例中,所述滤波参数包括一组或多组滤波系数,以及其中每一组滤波系数对应的一个类别的信息;其中,所述一个类别是根据所述第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成的多个类别中的一个,所述一个类别对应所述邻域差异的一个或多个取值区间。此处限定所述滤波参数包括一组或多组滤波系数,是限定滤波参数的内容。但并不表示相应的解码装置对所有视频图像都必须收到有效的滤波系数和类别信息,解码装置也可能没有收到任何的滤波系数和类别信息,或者只收到表示不存在滤波系数和类别信息的标志。
本公开一示例性的实施例中,所述第一视频图像中一个像素点的邻域差异根据该像素点的像素值与该像素点邻域中每一像素点的像素值之差的绝对值统计得到,所述统计为求和、求均值或者求最大值;或者,所述第一视频图像中一个像素点的邻域差异根据该像素点及该像素点邻域中像素点的像素值之间的差异确定;其中,该像素点邻域指该像素点的八邻域或四邻域或对角邻域。 关于领域差异如何计算可参见上文。
本公开一示例性的实施例中,所述解码码流,确定第一视频图像的滤波参数,包括:
对码流中携带所述滤波参数的一个或多个信息单元分别解析,对每一个所述信息单元,先读取1位标志,如该标志的值表示存在滤波系数和类别信息,再读取一组滤波系数和该组滤波系数对应的一个类别的信息。如该标志的值表示不存在滤波系数和类别信息,则继续读取后续的其他信息单元。这种解析方式是基于信息单元的标志+类别信息+滤波系数的数据格式,具体可参见本公开实施例关于码流的说明。
所述类别的信息可以为类别的索引信息,
在本公开一示例性的实施例中,根据所述滤波参数所述第一视频图像进行邻域滤波,包括:对解析出的每一组滤波系数,根据该组滤波系数对应的该类别的信息确定该类别对应的邻域差异的取值区间,对所述第一视频图像进行邻域滤波时,所述第一视频图像中邻域差异的值属于所述取值区间的像素点使用该组滤波系数进行滤波。
在本公开一示例性的实施例中,至少一个所述类别的信息包括合并为该类别的多个小类的索引信息,每一小类对应一个约定的邻域差异的取值区间;每一所述类别对应的邻域差异的取值区间是合并为该类别的多个小类对应的取值区间的并集。
在本公开一示例性的实施例中,所述视频解码方法应用于基于视频的点云压缩***的解码端;
所述第一视频图像包括有损的重建几何视频图像,对所述重建几何视频图像进行邻域滤波后,所述视频解码方法还包括:将滤波后的重建几何视频图像用于对应的纹理视频图像的质量增强;或者
所述第一视频图像包括有损的重建纹理视频帧。
在V-PCC***的解码端执行本公开实施例的视频解码方法,在一个示例中,可以采用如图15所示的视频解码装置,该视频解码装置在原有架构中增加了一个维纳滤波模块59,该维纳滤波模块59接收几何后处理模块55输出的重建几何视频帧,根据解析出的滤波参数对所述重建几何视频帧进行滤波,维纳滤波后的重建几何视频帧再送入纹理转换和平滑模块57用于生成重建点云。所述滤波参数可以从SPS语法分析模块43中获取,也可以从几何子流中解析得到,如可以由第二视频解压缩模块49解析出所述滤波参数再输出到维纳滤波模块59。在另一示例中,也可以将维纳滤波模块59设置在几何和纹理重建模块53和几何后处理模块55之间,或者与几何后处理模块55集成在一起,或者替代原来的几何后处理模块55。本实施例的所述滤波参数包括一组或多组滤波系数,以及其中每一组滤波系数对应的一个类别的信息。
在本公开一示例性的实施例中,所述视频解码方法同样可应用于处理二维视频图像的视频编解码***,例如。H.264/AVC、H.265/HEVC、VVC/H.266及其他类似标准的视频编解码***。所述邻域滤波为维纳滤波;所述第一视频图像包括重建视频图像。
如图15所示,一种传统的视频解码装置101包含熵解码单元150、预测处理单元152、反量化单元154、反变换处理单元156、重建单元158(图中用带加号的圆圈表示)、滤波器单元159,以及图片缓冲器160。在其它实施例中,视频解码器30可以包含更多、更少或不同的功能组件。
熵解码单元150可对接收的码流进行熵解码,提取语法元素、量化后的系数块和PU的运动信息等信息。预测处理单元152、反量化单元154、反变换处理单元156、重建单元158以及滤波器单元159均可基于从码流提取的语法元素来执行相应的操作。
作为执行重建操作的功能组件,反量化单元154可对量化后的TU关联的系数块进行反量化。反变换处理单元156可将一种或多种反变换应用于反量化后的系数块以便产生TU的重建残差块。
预测处理单元152包含帧间预测处理单元162和帧内预测处理单元164。如果PU使用帧内预测编码,帧内预测处理单元164可基于从码流解析出的语法元素确定PU的帧内预测模式,根据确定的帧内预测模式和从图片缓冲器件60获取的PU邻近的已重建参考信息执行帧内预测,产生PU的预测块。如果PU使用帧间预测编码,帧间预测处理单元162可基于PU的运动信息和相应的语法元素来确定PU的一个或多个参考块,基于所述参考块来产生PU的预测块。
重建单元158可基于TU关联的重建残差块和预测处理单元152产生的PU的预测块(即帧内预测数据或帧间预测数据),得到CU的重建块。
滤波器单元159可对CU的重建块执行环路滤波,得到重建的图片。重建的图片存储在图片缓冲器160中。图片缓冲器160可提供参考图片以用于后续运动补偿、帧内预测、帧间预测等,也可将重建的视频数据作为已解码视频数据输出,在显示装置上的呈现。
上述显示器105例如可以是液晶显示器、等离子显示器、有机发光二极管显示器或其它类型的显示装置。在其他示例中,解码端也可以不包含显示器105,而是包含可应用解码后数据的其他装置。
将本公开实施例的视频解码方法应用于该视频解码装置时,如图16所示,需要增加维纳滤波单元166,此时断开滤波器单元159到图片缓冲器160的输出。如图16的虚线所示,可以在图中的滤波器单元159和图片缓冲器160之间增加一个维纳滤波单元166,该维纳滤波单元接收滤波器单元159输出的经滤波的重建视频图像(可以是相应标准中规定的任何视频图像),并且从熵编码单元150接收解析得到的滤波参数(包括滤波系数和类别信息),对所述重建视频图像进行维纳滤波,滤波后的重建视频信号保存在图片缓冲器160。在其他示例中,该维纳滤波单元也可以设置在滤波器单元159之前接收加法器158输出的重建视频图像,对重建视频图像进行维纳滤波后输出滤波器单元159,或者该维纳滤波单元可以集成在滤波器单元159中,或者替代滤波器单元159。本实施例的所述滤波参数包括一组或多组滤波系数,以及其中每一组滤波系数对应的一个类别的信息。
本公开上述实施例的视频编码装置和/或视频解码装置可使用以下电路中的任意一种或者以下电路的任意组合来实现:一个或多个微处理器、数字信号处理器、专用集成电路、现场可编程门阵列、离散逻辑、硬件。如果部分地以软件来实施本公开,那么可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本公开实施例的方法。
本公开一实施例提供了一种视频编码装置,应用于基于视频的点云压缩***,参见图11和图12,包括纹理帧生成模块,以及依次连接的几何帧生成模块、几何帧填充模块和几何帧视频压缩模块(对应于图11和图12中的第二视频压缩模块31),其中,还包括:维纳滤波模块,设置为接收所述几何帧视频压缩模块输出的重建几何视频帧,及所述几何帧生成模块或几何帧填充模块输出的原始几何视频帧,执行如本公开对重建几何视频帧进行维纳滤波的实施例所述的视频编码方法,输出滤波后的重建几何视频帧到所述纹理帧生成模块。需要说明的是,维纳滤波模块接收所述几何帧视频压缩模块输出的重建几何视频帧,可以是几何帧视频压缩模块直接或间接输出给维纳滤波模块(如中间经过平滑模块),而维纳滤波模块输出滤波后的重建几何视频帧到所述纹理帧生成模块,也可以是直接或间接输出到纹理帧生成模块(如中间经过平滑模块)。
本公开一实施例还提供了一种视频编码装置,如图22所示,包括处理器5以及存储有计算机程序的存储器6,其中,所述处理器5执行所述计算机程序时实现如本公开任一实施例所述的视频编码方法。
本公开一实施例还提供了一种视频解码装置,包括几何帧重建模块和纹理转换模块,参见图15,还包括:
维纳滤波模块,设置为接收所述几何帧重建模块输出的重建几何视频帧,及从码流中解析得到的滤波参数,执行如本公开任一实施例所述的视频解码方法,输出滤波后的重建几何视频帧到所述纹理转换模块。
本实施例所述的几何帧重建模块可以集成在图15中的几何和纹理重建模块53中,几何帧重建模块输出的重建几何视频帧可以直接或间接输出到所述维纳滤波模块(或叫维纳滤波器,其他实施例同此)。本实施例所述的纹理转换模块可以集成在图15中的纹理转换和平滑模块57中,维纳滤波模块滤波后的重建几何视频帧可以直接或间接地输出到纹理转换模块。
本公开一实施例还提供了一种视频解码装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频解码方法。
本公开一实施例还提供了一种视频编解码***,包括如本公开任一实施例所述的视频编码装置和如本公开任一实施例所述的视频解码装置。
本公开一实施例还提供了一种视频滤波装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的视频滤波方法。
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的方法。
上述视频编码装置和解码装置中的处理器可以是通用处理器,包括中央处理器 (Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本公开一实施例还提供了一种针对V-PCC重建几何帧进行基于邻域差异的自适应维纳滤波的方法,以达到增强点云质量及主观效果的目的。本实施例提出在编解码端依据数据及滤波效果设置一个或多个维纳滤波器,编码端通过计算得到最优的滤波系数并传递到解码端,解码端解码出所述最优的滤波系数后对点云重建几何帧做后处理。本实施例针对的是V-PCC几何有损、属性有损的编码方式。
在编码端操作如下:
依据V-PCC的编码过程,对于每一帧输入点云,在编码前都会被划分成一块块patch,patch间紧密排列,并映射到远层和近层两幅单通道图像上,即生成Geometry video;而每个patch的纹理信息则会映射到两幅三通道图像上,即生成Attribute video;每帧中两幅几何/纹理图像差距很小。Occupancy map用来表示video中有用像素的占用情况。在V-PCC几何编码部分,我们可以获得重建(有损)几何视频帧与原始(无损)几何视频帧,同时得到写有编码信息的几何码流。
本实施例中,考虑到几何视频帧背景为黑色,而前景即一块块patch灰度分布不均匀,且在与背景交界处亮度变化极为剧烈,提出了基于邻域差异的自适应维纳滤波技术。如果采用传统的维纳滤波方法,整幅图像计算一组最优系数并进行滤波,效果很难达到预期。实际上,经过测试,如果采用k=25的滤波器直接进行维纳滤波,图像PSNR甚至会下降0.05dB左右。这是由于没有充分考虑到在几何视频帧中每个像素点之间邻域差异较大的特点。
基于此,该实施例计算了每个像素点与其八邻域每个点像素值差的绝对值的和,记为diff,并利用该值进行分类。在实际测试中,将diff限制在0~11范围内(大于11的取值11),即共分为12个小类,并进行最佳组合点(即最佳合并方式)的寻找,从该12小类中组合出3个类别。最佳合并方式的确定参见图6及相应说明。
在本实施例中,遍历的合并方式中用于合并的小类对应的diff值都是相邻的,基于测试显示,相邻的类别在共享滤波系数时有更好的结果,也能有效降低时间复杂度;同时,在计算每种合并方式的最大增益时,对PSNR增加值表示的增益进行加权后再比较,因为不同合并方式下合并成的类别中的像素点的数目不同,当点数多时,PSNR提升更难,且对整幅图像的质量增强贡献更大,因此利用当前合并方式得到的类别中像素点的数量与整幅图像分辨率大小的比值作为权重对增益进行加权。
本实施例对每一视频帧(即每幅图像)设置的合并的最大轮次为3,最多可以生成三组滤波系数,三组滤波系数对应的3个类别不一定覆盖视频帧中所有的像素点,余下的像素点可以视为一个没有对应滤波系数、不需要滤波的类别。且后续需要判断基于每一类别的滤波系数对该类别像素点进行维纳滤波是否会带来整体性能的提升,如果不能,可以不传递所有的滤波系数。实际测试证明,大多数几何视频帧图像仅对第一个类别的像素点滤波会带来正增益,也即只需要传递第一组滤波系数与合并方式的信息(即对应的类别的信息);少数情况下,对前两个类别的像素点按照各自的滤波系数进行滤波都有增益。本实施例设置三个类别,可以最大程度保证质量提升的可能性。
考虑到每个点云帧会生成两张几何视频帧(Geometry video),而这两张图像的差异极小,因此本实施例提出了两幅图像共享最佳合并方式的方法:第二张图像计算出差异值diff后,不再进行合并方式的遍历,而是直接利用前一张图像帧保存的合并方式信息进行像素点的组合并进行后续操作。实验证明,该方法能有效降低时间复杂度,同时最后的性能与原方法几乎没有差别。
在一示例中,得到三个类别的重建像素点集以及对应的原始像素点集后,可以对每个类别的像素点进行维纳滤波操作。相较于原始的矩形框滤波器,本实施例中改变了滤波器形状,采用了菱形滤波器,这样可以更有效地提取邻域信息。本实施例中采用k阶滤波器,k=25。
对于具有对应的一组滤波系数的类别下的像素点,该实施例会计算滤波后这些像素点与原始点的PSNR以及率失真性能,并与滤波前相比较。其中利用代价函数来进行率失真的权衡:
J=D+λ×R
其中,D是原始点集(原始像素点集)与重建点集(不滤波时的重建像素点集)或滤波点集(滤波后的重建像素点集)的SSE,即对应点误差的平方和;λ是与量化参数QP有关的量,该方案中取
Figure PCTCN2021144056-appb-000003
R为比特流大小。如果滤波后的代价J f小于不滤波时的重建点集的代价J r,则该组的标志设为1,保存该组系数和最佳的合并方式(或称组合点);否则标志设为0,不保存其余信息。
最后,将需要传递至解码端的信息写入码流。具体写入码流的步骤如下:在H.264编码框架中, 常用0x00 00 00 01作为分隔符,用于开始一段新的编码,之后便是NALU(网络抽象层单元),即帧数据信息。在V-PCC几何视频帧编码中,可以得到几何码流(也称为几何子流)数据,也即从分隔符开始的信息,因此可以将本实施例所需要传递的滤波系数相关的数据写入码流中分隔符之后的位置,完成后再将原有数据写入。在解码端读取码流时,由于分隔符的存在,可以正确定位,之后从第五个字节开始读取滤波系数相关的数据,即可得到编码端传递的滤波参数;读取完滤波系数相关的数据后,再将后续的几何码流数据恢复即可。
编码端需写入的数据为:对于每一视频帧的每一类别,首先写入标志(bool类型);如果标志为0,则不再继续写入;如果标志为1,则写入最佳的合并方式也即类别的信息(char类型),并写入k个滤波器系数(转化为int类型)。类别的信息可以用小类的索引表示,在队列中位置连接的多个小类可以用第一个小类的索引和最后一个小类的索引表示即可。实验证明,本实施例的系数传递方法是可行的,在解码端并没有信息的丢失。
在解码端操作如下:
由于编码端滤波器所传递的滤波系数的相关数据在几何视频帧数据之前,V-PCC解码定位到几何码流后,先从第五位起按照编码端的格式读取维纳滤波标志、类别的信息及一组滤波系数,之后恢复原始码流,继续V-PCC解码操作,得到重建几何视频帧,将重建几何视频帧与滤波参数传递至维纳滤波器。
根据码流中解析得到的标志可判断出每一帧中哪些类别的像素点需要滤波,将需要滤波的像素点按照类别的信息(如小类的合并方式)分入各自的类别,利用各类别对应的滤波器系数分别进行滤波,并将滤波后的像素值替换重建点云对应的像素点的像素值;所有帧的滤波完毕后,即得到质量增强的几何视频帧。之后返回V-PCC程序。由于几何视频帧的优化,可以使得重建点云中点的位置与真实值更为接近,从而提升整个点云的质量,同时有更好的主观效果。
本实施例提出的方法在V-PCC参考软件TMC2V14.0上实现后,在CTC-C2测试条件下对MPEG的测试序列的前32帧进行了测试(cat2-A&cat2-B&cat2-C),采用C2编码方式,几何有损-属性有损(帧内)。测试结果的截图如图17和图18所示。图中BD-Rate就是衡量压缩率的一个指标,在相同PSNR下所用的码流对比。D1是点到点的PSNR,D2是点到平面的PSNR。Geom.BD-TotGeomRate是几何PSNR相对于几何码流的BD-Rate,End-to-End BD-AttrRate是端到端属性PSNR相对于属性码流的BD-Rate,Geom.BD-TotalRate是几何PSNR相对于总码流的BD-Rate,End-to-End BD-TotalRate是端到端属性PSNR相对于总码流的BD-Rate。
图17和图18中示出了CTC_C2上每个序列的测试结果,图17和图18下面几行均表示CTC_C2上测试的平均增益,其中,图17所示的BD-TotalRate是几何或纹理质量提升相对于整体码流来看,总的压缩率提升;图18所示是几何质量提升针对几何码流,纹理质量提升针对纹理码流,各自的压缩率提升。从图中可以看出,相较于原始程序,经过几何视频帧自适应维纳滤波后,点云的质量有了较大的提升,同时压缩效率进一步增加,BD-Rate有显著降低。从几何或颜色属性质量相对于整体码流大小来看(即BD-TotalRate),颜色属性的压缩率变化不大,而几何属性有了很大的提升。从二者相对于各自码流大小来看,颜色属性的BD-Rate也有少量的降低,这是几何质量提升而对颜色编码与重建产生的有益影响;Cat2-C序列几何增益依旧可观,说明该算法对于重建质量较高的点云质量提升效果更为明显。
本实施例同样可以明显提升重建点云的主观质量。图19、图20和图21展示了在R1码率下,点云序列redandblack_vox10_1450.ply质量增强前后效果对比图。其中图19为原始点云(Ground Truth),图20为重建点云,图21为进行几何视频帧质量增强后的点云。从图片中可以明显看出,几何视频帧进行基于邻域差异的自适应维纳滤波后,得到的点云相对于重建点云,边界轮廓更为平滑,部分离群点也回到了正确的位置,给人以更好的主观感受。
本实施例提出的V-PCC点云后处理质量增强方法,采用了基于邻域差异的自适应维纳滤波算法,至少具有以下特点:
针对几何视频帧图像中Patch与背景交界处变化较大的特点,不再进行全图像的维纳滤波,而是先计算每个像素点的八邻域总差异,根据差异值大小进行小类的划分。
进一步合并和分类,将共享滤波器参数后质量提升最多像素点的组合成大的类别,共分成三个类别,依次进行维纳滤波并得到三组滤波系数。
滤波系数等信息是否传递到解码端,是依据率失真指标等来确定,当滤波系数、合并方式等数据的码流代价大于质量提升的效果时,不进行滤波系数的传递。
滤波器形状采用菱形,以更有效率地提取邻域信息。
基于同一点云帧中远近层图像差异极小的特点,第二帧图像的最佳合并方式不再重新计算,而是直接确定为与第一帧图像相同,保证效果的同时可以减少时间复杂度。这里的第一帧图像和第二帧图像可以是几何视频帧,也可以是纹理视频帧,均包含近层和远层的两帧图像。
依据H.264编码特点,提出了将额外的数据写入码流的一种方式。
在一个或多个示例性实施例中,所描述的功能可以硬件、软件、固件或其任一组合来实施。如果以软件实施,那么功能可作为一个或多个指令或代码存储在计算机可读介质上或经由计算机可读介质传输,且由基于硬件的处理单元执行。计算机可读介质可包含对应于例如数据存储介质等有形介质的计算机可读存储介质,或包含促进计算机程序例如根据通信协议从一处传送到另一处的任何介质的通信介质。以此方式,计算机可读介质通常可对应于非暂时性的有形计算机可读存储介质或例如信号或载波等通信介质。数据存储介质可为可由一个或多个计算机或者一个或多个处理器存取以检索用于实施本公开中描述的技术的指令、代码和/或数据结构的任何可用介质。计算机程序产品可包含计算机可读介质。
举例来说且并非限制,此类计算机可读存储介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来以指令或数据结构的形式存储所要程序代码且可由计算机存取的任何其它介质。而且,还可以将任何连接称作计算机可读介质举例来说,如果使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令,则同轴电缆、光纤电缆、双纹线、DSL或例如红外线、无线电及微波等无线技术包含于介质的定义中。然而应了解,计算机可读存储介质和数据存储介质不包含连接、载波、信号或其它瞬时(瞬态)介质,而是针对非瞬时有形存储介质。如本文中所使用,磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘或蓝光光盘等,其中磁盘通常以磁性方式再生数据,而光盘使用激光以光学方式再生数据。上文的组合也应包含在计算机可读介质的范围内。
可由例如一个或多个数字信号理器(DSP)、通用微处理器、专用集成电路(ASIC)现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一个或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文描述的功能性可提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或并入在组合式编解码器中。并且,可将所述技术完全实施于一个或多个电路或逻辑元件中。
本公开实施例的技术方案可在广泛多种装置或设备中实施,包含无线手机、集成电路(IC)或一组IC(例如,芯片组)。本公开实施例中描各种组件、模块或单元以强调经配置以执行所描述的技术的装置的功能方面,但不一定需要通过不同硬件单元来实现。而是,如上所述,各种单元可在编解码器硬件单元中组合或由互操作硬件单元(包含如上所述的一个或多个处理器)的集合结合合适软件和/或固件来提供。

Claims (38)

  1. 一种视频解码方法,包括:
    解码码流,确定第一视频图像的滤波参数,所述滤波参数包括滤波系数;
    根据所述滤波参数对所述第一视频图像进行邻域滤波。
  2. 根据权利要求1所述的视频解码方法,其中:
    所述滤波参数包括一组或多组滤波系数,以及其中每一组滤波系数对应的一个类别的信息;其中,所述一个类别是根据所述第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成的多个类别中的一个,所述一个类别对应所述邻域差异的一个或多个取值区间。
  3. 根据权利要求1所述的视频解码方法,其中:
    所述邻域滤波为维纳滤波;所述第一视频图像包括重建视频图像。
  4. 根据权利要求1所述的视频解码方法,其中:
    所述第一视频图像中一个像素点的邻域差异根据该像素点的像素值与该像素点邻域中每一像素点的像素值之差的绝对值统计得到,所述统计为求和、求均值或者求最大值;或者,所述第一视频图像中一个像素点的邻域差异根据该像素点及该像素点邻域中像素点的像素值之间的差异确定;
    其中,该像素点邻域指该像素点的八邻域或四邻域或对角邻域。
  5. 根据权利要求2所述的视频解码方法,其中:
    所述解码码流,确定第一视频图像的滤波参数,包括:
    对码流中携带所述滤波参数的一个或多个信息单元分别解析,对每一个所述信息单元,先读取1位标志,如该标志的值表示存在滤波系数和类别信息,再读取一组滤波系数和该组滤波系数对应的一个类别的信息。
  6. 根据权利要求2所述的视频解码方法,其中:
    根据所述滤波参数对所述第一视频图像进行邻域滤波,包括:对解析出的每一组滤波系数,根据该组滤波系数对应的该类别的信息确定该类别对应的邻域差异的取值区间,对所述第一视频图像进行邻域滤波时,所述第一视频图像中邻域差异的值属于所述取值区间的像素点使用该组滤波系数进行滤波。
  7. 根据权利要求6所述的视频解码方法,其中:
    至少一个所述类别的信息包括合并为该类别的多个小类的索引信息,每一小类对应一个约定的邻域差异的取值区间;每一所述类别对应的邻域差异的取值区间是合并为该类别的多个小类对应的取值区间的并集。
  8. 根据权利要求1或3所述的视频解码方法,其中:
    所述视频解码方法应用于基于视频的点云压缩***的解码端;
    所述第一视频图像包括有损的重建几何视频图像,对所述重建几何视频图像进行邻域滤波后,所述视频解码方法还包括:将滤波后的重建几何视频图像用于对应的纹理视频图像的质量增强;或者
    所述第一视频图像包括有损的重建纹理视频图像。
  9. 一种滤波系数生成方法,包括:
    根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别;
    为所述多个类别中的部分或全部类别分别生成对应的滤波系数。
  10. 根据权利要求9所述的滤波系数生成方法,其中:
    所述第一视频图像中一个像素点的邻域差异根据该像素点的像素值与该像素点邻域中每一像素点的像素值之差的绝对值统计得到,所述统计为求和、求均值或者求最大值;或者,所述第一视频图像中一个像素点的邻域差异根据该像素点及该像素点邻域中像素点的像素值之间的差异确定;
    其中,该像素点邻域指该像素点的八邻域或四邻域或对角邻域。
  11. 根据权利要求9所述的滤波系数生成方法,其中:
    所述部分或全部类别中每一类别对应的滤波系数设置为对所述第一视频图像进行邻域滤波时该类别像素点使用的滤波系数;
    所述对所述第一视频图像进行邻域滤波,包括:对有对应滤波系数的每一类别,使用窗口扫 描该类别的每一像素点,对所述窗口内所有像素点的像素值加权平均,将位于所述窗口中心的该类别像素点的像素值更新为加权平均的结果,加权平均使用的一组加权系数采用该类别对应的一组滤波系数,所述窗口为矩形或菱形。
  12. 根据权利要求9所述的滤波系数生成方法,其中:
    所述邻域滤波为维纳滤波,所述第一视频图像包括有损视频图像;
    为一个所述类别生成对应的滤波系数时,将所述有损视频图像中该类别所有像素点的滤波相关向量组成该类别滤波相关矩阵,将该类别像素点在相应原始视频图像中相应位置的像素点的像素值组成该类别原始像素值向量;将该类别滤波相关矩阵和该类别原始像素值向量的互相关矩阵左乘该类别滤波相关矩阵的自相关矩阵的逆矩阵,得到该类别对应的滤波系数;
    其中,一个像素点的滤波相关向量指以该像素点为中心的窗口内的k个像素值组成的向量。
  13. 根据权利要求9所述的滤波系数生成方法,其中:
    所述根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别,包括:
    将邻域差异的取值范围分为多个取值区间,确定所述第一视频图像中每一像素点的邻域差异所属的取值区间,将该像素点分入该取值区间对应的一个类别。
  14. 根据权利要求9至12中任一所述的滤波系数生成方法,其中:
    所述根据第一视频图像中像素点的邻域差异将所述第一视频图像中的像素点分成多个类别,包括:
    将邻域差异的取值范围分成多个取值区间,确定所述第一视频图像中每一像素点的邻域差异的值所属的取值区间,将该像素点分入该取值区间对应的一个小类,所述小类的数量大于所述类别的数量;
    遍历多种将小类合并为类别的方式,按照最优的合并方式将所述多个小类的像素点分入所述多个类别。
  15. 根据权利要求14所述的滤波系数生成方法,其中:
    所述遍历多种将小类合并为类别的方式,按照最优的合并方式将所述多个小类的像素点分入所述多个类别,包括:
    对所述多个小类进行第一轮的多次合并,每次按不同方式将其中的部分或全部小类合并为第一个类别,基于为该第一个类别生成的滤波系数对该第一个类别的像素点进行邻域滤波并计算增益,将增益最大且大于等于相应增益阈值的一次合并所合并小类的像素点分入该第一个类别,记录合并为该第一个类别的多个小类;
    在前一轮次合并成功,已合并的轮次i小于设定的最大轮数且未合并的小类数量大于1的情况下,对未合并的多个小类进行第i+1轮的一次或多次合并,每次按不同方式将其中的部分或全部小类合并为第i+1个类别,基于为该第i+1个类别生成的滤波系数对该第i+1个类别中的像素点进行邻域滤波并计算增益,将增益最大且大于等于相应增益阈值的一次合并所合并小类的像素点分入该第i+1个类别,记录合并为该第i+1个类别的多个小类。
  16. 根据权利要求15所述的滤波系数生成方法,其中:
    所述遍历多种将小类合并为类别的方式,按照最优的合并方式将所述多个小类的像素点分入所述多个类别,还包括:
    在当前轮次,在所有合并方式的增益均小于相应增益阈值的情况下,当前轮次不进行合并且结束整个合并过程;
    在已合并的轮次等于设定的最大轮数或没有可以合并的小类时,结束整个合并过程;
    在整个合并过程结束还有未合并的小类的情况下,将未合并的所有小类的像素点分入没有对应滤波系数的一个类别,不参与滤波运算。
  17. 根据权利要求15所述的滤波系数生成方法,其中:
    所述按不同方式将其中的部分或全部小类合并,包括:遍历所有可能的合并方式,将其中的部分或全部小类合并;
    或者
    所述按不同方式将其中的部分或全部小类合并,包括:遍历满足约束条件时的可能的合并方式,将其中的部分或全部小类合并,所述约束条件包括以下一个或多个:
    只能将队列中位置连续的多个小类合并;
    每一轮合并时,先遍历队列位置最靠前的未合并小类与其他未合并小类之间可能的合并方式,如合并失败再遍历其他可能的合并方式;
    其中,所述队列指按照对应取值区间中的值从小到大的顺序将所述多个小类排列成的队列。
  18. 根据权利要求15所述的滤波系数生成方法,其中:
    所述对该第一个类别的像素点进行邻域滤波的增益用该类别像素点滤波后的图像质量相对滤波前的图像质量的增加来表示;
    所述每次合并计算的增益均是加权后的增益,权值等于此次合并的所有小类中像素点的总数与所述第一视频图像中像素点的总数的比值。
  19. 根据权利要求15所述的滤波系数生成方法,其中:
    所述小类的数量大于等于8且小于等于20,所述最大轮数等于1或2或3或4。
  20. 根据权利要求15所述的滤波系数生成方法,其中:
    对视频图像序列中有损的第一视频图像,对其中一个第一视频图像通过小类合并的方式将所述多个小类的像素点分入所述多个类别后,对该第一视频图像后的第一个视频图像,采用与该第一视频图像相同的小类合并方式,将所述多个小类的像素点分入所述多个类别;或者
    对视频图像序列中有损的第一视频图像,分别执行所述滤波系数生成方法。
  21. 一种视频滤波方法,其中:
    获取按照如权利要求9至20中任一所述的滤波系数生成方法生成的,与所述多个类别中的部分或全部类别对应的滤波系数;
    对所述第一视频图像进行邻域滤波,其中,对所述第一视频图像中有对应滤波系数的每一类别的像素点,使用该类别对应的滤波系数进行滤波。
  22. 一种视频编码方法,包括:
    按照如权利要求9至20中任一所述的滤波系数生成方法,将第一视频图像中的像素点分成多个类别,为所述多个类别中的部分或全部类别分别生成对应的滤波系数,一个所述类别对应一组滤波系数;
    对滤波参数编码并发送,或者对符合发送条件的滤波参数编码并发送,其中,所述滤波参数包括所述滤波系数和类别信息。
  23. 根据权利要求22所述的视频编码方法,其中:
    所述对符合发送条件的滤波参数编码发送,包括:分组判断生成的滤波系数是否符合发送条件,将符合发送条件的每一组滤波系数及该组滤波系数对应的一个类别的信息进行编码并发送;其中,所述发送条件包括以下条件中的任意一种或更多种:
    对一组滤波系数,使用该组滤波系数对所述第一视频图像中对应类别的像素点进行邻域滤波,获得的增益大于相应的增益阈值;
    对一组滤波系数,使用该组滤波系数对所述第一视频图像中对应类别的像素点进行邻域滤波时的率失真相对不进行所述邻域滤波时的率失真变小且变小的量大于相应的率失真增益阈值。
  24. 根据权利要求22所述的视频编码方法,其中:
    所述将符合发送条件的每一组滤波系数及该组滤波系数对应的一个类别的信息进行编码,包括:在每一组滤波系数及该组滤波系数对应的一个类别的信息前增加一个标志,用于指示是否存在类别信息和滤波系数,其中,至少一个所述类别的信息用合并为该类别的多个小类的索引信息表示。
  25. 根据权利要求24所述的视频编码方法,其中:
    所述视频编码方法应用于处理二维视频图像的视频编码***,所述邻域滤波为维纳滤波,所述第一视频图像为重建视频图像,所述滤波系数根据所述重建视频图像和对应的原始视频图像生成。
  26. 根据权利要求22所述的视频编码方法,其中:
    所述视频编码方法应用于基于视频的点云压缩***的编码端,所述第一视频图像包括有损的重建几何视频图像;
    所述视频编码方法还包括:对所述重建几何视频图像进行邻域滤波,将滤波后的重建几何视频图像用于对应的纹理视频图像的生成;其中,对所述重建几何视频图像进行邻域滤波时,所述重建几何视频图像中有对应滤波系数的每一类别的像素点使用该类别对应的一组滤波系数进行滤波。
  27. 根据权利要求22所述的视频编码方法,其中:
    所述视频编码方法应用于基于视频的点云压缩***的编码端,所述邻域滤波为维纳滤波;所述第一视频图像包括重建几何视频图像;或者,所述第一视频图像包括重建纹理视频图像。
  28. 根据权利要求22所述的视频编码方法,其中:
    所述视频编码方法应用于基于视频的点云压缩***的编码端;所述第一视频图像包括点云帧 映射成的两幅重建视频图像;
    所述视频编码方法还包括:对同一点云帧映射成的两幅重建视频图像,按照如权利要求14-19中任一所述的滤波系数生成方法为第一幅重建视频图像生成滤波系数后,在为第二幅重建视频图像生成滤波系数时,对第二幅重建视频图像的像素点划分小类后,采用与第一幅重建视频图像相同的小类合并方式,将所述多个小类的像素点分入所述多个类别;
    其中,所述重建视频图像包括有损的几何视频图像或纹理视频图像。
  29. 一种码流,其中,所述码流为已编码视频码流,所述码流中包括已编码的滤波参数,所述滤波参数包括用于对第一视频图像进行邻域滤波的滤波系数。
  30. 根据权利要求29所述的码流,其中:
    所述已编码的滤波参数包括一个或多个信息单元,每个所述信息单元包括以下子单元:
    标志子单元,设置为指示是否存在滤波系数和类别信息;
    索引子单元,设置为写入一个类别的信息,或为空,其中该类别的信息用该类别的索引信息或者合并为该类别的多个小类的索引信息表示;
    系数子单元,设置为写入一组滤波系数,或为空;其中,所述索引子单元中的类别是该组滤波系数对应的类别。
  31. 根据权利要求29所述的码流,其中:
    所述码流是基于视频的点云压缩***的编码端发送的码流,所述第一视频图像包括重建视频图像;
    所述已编码的滤波参数位于几何码流中的分隔符之后和所述重建视频图像的数据之前,其中,所述重建视频图像包括有损的重建几何视频图像或重建纹理视频图像;或者
    所述已编码的滤波参数携带在码流的序列参数集中。
  32. 一种视频解码装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求1至9中任一所述的视频解码方法。
  33. 一种视频解码装置,包括几何帧重建模块和纹理转换模块,其中,还包括:
    维纳滤波模块,设置为接收所述几何帧重建模块输出的重建几何视频图像,及从码流中解析得到的滤波参数,执行如权利要求8所述的视频解码方法,输出滤波后的重建几何视频图像到所述纹理转换模块。
  34. 一种视频编码装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求22至28中任一所述的视频编码方法。
  35. 一种视频编码装置,应用于基于视频的点云压缩***,包括纹理帧生成模块,以及依次连接的几何帧生成模块、几何帧填充模块和几何帧视频压缩模块,其中,还包括:
    维纳滤波模块,设置为接收所述几何帧视频压缩模块输出的重建几何视频图像,及所述几何帧生成模块或几何帧填充模块输出的原始几何视频图像,执行如权利要求26所述的视频编码方法,输出滤波后的重建几何视频图像到所述纹理帧生成模块。
  36. 一种视频编解码***,其中,包括如权利要求34或35所述的视频编码装置和如权利要求32或33所述的视频解码装置。
  37. 一种视频滤波装置,包括处理器以及存储有计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求21所述的视频滤波方法。
  38. 一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如权利要求1至28中任一所述的方法。
PCT/CN2021/144056 2021-12-31 2021-12-31 滤波系数生成及滤波方法、视频编解码方法、装置和*** WO2023123512A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180104218.XA CN118235392A (zh) 2021-12-31 2021-12-31 滤波系数生成及滤波方法、视频编解码方法、装置和***
PCT/CN2021/144056 WO2023123512A1 (zh) 2021-12-31 2021-12-31 滤波系数生成及滤波方法、视频编解码方法、装置和***

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/144056 WO2023123512A1 (zh) 2021-12-31 2021-12-31 滤波系数生成及滤波方法、视频编解码方法、装置和***

Publications (1)

Publication Number Publication Date
WO2023123512A1 true WO2023123512A1 (zh) 2023-07-06

Family

ID=86997236

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/144056 WO2023123512A1 (zh) 2021-12-31 2021-12-31 滤波系数生成及滤波方法、视频编解码方法、装置和***

Country Status (2)

Country Link
CN (1) CN118235392A (zh)
WO (1) WO2023123512A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102484714A (zh) * 2009-08-26 2012-05-30 索尼公司 图像处理装置和方法
CN103108187A (zh) * 2013-02-25 2013-05-15 清华大学 一种三维视频的编码方法、解码方法、编码器和解码器
CN103503451A (zh) * 2011-05-06 2014-01-08 西门子公司 用于对经编码的图像分区进行滤波的方法和设备
US20190349582A1 (en) * 2017-01-11 2019-11-14 Interdigital Vc Holdings, Inc. A method and a device for image encoding and decoding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102484714A (zh) * 2009-08-26 2012-05-30 索尼公司 图像处理装置和方法
CN103503451A (zh) * 2011-05-06 2014-01-08 西门子公司 用于对经编码的图像分区进行滤波的方法和设备
CN103108187A (zh) * 2013-02-25 2013-05-15 清华大学 一种三维视频的编码方法、解码方法、编码器和解码器
US20190349582A1 (en) * 2017-01-11 2019-11-14 Interdigital Vc Holdings, Inc. A method and a device for image encoding and decoding

Also Published As

Publication number Publication date
CN118235392A (zh) 2024-06-21

Similar Documents

Publication Publication Date Title
CN114424542B (zh) 具有非规范平滑的基于视频的点云压缩
US11265559B2 (en) High dynamic range image/video coding
US20200193647A1 (en) Artificial intelligence encoding and artificial intelligence decoding methods and apparatuses using deep neural network
KR100974177B1 (ko) 랜덤 필드 모델을 사용한 사진 및 비디오 압축과 프레임레이트 업 변환을 개선시키는 방법 및 장치
CN103596008B (zh) 编码器及编码方法
US11871011B2 (en) Efficient lossless compression of captured raw image information systems and methods
TW202147849A (zh) 點雲壓縮方法、編碼器、解碼器及儲存媒介
CN116405688A (zh) 视频译码中使用交叉分量线性模型进行帧内预测
WO2019105179A1 (zh) 颜色分量的帧内预测方法及装置
CN107251557A (zh) 高色度分辨率细节的编码/解码
GB2509707A (en) Encoding or decoding a video sequence using SAO parameters
US20180309991A1 (en) Video encoding with adaptive rate distortion control by skipping blocks of a lower quality video into a higher quality video
CN115606179A (zh) 用于使用学习的下采样特征进行图像和视频编码的基于学习的下采样的cnn滤波器
CN113994688A (zh) 视频译码中的残差的处理
WO2018001208A1 (zh) 编解码的方法及设备
WO2023000179A1 (zh) 视频超分辨网络及视频超分辨、编解码处理方法、装置
JP2020537468A (ja) ビデオコーディングのための空間変動変換
CN111434115A (zh) 视频编码中纹理合成的聚类修正
Hu et al. An adaptive two-layer light field compression scheme using GNN-based reconstruction
CN115552905A (zh) 用于图像和视频编码的基于全局跳过连接的cnn滤波器
CN117480778A (zh) 残差编码和视频编码方法、装置、设备和***
CN114531952A (zh) 视频编码中的残差的量化
WO2023123512A1 (zh) 滤波系数生成及滤波方法、视频编解码方法、装置和***
CN111903124B (zh) 图像处理装置和图像处理方法
WO2022226850A1 (zh) 点云质量增强方法、编码和解码方法及装置、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21969855

Country of ref document: EP

Kind code of ref document: A1