WO2023142926A1 - 一种图像处理方法和装置 - Google Patents

一种图像处理方法和装置 Download PDF

Info

Publication number
WO2023142926A1
WO2023142926A1 PCT/CN2023/070405 CN2023070405W WO2023142926A1 WO 2023142926 A1 WO2023142926 A1 WO 2023142926A1 CN 2023070405 W CN2023070405 W CN 2023070405W WO 2023142926 A1 WO2023142926 A1 WO 2023142926A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
current
image frame
sub
block
Prior art date
Application number
PCT/CN2023/070405
Other languages
English (en)
French (fr)
Inventor
那彦波
卢运华
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2023142926A1 publication Critical patent/WO2023142926A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]

Definitions

  • the present disclosure relates to the image field, and in particular to an image processing method and device.
  • inter-frame encoding may be performed on the current image frame.
  • the coded image block that is most similar to the current image block can be searched in the reference image frame as the matching block of the current image block, and the offset between the current image block and the matching block.
  • the quantity is used as a motion vector, and the motion vector is encoded, so that the decoding end can know the position of the current image block according to the motion vector.
  • Embodiments of the present application provide an image processing method and device, which can more accurately calculate motion vectors and improve image quality.
  • an image processing method comprising: firstly, acquiring a current image frame and a reference image frame, sequentially downsampling and upsampling the current image frame to obtain a processed current image frame,
  • the reference image frame is sequentially down-sampled and up-sampled to obtain a processed reference image frame.
  • divide the processed current image frame and the processed reference image frame into a plurality of current sub-image blocks and a plurality of reference sub-image blocks respectively according to a preset division method, and determine the relationship between the plurality of reference sub-image blocks and each
  • the reference sub-image block with the smallest similarity to the current sub-image block is used as the matching block of the current sub-image block.
  • Based on each current sub-image block and the matching block corresponding to the current sub-image block a motion vector corresponding to the current sub-image block is obtained.
  • the current image frame is encoded based on the motion vector.
  • the current image frame is the original current image frame or the image frame after the original current image frame is scaled at least once by the scaling factor K
  • the reference image frame is the original reference image frame or the scaling factor K is used to scale the original current image frame.
  • the coefficient K scales the original reference image frame at least once to the image frame.
  • the current image frame and the reference image frame can be scaled at least once to obtain the current image frame and the reference image frame of different sizes, and down-sampling and up-sampling processing are performed on the current image frame and the reference image frame of the corresponding size, In order to reduce the high-frequency part of the current image frame and the reference image frame that does not represent the main information, filter the noise signal, and then obtain each current sub-image block of each size more accurately according to the processed current image frame and reference image frame best matching block.
  • the reference sub-image blocks in the search range and the current sub-image block respectively the distance between.
  • a regularization process is performed on the distances between multiple reference sub-image blocks within the search range and the current sub-image block to obtain multiple candidate similarities.
  • the reference sub-image block corresponding to the smallest similarity among the plurality of candidate similarities is determined as the matching block of the current sub-image block.
  • the reference sub-image block corresponding to the minimum similarity is determined as the current sub-image block.
  • the matching block of the sub-image block can ensure that the obtained matching block is the best matching block of the current image block.
  • the smallest similarity among multiple candidate similarities corresponding to each current sub-image block is the similarity corresponding to the current sub-image block.
  • the method further includes: according to the first similarity corresponding to the first current sub-image block and the second similarity corresponding to the second current sub-image block For similarity, the target motion vector is determined from the first motion vector corresponding to the first current sub-image block and the second motion vector corresponding to the second current sub-image block.
  • the first current sub-image block is a scaled image block of the second current sub-image block.
  • the first similarity degree and the second similarity degree are compared, and when the first similarity degree is less than or equal to the second similarity degree, the first similarity degree is compared to the second similarity degree A motion vector is determined as the target motion vector. In the case where the first degree of similarity is greater than the second degree of similarity, the second motion vector is determined as the target motion vector.
  • the motion vector corresponding to the large block is used as the optimal motion vector; if the similarity of the large block is lower than that of the small block.
  • the motion vector corresponding to the small block is used as the optimal motion vector, so the optimal motion vector can be selected under different sizes, and the accuracy of the motion vector can be further improved.
  • the current image frame can be divided into image blocks of different sizes for encoding. It can be understood that a smaller similarity value indicates a higher similarity, and a larger similarity value indicates a lower similarity.
  • the current image frame is encoded based on the target motion vector.
  • Image processing devices include:
  • An acquisition module configured to acquire a current image frame and a reference image frame
  • the sampling module is used to sequentially down-sample and up-sample the current image frame to obtain the processed current image frame, and to sequentially down-sample and up-sample the reference image frame to obtain the processed reference image frame;
  • a division module configured to divide the processed current image frame and the processed reference image frame into a plurality of current sub-image blocks and a plurality of reference sub-image blocks according to a preset division method
  • a determination module configured to determine a reference sub-image block having the smallest similarity with each current sub-image block among multiple reference sub-image blocks as a matching block of the current sub-image block;
  • the determining module is further configured to obtain a motion vector corresponding to the current sub-image block based on each current sub-image block and the matching block corresponding to the current sub-image block;
  • the encoding module is used to encode the current image frame based on the motion vector.
  • the current image frame is the original current image frame or the image frame after the original current image frame is scaled at least once by the scaling factor K
  • the reference image frame is the original reference image frame or the scaling factor K is used to scale the original current image frame.
  • the coefficient K scales the original reference image frame at least once to the image frame.
  • the determination module is specifically configured to: within the search range corresponding to the current sub-image block, respectively calculate multiple reference sub-images within the search range The distance between the block and the current subimage block.
  • a regularization process is performed on the distances between multiple reference sub-image blocks within the search range and the current sub-image block to obtain multiple candidate similarities.
  • the reference sub-image block corresponding to the smallest similarity among the plurality of candidate similarities is determined as the matching block of the current sub-image block.
  • the smallest similarity among multiple candidate similarities corresponding to each current sub-image block is the similarity corresponding to the current sub-image block.
  • the determining module is further configured to use the first similarity corresponding to the first current sub-image block and the second similarity corresponding to the second current sub-image block Similarity, determining the target motion vector in the first motion vector corresponding to the first current sub-image block and the second motion vector corresponding to the second current sub-image block; wherein, the first current sub-image block is the second current sub-image block Image blocks after scaling.
  • the determining module is specifically configured to compare the first similarity with the second similarity, and if the first similarity is less than or equal to the second similarity In this case, the first motion vector is determined as the target motion vector; when the first similarity is greater than the second similarity, the second motion vector is determined as the target motion vector.
  • the encoding module is specifically configured to encode the current image frame based on the target motion vector.
  • a computer-readable storage medium stores computer program instructions, and when the computer program instructions are run on a computer (for example, an image processing device), the computer is made to execute the image processing method as described in any one of the above embodiments.
  • a fourth aspect of the embodiments of the present application provides a computer program product.
  • the computer program product includes computer program instructions.
  • the computer program instructions When the computer program instructions are executed on a computer (eg, an image processing device), the computer program instructions cause the computer to execute the image processing method described in any one of the above embodiments.
  • a fifth aspect of the embodiments of the present application provides a computer program.
  • the computer program When the computer program is executed on a computer (for example, an image processing device), the computer program causes the computer to execute the image processing method described in the above-mentioned embodiments.
  • FIG. 1 is a structural diagram of an image editor provided in an embodiment of the present application
  • FIG. 2 is a flow chart of an image processing method provided in an embodiment of the present application.
  • FIG. 3 is an application diagram of an image processing method provided in an embodiment of the present application.
  • FIG. 4 is an application diagram of another image processing method provided by the embodiment of the present application.
  • FIG. 5 is an application diagram of another image processing method provided by the embodiment of the present application.
  • FIG. 6 is an application diagram of another image processing method provided in the embodiment of the present application.
  • FIG. 7 is an application diagram of another image processing method provided in the embodiment of the present application.
  • FIG. 8 is an application diagram of another image processing method provided in the embodiment of the present application.
  • FIG. 9 is an application diagram of another image processing method provided in the embodiment of the present application.
  • FIG. 10 is an application diagram of another image processing method provided in the embodiment of the present application.
  • FIG. 11 is an application diagram of another image processing method provided in the embodiment of the present application.
  • FIG. 12 is a flow chart of another image processing method provided by the embodiment of the present application.
  • Fig. 13 is an application diagram of another image processing method provided by the embodiment of the present application.
  • FIG. 14 is a structural diagram of an image processing device provided by an embodiment of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present disclosure, unless otherwise specified, "plurality” means two or more.
  • a and/or B includes the following three combinations: A only, B only, and a combination of A and B.
  • the term “if” is optionally interpreted to mean “when” or “at” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrases “if it is determined that " or “if [the stated condition or event] is detected” are optionally construed to mean “when determining ! or “in response to determining ! depending on the context Or “upon detection of [stated condition or event]” or “in response to detection of [stated condition or event]”.
  • Down-sampling refers to shrinking an image frame, and through down-sampling, the image frame can be made to conform to the size of the display area, or a thumbnail of the image frame can be generated. For example, c-fold downsampling is performed on an image frame with a size of M*N pixels to obtain an image frame of (M/c)*(N/c) pixels, where c is a common divisor of M and N.
  • Upsampling refers to enlarging an image frame, and the image frame can be displayed on a display device with a higher resolution through upsampling. Upsampling almost always uses the interpolation method, that is, on the basis of the pixels of the original image frame, new elements are inserted between the pixels using a suitable interpolation algorithm.
  • Intra-frame coding refers to the coding method of performing discrete cosine transform (discrete cosine transform, DCT), zigzag scanning, quantization processing and variable length coding (variable length coding, VLC) on image frames during image compression processing. .
  • Inter-frame coding refers to a coding method that uses time redundancy of video images to encode motion vectors and texture (prediction residual) information between image frames during image compression processing.
  • Fig. 1 is a structural diagram of an image encoder applying the method provided by the present disclosure provided by an embodiment of the present disclosure.
  • the image encoder includes a residual calculation unit, a selection switch, a DCT module, a quantizer, a VLC module, a buffer, a rate control module, a dequantizer, an inverse DCT module, a frame storage module, and a motion estimation and compensation module.
  • the image encoder receives image frames through an input interface.
  • the image frame may be an image frame in a sequence of pictures forming a video or a video sequence.
  • the image frame received by the image encoder may also be referred to as a current image frame or an image frame to be encoded.
  • the image encoder can perform intra-frame coding or inter-frame coding on the image frames it receives.
  • the selection switch is used to select an image compression method of intra-frame coding or an image compression method of inter-frame coding when compressing an image.
  • intra-frame coding is generally used for still images (pictures)
  • inter-frame coding is generally used for moving images (video).
  • the residual calculation unit is used for calculating the residual based on the current image frame and the predicted image frame.
  • the DCT module is used to transform the spatial domain image to the frequency domain for image compression.
  • the space domain the content of the image varies greatly, but in the frequency domain, after statistical analysis of a large number of images, it is found that after the image is transformed by DCT, the main components of the frequency coefficients are concentrated in a relatively small range, and mainly located in the low frequency part. According to the statistical characteristics of the image signal in the frequency domain, some measures can be taken to discard the part with less energy in the spectrum, and keep the main frequency components in the transmission spectrum as much as possible, so as to achieve the purpose of image data compression.
  • the quantizer is used to process the frequency data processed by the DCT module again to further compress the amount of data. Since human eyes have different sensitivities to various frequencies, the frequency data processed by the DCT module can be quantized, and the quantized DCT coefficient matrix will have many zero values. Generally, the quotient of the data in the upper left corner is non-zero, and the quotient of the data in the lower right corner is very small, and can be abbreviated as 0 after being rounded to an integer. There are many 0 values on the coefficient matrix, which greatly reduces the amount of data. On the one hand, the main part of the image information is preserved, and on the other hand, the image data is compressed.
  • the VLC module is used to encode the above quantized coefficient matrix.
  • the VLC module can matrix the above-mentioned quantized coefficients into a one-dimensional array through zigzag scanning.
  • the tail of the one-dimensional array has multiple "0s", which can be replaced by other forms.
  • the multiple "0"s are restored to fill up the 64 bits of the matrix. Therefore, image data can be further compressed by VLC encoding. For example, 00000000 can be represented as 80, which returns to 00000000 when decoded.
  • the intra-frame encoding of image data can be completed through the above-mentioned DCT module, quantizer and VLC module, reducing the amount of image data.
  • the buffer is used to temporarily store image compression data.
  • the rate control module is used for adjusting the code rate of the image according to the data cache amount of the buffer.
  • the code rate is higher when the image is more complex, and the code rate is lower when the image is simpler.
  • the rate control module is used to control the code rate within a certain range.
  • the dequantizer and inverse DCT module are used to restore the encoded image data to the image data before encoding as the reference image frame.
  • Frame storage is used to store reference frames and motion vectors.
  • the motion estimation and compensation module is used to calculate the motion vector and motion residual according to the reference image frame and the current image frame.
  • inter-frame coding of image data can be performed according to reference frames, motion vectors and motion residuals to reduce the amount of image data.
  • an embodiment of the present application provides an image processing method, which can find the best matching block and obtain a more accurate motion vector, thereby reducing the size of the bit stream and improving the image quality.
  • FIG. 2 is a flow chart of an image processing method provided by an embodiment of the present application. As shown in FIG. 2 , the method includes steps 201-206.
  • the current image frame is the image frame received by the encoder
  • the reference image frame is the image frame restored by the dequantizer and the inverse DCT module after encoding the image data.
  • the current image frame can be an original current image frame or an image frame that has been scaled at least once by using a scaling factor K to the original current image frame
  • the reference image frame can be an original reference image frame or an original reference image frame that is scaled by a scaling factor K Image frame after scaling at least once.
  • the embodiment of the present application does not limit whether the current image frame and the reference image frame are scaled, nor does it limit the specific value of the scaling factor K.
  • the following embodiments take the scaling factor K equal to 2 as an example for illustration.
  • the times that the current image frame and the reference image frame can be scaled are related to the image coding standard. For example, taking the image coding standard as moving picture experts group-2 (MPEG2) as an example, since MPEG2 only supports image blocks of 8*8 size, the current image frame and the reference image frame are not scaled, at this time , the current image frame is an unscaled current image frame (also called an original current image frame), and the reference image frame is an unscaled reference image frame (also called an original reference image frame).
  • MPEG2 moving picture experts group-2
  • the reference image frame is an unscaled reference image frame (also called an original reference image frame).
  • the scaling factor can be used
  • the current image frame and the reference image frame are scaled 4 times to get 5 levels of the current image frame and the reference image frame, and the sizes of the current image frame and the reference image frame of the 5 levels are successively reduced.
  • the embodiment of the present application does not limit which encoding standard is used for image encoding, the specific size of the scaling coefficient, and the specific scaling times.
  • the following embodiments take the scaling factor as 2, and the current image frame and the reference image frame are respectively scaled twice as an example for illustration.
  • the current image frame of 1280*1280 and the reference image frame are respectively scaled once by using a scaling factor of 2 to obtain a current image frame of 640*640 pixels and a reference image frame of 640*640 pixels.
  • the current image frame of 640*640 pixels and the reference image frame of 640*640 are respectively scaled once to obtain the current image frame of 320*320 pixels and the reference image frame of 320*320 pixels. That is, when the number of scaling is 2, three levels of current image frame and reference image frame can be obtained.
  • the first layer is the current image frame of 1280*1280 pixels and the reference image frame of 1280*1280 pixels
  • the second layer is 640*1280 pixels.
  • the third layer is the current image frame of 320*320 pixels and the reference image frame of 320*320 pixels.
  • the current image frame x n is sequentially down-sampled and up-sampled to obtain the processed current image frame x n+1 and the reference image frame y n Downsampling and upsampling are performed sequentially to obtain the processed current image frame y n+1 .
  • the reference image frame obtained by scaling the most times may not be down-sampled or up-sampled, and the reference image frame obtained by scaling the most times may be directly used as the processing the subsequent reference image frame.
  • the embodiment of the present application does not limit whether the reference image frame obtained by zooming the most times is down-sampled or up-sampled. Exemplary instructions.
  • the current image frame is an image frame of 1280*1280 pixels
  • the reference image frame is an image frame of 1280*1280 pixels
  • the scaling factor K is 2, and scaling is performed twice as an example.
  • the current image frame p 0 of 1280*1280 pixels is sequentially down-sampled and up-sampled to obtain the processed current image frame p 1
  • the current image frame p n of 640*640 pixels is sequentially down-sampled and up-sampled to obtain the processed
  • the current image frame p N of 320*320 pixels is sequentially down-sampled and up-sampled to obtain the processed current image frame p N+1 .
  • the reference image frame q 0 of 1280*1280 pixels is sequentially down-sampled and up-sampled to obtain the processed reference image frame q 1
  • the reference image frame q n of 640*640 pixels is sequentially down-sampled and up-sampled to obtain the processed
  • the reference image frame q N is used as the processed reference image frame q N+1 .
  • the current image frame and the reference image frame are scaled at least once to obtain the current image frame and the reference image frame of different sizes, and the current image frame and the reference image frame of the corresponding size are downsized.
  • Sampling and upsampling processing to reduce the high-frequency parts that do not represent the main information in the current image frame and the reference image frame, filter the noise signal, and then obtain the lower resolution of each size more accurately according to the processed current image frame and reference image frame.
  • the best matching block for each current subimage block is the best matching block for each current subimage block.
  • different preset division methods may be used. If the image coding standard supports a division mode, use the division mode as the default division mode. If the image coding standard supports multiple division methods, the division method containing the least number of pixels may be used as the preset division method. The embodiment of the present application does not limit which division mode is specifically adopted for the preset division mode.
  • the preset division method is to adopt the size of 8*8 pixels, and the processed current image frame and the processed reference image frame Divided into multiple current sub-image blocks and multiple reference sub-image blocks.
  • the preset division method can use 4*4 pixel size, and the The processed current image frame and the processed reference image frame are divided into multiple current sub-image blocks and multiple reference sub-image blocks.
  • the processed current image frame can be divided into M current sub-image blocks by using a preset division method of 4*4 pixels, and the processed reference image frame can be divided into M current sub-image blocks. M reference sub-image blocks.
  • the specific number of the multiple current sub-image blocks and the specific number of the multiple reference sub-image blocks are related to the image size of the current image frame and the reference image frame, and the preset division method.
  • the embodiment of the present application does not limit the specific number of multiple current sub-image blocks and the specific number of multiple reference sub-image blocks.
  • Determining the matching block of the current sub-image block among multiple reference sub-image blocks may include steps 1-3.
  • Step 1 Within the search range corresponding to the current sub-image block, respectively calculate the distances between a plurality of reference sub-image blocks within the search range and the current sub-image block.
  • Each current sub-image block corresponds to a search range
  • the search ranges corresponding to different sub-image blocks may be different
  • the search ranges corresponding to different sub-image blocks may include the same image block.
  • the embodiment of the present application does not limit the size of the search range corresponding to each current sub-image block and the positional relationship between the search range and the current sub-image block, and the size of the search range corresponding to the current sub-image block is related to the image coding standard.
  • the search range corresponding to the current sub-image block can be the The search is performed in the square around the current sub-image block with a radius of 16 pixels or a radius of 32 pixels.
  • the distance between multiple reference sub-image blocks within the search range and the current sub-image block can be calculated by means of mean squared error (MSE):
  • D i, j represents the distance between the reference sub-image block and the current sub-image block
  • K i represents the current sub-image block
  • Q j represents the reference sub-image block.
  • the L reference sub-image blocks in FIG. 7 are images within the search range corresponding to the current sub-image block block, the distance between multiple reference sub-image blocks and the current sub-image block can be calculated through MSE, and the closer the distance, the higher the similarity between the two image blocks.
  • Step 2 Regularize the distances between the multiple reference sub-image blocks within the search range and the current sub-image block to obtain multiple candidate similarities.
  • the embodiment of the present application does not limit the specific processing manner of the regularization.
  • the smaller the similarity obtained by regularization processing the smaller the difference between the reference sub-image block and the current sub-image block, and the more similar the reference sub-image block is to the current sub-image block.
  • the distance between multiple reference sub-image blocks within the search range and the current sub-image block can be regularized by the following formula:
  • S i,j represents the similarity between the reference sub-image block and the current sub-image block
  • D i,j represents the distance between the reference sub-image block and the current sub-image block
  • is a non-zero parameter
  • the search range of the current sub-image block includes L reference sub-image blocks as an example, by comparing multiple reference sub-image blocks with the current The distance between sub-image blocks is regularized to obtain multiple candidate similarities.
  • Step 3 Determine the reference sub-image block corresponding to the smallest similarity among multiple candidate similarities as the matching block of the current sub-image block.
  • the matching block of the current sub-image block can be determined by the following formula:
  • j nn (i) represents the reference sub-image block (matching block) corresponding to the current sub-image block, Indicates searching for a reference sub-image block corresponding to the minimum similarity among multiple candidate similarities.
  • j nn (i) can be used in multiple Find the reference sub-image block corresponding to the smallest similarity among the candidate similarities, that is, the reference sub-image blocks corresponding to the smallest similarities such as 0.7, 0.3, and 0.2 are the matching blocks of the current sub-image block.
  • the reference sub-image block corresponding to the minimum similarity is determined as the matching of the current sub-image block block, which can ensure that the obtained matching block is the best matching block of the current image block.
  • the current image frame and the reference image frame can be scaled differently, and the number of motion vectors corresponding to the current sub-image block is also different.
  • the embodiment of the present application does not limit the specific number of obtained motion vectors for the type of the image coding standard.
  • the above steps 203 and 204 can be executed by a nearest neighbor (patches nearest neighbors, PNN) module, and the specific execution steps of the PNN module can refer to the relevant content of the above 203 and 204 .
  • PNN nearest neighbor
  • the current image frame and the reference image frame can be zoomed multiple times, in each image size, according to the current sub-image block and the matching block corresponding to the current sub-image block, a corresponding one of the current sub-image blocks can be obtained Motion vector. Therefore, after the image is scaled multiple times, multiple sets of motion vectors can be obtained for each current sub-image block:
  • K represents a scaling factor
  • g represents different scaling levels corresponding to the current sub-image block.
  • the motion vector mv1 (K 1 c x , K 1 c y ).
  • the motion vector mv2 (K 2 c x , K 2 c y ).
  • the motion vector mv3 (K 3 c x , K 3 c y ).
  • the current image frame is encoded based on the motion vectors.
  • the image processing method provided by the embodiment of the present application can reduce the high-frequency parts in the current image frame and the reference image frame that do not represent the main information, and filter the noise signal by sequentially downsampling and upsampling the current image frame and the reference image frame. , and then by dividing the processed current image frame and reference image frame into image blocks, the best matching block of each current sub-image block can be obtained more accurately, therefore, the motion vector obtained according to the best matching block is more accurate, When encoding the current image frame based on the motion vector, the size of the bit stream can be reduced and the image quality can be improved.
  • the image processing method provided in the embodiment of the present application includes the above-mentioned Steps 201-206 may also include step 207 before step 206.
  • the first motion vector corresponding to the first current sub-image block corresponds to the second current sub-image block
  • the target motion vector is determined from the second motion vector of .
  • the first current sub-image block is a scaled image block of the second current sub-image block.
  • the first current sub-image block may be the image block after the second current sub-image block has been scaled once, and the first current sub-image block may also be the image block after the second current sub-image block has been scaled multiple times .
  • the first current sub-image block is specifically not limited to the second current sub-image block that has been scaled several times.
  • the second current sub-image block may be a plurality of current sub-image blocks, and each current sub-image block may correspond to a second similarity.
  • the embodiment of the present application does not limit the number of current sub-image blocks included in the second current sub-image block, and the number of current sub-image blocks included in the second current sub-image block is related to parameters such as scaling coefficients.
  • the first current sub-image block is an image block in the current image frame of 320*320 pixels
  • the second current sub-image block is an image block in the current image frame of 640*640 pixels.
  • the size of the first current sub-image block and the second current sub-image block are both 4*4 as an example, since the current image frame of 320*320 pixels is the image frame after the current image frame of 640*640 pixels is scaled once, 320*
  • the current image frame of 320 pixels is smaller than the current image frame of 640*640 pixels, so a 4*4 first current sub-image block in the current image frame of 320*320 pixels corresponds to the current image frame of 640*640 pixels
  • the four 4*4 second current sub-image blocks in . That is to say, the four 4*4 second current sub-image blocks can be scaled once to obtain a 4*4 first current sub-image block.
  • the first degree of similarity is compared with the second degree of similarity, and when the first degree of similarity is less than or equal to the second degree of similarity, the first motion vector is determined as the target motion vector. In the case where the first degree of similarity is greater than the second degree of similarity, the second motion vector is determined as the target motion vector.
  • the first current sub-image block includes 1 sub-image block
  • the first similarity is S 0
  • the first current sub-image block is scaled once for the second current sub-image block
  • the second current sub-image block includes 4 sub-image blocks
  • the second similarity includes S 1 , S 2 , S 3 and S 4 .
  • Sequentially compare the first similarity S 0 and the second similarity S 1 -S 4 and determine the first motion vector as the target when the first similarity S 0 is less than or equal to the second similarity S 1 -S 4 Motion vector. If the first similarity S 0 is greater than any second similarity in S 1 -S 4 , determine the second motion vector as the target motion vector.
  • the current image frame may be encoded based on the target motion vector determined in step 207 .
  • the above-mentioned steps 203-207 may be processed in parallel by using a tensor processing framework to improve processing efficiency.
  • a tensor processing framework for example, Pytorch (python torch), Tensorflow.
  • Pytorch plasma torch
  • Tensorflow The embodiment of the present application does not limit what type of tensor processing framework is specifically used to perform parallel computing.
  • a graphics processing unit graphics processing unit, GPU
  • GPU graphics processing unit
  • the motion vector corresponding to the large block is used as the optimal motion vector; if the similarity of the large block is lower than that of the small block.
  • the motion vector corresponding to the small block is used as the optimal motion vector, so the optimal motion vector can be selected under different sizes, and the accuracy of the motion vector can be further improved.
  • the current image frame can be divided into image blocks of different sizes for encoding. It can be understood that a smaller similarity value indicates a higher similarity, and a larger similarity value indicates a lower similarity.
  • An embodiment of the present application provides an image processing device, and the device may be an image encoder. Specifically, the image processing device is configured to perform steps 201-207 in the above image processing method.
  • the image processing apparatus provided in the embodiment of the present application may include modules corresponding to corresponding steps.
  • the embodiment of the present application may divide the functional modules of the image processing device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 14 shows a possible structural diagram of the image processing device involved in the above embodiment.
  • an image processing apparatus 1400 may include an acquisition module 1401 , a sampling module 1402 , a division module 1403 , a determination module 1404 , and an encoding module 1405 .
  • the functions of each module are as follows:
  • An acquisition module 1401 configured to acquire a current image frame and a reference image frame.
  • the sampling module 1402 is configured to sequentially perform down-sampling and up-sampling on the current image frame to obtain a processed current image frame, and sequentially perform down-sampling and up-sampling on the reference image frame to obtain a processed reference image frame.
  • the division module 1403 is configured to divide the processed current image frame and the processed reference image frame into a plurality of current sub-image blocks and a plurality of reference sub-image blocks according to a preset division method.
  • a determining module 1404 configured to determine, among the plurality of reference sub-image blocks, a reference sub-image block having the smallest similarity with each current sub-image block as a matching block of the current sub-image block.
  • the determining module 1404 is further configured to obtain a motion vector corresponding to the current sub-image block based on each current sub-image block and the matching block corresponding to the current sub-image block.
  • An encoding module 1405, configured to encode the current image frame based on the motion vector.
  • the current image frame is the original current image frame or the image frame after the original current image frame is scaled at least once by the scaling factor K
  • the reference image frame is the original reference image frame or the original current image frame is scaled by the scaling factor K
  • the determining module 1404 is specifically configured to: within the search range corresponding to the current sub-image block, respectively calculate the distances between multiple reference sub-image blocks within the search range and the current sub-image block.
  • a regularization process is performed on the distances between multiple reference sub-image blocks within the search range and the current sub-image block to obtain multiple candidate similarities.
  • the reference sub-image block corresponding to the smallest similarity among the plurality of candidate similarities is determined as the matching block of the current sub-image block.
  • the smallest similarity among multiple candidate similarities corresponding to each current sub-image block is the similarity corresponding to the current sub-image block.
  • the determining module 1404 is further configured to, according to the first similarity corresponding to the first current sub-image block and the second similarity corresponding to the second current sub-image block,
  • the target motion vector is determined from the first motion vector of the current sub-image block and the second motion vector corresponding to the second current sub-image block.
  • the first current sub-image block is a scaled image block of the second current sub-image block.
  • the determining module 1404 is specifically configured to compare the first similarity with the second similarity, and determine the first motion vector as the target if the first similarity is less than or equal to the second similarity Motion vector. In the case where the first degree of similarity is greater than the second degree of similarity, the second motion vector is determined as the target motion vector.
  • the encoding module 1405 is specifically configured to encode the current image frame based on the target motion vector.
  • Some embodiments of the present application provide a computer-readable storage medium (for example, a non-transitory computer-readable storage medium), in which computer program instructions are stored, and the computer program instructions are stored in a computer (for example, an image When running on a processing device), the computer is made to execute the image processing method described in any one of the above embodiments.
  • a computer-readable storage medium for example, a non-transitory computer-readable storage medium
  • the computer program instructions are stored in a computer (for example, an image When running on a processing device), the computer is made to execute the image processing method described in any one of the above embodiments.
  • the above-mentioned computer-readable storage medium may include, but is not limited to: a magnetic storage device (for example, a hard disk, a floppy disk, or a magnetic tape, etc.), an optical disk (for example, a CD (Compact Disk, a compact disk), a DVD (Digital Versatile Disk, Digital Versatile Disk), etc.), smart cards and flash memory devices (for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.).
  • Various computer-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information.
  • the term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • Some embodiments of the present disclosure also provide a computer program product, for example, the computer program product is stored on a non-transitory computer-readable storage medium.
  • the computer program product includes computer program instructions.
  • the computer program instructions When the computer program instructions are executed on a computer (for example, an image processing device), the computer program instructions cause the computer to execute the image processing method as described in the above-mentioned embodiments.
  • Some embodiments of the present disclosure also provide a computer program.
  • the computer program When the computer program is executed on a computer (for example, an image processing device), the computer program causes the computer to execute the image processing method described in the above-mentioned embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种图像处理方法和装置,涉及图像领域。具体方案为:首先,获取当前图像帧和参考图像帧,将当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧。然后,根据预设划分方式分别将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块。在多个参考子图像块中确定与每个当前子图像块相似度最小的参考子图像块作为该当前子图像块的匹配块。基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到当前子图像块对应的运动矢量。最后,基于运动矢量对当前图像帧进行编码。

Description

一种图像处理方法和装置
本申请要求于2022年1月25日提交国家知识产权局、申请号为202210086020.X、申请名称为“图像处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像领域,尤其涉及一种图像处理方法和装置。
背景技术
目前,在对图像帧进行编码时,为了降低图像帧的编码数据量,可以对当前图像帧进行帧间编码。对当前图像帧进行帧间编码时,可以在参考图像帧中搜索与当前图像块最相似的已编码图像块作为该当前图像块的匹配块,将当前图像块与该匹配块之间的偏移量作为运动矢量,并对运动矢量进行编码,以使解码端能够根据该运动矢量获知当前图像块的位置。
发明内容
本申请实施例提供一种图像处理方法和装置,能够更准确的计算运动矢量,提升图像质量。
本申请实施例第一方面,提供一种图像处理方法,该方法包括:首先,获取当前图像帧和参考图像帧,将当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧。然后,根据预设划分方式分别将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块,在多个参考子图像块中确定与每个当前子图像块相似度最小的参考子图像块作为该当前子图像块的匹配块。基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到当前子图像块对应的运动矢量。最后,基于运动矢量对当前图像帧进行编码。
基于本方案,通过将当前图像帧和参考图像帧依次进行下采样和上采样处理,能够减少当前图像帧和参考图像帧中不代表主要信息的高频部分,过滤噪声信号,再通过对处理后的当前图像帧和参考图像帧划分图像块,能够较为准确的得到每个当前子图像块的最佳匹配块,因此,根据该最佳匹配块得到的运动矢量较准确,基于该运动矢量对当前图像帧进行编码时,能够减小比特流大小,提升图像质量。
结合第一方面,在一种可能的实现方式中当前图像帧为原始当前图像帧 或采用缩放系数K对原始当前图像帧缩放至少一次后的图像帧,参考图像帧为原始参考图像帧或采用缩放系数K对原始参考图像帧缩放至少一次后的图像帧。
基于本方案,可以对当前图像帧和参考图像帧进行至少一次缩放,得到不同尺寸的当前图像帧和参考图像帧,并对相应尺寸的当前图像帧和参考图像帧进行下采样和上采样处理,以减少当前图像帧和参考图像帧中不代表主要信息的高频部分,过滤噪声信号,再根据处理后的当前图像帧和参考图像帧能够较为准确的得到每个尺寸下每个当前子图像块的最佳匹配块。
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,在当前子图像块对应的搜索范围内,分别计算该搜索范围内的多个参考子图像块与当前子图像块之间的距离。对搜索范围内的多个参考子图像块与当前子图像块之间的距离进行正则化处理,得到多个候选相似度。将多个候选相似度中最小的相似度对应的参考子图像块确定为当前子图像块的匹配块。
基于本方案,在当前子图像块对应的搜索范围内,通过计算参考子图像块与当前子图像块之间的距离并对距离进行处理,将最小的相似度对应的参考子图像块确定为当前子图像块的匹配块,能够确保得到的匹配块为当前图像块的最佳匹配块。
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,每个当前子图像块对应的多个候选相似度中最小的相似度为该当前子图像块对应的相似度。
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,该方法还包括:根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在第一当前子图像块对应的第一运动矢量和第二当前子图像块对应的第二运动矢量中确定目标运动矢量。其中,第一当前子图像块为第二当前子图像块进行缩放后的图像块。
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,比较第一相似度和第二相似度,在第一相似度小于或等于第二相似度的情况下,将第一运动矢量确定为目标运动矢量。在第一相似度大于第二相似度的情况下,将第二运动矢量确定为目标运动矢量。
基于本方案,由于对当前图像帧和参考图像帧进行缩放时,每个尺寸下都可以得到一组运动矢量和相似度,因此需要对当前图像块在不同尺寸下的多组相似度进行比较,并确定最佳的运动矢量。而且本方案在确定最佳运动矢量时,如果大块的相似度高于小块的相似度,将大块对应的运动矢量作为 最佳运动矢量,如果大块的相似度低于小块的相似度,将小块对应的运动矢量作为最佳运动矢量,因此可以在不同尺寸下选择最佳的运动矢量,进一步提高运动矢量的准确度。另外,通过本方案可以将当前图像帧划分为不同大小的图像块进行编码。可以理解的,相似度数值越小表示相似度越高,相似度数值越大表示相似度越低。
结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,基于目标运动矢量对当前图像帧进行编码。
本申请实施例第二方面,提供一种图像处理装置。图像处理装置包括:
获取模块,用于获取当前图像帧和参考图像帧;
采样模块,用于将当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧;
划分模块,用于根据预设划分方式分别将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块;
确定模块,用于在在多个参考子图像块中确定与每个当前子图像块相似度最小的参考子图像块作为该当前子图像块的匹配块;
确定模块,还用于基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到当前子图像块对应的运动矢量;
编码模块,用于基于运动矢量对当前图像帧进行编码。
结合第二方面,在一种可能的实现方式中当前图像帧为原始当前图像帧或采用缩放系数K对原始当前图像帧缩放至少一次后的图像帧,参考图像帧为原始参考图像帧或采用缩放系数K对原始参考图像帧缩放至少一次后的图像帧。
结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,确定模块具体用于:在当前子图像块对应的搜索范围内,分别计算该搜索范围内的多个参考子图像块与当前子图像块之间的距离。对搜索范围内的多个参考子图像块与当前子图像块之间的距离进行正则化处理,得到多个候选相似度。将多个候选相似度中最小的相似度对应的参考子图像块确定为当前子图像块的匹配块。
结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,每个当前子图像块对应的多个候选相似度中最小的相似度为该当前子图像块对应的相似度。
结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,确 定模块还用于根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在第一当前子图像块对应的第一运动矢量和第二当前子图像块对应的第二运动矢量中确定目标运动矢量;其中,第一当前子图像块为第二当前子图像块进行缩放后的图像块。
结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,确定模块具体用于比较第一相似度和第二相似度,在第一相似度小于或等于第二相似度的情况下,将第一运动矢量确定为目标运动矢量;在第一相似度大于第二相似度的情况下,将第二运动矢量确定为目标运动矢量。
结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,编码模块具体用于基于目标运动矢量对当前图像帧进行编码。
本申请实施例第三方面,提供一种计算机可读存储介质。该计算机可读存储介质存储有计算机程序指令,当计算机程序指令在计算机(例如,图像处理装置)上运行时,使得计算机执行如上述任一实施例所述的图像处理方法。
本申请实施例第四方面,提供一种计算机程序产品。该计算机程序产品包括计算机程序指令,当在计算机(例如,图像处理装置)上执行计算机程序指令时,计算机程序指令使计算机执行如上述任一实施例所述的图像处理方法。
本申请实施例第五方面,提供一种计算机程序。当该计算机程序在计算机(例如,图像处理装置)上执行时,该计算机程序使计算机执行如上述实施例所述的图像处理方法。
附图说明
为了更清楚地说明本公开中的技术方案,下面将对本公开一些实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例的附图,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。此外,以下描述中的附图可以视作示意图,并非对本公开实施例所涉及的产品的实际尺寸、方法的实际流程、信号的实际时序等的限制。
图1为本申请实施例提供的一种图像编辑器的结构图;
图2为本申请实施例提供的一种图像处理方法的流程图;
图3为本申请实施例提供的一种图像处理方法的应用图;
图4为本申请实施例提供的另一种图像处理方法的应用图;
图5为本申请实施例提供的又一种图像处理方法的应用图;
图6为本申请实施例提供的再一种图像处理方法的应用图;
图7为本申请实施例提供的再一种图像处理方法的应用图;
图8为本申请实施例提供的再一种图像处理方法的应用图;
图9为本申请实施例提供的再一种图像处理方法的应用图;
图10为本申请实施例提供的再一种图像处理方法的应用图;
图11为本申请实施例提供的再一种图像处理方法的应用图;
图12为本申请实施例提供的另一种图像处理方法的流程图;
图13为本申请实施例提供的再一种图像处理方法的应用图;
图14为本申请实施例提供的一种图像处理装置的结构图。
具体实施方式
下面将结合附图,对本公开一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开所提供的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
除非上下文另有要求,否则,在整个说明书和权利要求书中,术语“包括(comprise)”及其其他形式例如第三人称单数形式“包括(comprises)”和现在分词形式“包括(comprising)”被解释为开放、包含的意思,即为“包含,但不限于”。在说明书的描述中,术语“一个实施例(one embodiment)”、“一些实施例(some embodiments)”、“示例性实施例(exemplary embodiments)”、“示例(example)”、“特定示例(specific example)”或“一些示例(some examples)”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外,所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
“A和/或B”,包括以下三种组合:仅A,仅B,及A和B的组合。
如本文中所使用,根据上下文,术语“如果”任选地被解释为意思是“当……时”或“在……时”或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定……”或“如果检测到[所陈述的条件或事件]”任选地被解释为是指“在确定……时”或“响应于确定……”或“在检 测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
另外,“基于”的使用意味着开放和包容性,因为“基于”一个或多个所述条件或值的过程、步骤、计算或其他动作在实践中可以基于额外条件或超出所述的值。
首先,对本公开实施例中的名词进行解释。
下采样(或称为降采样)是指缩小图像帧,通过下采样可以使图像帧符合显示区域的大小,或生成图像帧的缩略图。例如,对尺寸为M*N像素的图像帧进行c倍下采样,得到(M/c)*(N/c)像素的图像帧,其中c为M和N的公约数。
上采样(或称为图像插值)是指放大图像帧,图像帧通过上采样可以显示在更高分辨率的显示设备上。上采样几乎都是采用内插值方法,即在原有图像帧像素的基础上,在像素点之间采用合适的插值算法***新的元素。
帧内编码是指在进行图像压缩处理时,对图像帧进行离散余弦变换(discrete cosine transform,DCT)、Z字形扫描、量化处理和可变长度编码(variable length coding,VLC)等处理的编码方式。
帧间编码是指在进行图像压缩处理时,利用视频图像的时间冗余,对图像帧之间的运动矢量和纹理(预测残差)信息进行编码的编码方式。
接下来,对本公开涉及的***架构进行介绍。
图1是本公开实施例提供的一种应用本公开所提供方法的图像编码器的结构图。该图像编码器包括残差计算单元、选择开关、DCT模块、量化器、VLC模块、缓冲器、速率控制模块、去量化器、逆DCT模块、帧存储模块以及运动估计和补偿模块。
如图1所示,图像编码器通过输入接口接收图像帧。例如,该图像帧可以为形成视频或视频序列的图片序列中的一帧图像。图像编码器接收的图像帧也可以称为当前图像帧或待编码图像帧。图像编码器可以对其接收的图像帧进行帧内编码或帧间编码。
选择开关用于在对图像进行压缩时,选择帧内编码的图像压缩方式,或者,选择帧间编码的图像压缩方式。为了更好的压缩图像,帧内编码一般用于静止图像(图片),帧间编码一般用于活动图像(视频)。
残差计算单元用于基于当前图像帧和预测图像帧计算残差。DCT模块用于将空间域图像变换到频率域进行图像压缩。在空间域看来,图像内容千差万别,但在频率域上,经过对大量图像的统计分析发现,图像经过DCT变换后,其频率系数的主要成分集中于比较小的范围,且主要位于低频部分。根 据图像信号在频率域的统计特性,可以采取一些措施把频谱中能量较小的部分舍弃,尽量保留传输频谱中主要的频率分量,从而达到图像数据压缩目的。
量化器用于对DCT模块处理后的频率数据再次进行处理,进一步压缩数据量。由于人眼睛对各种频率的敏感程度不同,可以对DCT模块处理后的频率数据进行量化处理,经过量化处理后的DCT系数矩阵将出现许多零值。一般左上角位置的数据的商数是非0,在右下角位置的数据的商数很小,经四舍五入取整值后可简写为0。在系数矩阵上出现了许多0值,大大减少了数据量,一方面保留了图像信息的主体部分,另一方面压缩了图像数据。
VLC模块用于对上述量化处理后的系数矩阵进行编码。VLC模块编码时可以通过Z字形扫描将上述量化处理后的系数矩阵化为一维数组,该一维数组的尾部具有多个“0”,可以将该多个“0”用其他的形式代替,解码时再恢复该多个“0”,以便填满矩阵的64位。因此,通过VLC编码可以进一步的压缩图像数据。例如,00000000可以表示为80,在解码时恢复为00000000。
对于静止图像,经过上述DCT模块、量化器和VLC模块可以完成图像数据的帧内编码,降低图像数据量。
缓冲器用于暂存图像压缩数据。
速率控制模块用于根据缓冲器的数据缓存量调整图像的码率。当图像较复杂时码率较高,图像较简单时码率较低,速率控制模块用于将码率调控在一定的范围内。
去量化器、逆DCT模块用于将编码后的图像数据恢复至编码前的图像数据作为参考图像帧。
帧存储用于存储参考帧和运动矢量。
运动估计和补偿模块用于根据参考图像帧与当前图像帧计算运动矢量和运动残差的计算。
对于活动图像,根据参考帧、运动矢量和运动残差可以进行图像数据的帧间编码,降低图像数据量。
目前,在帧间编码的图像处理过程中,搜索的匹配块可能不是最佳的匹配块,导致得到的运动矢量不准确,不准确的运动矢量将导致比特流的大小较大,需要占用较大的存储空间。而且不准确的运动矢量将降低图像质量。为了解决该问题,本申请实施例提供一种图像处理方法,能够找到最佳的匹配块,得到更准确的运动矢量,因此能够减小比特流大小,提升图像质量。
图2为本申请实施例提供的一种图像处理方法的流程图,如图2所示,该方法包括步骤201-206。
201、获取当前图像帧和参考图像帧。
结合图1,当前图像帧为编码器接收的图像帧,参考图像帧为编码后的图像数据经去量化器和逆DCT模块恢复的图像帧。
可选的,当前图像帧可以为原始当前图像帧或采用缩放系数K对原始当前图像帧缩放至少一次后的图像帧,参考图像帧可以为原始参考图像帧或采用缩放系数K对原始参考图像帧缩放至少一次后的图像帧。本申请实施例对于当前图像帧和参考图像帧是否进行缩放并不限定,对于缩放系数K的具体取值并不限定,下述实施例以缩放系数K等于2为例进行示例性说明。
当前图像帧与参考图像帧能够缩放的次数与图像编码标准有关。例如,以图像编码标准为活动图像专家组(moving picture experts group-2,MPEG2)为例,由于MPEG2只支持8*8大小的图像块,因此不对当前图像帧与参考图像帧进行缩放,此时,当前图像帧为未经缩放的当前图像帧(也可以称为原始当前图像帧),参考图像帧为未经缩放的参考图像帧(也可以称为原始参考图像帧)。以图像编码标准为高效视频编码(high efficiency video coding,HEVC)为例,由于HEVC支持4*4、8*8、16*16、32*32、64*64的图像块,因此可以采用缩放系数为2对当前图像帧与参考图像帧进行4次缩放,得到5个层级的当前图像帧与参考图像帧,而且该5个层级的当前图像帧与参考图像帧的尺寸依次减小。本申请实施例对于采用何种编码标准进行图像编码、缩放系数的具体大小,以及具体缩放次数并不限定。为了方便说明,下述实施例以缩放系数为2,对当前图像帧和参考图像帧分别进行2次缩放为例进行说明。
如图3所示,以当前图像帧的大小为1280*1280,参考图像帧的大小为1280*1280,缩放系数K为2,缩放2次为例。首先,采用缩放系数2,对1280*1280的当前图像帧和参考图像帧分别进行一次缩放,得到640*640像素的当前图像帧和640*640像素的参考图像帧。然后,再采用缩放系数2,对640*640像素的当前图像帧和640*640的参考图像帧分别进行一次缩放,得到320*320像素的当前图像帧和320*320像素的参考图像帧。即,缩放次数为2时,可以得到三个层级的当前图像帧与参考图像帧,第一层为1280*1280像素的当前图像帧和1280*1280像素的参考图像帧,第二层为640*640像素的当前图像帧和640*640像素的参考图像帧,第三层为320*320像素的当前图像帧和320*320像素的参考图像帧。
202、将当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧。
如图4所示,以当前图像帧与参考图像帧不能进行缩放为例,当前图像帧x n依次进行下采样和上采样,得到处理后的当前图像帧x n+1,参考图像帧y n依次进行下采样和上采样,得到处理后的当前图像帧y n+1
可选的,当参考图像帧经过多次缩放后,为了避免图像信息的丢失,缩放最多次得到的参考图像帧可以不进行下采样、上采样处理,直接将缩放最多次得到参考图像帧作为处理后的参考图像帧。本申请实施例对于缩放最多次得到的参考图像帧是否进行下采样、上采样处理并不限定,下述实施例以对缩放最多次得到的参考图像帧不进行下采样、上采样处理为例进行示例性说明。
如图5所示,以当前图像帧为1280*1280像素的图像帧,参考图像帧为1280*1280像素的图像帧,缩放系数K为2,缩放2次为例。1280*1280像素的当前图像帧p 0依次进行下采样和上采样,得到处理后的当前图像帧p 1,640*640像素的当前图像帧p n依次进行下采样和上采样,得到处理后的当前图像帧p n+1,320*320像素的当前图像帧p N依次进行下采样和上采样,得到处理后的当前图像帧p N+1。1280*1280像素的参考图像帧q 0依次进行下采样和上采样,得到处理后的参考图像帧q 1,640*640像素的参考图像帧q n依次进行下采样和上采样,得到处理后的参考图像帧q n+1,320*320像素的参考图像帧q N不进行下采样和上采样,将参考图像帧q N作为处理后的参考图像帧q N+1
本申请实施例提供的图像处理方法,通过对当前图像帧和参考图像帧进行至少一次缩放,得到不同尺寸的当前图像帧和参考图像帧,并对相应尺寸的当前图像帧和参考图像帧进行下采样和上采样处理,以减少当前图像帧和参考图像帧中不代表主要信息的高频部分,过滤噪声信号,再根据处理后的当前图像帧和参考图像帧能够较为准确的得到每个尺寸下每个当前子图像块的最佳匹配块。
203、根据预设划分方式分别将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块。
对于不同的图像编码标准,可以采用不同的预设划分方式。如果图像编码标准支持一种划分方式,将该划分方式作为预设划分方式。如果图像编码标准支持多种划分方式,可以将包含像素数量最少的划分方式作为预设划分方式。本申请实施例对于预设划分方式具体采用哪种划分方式并不限定。
例如,以图像编码标准为MPEG2为例,由于MPEG2仅支持8*8像素的图像块,因此预设划分方式为采用8*8像素大小,将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块。以图像 编码标准为HEVC为例,由于HEVC支持4*4、8*8、16*16、32×32和64×64像素的图像块,因此预设划分方式可以采用4*4像素大小,将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块。
如图6所示,以编码标准为HEVC为例,可以采用4*4像素的预设划分方式将处理后的当前图像帧划分为M个当前子图像块,将处理后的参考图像帧划分为M个参考子图像块。
多个当前子图像块的具体数量和多个参考子图像块的具体数量,与当前图像帧和参考图像帧的图像大小,以及预设划分方式有关。本申请实施例对于多个当前子图像块的具体数量和多个参考子图像块的具体数量并不限定。
204、在多个参考子图像块中确定与每个当前子图像块相似度最小的参考子图像块作为该当前子图像块的匹配块。
在多个参考子图像块中确定当前子图像块的匹配块可以包括步骤1-3。
步骤1、在当前子图像块对应的搜索范围内,分别计算该搜索范围内的多个参考子图像块与当前子图像块之间的距离。
每个当前子图像块对应一个搜索范围,不同子图像块对应的搜索范围可以不同,不同子图像块对应的搜索范围可以包括同一个图像块。本申请实施例对每个当前子图像块对应的搜索范围的大小和该搜索范围与当前子图像块的位置关系并不限定,该当前子图像块对应的搜索范围的大小与图像编码标准有关。
示例性的,以图像编码标准为HEVC,采用4*4像素的预设划分方式将处理后的当前图像帧划分为多个当前子图像块为例,当前子图像块对应的搜索范围可以为该当前子图像块周围的正方形中以半径=16像素或半径=32像素等进行搜索。
可选的,可以通过均方误差(mean squared error,MSE)计算搜索范围内的多个参考子图像块与当前子图像块之间的距离:
D i,j=MSE(K i,Q j)
其中,D i,j表示参考子图像块与当前子图像块之间的距离,K i表示当前子图像块,Q j表示参考子图像块。
结合图6,如图7所示,以处理后的当前图像帧划分为M个当前子图像块为例,图7中的L个参考子图像块为当前子图像块对应的搜索范围内的图像块,通过MSE可以计算得到多个参考子图像块与当前子图像块之间的距离,距离越近,表示两个图像块之间的相似性越高。
步骤2、对搜索范围内的多个参考子图像块与当前子图像块之间的距离进行正则化处理,得到多个候选相似度。
本申请实施例对于正则化的具体处理方式并不限定。正则化处理得到的相似度越小,参考子图像块与当前子图像块之间的差异越小,参考子图像块与当前子图像块越相似。
示例性的,可以通过以下公式对搜索范围内的多个参考子图像块与当前子图像块之间的距离进行正则化处理:
Figure PCTCN2023070405-appb-000001
其中S i,j表示参考子图像块与当前子图像块之间的相似度,D i,j表示参考子图像块与当前子图像块之间的距离,α为非0的参数,
Figure PCTCN2023070405-appb-000002
表示当前子图像块与多个参考子图像块的距离中最小的D i,j
如图8所示,以处理后的当前图像帧划分为M个当前子图像块,当前子图像块的搜索范围内包括L个参考子图像块为例,通过对多个参考子图像块与当前子图像块之间的距离进行正则化处理,得到多个候选相似度。
步骤3、将多个候选相似度中最小的相似度对应的参考子图像块确定为当前子图像块的匹配块。
可选的,可以通过以下公式确定当前子图像块的匹配块:
Figure PCTCN2023070405-appb-000003
其中,j nn(i)表示当前子图像块对应的参考子图像块(匹配块),
Figure PCTCN2023070405-appb-000004
表示在多个候选相似度中寻找最小相似度对应的参考子图像块。
如图9所示,以处理后的当前图像帧划分为M个当前子图像块,当前子图像块的搜索范围内包括L个参考子图像块为例,通过j nn(i)可以在多个候选相似度中找到最小的相似度对应的参考子图像块,即0.7、0.3和0.2等最小的相似度对应的参考子图像块为当前子图像块的匹配块。
本申请实施例提供的图像处理方法,通过计算参考子图像块与当前子图像块之间的距离并对距离进行处理,将最小的相似度对应的参考子图像块确定为当前子图像块的匹配块,能够确保得到的匹配块为当前图像块的最佳匹配块。
205、基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到当前子图像块对应的运动矢量。
根据图像编码标准的不同,当前图像帧和参考图像帧可以缩放的次数不 同,得到当前子图像块对应的运动矢量的数量也不同。本申请实施例对于具体采用图像编码标准的类型,得到运动矢量的具体数量并不限定。
如图10所示,上述步骤203和204可以由最近邻(patches nearest neighbors,PNN)模块执行,PNN模块的具体执行步骤可参考上述203和204的相关内容。以图像编码标准为MPEG2为例,当前图像帧和参考图像帧不进行缩放,根据当前子图像块,以及当前子图像块对应的匹配块,可以得到当前子图像块对应的1组运动矢量mv=(c x,c y),其中(c x,c y)表示当前图像块与匹配块之间的相对坐标。
可选的,如果当前图像帧和参考图像帧可以缩放多次,在每种图像尺寸下,根据当前子图像块,以及该当前子图像块对应的匹配块可以得到该当前子图像块对应的一个运动矢量。因此,图像缩放多次后,对于每个当前子图像块可以得到多组运动矢量:
mv=(K gc x,K gc y)
其中,K表示缩放系数,g表示当前子图像块对应的不同缩放层级。
如图11所示,以当前图像帧和参考图像帧可以缩放2次,共3个缩放层级为例,对于第一层,运动矢量mv1=(K 1c x,K 1c y)。对于第二层,运动矢量mv2=(K 2c x,K 2c y)。对于第三层,运动矢量mv3=(K 3c x,K 3c y)。即在每一层得到当前子图像块的匹配块之间的运动矢量(c x,c y)后,还需要将该运动矢量(c x,c y)乘以K g,得到该当前图像块未压缩时对应的运动矢量。
206、基于运动矢量对当前图像帧进行编码。
如果基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到一组当前子图像块对应的运动矢量,基于该运动矢量对当前图像帧进行编码。
本申请实施例提供的图像处理方法,通过将当前图像帧和参考图像帧依次进行下采样和上采样处理,能够减少当前图像帧和参考图像帧中不代表主要信息的高频部分,过滤噪声信号,再通过对处理后的当前图像帧和参考图像帧划分图像块,能够较为准确的得到每个当前子图像块的最佳匹配块,因此,根据该最佳匹配块得到的运动矢量较准确,基于该运动矢量对当前图像帧进行编码时,能够减小比特流大小,提升图像质量。
如图12所示,如果基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到多组当前子图像块对应的运动矢量,本申请实施例提供的图像处理方法除包括上述步骤201-206,在步骤206之前还可以包括步骤207。
207、根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在第一当前子图像块对应的第一运动矢量和第二当前子图像 块对应的第二运动矢量中确定目标运动矢量。其中,第一当前子图像块为第二当前子图像块进行缩放后的图像块。
可选的,第一当前子图像块可以是第二当前子图像块进行1次缩放后的图像块,第一当前子图像块也可以是第二当前子图像块进行多次缩放后的图像块。本申请实施例对于第一当前子图像块具体为第二当前子图像块进行几次缩放后的图像块并不限定。
第二当前子图像块可以为多个当前子图像块,每个当前子图像块可以对应一个第二相似度。本申请实施例对于第二当前子图像块具体包括的当前子图像块的数量并不限定,该第二当前子图像块包括的当前子图像块的数量与缩放系数等参数有关。
例如,如图3所示,以第一当前子图像块为320*320像素的当前图像帧中的图像块,第二当前子图像块为640*640像素的当前图像帧中的图像块,第一当前子图像块和第二当前子图像块的大小均为4*4为例,由于320*320像素的当前图像帧是640*640像素的当前图像帧进行一次缩放后的图像帧,320*320像素的当前图像帧较640*640像素的当前图像帧的尺寸小,因此320*320像素的当前图像帧中的一个4*4的第一当前子图像块对应640*640像素的当前图像帧中的4个4*4的第二当前子图像块。也就是说,4个4*4的第二当前子图像块经过一次缩放后可以得到一个4*4的第一当前子图像块。
比较第一相似度和第二相似度,在第一相似度小于或等于第二相似度的情况下,将第一运动矢量确定为目标运动矢量。在第一相似度大于第二相似度的情况下,将第二运动矢量确定为目标运动矢量。
如图13所示,以缩放系数K等于2,第一当前子图像块包括1个子图像块,第一相似度为S 0,第一当前子图像块为第二当前子图像块进行1次缩放后的图像块为例,第二当前子图像块包括4个子图像块,第二相似度包括S 1、S 2、S 3和S 4。依次比较第一相似度S 0和第二相似度S 1-S 4,在第一相似度S 0小于或等于第二相似度S 1-S 4的情况下,将第一运动矢量确定为目标运动矢量。在第一相似度S 0大于S 1-S 4中任一第二相似度的情况下,将第二运动矢量确定为目标运动矢量。
相应的,步骤206中可以基于步骤207确定的目标运动矢量对当前图像帧进行编码。
可选的,可以使用张量处理框架并行处理上述步骤203-207提高处理效率。例如,Pytorch(python torch)、Tensorflow。本申请实施例对于具体采用什么类型的张量处理框架进行并行计算并不限定。
可选的,可以通过图形处理器(graphics processing unit,GPU)提高编码效率。
本申请实施例提供的图像处理方法由于对当前图像帧和参考图像帧进行缩放时,每个尺寸下都可以得到一组运动矢量和相似度,因此需要对当前图像块在不同尺寸下的多组相似度进行比较,并确定最佳的运动矢量。而且本方案在确定最佳运动矢量时,如果大块的相似度高于小块的相似度,将大块对应的运动矢量作为最佳运动矢量,如果大块的相似度低于小块的相似度,将小块对应的运动矢量作为最佳运动矢量,因此可以在不同尺寸下选择最佳的运动矢量,进一步提高运动矢量的准确度。另外,通过本方案可以将当前图像帧划分为不同大小的图像块进行编码。可以理解的,相似度数值越小表示相似度越高,相似度数值越大表示相似度越低。
本申请实施例提供一种图像处理装置,该装置可以为图像编码器,具体的,图像处理装置用于执行以上图像处理方法中的步骤201-207。本申请实施例提供的图像处理装置可以包括相应步骤所对应的模块。
本申请实施例可以根据上述方法示例对图像处理装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图14示出上述实施例中所涉及图像处理装置的一种可能的结构示意图。如图14所示,图像处理装置1400可以包括获取模块1401、采样模块1402、划分模块1403、确定模块1404、编码模块1405。具体的,各模块功能如下:
获取模块1401,用于获取当前图像帧和参考图像帧。
采样模块1402,用于将当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧。
划分模块1403,用于根据预设划分方式分别将处理后的当前图像帧和处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块。
确定模块1404,用于在多个参考子图像块中确定与每个当前子图像块相似度最小的参考子图像块作为该当前子图像块的匹配块。
确定模块1404,还用于基于每个当前子图像块,以及该当前子图像块对应的匹配块,得到当前子图像块对应的运动矢量。
编码模块1405,用于基于运动矢量对当前图像帧进行编码。
在一种可行的实施方式中,当前图像帧为原始当前图像帧或采用缩放系数K对原始当前图像帧缩放至少一次后的图像帧,参考图像帧为原始参考图像帧或采用缩放系数K对原始参考图像帧缩放至少一次后的图像帧。
在一种可行的实施方式中,确定模块1404具体用于:在当前子图像块对应的搜索范围内,分别计算该搜索范围内的多个参考子图像块与当前子图像块之间的距离。对搜索范围内的多个参考子图像块与当前子图像块之间的距离进行正则化处理,得到多个候选相似度。将多个候选相似度中最小的相似度对应的参考子图像块确定为当前子图像块的匹配块。
在一种可行的实施方式中,每个当前子图像块对应的多个候选相似度中最小的相似度为该当前子图像块对应的相似度。
在一种可行的实施方式中,确定模块1404还用于根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在第一当前子图像块对应的第一运动矢量和第二当前子图像块对应的第二运动矢量中确定目标运动矢量。其中,第一当前子图像块为第二当前子图像块进行缩放后的图像块。
在一种可行的实施方式中,确定模块1404具体用于比较第一相似度和第二相似度,在第一相似度小于或等于第二相似度的情况下,将第一运动矢量确定为目标运动矢量。在第一相似度大于第二相似度的情况下,将第二运动矢量确定为目标运动矢量。
在一种可行的实施方式中,编码模块1405具体用于基于目标运动矢量对当前图像帧进行编码。
本申请的一些实施例提供了一种计算机可读存储介质(例如,非暂态计算机可读存储介质),该计算机可读存储介质中存储有计算机程序指令,计算机程序指令在计算机(例如,图像处理装置)上运行时,使得计算机执行如上述实施例中任一实施例所述的图像处理方法。
示例性的,上述计算机可读存储介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,CD(Compact Disk,压缩盘)、DVD(Digital Versatile Disk,数字通用盘)等),智能卡和闪存器件(例如,EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、卡、棒或钥匙驱动器等)。本公开描述的各种计算机可读存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读存储介质。术语“机器可读存储介质”可包括但不限于,无线信道和能够存储、包含和/或承载指 令和/或数据的各种其它介质。
本公开的一些实施例还提供了一种计算机程序产品,例如该计算机程序产品存储在非瞬时性的计算机可读存储介质上。该计算机程序产品包括计算机程序指令,在计算机(例如,图像处理装置)上执行该计算机程序指令时,该计算机程序指令使计算机执行如上述实施例所述的图像处理方法。
本公开的一些实施例还提供了一种计算机程序。当该计算机程序在计算机(例如,图像处理装置)上执行时,该计算机程序使计算机执行如上述实施例所述的图像处理方法。
上述计算机可读存储介质、计算机程序产品及计算机程序的有益效果和上述一些实施例所述的图像处理方法的有益效果相同,此处不再赘述。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (14)

  1. 一种图像处理方法,所述方法包括:
    获取当前图像帧和参考图像帧;
    将所述当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将所述参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧;
    根据预设划分方式分别将所述处理后的当前图像帧和所述处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块;
    在所述多个参考子图像块中确定与每个所述当前子图像块相似度最小的所述参考子图像块作为该当前子图像块的匹配块;
    基于每个所述当前子图像块,以及该当前子图像块对应的匹配块,得到所述当前子图像块对应的运动矢量;
    基于所述运动矢量对所述当前图像帧进行编码。
  2. 根据权利要求1所述的方法,所述当前图像帧为原始当前图像帧或采用缩放系数K对所述原始当前图像帧缩放至少一次后的图像帧,所述参考图像帧为原始参考图像帧或采用所述缩放系数K对所述原始参考图像帧缩放至少一次后的图像帧。
  3. 根据权利要求2所述的方法,所述在所述多个参考子图像块中确定与每个所述当前子图像块相似度最小的所述参考子图像块作为该当前子图像块的匹配块,包括:
    在所述当前子图像块对应的搜索范围内,分别计算该搜索范围内的多个所述参考子图像块与所述当前子图像块之间的距离;
    对所述搜索范围内的多个所述参考子图像块与所述当前子图像块之间的距离进行正则化处理,得到多个候选相似度;
    将所述多个候选相似度中最小的相似度对应的所述参考子图像块确定为所述当前子图像块的匹配块。
  4. 根据权利要求3所述的方法,每个所述当前子图像块对应的多个候选相似度中最小的相似度为该当前子图像块对应的相似度。
  5. 根据权利要求4所述的方法,所述方法还包括:
    根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在所述第一当前子图像块对应的第一运动矢量和所述第二当前子图像块对应的第二运动矢量中确定目标运动矢量;其中,所述第一当前子图像块为所述第二当前子图像块进行缩放后的图像块。
  6. 根据权利要求5所述的方法,所述根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在所述第一当前子图像块对 应的第一运动矢量和所述第二当前子图像块对应的第二运动矢量中确定目标运动矢量,包括:
    比较所述第一相似度和所述第二相似度,在所述第一相似度小于或等于第二相似度的情况下,将所述第一运动矢量确定为所述目标运动矢量;在所述第一相似度大于所述第二相似度的情况下,将所述第二运动矢量确定为所述目标运动矢量。
  7. 根据权利要求5或6所述的方法,所述基于所述运动矢量对所述当前图像帧进行编码,包括:
    基于所述目标运动矢量对所述当前图像帧进行编码。
  8. 一种图像处理装置,所述装置包括:
    获取模块,用于获取当前图像帧和参考图像帧;
    采样模块,用于将所述当前图像帧依次进行下采样和上采样,得到处理后的当前图像帧,将所述参考图像帧依次进行下采样和上采样,得到处理后的参考图像帧;
    划分模块,用于根据预设划分方式分别将所述处理后的当前图像帧和所述处理后的参考图像帧划分为多个当前子图像块和多个参考子图像块;
    确定模块,用于在在所述多个参考子图像块中确定与每个所述当前子图像块相似度最小的所述参考子图像块作为该当前子图像块的匹配块;
    确定模块,还用于基于每个所述当前子图像块,以及该当前子图像块对应的匹配块,得到所述当前子图像块对应的运动矢量;
    编码模块,用于基于所述运动矢量对所述当前图像帧进行编码。
  9. 根据权利要求8所述的装置,所述当前图像帧为原始当前图像帧或采用缩放系数K对所述原始当前图像帧缩放至少一次后的图像帧,所述参考图像帧为原始参考图像帧或采用所述缩放系数K对所述原始参考图像帧缩放至少一次后的图像帧。
  10. 根据权利要求9所述的装置,所述确定模块具体用于:
    在所述当前子图像块对应的搜索范围内,分别计算该搜索范围内的多个所述参考子图像块与所述当前子图像块之间的距离;
    对所述搜索范围内的多个所述参考子图像块与所述当前子图像块之间的距离进行正则化处理,得到多个候选相似度;
    将所述多个候选相似度中最小的相似度对应的所述参考子图像块确定为所述当前子图像块的匹配块。
  11. 根据权利要求10所述的装置,每个所述当前子图像块对应的多个候 选相似度中最小的相似度为该当前子图像块对应的相似度。
  12. 根据权利要求11所述的装置,所述确定模块还用于根据第一当前子图像块对应的第一相似度和第二当前子图像块对应的第二相似度,在所述第一当前子图像块对应的第一运动矢量和所述第二当前子图像块对应的第二运动矢量中确定目标运动矢量;其中,所述第一当前子图像块为所述第二当前子图像块进行缩放后的图像块。
  13. 根据权利要求12所述的装置,所述确定模块具体用于比较所述第一相似度和所述第二相似度,在所述第一相似度小于或等于第二相似度的情况下,将所述第一运动矢量确定为所述目标运动矢量;在所述第一相似度大于所述第二相似度的情况下,将所述第二运动矢量确定为所述目标运动矢量。
  14. 根据权利要求12或13所述的装置,所述编码模块具体用于基于所述目标运动矢量对所述当前图像帧进行编码。
PCT/CN2023/070405 2022-01-25 2023-01-04 一种图像处理方法和装置 WO2023142926A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210086020.X 2022-01-25
CN202210086020.XA CN114531596A (zh) 2022-01-25 2022-01-25 图像处理方法和装置

Publications (1)

Publication Number Publication Date
WO2023142926A1 true WO2023142926A1 (zh) 2023-08-03

Family

ID=81622256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/070405 WO2023142926A1 (zh) 2022-01-25 2023-01-04 一种图像处理方法和装置

Country Status (2)

Country Link
CN (1) CN114531596A (zh)
WO (1) WO2023142926A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114531596A (zh) * 2022-01-25 2022-05-24 京东方科技集团股份有限公司 图像处理方法和装置
CN114900691B (zh) * 2022-07-14 2022-10-28 浙江大华技术股份有限公司 编码方法、编码器及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103929648A (zh) * 2014-03-27 2014-07-16 华为技术有限公司 一种帧率上采样中的运动估计方法和装置
CN108702512A (zh) * 2017-10-31 2018-10-23 深圳市大疆创新科技有限公司 运动估计方法和装置
CN108833917A (zh) * 2018-06-20 2018-11-16 腾讯科技(深圳)有限公司 视频编码、解码方法、装置、计算机设备和存储介质
US20210084291A1 (en) * 2019-03-11 2021-03-18 Alibaba Group Holding Limited Inter coding for adaptive resolution video coding
CN112738517A (zh) * 2019-10-14 2021-04-30 珠海格力电器股份有限公司 运动估计搜索方法、装置、设备及存储介质
CN114531596A (zh) * 2022-01-25 2022-05-24 京东方科技集团股份有限公司 图像处理方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103929648A (zh) * 2014-03-27 2014-07-16 华为技术有限公司 一种帧率上采样中的运动估计方法和装置
CN108702512A (zh) * 2017-10-31 2018-10-23 深圳市大疆创新科技有限公司 运动估计方法和装置
CN108833917A (zh) * 2018-06-20 2018-11-16 腾讯科技(深圳)有限公司 视频编码、解码方法、装置、计算机设备和存储介质
US20210084291A1 (en) * 2019-03-11 2021-03-18 Alibaba Group Holding Limited Inter coding for adaptive resolution video coding
CN113597764A (zh) * 2019-03-11 2021-11-02 阿里巴巴集团控股有限公司 用于自适应分辨率视频编码的帧间编码
CN112738517A (zh) * 2019-10-14 2021-04-30 珠海格力电器股份有限公司 运动估计搜索方法、装置、设备及存储介质
CN114531596A (zh) * 2022-01-25 2022-05-24 京东方科技集团股份有限公司 图像处理方法和装置

Also Published As

Publication number Publication date
CN114531596A (zh) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2023142926A1 (zh) 一种图像处理方法和装置
JP3753578B2 (ja) 動きベクトル探索装置および方法
CN107318026B (zh) 视频编码器以及视频编码方法
CN108028941B (zh) 用于通过超像素编码和解码数字图像的方法和装置
Montrucchio et al. New sorting-based lossless motion estimation algorithms and a partial distortion elimination performance analysis
EP3146719B1 (en) Re-encoding image sets using frequency-domain differences
JP2004227519A (ja) 画像処理方法
AU2008245952B2 (en) Image compression and decompression using the pixon method
EP3104615B1 (en) Moving image encoding device and moving image encoding method
WO2019080892A1 (zh) 确定仿射编码块的运动矢量的方法和装置
JP2013537381A5 (zh)
CN108235020A (zh) 一种面向量化分块压缩感知的螺旋式逐块测量值预测方法
CN117014601A (zh) 用于发出虚拟边界和环绕运动补偿的信号的方法
US20150350670A1 (en) Coding apparatus, computer system, coding method, and computer product
EP2552115A1 (en) A method for coding a sequence of digital images
CN115836525B (zh) 用于从多个交叉分量进行预测的视频编码、解码方法和设备
Zhu et al. Video coding with spatio-temporal texture synthesis and edge-based inpainting
CN114651270A (zh) 通过时间可变形卷积进行深度环路滤波
Xia et al. Visual sensitivity-based low-bit-rate image compression algorithm
US11202082B2 (en) Image processing apparatus and method
JPH1023420A (ja) 動き検出方法および動き検出装置
CN114424528A (zh) 视频编码的运动补偿方法
US12003728B2 (en) Methods and systems for temporal resampling for multi-task machine vision
US7706440B2 (en) Method for reducing bit rate requirements for encoding multimedia data
WO2024078509A1 (en) Methods and non-transitory computer readable storage medium for pre-analysis based resampling compression for machine vision

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23745811

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18558741

Country of ref document: US