WO2023155032A1 - Image processing method and image processing apparatus - Google Patents

Image processing method and image processing apparatus Download PDF

Info

Publication number
WO2023155032A1
WO2023155032A1 PCT/CN2022/076277 CN2022076277W WO2023155032A1 WO 2023155032 A1 WO2023155032 A1 WO 2023155032A1 CN 2022076277 W CN2022076277 W CN 2022076277W WO 2023155032 A1 WO2023155032 A1 WO 2023155032A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scale
feature map
feature
weight
Prior art date
Application number
PCT/CN2022/076277
Other languages
French (fr)
Chinese (zh)
Inventor
林永兵
张培科
马莎
万蕾
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/076277 priority Critical patent/WO2023155032A1/en
Publication of WO2023155032A1 publication Critical patent/WO2023155032A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application relates to the field of image processing, and in particular to an image processing method and an image processing device.
  • the camera has the characteristics of high resolution, non-contact, convenient use, and low cost, and has a wide range of applications in the field of environmental perception.
  • more and more cameras are installed on vehicles, which can achieve no blind spot coverage.
  • users can observe any position of the vehicle; it can also realize machine vision artificial intelligence (AI) processing, such as , object recognition, semantic segmentation, traffic light detection, and lane line detection, etc., are conducive to ensuring the safety of automatic driving.
  • AI machine vision artificial intelligence
  • the output video requires more and more transmission bandwidth.
  • the encoder can compress the image or video output by the camera with a higher compression rate, which often uses lossy image or video compression technology, which will inevitably lead to image or video quality damage. If the image or video is severely damaged, it will affect the accuracy of subsequent tasks such as object detection and semantic segmentation.
  • the image compressed by the encoder can be reconstructed by the decoder to obtain a reconstructed image.
  • the quality of the reconstructed image is of key significance for subsequent tasks, so it is particularly important to have a high quality reconstructed image. However, the quality of the reconstructed image cannot be guaranteed due to the damage of the compressed image.
  • the present application provides an image processing method and an image processing device, which are used to obtain the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image.
  • an image processing method comprising: acquiring a first image and a second image; inputting the first image and the second image to a single-layer convolutional neural network respectively to obtain the first image of the first image
  • the feature map of the scale and the feature map of the first scale of the second image includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image
  • the number of feature maps is C;
  • the feature map of the first scale of the first image and the feature map of the second scale of the second image are respectively down-sampled M-1 times to obtain M-1 scales of the first image
  • the feature map and the feature map of M-1 scales of the second image wherein C and M are both positive integers greater than 1; according to the feature map of the first scale of the first image, the feature of the first scale of the second image graph, the M-1 scale feature map of the first image, the M-1 scale feature map of the second image, and the weight
  • the first image and the second image are two different images.
  • the second image may be a reconstructed image; when the first image is a reconstructed image, the second image may be an original image.
  • the parameters of the single-layer convolutional neural network come from the parameters of the first convolutional layer of the pre-trained model, and the pre-trained model is the pre-trained convolutional neural network model.
  • a single-layer convolutional neural network may include C convolution kernels, and different convolution kernels are used to extract different features and obtain different feature maps. These different feature maps can be referred to as first-scale feature maps.
  • the image processing device may perform the first downsampling in M-1 times of downsampling on the feature map of the first scale of the first image to obtain the feature map of the second scale of the first image, and the second scale of the first image
  • the feature map of the M-1 times of downsampling is performed for the second downsampling to obtain the feature map of the third scale of the first image, and so on, the feature map of the Mth scale of the first image can be obtained.
  • the image processing device performs M-1 downsampling on the first-scale feature map of the first image to obtain M-1 feature maps of the first image.
  • the image processing device can obtain feature maps of M-1 scales of the second image.
  • the number of feature maps of the first scale of the first image is C
  • the feature maps of the second scale of the first image are obtained
  • the number of feature maps of the second scale of the first image is also C
  • the number of feature maps of other scales of the first image is also C, that is, each feature map of M-1 scales of the first image
  • the number of feature maps of a scale is C.
  • the number of feature maps of each scale in the M-1 scale feature maps of the second image is also C.
  • the first feature map in the feature map of the first scale of the first image is down-sampled, and the first feature map of the feature map of the second scale of the first image can be obtained, and the other is the same, here No longer.
  • the image processing method provided by this application uses a single-layer convolutional neural network to obtain the feature map of the first scale. It only needs a single-layer convolution calculation, and the calculation is simple.
  • the underlying general features are extracted through a single-layer convolution, which is conducive to ensuring image multi-tasking.
  • the generalization ability of the feature map of the first scale is higher, which can ensure the quality of the image.
  • multiple downsampling of the feature map with high resolution can obtain more image feature information.
  • the matrix assigns weights to different feature maps of different scales, flexibly adjusts the weight values of image features and scale dimensions, optimizes machine vision performance, and achieves efficient machine vision-oriented image coding and compression performance, which is conducive to using the first image compared to the second
  • the image similarity evaluates the quality of the first image.
  • the above method also includes: According to the peak signal-to-noise ratio of the feature map of the first scale, the weight coefficient of the feature map of the first scale is determined, and the feature map of the first scale includes the feature map of the first scale of the first image and the feature map of the first scale of the second image Feature map; according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of the M-1 scale of the first image, and the feature map of the M-1 scale of the second image Figure and the weight matrix related to M scales and the number C of feature maps to determine the similarity of the
  • the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C, then the number of peak signal-to-noise ratios of the feature maps of the first scale is C, and the image processing device can be based on the first scale
  • the peak signal-to-noise ratio of the feature map of the first scale is used to obtain the weight coefficient of each feature map in the feature map of the first scale.
  • the image processing device may be based on the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image,
  • the M-1 scale feature maps and the weight matrix of the second image are used to determine the similarity.
  • the peak signal-to-noise ratio of the feature map of the first scale can make the color, brightness and other features of the first image and the second image consistent, based on the peak signal-to-noise ratio of the feature map of the first scale, Obtain the weight coefficient of each feature map in the feature map of the first scale, and use the weight coefficient to weight each feature map, which can increase the detail effect of the feature map, improve the performance of human vision and machine vision, and take into account the needs of human vision and machine vision needs.
  • ASSIM is the similarity
  • x is the first image
  • y is the second image
  • f() is the convolution operation of the single-layer convolutional neural network
  • i is the i-th volume in the single-layer convolutional neural network
  • Product kernel 1 ⁇ i ⁇ C
  • j is the jth scale
  • w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale
  • ⁇ ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix
  • f 1 i (x) is the feature map of the first scale of the first image
  • the i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image
  • PSNR i is the peak signal-to-noise ratio of the i-th feature map in
  • i is the i-th convolution kernel in the single-layer convolutional neural network, which may also be called the i-th channel, which is not limited in this application.
  • the above weight matrix is a matrix of C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, the weight matrix The sum of the weight coefficients of each row in is 1.
  • the weight coefficient of each row in the weight matrix is the weight coefficient of the feature map corresponding to the same channel (or the feature map corresponding to the same convolution kernel) at different scales, and the sum of the weight coefficients of the feature maps corresponding to the same channel at different scales is 1 .
  • each weight coefficient of each row of the weight coefficients in the weight matrix is the same.
  • the weight coefficient of each row in the weight matrix is the simplest implementation method, and the calculation is simple, which is conducive to improving the efficiency of calculating the similarity.
  • the above M 5, and the weight coefficient of each row in the weight matrix is the same, and is [0.2 0.2 0.2 0.2 0.2].
  • the above C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and the feature maps of M scales are based on M
  • the resolution of the scale feature map is divided into low-scale feature map, medium-scale feature map and high-scale feature map; the weight coefficient distribution of the feature maps corresponding to the high-frequency feature group, medium-frequency feature group and low-frequency feature group in the weight matrix at different scales
  • the non-zero weight coefficients of the feature maps corresponding to the high-frequency feature groups are distributed in the low-scale feature maps of the first image and the second image, and the weight coefficients of the high-scale feature maps are 0
  • the low-frequency feature groups correspond to The non-zero weight coefficients of the feature map are distributed in the low-scale feature map, medium-scale feature map and high-scale feature map; the distribution of non-zero weight coefficients of the feature map corresponding to the medium-frequency feature group is
  • the image processing device can sort the C convolution kernels according to the characteristic frequency from high to low or from low to high, and then group the sorted convolution kernels into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups. .
  • the feature map corresponding to the high-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the high-frequency feature group, and the feature map corresponding to each convolution kernel includes M scales.
  • the feature map corresponding to the intermediate frequency feature group can be understood as the feature map corresponding to each convolution kernel in the intermediate frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales.
  • the feature map corresponding to the low-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the low-frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales.
  • the image processing device can sort the feature maps of M scales according to the resolution of the feature maps of M scales from high to low or from low to high, and then divide the sorted feature maps of M scales into low-scale feature maps , mid-scale feature maps, and high-scale feature maps.
  • the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. For features corresponding to high-frequency feature groups, higher weights are allocated on low-scale feature maps. For low-frequency The features corresponding to the feature group consider the ability to extract contour information at high scales, and distribute weights evenly on different scales. For the features corresponding to the mid-frequency feature group, they are between the high-frequency feature group and the low-frequency feature group. The video details also take into account the contour information, which helps to improve the performance of machine vision.
  • each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, and the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group
  • Each weight coefficient in the weight coefficients of is the same, and each weight coefficient in the non-zero weight coefficients of the feature map corresponding to the intermediate frequency feature group is the same.
  • the feature map corresponding to each convolution kernel in the high-frequency feature group, low-frequency feature group, and intermediate-frequency feature group includes M scales.
  • the non-zero weight coefficients of the feature maps corresponding to the high-frequency feature groups are distributed in the low-scale feature maps of the M scales, and each weight coefficient of the low-scale feature maps is the same.
  • the non-zero weight coefficients of the feature maps corresponding to the low-frequency feature groups are distributed in the low-scale feature maps, medium-scale feature maps, and high-scale feature maps (that is, M scales), and each weight coefficient of the feature maps of M scales are the same. The same is true for the intermediate frequency feature group, which will not be repeated here.
  • the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map
  • the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map
  • the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
  • the convolution kernel corresponding to the high frequency feature group is used to extract the texture features of the image
  • the convolution kernel corresponding to the low frequency feature group can be used to extract the smooth features of the image
  • the convolution kernel corresponding to the intermediate frequency feature group can be used to extract the edge features of the image .
  • the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. For detailed features, assign a weight of 1 on the high-resolution first-scale feature map. For smooth features, The ability to extract contour information at high scales is considered, and the weights are evenly distributed on different scales. For edge features, it is between detail features and smooth features. It considers both high-frequency details and contour information, which helps to improve machine vision. performance.
  • the above-mentioned characteristic frequency is determined by the modulus size of the convolution kernel of the single-layer convolutional neural network and/or the scaling factor of the normalization layer of the single-layer convolutional neural network And the normalization coefficient is determined.
  • the first image is the input of the codec
  • the second image is the output of the codec
  • the codec is used to compress and reconstruct the first image to output
  • the similarity is used to optimize the codec
  • the first image is the input of the codec, which can also be called the original image;
  • the second image is the output of the codec, which can also be called the reconstructed image, and the image processing device can use the above method to obtain the similarity between the original image and the reconstructed image degree, in order to optimize the codec.
  • the image processing method provided in this application can be used to obtain the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image, so as to guide the optimization of the codec.
  • the above-mentioned first image is an image after image signal processing of the third image
  • the third image is an input of a codec
  • the second image is an image of a codec output
  • the codec is used to perform image signal processing, compression and reconstruction on the third image to output the second image
  • the similarity is used to optimize the codec
  • the first image is the image of the third image after image signal processing, which may also be called the original image;
  • the third image is the input of the codec, and the second image is the output of the codec, which may also be called the reconstructed image;
  • the decoder is used to perform image signal processing, compression and reconstruction on the third image to output the second image, and the image processing device can use the above method to obtain the similarity between the original image and the reconstructed image for optimizing the codec.
  • an image processing device in a second aspect, includes: an acquisition module and a processing module.
  • the acquisition module is used to: acquire the first image and the second image;
  • the processing module is used to: respectively input the first image and the second image to a single-layer convolutional neural network to obtain the first-scale feature map and the first-scale feature map of the first image.
  • the feature map of the first scale of the second image, the single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are the same is C; respectively perform M-1 downsampling on the feature map of the first scale of the first image and the feature map of the second scale of the second image to obtain the feature map of the M-1 scale of the first image and the second
  • the feature map of M-1 scales of the image where C and M are both positive integers greater than 1; according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, and the first image
  • the above-mentioned processing module is further configured to: determine the weight coefficient of the feature map of the first scale according to the peak signal-to-noise ratio of the feature map of the first scale, and the first scale
  • the feature map of the first image includes the feature map of the first scale of the first image and the feature map of the first scale of the second image; according to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the second The feature map of the first scale of the image, the feature map of the M-1 scale of the first image, the feature map of the M-1 scale of the second image, and the weight matrix determine the similarity.
  • ASSIM is the similarity
  • x is the first image
  • y is the second image
  • f() is the convolution operation of the single-layer convolutional neural network
  • i is the i-th volume in the single-layer convolutional neural network
  • Product kernel 1 ⁇ i ⁇ C
  • j is the jth scale
  • w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale
  • ⁇ ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix
  • f 1 i (x) is the feature map of the first scale of the first image
  • the i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image
  • PSNR i is the peak signal-to-noise ratio of the i-th feature map in
  • the above weight matrix is a matrix with C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, The sum of the weight coefficients of each row in the weight matrix is 1.
  • each weight coefficient of each row of the weight coefficients in the weight matrix is the same.
  • the above M 5, and the weight coefficients of each row in the weight matrix are the same, and are [0.2 0.2 0.2 0.2 0.2].
  • the above C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and feature maps of M scales are based on M
  • the resolution of the scale feature map is divided into low-scale feature map, medium-scale feature map and high-scale feature map; the weight coefficient distribution of the feature maps corresponding to the high-frequency feature group, medium-frequency feature group and low-frequency feature group in the weight matrix at different scales
  • the non-zero weight coefficients of the feature maps corresponding to the high-frequency feature groups are distributed in the low-scale feature maps of the first image and the second image, and the weight coefficients of the high-scale feature maps are 0
  • the low-frequency feature groups correspond to The non-zero weight coefficients of the feature map are distributed in the low-scale feature map, medium-scale feature map and high-scale feature map; the distribution of non-zero weight coefficients of the feature map corresponding to the medium-frequency feature group is more
  • each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, and the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group
  • Each of the weight coefficients of is the same, and each of the non-zero weight coefficients of the feature map corresponding to the intermediate frequency feature is the same.
  • the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map
  • the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map
  • the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
  • the above-mentioned characteristic frequency is determined by the modulus size of the convolution kernel of the single-layer convolutional neural network and/or the scaling factor of the normalization layer of the single-layer convolutional neural network And the normalization coefficient is determined.
  • the above-mentioned first image is the input of the codec
  • the second image is the output of the codec
  • the codec is used to compress and reconstruct the first image to Output the second image
  • the similarity is used to optimize the codec
  • the above-mentioned first image is an image after image signal processing of the third image
  • the third image is the input of the codec
  • the second image is the output of the codec output
  • the codec is used to perform image signal processing, compression and reconstruction on the third image to output the second image
  • the similarity is used to optimize the codec
  • an image processing device including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device performs any one of the above first aspects method in one possible implementation.
  • processors there are one or more processors, and one or more memories.
  • the memory may be integrated with the processor, or the memory may be set separately from the processor.
  • the device also includes a transmitter (transmitter) and a receiver (receiver).
  • the transmitter and receiver can be set separately or integrated together, called a transceiver (transceiver).
  • the present application provides a processor, including: an input circuit, an output circuit, and a processing circuit.
  • the processing circuit is configured to receive a signal through the input circuit and transmit a signal through the output circuit, so that the processor executes the method in any possible implementation manner of the first aspect above.
  • the above-mentioned processor can be a chip
  • the input circuit can be an input pin
  • the output circuit can be an output pin
  • the processing circuit can be a transistor, a gate circuit, a flip-flop, and various logic circuits.
  • the input signal received by the input circuit may be received and input by, for example but not limited to, the receiver
  • the output signal of the output circuit may be, for example but not limited to, output to the transmitter and transmitted by the transmitter
  • the circuit may be the same circuit, which is used as an input circuit and an output circuit respectively at different times.
  • the present application does not limit the specific implementation manners of the processor and various circuits.
  • a computer-readable storage medium stores a computer program (also referred to as code, or an instruction) which, when run on a computer, causes the computer to perform the above-mentioned first aspect.
  • a computer program also referred to as code, or an instruction
  • a computer program product includes: a computer program (also referred to as code, or an instruction), which, when the computer program is executed, causes the computer to perform any of the above-mentioned first aspects.
  • a computer program also referred to as code, or an instruction
  • FIG. 1 is a schematic diagram of an application scenario applicable to an embodiment of the present application
  • Fig. 2 is a schematic block diagram of image compression
  • FIG. 3 is a schematic block diagram of another image compression provided by the embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an image processing method provided in an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a downsampling provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of calculating similarity provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a feature map grouping provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a simulation provided by an embodiment of the present application.
  • FIG. 9 is another schematic diagram of simulation provided by the embodiment of the present application.
  • FIG. 10 is another schematic diagram of simulation provided by the embodiment of the present application.
  • FIG. 11 is another schematic diagram of simulation provided by the embodiment of the present application.
  • Fig. 12 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 13 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.
  • the term "and/or" in the embodiment of the present application is only an association relationship describing associated objects, which means that there may be three relationships, for example, A and/or B, which can mean: A exists alone, and at the same time There are A and B, and there are three cases of B alone, where A and B can be singular or plural.
  • the character "/" in this article generally indicates that the related objects are an “or” relationship, but it may also indicate an “and/or” relationship, which can be understood by referring to the context.
  • At least one means one or more, and “multiple” means two or more.
  • At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • FIG. 1 shows a schematic diagram of a vehicle 100 .
  • a vehicle 100 includes a camera 101 , a display device 102 and a mobile data computing platform (mobile data center, MDC) 103 . Communication between the camera 101 and the display device 102, and between the camera 101 and the MDC 103 can be performed through a network.
  • the vehicle 100 may include multiple cameras including the camera 101 , and the embodiment of the present application only uses the camera 101 as an example for illustration, but this is not limited thereto.
  • MDC 103 can include at least one processor that can execute instructions stored in a computer-readable medium such as memory.
  • the MDC 103 may also be a plurality of computing devices that control individual components or subsystems of the vehicle 100 in a distributed manner.
  • the processor can be any conventional processor, such as a central processing unit (central processing unit, CPU).
  • CPU central processing unit
  • the processor may also include a graphics processor (graphic process unit, GPU), a field programmable gate array (field programmable gate array, FPGA), a system on chip (system on chip, SOC), an application-specific integrated chip (application specific integrated circuit, ASIC) or their combination.
  • memory may also store data such as road maps, route information, the vehicle's position, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the MDC 103 during operation of the vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
  • the structure of the vehicle in Fig. 1 should not be construed as limiting the embodiment of the present application.
  • the above-mentioned vehicle 100 may be a car, truck, motorcycle, bus, boat, plane, helicopter, lawn mower, recreational vehicle, playground vehicle, construction equipment, tram, golf cart, train, etc.
  • the embodiments of the present application do not make special limitations.
  • the above-mentioned camera 101 can collect images or videos, and send the images or videos to the display device 102 and the MDC 103 through the network.
  • the display device 102 may display the image or video for viewing by the user.
  • the MDC 103 receives the image or video, it can perform target recognition, semantic segmentation, traffic light detection and lane line detection on the image or video.
  • the output video requires more and more transmission bandwidth.
  • compression is generally used to reduce bandwidth requirements.
  • the camera may output a Bayer raw image or video.
  • the Bayer raw image or video has higher precision and has higher requirements on transmission bandwidth.
  • the camera can output a frame rate of 30 frames per second (frame per second, fps), a sampling depth of 16 bit depth (bitdepth), and a resolution of 4K (that is, 4096 ⁇ 2160) ultra high definition (ultra high definition) , UHD) video, the bandwidth required by the camera to transmit the video is as high as 4 switching bandwidth (Gbps), that is, 4K*2K*30*16.
  • the camera can compress the image or video before transmitting the original Bayer image or video.
  • FIG. 2 shows a schematic block diagram of image compression.
  • the camera 101 outputs a Bayer original image, and the Bayer original image is processed through an image signal processing (image signal processing, ISP) module
  • the 201 can obtain red green blue (RGB) images, and the accuracy of the RGB images is much lower than that of the original image output by the camera, thereby reducing the bandwidth requirements for network transmission.
  • the RGB image can be reconstructed through a codec 202 (encoder and decoder, CODEC), that is, after coding and compression, the reconstructed image can be obtained through decoding and reconstruction.
  • codec 202 encoder and decoder, CODEC
  • the reconstructed image is transmitted to the MDC 103, it can be used for tasks such as object detection, semantic segmentation, detection of traffic lights and lane lines. If the reconstructed image is transmitted to the display device 102, it can be displayed for viewing by the user.
  • the quality requirements of the reconstructed image for the visual perception of autonomous driving are to meet the requirements of human vision and machine vision.
  • the quality requirements of automatic driving visual perception for reconstructed images may include: 1) machine vision-oriented: automatic driving recognizes objects in images; 2) adapting to various machine vision tasks, including object detection, semantic segmentation, traffic lights/lanes Line detection, etc.; 3) Human vision: The images collected by automatic driving are provided to human eyes.
  • the objective evaluation indicators commonly used in machine vision do not consider the needs of human vision, and the objective evaluation indicators commonly used in human vision do not consider the needs of machine vision, that is, the existing evaluation indicators cannot take into account the needs of human vision and machine vision. need.
  • the above-mentioned tasks such as target detection, semantic segmentation, detection of traffic lights, and lane lines can be called machine vision tasks, that is, the reconstructed image is directly processed by the machine system, so the reconstructed image mainly meets the requirements of The need to be quickly identified and tested by machine systems.
  • machine vision-oriented evaluation indicators are used to evaluate the quality of reconstructed images, including image classification evaluation indicators such as accuracy (accuracy, Acc), target detection evaluation indicators such as target average precision (mean average precision, mAP), semantic segmentation evaluation Indicators such as mean intersection over union (mIoU) and lane line detection evaluation indicators such as Acc, etc.
  • evaluation indicators for machine vision it also includes evaluation indicators for human vision, including peak signal-to-noise ratio (peak signal noise ratio, PSNR), multiscale structural similarity (multiscale structural similarity, MSSSIM), learning perception block Similarity (learned perceptual image patch similarity, LPIPS), depth image structure texture similarity (depth image structural texture similarity, DISTS) and other evaluation indicators
  • the reconstructed image output by CODEC optimized according to the evaluation indicators for human vision can be more in line with The subjective perception of the human eye, for example, the reconstructed image is directly displayed on the display screen inside the vehicle for the driver to watch, so the reconstructed image needs to have higher definition and be easy for human eyes to watch.
  • ISP processing will cause image or video information to be damaged.
  • the quality of the reconstructed image reconstructed by the decoder is of key significance for subsequent tasks, so it is particularly important to have a high quality reconstructed image.
  • the embodiment of the present application provides an image processing method and an image processing device, which are used to obtain the similarity between the reconstructed image and the original image, so as to evaluate the quality of the reconstructed image, so as to guide the optimization of the encoder and/or decoder, and at the same time , the reconstructed image can not only meet the needs of various machine vision tasks but also meet the needs of human vision. In addition, it can also avoid information damage caused by ISP processing.
  • the technical solution proposed in this application provides an evaluation index of “machine-oriented, task decoupling, and human eyes” to calculate the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image. Thereby guiding the optimization of the encoder and/or decoder.
  • V2X vehicle-to-everything
  • LTE-V long-term evolution-vehicle
  • V2V vehicle-to-vehicle
  • the other devices include but are not limited to: vehicle-mounted terminals, vehicle-mounted controllers, vehicle-mounted modules, vehicle-mounted modules, vehicle-mounted components, vehicle-mounted chips, vehicle-mounted units, vehicle-mounted radars, or vehicle-mounted cameras.
  • a vehicle-mounted module, a vehicle-mounted module, a vehicle-mounted component, a vehicle-mounted chip, a vehicle-mounted unit, a vehicle-mounted radar, or a vehicle-mounted camera implement the vehicle control method provided by the embodiment of the present application.
  • the solutions in the embodiments of the present application can also be used in smart terminals with mobile control functions other than vehicles, or set in smart terminals with mobile control functions other than vehicles, or set in the smart terminal with mobile control functions. part of the terminal.
  • the smart terminal may be a smart transportation device, a smart home device, a robot, and the like. For example, it includes but is not limited to smart terminals or controllers, chips, radars or cameras and other sensors in the smart terminals, and other components.
  • the technical solution provided by the embodiment of the present application can be applied to the image compression scenario shown in Figure 2 above.
  • the ISP module 201 and CODEC 202 are independently deployed, which can be oriented to machine vision and improve machine vision.
  • the machine vision task model does not need to be changed (that is, the MDC102 does not need to be changed).
  • the image reconstructed by the decoder can also take into account human vision and is friendly to human vision.
  • the technical solution provided by the embodiment of the present application can be applied not only to the image compression scenario shown in FIG. 2 above, but also to the image compression scenario where the ISP and the CODEC are deployed together. Among them, the image compression scenario deployed by ISP and CODEC can solve the above problem 1).
  • Fig. 3 shows a schematic block diagram of another image compression.
  • the camera 101 outputs the Bayer original image, and the Bayer original image can be reconstructed through the CODEC 301 .
  • CODEC 301 can perform image signal processing, compression and reconstruction on Bayer original images to obtain reconstructed images. If the reconstructed image is transmitted to the MDC 103, it can be used for tasks such as object detection, semantic segmentation, detection of traffic lights and lane lines. If the reconstructed image is transmitted to the display device 102, it can be displayed for viewing by the user.
  • the technical solution provided by the embodiment of the present application is applied to the image compression scene shown in Figure 3, which can be oriented to machine vision and improve the detection accuracy of machine vision tasks.
  • the reconstructed image can also take into account human vision and is friendly to human vision. , and can also avoid information damage caused by ISP processing.
  • the image compression scenario does not need to deploy an ISP module separately, and the deployment is simple.
  • FIG. 4 is a schematic flowchart of an image processing method 400 provided by an embodiment of the present application, and the method 400 may be executed by an image processing device.
  • the method 400 may be applied to the scenario shown in FIG. 2 or FIG. 3 above, but this embodiment of the present application is not limited thereto.
  • the method 400 may include the following steps:
  • the first image and the second image are two different images.
  • the second image may be a reconstructed image; when the first image is a reconstructed image, the second image may be an original image.
  • the first image may be the RGB image in FIG. 2 above, and the second image may be the reconstructed image in FIG. 2 .
  • the first image may be the reconstructed image in FIG. 2 above
  • the second image may be the RGB image in FIG. 2 .
  • the image processing device may obtain the first image from the ISP, and obtain the first image from the codec Get the second image.
  • the image processing device needs to use an additional ISP to process the Bayer original image processing to obtain the first image.
  • the image processing device may acquire the second image from the codec.
  • S402. Input the first image and the second image respectively into the single-layer convolutional neural network to obtain the feature map of the first scale of the first image and the feature map of the first scale of the second image.
  • the single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C.
  • the parameters of the single-layer convolutional neural network come from the parameters of the first convolutional layer of the pre-training model.
  • the pre-training model is the pre-trained convolutional neural network model, such as Resnet trained on the large-scale training set ImageNet, Classification models such as alexnet, vggnet, regnet, etc.
  • the single-layer convolutional neural network may include C convolution kernels, and different convolution kernels among the C convolution kernels are used to extract different features and obtain different feature maps. These different feature maps can be referred to as first-scale feature maps.
  • the feature map of the first scale refers to an image obtained by processing the first image and the second image once (ie, processed by a single-layer convolutional neural network).
  • the image processing device When the image processing device inputs the first image to the single-layer convolutional neural network to obtain the feature map of the first scale of the first image, and the number of feature maps of the first scale of the first image is C, when the image processing device inputs the feature map of the first scale
  • the second image is input to the single-layer convolutional neural network to obtain the feature map of the first scale of the second image, and the number of feature maps of the first scale of the second image is C.
  • there is a one-to-one correspondence between the C convolution kernels and the first-scale feature maps of the C first images there is a one-to-one correspondence between the C convolution kernels and the C first-scale feature maps of the second images.
  • each of the above C convolution kernels can be a 7*7 convolution kernel, and the step size is 1, that is, the single-layer convolutional neural network does not perform downsampling, so that the obtained first-scale features
  • the graph is characterized by high resolution.
  • downsampling can be average pooling downsampling, which can reduce the loss of image information compared with other downsampling (such as random downsampling).
  • the number of feature maps of each scale in the M-1 scale feature maps of the first image and the M-1 scale feature maps of the second image is C.
  • the image processing device may perform the first downsampling in M-1 times of downsampling on the feature map of the first scale of the first image to obtain the feature map of the second scale of the first image, and the second scale of the first image
  • the feature map of the M-1 times of downsampling is performed for the second downsampling to obtain the feature map of the third scale of the first image, and so on, the feature map of the Mth scale of the first image can be obtained.
  • the image processing device performs M-1 downsampling on the first-scale feature map of the first image to obtain M-1 feature maps of the first image.
  • the image processing device can obtain feature maps of M-1 scales of the second image.
  • the number of feature maps of the first scale of the first image is C
  • the feature maps of the second scale of the first image are obtained
  • the number of feature maps of the second scale of the first image is also C
  • the number of feature maps of other scales of the first image is also C, that is, each feature map of M-1 scales of the first image
  • the number of feature maps of a scale is C.
  • the number of feature maps of each scale in the M-1 scale feature maps of the second image is also C.
  • the first feature map in the feature map of the first scale of the first image is down-sampled, and the first feature map of the feature map of the second scale of the first image can be obtained, and the other is the same, here No longer.
  • Fig. 5 shows a schematic block diagram of downsampling.
  • the first image and the second image pass through a single-layer convolutional neural network to obtain the first-scale feature map of the first image and the first-scale feature map of the second image, and the first-scale feature map of the first image
  • the feature map of the first image and the feature map of the first scale of the second image are down-sampled for the first time to obtain the feature map of the second scale of the first image and the feature map of the second scale of the second image, and the second scale of the first image
  • the feature map of the scale and the feature map of the second scale of the second image are down-sampled for the second time to obtain the feature map of the third scale of the first image and the feature map of the third scale of the second image, and so on.
  • the feature map of the Mth scale of the first image and the feature map of the Mth scale of the second image can be obtained.
  • the image processing device can obtain feature maps of four scales of the first image through S403, which are feature maps of the second scale of the first image and feature maps of the third scale of the first image , the feature map of the fourth scale of the first image, the feature map of the fifth scale of the first image, and the feature maps of the 4 scales of the second image, which are respectively the feature map of the second scale of the second image,
  • the feature maps of the 4 scales each include 64 feature maps.
  • the weight matrix includes weight coefficients for different feature maps of different scales.
  • the dimension of the weight matrix is C ⁇ M, and the weight matrix includes weight coefficients of different feature maps of different scales.
  • the weight coefficient of the fourth feature map of the first scale is the value of the fourth row and the first column in the weight matrix. It should be understood that the values in the weight matrix can be understood as weight coefficients.
  • the weight matrix can be understood as a matrix composed of weight coefficients of different feature maps of different scales.
  • the weight matrix is only one form of expression, and the weight coefficients of different feature maps of different scales can also be expressed in other forms, such as charts, texts, etc., which are not limited in this embodiment of the present application.
  • the image processing method provided by the embodiment of the present application uses a single-layer convolutional neural network to obtain the feature map of the first scale, and only needs to be calculated by a single-layer convolution, which is simple to calculate, and extracts the underlying general features through a single-layer convolution, which is beneficial to ensure that the image Multi-task generalization ability, the feature map of the first scale has a higher resolution, which can ensure the quality of the image.
  • multiple downsampling of the feature map with high resolution can obtain more image feature information.
  • weight matrix to assign weights to different feature maps of different scales, flexibly adjust the weight values of image features and scale dimensions, optimize machine vision performance, and achieve efficient image coding and compression performance for machine vision, which is beneficial to use the first image relative to The similarity of the second image evaluates the quality of the first image.
  • the first image is the input of the codec
  • the second image is the output of the codec
  • the codec is used to process the first image. Compressed and reconstructed to output the second image, the similarity is used to optimize the codec.
  • the first image is an image processed by the image signal of the third image, which can also be called an original image;
  • the third image is the input of the codec, and the second
  • the image is the output of the codec, which can also be referred to as a reconstructed image;
  • the codec is used to perform image signal processing, compression, and reconstruction on the third image to output the second image, and the image processing device can use the above method 400 to obtain the original image relative to Similarity of reconstructed images for use in optimizing codecs.
  • the image processing method provided in this application can be used to obtain the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image, so as to guide the optimization of the codec.
  • the method 400 further includes: according to the first scale The peak signal-to-noise ratio of the feature map determines the weight coefficient of the feature map of the first scale, and the feature map of the first scale includes the feature map of the first scale of the first image and the feature map of the first scale of the second image; according to the first scale The feature map of the first scale of an image, the feature map of the first scale of the second image, the feature map of the M-1 scale of the first image, the feature map of the M-1 scale of the second image, and M The scale and the weight matrix related to the number C of feature maps determine the similarity between the first image and the second image,
  • the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C, then the number of peak signal-to-noise ratios of the feature maps of the first scale is C, and the image processing device can be based on the first scale
  • the peak signal-to-noise ratio of the feature map of the first scale is used to obtain the weight coefficient of each feature map in the feature map of the first scale.
  • the image processing device may be based on the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image,
  • the M-1 scale feature maps and the weight matrix of the second image are used to determine the similarity.
  • the feature map of the first scale has two weight coefficients, one weight coefficient is determined according to the peak signal-to-noise ratio, and the other is the value in the first column of the weight matrix.
  • the peak signal-to-noise ratio of the feature map of the first scale can make the color, brightness and other characteristics of the first image and the second image consistent, based on the peak signal-to-noise ratio of the feature map of the first scale
  • the weight coefficient of each feature map in the feature map of the first scale is obtained, and each feature map is weighted by using the weight coefficient, which can increase the detail effect of the feature map, improve the performance of human vision and machine vision, and take into account human vision.
  • the needs of the machine vision and the needs of the machine vision can solve the above problems 3).
  • ASSIM is the similarity
  • x is the first image
  • y is the second image
  • f() is the convolution operation of the single-layer convolutional neural network
  • i is the i-th convolution kernel in the single-layer convolutional neural network
  • 1 ⁇ i ⁇ C is the jth scale
  • 1 ⁇ j ⁇ M is the w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale
  • ⁇ ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix
  • f 1 i (x) is the feature map of the first scale of the first image
  • the i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image
  • PSNR i is the peak signal-to-noise ratio of the i-th feature map
  • i is the i-th convolution kernel in the single-layer convolutional neural network, which may also be referred to as the i-th channel, which is not limited in this embodiment of the present application.
  • the ASSIM may also be called an adaptive structure similarity (ASSIM) index, and it should be understood that this name is only an example, which is not limited in this embodiment of the present application.
  • ASSIM adaptive structure similarity
  • the above C can be 64, M can be 5, ⁇ can be 0.1, the single-layer convolutional neural network can include 64 convolution kernels of 7*7, and the step size is 1, that is, the single-layer convolutional neural network Without downsampling, the feature map of the first scale obtained in this way has the characteristics of high resolution.
  • FIG. 6 shows a schematic diagram of calculating similarity.
  • the weight matrix is a matrix with C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, and the sum of the weight coefficients of each row in the weight matrix is 1.
  • the weight coefficient of each row in the weight matrix is the weight coefficient of the feature map corresponding to the same channel (or the feature map corresponding to the same convolution kernel) at different scales, and the sum of the weight coefficients of the feature maps corresponding to the same channel at different scales is 1 .
  • the feature map corresponding to the channel includes feature maps of 5 scales.
  • the feature map corresponding to the first channel among the 64 channels may include the feature map f 1 1 (x) of the first scale of the first image, and the feature map of the first image Feature map of the second scale Feature map of the third scale of the first image Feature map of the fourth scale of the first image Feature map of the fifth scale of the first image
  • each weight coefficient of each row of weight coefficients in the weight matrix is the same.
  • the weight matrix is a matrix with 64 rows and 5 columns, and the weight coefficient of each row in the 64 rows is [0.2 0.2 0.2 0.2 0.2].
  • weight coefficients in the weight matrix may also be displayed in the form of graphs or text, but this embodiment of the present application is not limited thereto.
  • each of the weight coefficients of each row in the weight matrix is the same, and the calculation is simple, which is conducive to improving the efficiency of calculating the similarity.
  • the weight coefficients in the weight matrix can reuse the weight coefficients in the traditional multi-scale structure similarity index (MSSSIM) index, and the weight coefficients in the traditional MSSSIM index are obtained by Determined by human vision experiments.
  • MSSSIM multi-scale structure similarity index
  • the weight coefficients in the traditional MSSSIM index include 0.0448, 0.2856, 0.3001, 0.2363 and 0.1333.
  • the weight matrix is a matrix with 64 rows and 5 columns, and the weight coefficient of each row in the 64 rows is [0.0448 0.2856 0.3001 0.2363 0.1333].
  • the weight coefficients in the weight matrix can reuse the weight coefficients in the traditional MSSSIM index, which helps to meet the visual needs of the human eye.
  • the C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and the feature maps of M scales are based on the resolution of the feature maps of M scales.
  • Divided into low-scale feature maps, medium-scale feature maps, and high-scale feature maps; the weight coefficient distributions of the feature maps corresponding to high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups in the weight matrix at different scales meet the following conditions: high-frequency features
  • the weight coefficient of the feature map corresponding to the group is distributed in the low-scale feature map of the first image and the second image, the weight coefficient of the high-scale feature map is 0; the feature map corresponding to the low-frequency feature group is not 0
  • the weight coefficients of are distributed in low-scale feature maps, medium-scale feature maps, and high-scale feature maps; the distribution of non-zero weight coefficients of feature maps corresponding to intermediate-frequency feature groups is more than that of feature maps corresponding to high-frequency feature groups
  • the image processing device can sort the C convolution kernels according to the characteristic frequency from high to low or from low to high, and then group the sorted convolution kernels into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups. .
  • the image processing device can sort the 64 convolution kernels according to the feature frequency from high to low, and then group the sorted convolution kernels into high-frequency feature groups and intermediate-frequency feature groups and low-frequency feature groups.
  • Figure 7 shows a schematic diagram of a convolution kernel grouping, as shown in Figure 7, the convolution kernel of the high-frequency feature group can include 8 convolution kernels, which are convolution kernel 1 to convolution kernel 8, and the intermediate frequency feature
  • the convolution kernel of the group can include 24 convolution kernels, respectively convolution kernel 9 to convolution kernel 32, and the convolution kernel of the low-frequency feature group can include 32 convolution kernels, respectively convolution kernel 33 to convolution kernel 64.
  • the feature map corresponding to the high-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the high-frequency feature group, and the feature map corresponding to each convolution kernel includes M scales.
  • the feature map corresponding to the intermediate frequency feature group can be understood as the feature map corresponding to each convolution kernel in the intermediate frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales.
  • the feature map corresponding to the low-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the low-frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales.
  • the image processing device can sort the feature maps of M scales according to the resolution of the feature maps of M scales from high to low or from low to high, and then divide the sorted feature maps of M scales into low-scale feature maps , mid-scale feature maps, and high-scale feature maps.
  • the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features.
  • High weight For the features corresponding to the low-frequency feature group, the ability to extract contour information at a high scale is considered, and the weight is evenly distributed on different scales.
  • the features corresponding to the medium-frequency feature group it is between the high-frequency feature group and the low-frequency feature group. During this time, both high-frequency details and contour information are considered, which helps to improve the performance of machine vision.
  • each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, each of the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group is the same, and the intermediate frequency Each of the non-zero weight coefficients of the feature map corresponding to the feature is the same.
  • the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0]
  • the weight coefficient of the feature map corresponding to the low-frequency feature group is [1/5 1/5 1/5 1/5 1/5]
  • the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
  • the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map
  • the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map
  • the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
  • the weight matrix is a matrix of 64 rows and 5 columns. If the convolution kernel corresponding to the high-frequency feature group includes 8 convolution kernels, there are 8 rows of weight coefficients corresponding to the high-frequency feature group in the weight matrix. The 8 Each weight coefficient in the row weight coefficient is the same, and it is [1 0 0 0 0]; if the convolution kernel corresponding to the intermediate frequency feature group includes 24 convolution kernels, there are 24 row weight coefficients in the weight matrix corresponding to the intermediate frequency feature group, Each weight coefficient in the 24 row weight coefficients is the same, and is [1/3 1/3 1/3 0 0]; if the convolution kernel corresponding to the low-frequency feature group includes 32 convolution kernels, there are 32 convolution kernels in the weight matrix The row weight coefficient corresponds to the intermediate frequency feature group, and each of the 32 row weight coefficients is the same, and is [1/5 1/5 1/5 1/5].
  • the weight coefficients in the first column in the above weight matrix are used to represent the weight coefficients of low-scale feature maps, and the weight coefficients in the second and third columns are used to represent the weight coefficients of medium-scale feature maps.
  • the weight coefficient in the fourth column and the weight coefficient in the fifth column are used to indicate the weight coefficient of the high-scale feature map, which is just an example, and is not limited in this embodiment of the present application.
  • the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. For detailed features, a weight of 1 is assigned on the high-resolution first-scale feature map. For smoothing Features, considering the ability to extract contour information at high scales, evenly distribute weights on different scales. For edge features, it is between detail features and smooth features. It considers both high-frequency details and contour information, which helps to improve machine vision performance.
  • the feature frequency is determined by the modulus of the convolution kernel of the single-layer convolutional neural network and/or the scaling factor and normalization coefficient of the normalization layer of the single-layer convolutional neural network.
  • the module size of the i-th convolution kernel of the single-layer convolutional neural network can be represented by the symbol
  • the scaling coefficient and normalization coefficient of the normalization layer can be the parameters of the normalization layer (batch normalization, BN) of the pre-training model, and the pre-training model is the pre-trained Convolutional neural network models, such as the Resnet-50 pretrained model.
  • the scaling factor of the normalization layer can be represented by the symbol ⁇ i
  • the normalization coefficient can be represented by the symbol ⁇ i
  • the feature frequency can be determined by the scaling factor and the normalization coefficient of the normalization layer of the single-layer convolutional neural network.
  • the feature frequency can be determined by the modulus size of the convolution kernel of the single-layer convolutional neural network, the scaling factor of the normalization layer, and the normalization coefficient.
  • the feature frequency is determined by Sure.
  • the image processing method provided by the embodiment of the present application calculates the similarity ASSIM, compared to calculating the mean square error (mean square error, MSE) based on the pixel domain, calculating the mean square error (wfMSE) based on the feature domain, and calculating the MSSSIM based on the feature domain ( It can be referred to as wfMSSSIM for short), which can improve the performance of machine vision (such as target detection mAP and semantic segmentation mIoU) and human vision (such as LPIPS and DISTS).
  • machine vision such as target detection mAP and semantic segmentation mIoU
  • human vision such as LPIPS and DISTS
  • ASSIM has better LPIPS in human visual performance.
  • Fig. 8 shows a schematic diagram of simulation.
  • the curve in Figure 8 is the change curve of LPIPS as the bit per pixel (BPP) changes, and the figure includes 4 curves, which are the curves filled with black circles and the circles filled with white , a curve with a black-filled triangle, and a curve with a white-filled triangle.
  • the curve filled with black circles is used to represent ASSIM
  • the curve filled with white circles is used to represent wfMSSSIM
  • the curve filled with black triangles is used to represent wfMSE
  • the curve filled with white triangles is used to represent MSE. It can be seen in the figure that ASSIM is above other curves, and ASSIM's LPIPS can be obtained better.
  • ASSIM is superior to DISTS in human visual performance.
  • Fig. 9 shows a schematic diagram of simulation.
  • the curve in Figure 9 is the change curve of DISTS with the change of BPP.
  • the figure includes 4 curves, namely the curve filled with black circles, the curve filled with white circles, and the curve filled with black triangles and the curve of the triangle filled with white.
  • the curve filled with black circles is used to represent ASSIM
  • the curve filled with white circles is used to represent wfMSSSIM
  • the curve filled with black triangles is used to represent wfMSE
  • the curve filled with white triangles is used to represent MSE.
  • ASSIM is above other curves, and the DISTS of ASSIM can be obtained better.
  • ASSIM has better mAP in machine vision performance.
  • Fig. 10 shows a schematic diagram of simulation.
  • the curve in Figure 10 is the change curve of mAP with the change of BPP, and the figure includes 5 curves, which are the curve filled with black circles, the curve filled with white circles, and the curve filled with black triangles , curved lines of white-filled triangles, and straight lines.
  • the curve filled with black circles is used to represent ASSIM
  • the curve filled with white circles is used to represent wfMSSSIM
  • the curve filled with black triangles is used to represent wfMSE
  • the curve filled with white triangles is used to represent MSE
  • the straight line is used to represent No compression.
  • ASSIM is above the curves of wfMSSSIM, wfMSE and MSE.
  • BPP is 1.2 or above
  • the mAP of ASSIM is above the straight line without compression, and the mAP of ASSIM can be obtained better.
  • ASSIM saves about 50% of the coding bits of target detection such as Faster-RCNN under the same machine vision performance.
  • ASSIM has better mIoU in machine vision performance.
  • Fig. 11 shows a schematic diagram of simulation.
  • the curve in Figure 11 is the change curve of mIoU with the change of BPP.
  • the figure includes 5 curves, namely the curve filled with black circles, the curve filled with white circles, and the curve filled with black triangles , curved lines of white-filled triangles, and straight lines.
  • the curve filled with black circles is used to represent ASSIM
  • the curve filled with white circles is used to represent wfMSSSIM
  • the curve filled with black triangles is used to represent wfMSE
  • the curve filled with white triangles is used to represent MSE
  • the straight line is used to represent No compression.
  • ASSIM is above the curves of wfMSE and MSE, and below the curve of wfMSSSIM, it can be obtained that ASSIM has better mIoU than wfMSE and MSE.
  • ASSIM saves about 40% of coding bits for semantic segmentation such as pyramid scene parsing network (PSPnet) under the same machine vision performance.
  • sequence numbers of the above processes do not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
  • FIG. 12 shows an image processing apparatus 1200 provided by an embodiment of the present application.
  • the image processing apparatus 1200 includes: an acquisition module 1210 and a processing module 1220 .
  • the acquiring module 1210 is used for: acquiring the first image and the second image
  • the processing module 1220 is used for: inputting the first image and the second image into the single-layer convolutional neural network respectively, to obtain the first scale of the first image
  • the feature map and the feature map of the first scale of the second image, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C
  • the feature map of one scale and the feature map of the second scale of the second image are down-sampled M-1 times to obtain the feature map of M-1 scales of the first image and the feature map of M-1 scales of the second image , where C and M are both positive integers greater than 1; according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of
  • the above-mentioned processing module 1220 is further configured to: determine the weight coefficient of the feature map of the first scale according to the peak signal-to-noise ratio of the feature map of the first scale, and the feature map of the first scale includes the first scale of the first image The feature map of the first scale and the feature map of the second image; according to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, the first The feature map of M-1 scales of the image, the feature map of M-1 scales of the second image, and the weight matrix determine the similarity.
  • ASSIM is the similarity
  • x is the first image
  • y is the second image
  • f() is the convolution operation of the single-layer convolutional neural network
  • i is the i-th volume in the single-layer convolutional neural network
  • Product kernel 1 ⁇ i ⁇ C
  • j is the jth scale
  • w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale
  • ⁇ ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix
  • f 1 i (x) is the feature map of the first scale of the first image
  • the i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image
  • PSNR i is the peak signal-to-noise ratio of the i-th feature map in
  • the above weight matrix is a matrix of C rows and M columns, the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, and the sum of the weight coefficients of each row in the weight matrix is 1.
  • each weight coefficient of each row of the weight coefficients in the above weight matrix is the same.
  • the above C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency
  • the feature maps of M scales are divided into low-scale features according to the resolution of the feature maps of M scales graph, medium-scale feature map, and high-scale feature map
  • the weight coefficient distributions of the feature maps corresponding to the high-frequency feature group, intermediate-frequency feature group, and low-frequency feature group in the weight matrix at different scales meet the following conditions: the feature map corresponding to the high-frequency feature group
  • the non-zero weight coefficients are distributed in the low-scale feature maps of the first image and the second image, and the high-scale feature maps have a weight coefficient of 0;
  • the non-zero weight coefficients of the feature maps corresponding to the low-frequency feature group are distributed in Low-scale feature maps, medium-scale feature maps, and high-scale feature maps;
  • the distribution of non-zero weight coefficients of feature maps corresponding to intermediate-frequency feature groups is more than the distribution of weight coefficients
  • each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, each of the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group is the same, and the intermediate frequency Each of the non-zero weight coefficients of the feature map corresponding to the feature is the same.
  • the weight coefficient of the feature map corresponding to the low-frequency feature group is [1/5 1/5 1/5 1/5 1/5]
  • the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
  • the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map
  • the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map
  • the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
  • the characteristic frequency is determined by the modulus of the convolution kernel of the single-layer convolutional neural network and/or the scaling coefficient and normalization coefficient of the normalization layer of the single-layer convolutional neural network.
  • the above-mentioned first image is the input of the codec
  • the second image is the output of the codec
  • the codec is used to compress and reconstruct the first image to output the second image
  • the similarity is used to optimize the codec device.
  • the above-mentioned first image is an image after image signal processing of the third image
  • the third image is the input of the codec
  • the second image is the output of the codec
  • the codec is used to image the third image Signal processing, compression and reconstruction to output the second image, similarity is used to optimize the codec.
  • module here may refer to an application-specific integrated circuit ASIC, an electronic circuit, a processor (such as a shared processor, a dedicated processor or a group processor, etc.) and a memory for executing one or more software or firmware programs, Logic and/or other suitable components are incorporated to support the described functionality.
  • ASIC application-specific integrated circuit
  • processor such as a shared processor, a dedicated processor or a group processor, etc.
  • memory for executing one or more software or firmware programs, Logic and/or other suitable components are incorporated to support the described functionality.
  • the above-mentioned image processing apparatus 1200 has the function of implementing the corresponding steps performed by the image processing device in the above-mentioned method embodiment; the above-mentioned functions can be realized by hardware, and can also be realized by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the image processing device 1200 in FIG. 12 may also be a chip or a chip system, for example: a system on chip (system on chip, SoC).
  • SoC system on chip
  • FIG. 13 is a schematic block diagram of another image processing apparatus 1300 provided by an embodiment of the present application.
  • the image processing device 1300 includes a processor 1310 , a communication interface 1320 and a memory 1330 .
  • the processor 1310, the communication interface 1320 and the memory 1330 communicate with each other through an internal connection path, the memory 1330 is used to store instructions, and the processor 1320 is used to execute the instructions stored in the memory 1330 to control the communication interface 1320 to send signals and /or to receive a signal.
  • the image processing apparatus 1300 may specifically be the image processing device in the above method embodiment, or the functions of the image processing device in the above method embodiment may be integrated in the image processing apparatus 1300, and the image processing apparatus 1300 may be used to execute the above Each step and/or process corresponding to the image processing device in the method embodiment.
  • the memory 1330 may include read-only memory and random-access memory, and provides instructions and data to the processor. A portion of the memory may also include non-volatile random access memory.
  • the memory may also store device type information.
  • the processor 1310 may be configured to execute instructions stored in the memory, and when the processor executes the instructions, the processor may execute various steps and/or processes corresponding to the image processing device in the foregoing method embodiments.
  • each step of the above method can be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software.
  • the steps of the methods disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in the field.
  • the storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, no detailed description is given here.
  • An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to implement the method corresponding to the image processing device in the foregoing embodiments.
  • the embodiment of the present application also provides a chip system, which is used to support the image processing device in the above method embodiment to implement the functions shown in the embodiment of the present application.
  • An embodiment of the present application provides a computer program product, the computer program product includes a computer program (also called code, or instruction), when the computer program runs on a computer, the computer can execute the image processing device in the above-mentioned embodiments corresponding method.
  • a computer program also called code, or instruction
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the present application are an image processing method and an image processing apparatus, which are used for acquiring the similarity between two images, so as to evaluate the quality of the images, thereby guiding the optimization of an encoder and/or a decoder. The method comprises: acquiring a first image and a second image; respectively inputting the first image and the second image into a single-layer convolutional neural network, so as to obtain a feature map of a first scale of the first image and a feature map of a first scale of the second image; respectively performing M-1 instances of downsampling on the feature map of the first scale of the first image and a feature map of a second scale of the second image, so as to obtain feature maps of M-1 scales of the first image and feature maps of M-1 scales of the second image; and determining the similarity of the first image with respect to the second image according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature maps of the M-1 scales of the first image, the feature maps of the M-1 scales of the second image, and a weight matrix related to M scales and the number C of feature maps.

Description

图像处理方法和图像处理装置Image processing method and image processing device 技术领域technical field
本申请涉及图像处理领域,尤其涉及一种图像处理方法和图像处理装置。The present application relates to the field of image processing, and in particular to an image processing method and an image processing device.
背景技术Background technique
摄像头具有分辨率高、非接触、使用方便、成本低廉等特点,在环境感知领域具有广泛应用。例如在自动驾驶领域,越来越多的摄像头安装到车辆上,可以实现无盲点覆盖,例如,用户可以观察到车的任意位置;也可以实现机器视觉人工智能(artificial intelligence,AI)处理,例如,目标识别、语义分割、交通灯检测以及车道线检测等,有利于保证自动驾驶的安全性。但随着摄像头分辨率、帧率、采样深度等的不断提高,输出的视频对传输带宽需求越来越大。The camera has the characteristics of high resolution, non-contact, convenient use, and low cost, and has a wide range of applications in the field of environmental perception. For example, in the field of autonomous driving, more and more cameras are installed on vehicles, which can achieve no blind spot coverage. For example, users can observe any position of the vehicle; it can also realize machine vision artificial intelligence (AI) processing, such as , object recognition, semantic segmentation, traffic light detection, and lane line detection, etc., are conducive to ensuring the safety of automatic driving. However, with the continuous improvement of camera resolution, frame rate, sampling depth, etc., the output video requires more and more transmission bandwidth.
为缓解传输网络的压力,一般采用压缩的方法来降低带宽需求,以便无需升级现有的网络带宽即可传输高清视频。为了满足现有的带宽需求,编码器可以对摄像头输出的图像或视频进行较高压缩率的压缩,这往往会采用有损图像或视频的压缩技术,不可避免地会导致图像或视频质量损伤,若图像或视频的损伤严重,会影响后续任务例如目标检测和语义分割等任务的准确性。编码器压缩后的图像可以经过解码器进行图像重建,得到重建图像,重建图像的质量好坏对于后续的任务具有关键意义,因此重建图像具有较高质量则尤为重要。然而,由于经过压缩处理得到的图像有损伤,重建图像质量无法保障。In order to alleviate the pressure on the transmission network, compression is generally used to reduce bandwidth requirements, so that high-definition video can be transmitted without upgrading the existing network bandwidth. In order to meet the existing bandwidth requirements, the encoder can compress the image or video output by the camera with a higher compression rate, which often uses lossy image or video compression technology, which will inevitably lead to image or video quality damage. If the image or video is severely damaged, it will affect the accuracy of subsequent tasks such as object detection and semantic segmentation. The image compressed by the encoder can be reconstructed by the decoder to obtain a reconstructed image. The quality of the reconstructed image is of key significance for subsequent tasks, so it is particularly important to have a high quality reconstructed image. However, the quality of the reconstructed image cannot be guaranteed due to the damage of the compressed image.
因此,亟需一种方法来评价重建图像的质量,为后续的任务处理提供保证。Therefore, there is an urgent need for a method to evaluate the quality of reconstructed images to provide guarantee for subsequent task processing.
发明内容Contents of the invention
本申请提供了一种图像处理方法和图像处理装置,用于获取重建图像和原始图像的相似度以评价重建图像的质量。The present application provides an image processing method and an image processing device, which are used to obtain the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image.
第一方面,提供了一种图像处理方法,该方法包括:获取第一图像和第二图像;将第一图像和第二图像分别输入至单层卷积神经网络,得到第一图像的第一尺度的特征图和第二图像的第一尺度的特征图,单层卷积神经网络包括C个卷积核,第一图像的第一尺度的特征图的数量和第二图像的第一尺度的特征图的数量均为C;分别对第一图像的第一尺度的特征图和第二图像的第二尺度的特征图进行M-1次下采样,得到第一图像的M-1个尺度的特征图和第二图像的M-1个尺度的特征图,其中,C和M均为大于1的正整数;根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度,权重矩阵包括不同尺度的不同特征图的权重系数。In a first aspect, an image processing method is provided, the method comprising: acquiring a first image and a second image; inputting the first image and the second image to a single-layer convolutional neural network respectively to obtain the first image of the first image The feature map of the scale and the feature map of the first scale of the second image, the single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image The number of feature maps is C; the feature map of the first scale of the first image and the feature map of the second scale of the second image are respectively down-sampled M-1 times to obtain M-1 scales of the first image The feature map and the feature map of M-1 scales of the second image, wherein C and M are both positive integers greater than 1; according to the feature map of the first scale of the first image, the feature of the first scale of the second image graph, the M-1 scale feature map of the first image, the M-1 scale feature map of the second image, and the weight matrix related to the number C of M scales and feature maps, to determine the first image relative to the first image The similarity between two images, the weight matrix includes weight coefficients of different feature maps of different scales.
第一图像和第二图像是两个不同的图像,当第一图像为原始图像时,第二图像可以为重建图像;当第一图像为重建图像时,第二图像可以为原始图像。The first image and the second image are two different images. When the first image is an original image, the second image may be a reconstructed image; when the first image is a reconstructed image, the second image may be an original image.
单层卷积神经网络的参数来自于预训练模型的第一层卷积层的参数,预训练模型即为预先训练好的卷积神经网络模型。单层卷积神经网络可以包括C个卷积核,不同卷积核用于提取不同的特征,得到不同的特征图。这些不同的特征图可以称为第一尺度的特征图。当图像处理设备将第一图像输入至单层卷积神经网络,得到第一图像的第一尺度的特征图,当图像处理设备将第二图像输入至单层卷积神经网络,得到第二图像的第一尺度的特征图。The parameters of the single-layer convolutional neural network come from the parameters of the first convolutional layer of the pre-trained model, and the pre-trained model is the pre-trained convolutional neural network model. A single-layer convolutional neural network may include C convolution kernels, and different convolution kernels are used to extract different features and obtain different feature maps. These different feature maps can be referred to as first-scale feature maps. When the image processing device inputs the first image to the single-layer convolutional neural network, the feature map of the first scale of the first image is obtained, and when the image processing device inputs the second image to the single-layer convolutional neural network, the second image is obtained The feature map of the first scale of .
图像处理设备可以对第一图像的第一尺度的特征图进行M-1次下采样中的第一次下采样,得到第一图像的第二尺度的特征图,对第一图像的第二尺度的特征图进行M-1次下采样中的第二次下采样,得到第一图像的第三尺度的特征图,依次类推,可以得到第一图像的第M尺度的特征图。The image processing device may perform the first downsampling in M-1 times of downsampling on the feature map of the first scale of the first image to obtain the feature map of the second scale of the first image, and the second scale of the first image The feature map of the M-1 times of downsampling is performed for the second downsampling to obtain the feature map of the third scale of the first image, and so on, the feature map of the Mth scale of the first image can be obtained.
应理解,图像处理设备对第一图像的第一尺度的特征图进行M-1次下采样,得到第一图像的M-1个尺度的特征图。同理,图像处理设备可以得到第二图像的M-1个尺度的特征图。It should be understood that the image processing device performs M-1 downsampling on the first-scale feature map of the first image to obtain M-1 feature maps of the first image. Similarly, the image processing device can obtain feature maps of M-1 scales of the second image.
还应理解,第一图像的第一尺度的特征图的数量为C,该第一图像的第一尺度的特征图进行第一次下采样后,得到第一图像的第二尺度的特征图,该第一图像的第二尺度的特征图的数量也为C,同理,第一图像的其他尺度的特征图的数量也为C,即第一图像的M-1个尺度的特征图中每一个尺度的特征图的数量均为C。同理,第二图像的M-1个尺度的特征图中每一个尺度的特征图的数量也均为C。It should also be understood that the number of feature maps of the first scale of the first image is C, and after the first downsampling is performed on the feature maps of the first scale of the first image, the feature maps of the second scale of the first image are obtained, The number of feature maps of the second scale of the first image is also C, similarly, the number of feature maps of other scales of the first image is also C, that is, each feature map of M-1 scales of the first image The number of feature maps of a scale is C. Similarly, the number of feature maps of each scale in the M-1 scale feature maps of the second image is also C.
需要说明的是,第一图像的第一尺度的特征图中第一个特征图进行下采样,可以得到第一图像的第二尺度的特征图中第一个特征图,其他同理,此处不再赘述。It should be noted that the first feature map in the feature map of the first scale of the first image is down-sampled, and the first feature map of the feature map of the second scale of the first image can be obtained, and the other is the same, here No longer.
本申请提供的图像处理方法,利用单层卷积神经网络获取第一尺度的特征图,只需单层卷积计算,计算简单,通过单层卷积提取底层通用特征,有利于保证图像多任务的泛化能力,第一尺度的特征图分辨率较高,可以保证图像的质量,同时,对分辨率高的特征图进行多次下采样,可以得到更多的图像特征信息,另外,利用权重矩阵为不同尺度的不同特征图分配权重,灵活调节图像特征和尺度维度上的权重值,优化机器视觉性能,实现高效的面向机器视觉的图像编码压缩性能,有利于利用第一图像相对于第二图像的相似度评价第一图像的质量。The image processing method provided by this application uses a single-layer convolutional neural network to obtain the feature map of the first scale. It only needs a single-layer convolution calculation, and the calculation is simple. The underlying general features are extracted through a single-layer convolution, which is conducive to ensuring image multi-tasking. The generalization ability of the feature map of the first scale is higher, which can ensure the quality of the image. At the same time, multiple downsampling of the feature map with high resolution can obtain more image feature information. In addition, using the weight The matrix assigns weights to different feature maps of different scales, flexibly adjusts the weight values of image features and scale dimensions, optimizes machine vision performance, and achieves efficient machine vision-oriented image coding and compression performance, which is conducive to using the first image compared to the second The image similarity evaluates the quality of the first image.
结合第一方面,在第一方面的某些实现方式中,在根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度之前,上述方法还包括:根据第一尺度的特征图的峰值信噪比,确定第一尺度的特征图的权重系数,第一尺度的特征图包括第一图像的第一尺度的特征图和第二图像的第一尺度的特征图;根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度,包括:根据第一尺度的特征图的权重系数、第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及权重矩阵,确定相似度。In combination with the first aspect, in some implementations of the first aspect, according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, and the M-1 scales of the first image The feature map, the feature map of M-1 scales of the second image, and the weight matrix related to the number C of M scales and feature maps, before determining the similarity of the first image relative to the second image, the above method also includes: According to the peak signal-to-noise ratio of the feature map of the first scale, the weight coefficient of the feature map of the first scale is determined, and the feature map of the first scale includes the feature map of the first scale of the first image and the feature map of the first scale of the second image Feature map; according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of the M-1 scale of the first image, and the feature map of the M-1 scale of the second image Figure and the weight matrix related to M scales and the number C of feature maps to determine the similarity of the first image relative to the second image, including: the weight coefficient of the feature map according to the first scale, the first scale of the first image The feature map of the second image, the feature map of the first scale of the second image, the feature map of the M-1 scale of the first image, the feature map of the M-1 scale of the second image, and the weight matrix to determine the similarity.
第一图像的第一尺度的特征图和第二图像的第一尺度的特征图的数量均为C,则 第一尺度的特征图的峰值信噪比的数量为C,图像处理设备可以根据第一尺度的特征图的峰值信噪比,得到第一尺度的特征图中每个特征图的权重系数。图像处理设备可以根据第一尺度的特征图的权重系数、第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及权重矩阵,确定相似度。The number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C, then the number of peak signal-to-noise ratios of the feature maps of the first scale is C, and the image processing device can be based on the first scale The peak signal-to-noise ratio of the feature map of the first scale is used to obtain the weight coefficient of each feature map in the feature map of the first scale. The image processing device may be based on the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, The M-1 scale feature maps and the weight matrix of the second image are used to determine the similarity.
本申请提供的图像处理方法,第一尺度的特征图的峰值信噪比可以使第一图像和第二图像的颜色、亮度等特征保持一致,基于第一尺度的特征图的峰值信噪比,得到第一尺度的特征图中每个特征图的权重系数,利用权重系数对每个特征图进行加权,可以增加特征图的细节效果,提高人眼视觉和机器视觉性能,兼顾人眼视觉的需求和机器视觉的需求。In the image processing method provided by this application, the peak signal-to-noise ratio of the feature map of the first scale can make the color, brightness and other features of the first image and the second image consistent, based on the peak signal-to-noise ratio of the feature map of the first scale, Obtain the weight coefficient of each feature map in the feature map of the first scale, and use the weight coefficient to weight each feature map, which can increase the detail effect of the feature map, improve the performance of human vision and machine vision, and take into account the needs of human vision and machine vision needs.
结合第一方面,在第一方面的某些实现方式中,上述相似度通过下列公式得到:In combination with the first aspect, in some implementations of the first aspect, the above similarity is obtained by the following formula:
Figure PCTCN2022076277-appb-000001
Figure PCTCN2022076277-appb-000001
Figure PCTCN2022076277-appb-000002
Figure PCTCN2022076277-appb-000002
Figure PCTCN2022076277-appb-000003
Figure PCTCN2022076277-appb-000003
Figure PCTCN2022076277-appb-000004
Figure PCTCN2022076277-appb-000004
其中,ASSIM为所述相似度,x为第一图像,y为第二图像,f()为单层卷积神经网络的卷积操作,i为单层卷积神经网络中的第i个卷积核,1≤i≤C,j为第j个尺度,1≤j≤M,w i(f 1 i(x),f 1 i(y))为第一尺度的特征图的权重系数,
Figure PCTCN2022076277-appb-000005
为第j个尺度的第i个特征图的结构相似性,β ij为权重矩阵中第i行第j列对应的权重系数,f 1 i(x)为第一图像的第一尺度的特征图中的第i个特征图,f 1 i(y)为第二图像的第一尺度的特征图中的第i个特征图,PSNR i为第一尺度第i个特征图的峰值信噪比,α为常数,
Figure PCTCN2022076277-appb-000006
为第一图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000007
为第二图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000008
为第一图像的第j个尺度的第i个特征图和第二图像的第j个尺度的第i个特征图的协方差,max(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最大值,min(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最小值,H为第一图像或第二图像的高度,W为第一图像或第二图像的宽度。
Among them, ASSIM is the similarity, x is the first image, y is the second image, f() is the convolution operation of the single-layer convolutional neural network, and i is the i-th volume in the single-layer convolutional neural network Product kernel, 1≤i≤C, j is the jth scale, 1≤j≤M, w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale,
Figure PCTCN2022076277-appb-000005
is the structural similarity of the i-th feature map of the j-th scale, β ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix, and f 1 i (x) is the feature map of the first scale of the first image The i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image, PSNR i is the peak signal-to-noise ratio of the i-th feature map in the first scale, α is a constant,
Figure PCTCN2022076277-appb-000006
is the variance of the i-th feature map of the j-th scale of the first image,
Figure PCTCN2022076277-appb-000007
is the variance of the i-th feature map of the j-th scale of the second image,
Figure PCTCN2022076277-appb-000008
is the covariance of the i-th feature map of the j-th scale of the first image and the i-th feature map of the j-th scale of the second image, max(f 1 i (x), f 1 i (y)) is the maximum value of the pixel value in the feature map of the first scale of the first image and the second image, min(f 1 i (x), f 1 i (y)) is the first scale of the first image and the second image The minimum value of the pixel value in the feature map, H is the height of the first image or the second image, W is the width of the first image or the second image.
i为单层卷积神经网络中的第i个卷积核,也可以称为第i个通道,本申请对此不作限定。i is the i-th convolution kernel in the single-layer convolutional neural network, which may also be called the i-th channel, which is not limited in this application.
结合第一方面,在第一方面的某些实现方式中,上述权重矩阵为C行M列的矩阵,权重矩阵中每一行的权重系数用于表示特征图在不同尺度下的权重系数,权重矩阵中每一行的权重系数之和为1。In combination with the first aspect, in some implementations of the first aspect, the above weight matrix is a matrix of C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, the weight matrix The sum of the weight coefficients of each row in is 1.
权重矩阵中每一行的权重系数为同一通道对应的特征图(或者同一卷积核对应的 特征图)在不同尺度下的权重系数,同一通道对应的特征图在不同尺度的权重系数之和为1。The weight coefficient of each row in the weight matrix is the weight coefficient of the feature map corresponding to the same channel (or the feature map corresponding to the same convolution kernel) at different scales, and the sum of the weight coefficients of the feature maps corresponding to the same channel at different scales is 1 .
结合第一方面,在第一方面的某些实现方式中,上述权重矩阵中每一行的权重系数的每一个权重系数均相同。With reference to the first aspect, in some implementation manners of the first aspect, each weight coefficient of each row of the weight coefficients in the weight matrix is the same.
本申请提供的图像处理方法,权重矩阵中每一行的权重系数,是一种最简单的实现方式,计算简单,有利于提高计算相似度的效率。In the image processing method provided by this application, the weight coefficient of each row in the weight matrix is the simplest implementation method, and the calculation is simple, which is conducive to improving the efficiency of calculating the similarity.
结合第一方面,在第一方面的某些实现方式中,上述M=5,权重矩阵中每一行的权重系数均相同,且为[0.2 0.2 0.2 0.2 0.2]。In combination with the first aspect, in some implementations of the first aspect, the above M=5, and the weight coefficient of each row in the weight matrix is the same, and is [0.2 0.2 0.2 0.2 0.2].
结合第一方面,在第一方面的某些实现方式中,上述M=5,权重矩阵中每一行的权重系数均相同,且为[0.0448 0.2856 0.3001 0.2363 0.1333]。In combination with the first aspect, in some implementations of the first aspect, the above M=5, and the weight coefficients of each row in the weight matrix are the same, and are [0.0448 0.2856 0.3001 0.2363 0.1333].
结合第一方面,在第一方面的某些实现方式中,上述C个卷积核按照特征频率高低划分为高频特征组、中频特征组以及低频特征组,M个尺度的特征图根据M个尺度的特征图的分辨率划分为低尺度特征图、中尺度特征图以及高尺度特征图;权重矩阵中高频特征组、中频特征组以及低频特征组对应的特征图在不同尺度下的权重系数分布满足以下条件:高频特征组对应的特征图的不为0的权重系数分布于第一图像和第二图像的低尺度的特征图中,高尺度特征图的权重系数为0;低频特征组对应的特征图的不为0的权重系数分布于低尺度特征图、中尺度特征图以及高尺度特征图中;中频特征组对应的特征图的不为0的权重系数的分布多于高频特征组对应的特征图的权重系数分布,且少于低频特征组对应的特征图的权重系数分布。In combination with the first aspect, in some implementations of the first aspect, the above C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and the feature maps of M scales are based on M The resolution of the scale feature map is divided into low-scale feature map, medium-scale feature map and high-scale feature map; the weight coefficient distribution of the feature maps corresponding to the high-frequency feature group, medium-frequency feature group and low-frequency feature group in the weight matrix at different scales The following conditions are met: the non-zero weight coefficients of the feature maps corresponding to the high-frequency feature groups are distributed in the low-scale feature maps of the first image and the second image, and the weight coefficients of the high-scale feature maps are 0; the low-frequency feature groups correspond to The non-zero weight coefficients of the feature map are distributed in the low-scale feature map, medium-scale feature map and high-scale feature map; the distribution of non-zero weight coefficients of the feature map corresponding to the medium-frequency feature group is more than that of the high-frequency feature group The weight coefficient distribution of the corresponding feature map is less than the weight coefficient distribution of the feature map corresponding to the low-frequency feature group.
图像处理设备可以将C个卷积核按照特征频率从高到低或者从低到高进行排序,然后将排序后的卷积核进行分组,划分为高频特征组、中频特征组以及低频特征组。The image processing device can sort the C convolution kernels according to the characteristic frequency from high to low or from low to high, and then group the sorted convolution kernels into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups. .
高频特征组对应的特征图可以理解为高频特征组中每个卷积核对应的特征图,每个卷积核对应的特征图包括M个尺度。中频特征组对应的特征图可以理解为中频特征组中每个卷积核对应的特征图,每个卷积核对应的特征图包括M个尺度的特征图。同理,低频特征组对应的特征图可以理解为低频特征组中每个卷积核对应的特征图,每个卷积核对应的特征图包括M个尺度的特征图。The feature map corresponding to the high-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the high-frequency feature group, and the feature map corresponding to each convolution kernel includes M scales. The feature map corresponding to the intermediate frequency feature group can be understood as the feature map corresponding to each convolution kernel in the intermediate frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales. Similarly, the feature map corresponding to the low-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the low-frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales.
图像处理设备可以将M个尺度的特征图根据M个尺度的特征图的分辨率从高到低或者从低到高进行顺序,然后将排序后的M个尺度的特征图划分为低尺度特征图、中尺度特征图以及高尺度特征图。The image processing device can sort the feature maps of M scales according to the resolution of the feature maps of M scales from high to low or from low to high, and then divide the sorted feature maps of M scales into low-scale feature maps , mid-scale feature maps, and high-scale feature maps.
本申请提供的图像处理方法,权重矩阵可以为不同特征不同尺度分配权重系数,可以发挥不同特征的作用,对于高频特征组对应的特征,在低尺度特征图上分配较高的权重,对于低频特征组对应的特征,考虑了高尺度下提取轮廓信息的能力,在不同尺度上均匀分配权重,对于中频特征组对应的特征,介于高频特征组和低频特征组之间,既考虑了高频细节也兼顾轮廓信息,有助于提升机器视觉性能。In the image processing method provided by this application, the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. For features corresponding to high-frequency feature groups, higher weights are allocated on low-scale feature maps. For low-frequency The features corresponding to the feature group consider the ability to extract contour information at high scales, and distribute weights evenly on different scales. For the features corresponding to the mid-frequency feature group, they are between the high-frequency feature group and the low-frequency feature group. The video details also take into account the contour information, which helps to improve the performance of machine vision.
结合第一方面,在第一方面的某些实现方式中,高频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,低频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,中频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同。In combination with the first aspect, in some implementations of the first aspect, each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, and the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group Each weight coefficient in the weight coefficients of is the same, and each weight coefficient in the non-zero weight coefficients of the feature map corresponding to the intermediate frequency feature group is the same.
高频特征组、低频特征组以及中频特征组中每个卷积核对应的特征图均包括M个 尺度。高频特征组对应的特征图的不为0的权重系数分布于M个尺度中低尺度特征图中,且低尺度特征图的每一个权重系数均相同。低频特征组对应的特征图的不为0的权重系数分布于低尺度特征图、中尺度特征图以及高尺度特征图(即M个尺度)中,且M个尺度的特征图的每一个权重系数均相同。中频特征组同理,此处不再赘述。The feature map corresponding to each convolution kernel in the high-frequency feature group, low-frequency feature group, and intermediate-frequency feature group includes M scales. The non-zero weight coefficients of the feature maps corresponding to the high-frequency feature groups are distributed in the low-scale feature maps of the M scales, and each weight coefficient of the low-scale feature maps is the same. The non-zero weight coefficients of the feature maps corresponding to the low-frequency feature groups are distributed in the low-scale feature maps, medium-scale feature maps, and high-scale feature maps (that is, M scales), and each weight coefficient of the feature maps of M scales are the same. The same is true for the intermediate frequency feature group, which will not be repeated here.
结合第一方面,在第一方面的某些实现方式中,上述M=5,高频特征组对应的特征图的权重系数为[1 0 0 0 0],低频特征组对应的特征图的权重系数为[1/5 1/5 1/5 1/5 1/5],中频特征对应的特征图的权重系数为[1/3 1/3 1/3 0 0];Combined with the first aspect, in some implementations of the first aspect, the above M=5, the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0], and the weight of the feature map corresponding to the low-frequency feature group The coefficient is [1/5 1/5 1/5 1/5 1/5], and the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
其中,[1 0 0 0 0]、[1/5 1/5 1/5 1/5 1/5]以及[1/3 1/3 1/3 0 0]中,第一列的权重系数均用于表示低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数均用于表示中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数均用于表示高尺度特征图的权重系数。Among them, in [1 0 0 0 0], [1/5 1/5 1/5 1/5 1/5] and [1/3 1/3 1/3 0 0], the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map, the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map, the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
高频特征组对应的卷积核用于提取图像的纹理特征,低频特征组对应的卷积核可以用于提取图像的平滑特征,中频特征组对应的卷积核可以用于提取图像的边缘特征。The convolution kernel corresponding to the high frequency feature group is used to extract the texture features of the image, the convolution kernel corresponding to the low frequency feature group can be used to extract the smooth features of the image, and the convolution kernel corresponding to the intermediate frequency feature group can be used to extract the edge features of the image .
本申请提供的图像处理方法,权重矩阵可以为不同特征不同尺度分配权重系数,可以发挥不同特征的作用,对于细节特征,在高分辨率的第一尺度特征图上分配权重1,对于平滑特征,考虑了高尺度下提取轮廓信息的能力,在不同尺度上均匀分配权重,对于边缘特征,介于细节特征和平滑特征之间,既考虑了高频细节也兼顾轮廓信息,有助于提升机器视觉性能。In the image processing method provided by this application, the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. For detailed features, assign a weight of 1 on the high-resolution first-scale feature map. For smooth features, The ability to extract contour information at high scales is considered, and the weights are evenly distributed on different scales. For edge features, it is between detail features and smooth features. It considers both high-frequency details and contour information, which helps to improve machine vision. performance.
结合第一方面,在第一方面的某些实现方式中,上述特征频率由单层卷积神经网络的卷积核的模大小和/或单层卷积神经网络的归一化层的缩放系数和归一化系数确定。In conjunction with the first aspect, in some implementations of the first aspect, the above-mentioned characteristic frequency is determined by the modulus size of the convolution kernel of the single-layer convolutional neural network and/or the scaling factor of the normalization layer of the single-layer convolutional neural network And the normalization coefficient is determined.
结合第一方面,在第一方面的某些实现方式中,第一图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第一图像进行压缩和重建以输出第二图像,相似度用于优化编解码器。With reference to the first aspect, in some implementations of the first aspect, the first image is the input of the codec, the second image is the output of the codec, and the codec is used to compress and reconstruct the first image to output For the second image, the similarity is used to optimize the codec.
第一图像为编解码器的输入,也可以称为原始图像;第二图像为编解码器的输出,也可以称为重建图像,图像处理设备可以使用上述方法得到原始图像相对于重建图像的相似度,以便于优化编解码器。The first image is the input of the codec, which can also be called the original image; the second image is the output of the codec, which can also be called the reconstructed image, and the image processing device can use the above method to obtain the similarity between the original image and the reconstructed image degree, in order to optimize the codec.
本申请提供的图像处理方法,可以用于获取重建图像和原始图像的相似度以评价重建图像的质量,从而指导优化编解码器。The image processing method provided in this application can be used to obtain the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image, so as to guide the optimization of the codec.
结合第一方面,在第一方面的某些实现方式中,上述第一图像为第三图像经过图像信号处理后的图像,第三图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第三图像进行图像信号处理、压缩以及重建以输出第二图像,相似度用于优化编解码器。With reference to the first aspect, in some implementations of the first aspect, the above-mentioned first image is an image after image signal processing of the third image, the third image is an input of a codec, and the second image is an image of a codec output, the codec is used to perform image signal processing, compression and reconstruction on the third image to output the second image, and the similarity is used to optimize the codec.
第一图像为第三图像经过图像信号处理后的图像,也可以称为原始图像;第三图像为编解码器的输入,第二图像为编解码器的输出,也可以称为重建图像;编解码器用于对第三图像进行图像信号处理、压缩以及重建以输出第二图像,图像处理设备可以使用上述方法得到原始图像相对于重建图像的相似度,以用于优化编解码器。The first image is the image of the third image after image signal processing, which may also be called the original image; the third image is the input of the codec, and the second image is the output of the codec, which may also be called the reconstructed image; The decoder is used to perform image signal processing, compression and reconstruction on the third image to output the second image, and the image processing device can use the above method to obtain the similarity between the original image and the reconstructed image for optimizing the codec.
第二方面,提供了一种图像处理装置,该装置包括:获取模块和处理模块。该获取模块用于:获取第一图像和第二图像;处理模块用于:将第一图像和第二图像分别 输入至单层卷积神经网络,得到第一图像的第一尺度的特征图和第二图像的第一尺度的特征图,单层卷积神经网络包括C个卷积核,第一图像的第一尺度的特征图的数量和第二图像的第一尺度的特征图的数量均为C;分别对第一图像的第一尺度的特征图和第二图像的第二尺度的特征图进行M-1次下采样,得到第一图像的M-1个尺度的特征图和第二图像的M-1个尺度的特征图,其中,C和M均为大于1的正整数;根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度,权重矩阵包括不同尺度的不同特征图的权重系数。In a second aspect, an image processing device is provided, and the device includes: an acquisition module and a processing module. The acquisition module is used to: acquire the first image and the second image; the processing module is used to: respectively input the first image and the second image to a single-layer convolutional neural network to obtain the first-scale feature map and the first-scale feature map of the first image. The feature map of the first scale of the second image, the single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are the same is C; respectively perform M-1 downsampling on the feature map of the first scale of the first image and the feature map of the second scale of the second image to obtain the feature map of the M-1 scale of the first image and the second The feature map of M-1 scales of the image, where C and M are both positive integers greater than 1; according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, and the first image The feature map of M-1 scales of the second image, the feature map of M-1 scales of the second image, and the weight matrix related to the number C of M scales and feature maps, determine the similarity of the first image with respect to the second image , the weight matrix includes the weight coefficients of different feature maps of different scales.
结合第二方面,在第二方面的某些实现方式中,上述处理模块还用于:根据第一尺度的特征图的峰值信噪比,确定第一尺度的特征图的权重系数,第一尺度的特征图包括第一图像的第一尺度的特征图和第二图像的第一尺度的特征图;根据第一尺度的特征图的权重系数、第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及权重矩阵,确定相似度。In conjunction with the second aspect, in some implementations of the second aspect, the above-mentioned processing module is further configured to: determine the weight coefficient of the feature map of the first scale according to the peak signal-to-noise ratio of the feature map of the first scale, and the first scale The feature map of the first image includes the feature map of the first scale of the first image and the feature map of the first scale of the second image; according to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the second The feature map of the first scale of the image, the feature map of the M-1 scale of the first image, the feature map of the M-1 scale of the second image, and the weight matrix determine the similarity.
结合第二方面,在第二方面的某些实现方式中,上述相似度通过下列公式得到:In combination with the second aspect, in some implementations of the second aspect, the above similarity is obtained by the following formula:
Figure PCTCN2022076277-appb-000009
Figure PCTCN2022076277-appb-000009
Figure PCTCN2022076277-appb-000010
Figure PCTCN2022076277-appb-000010
Figure PCTCN2022076277-appb-000011
Figure PCTCN2022076277-appb-000011
Figure PCTCN2022076277-appb-000012
Figure PCTCN2022076277-appb-000012
其中,ASSIM为所述相似度,x为第一图像,y为第二图像,f()为单层卷积神经网络的卷积操作,i为单层卷积神经网络中的第i个卷积核,1≤i≤C,j为第j个尺度,1≤j≤M,w i(f 1 i(x),f 1 i(y))为第一尺度的特征图的权重系数,
Figure PCTCN2022076277-appb-000013
为第j个尺度的第i个特征图的结构相似性,β ij为权重矩阵中第i行第j列对应的权重系数,f 1 i(x)为第一图像的第一尺度的特征图中的第i个特征图,f 1 i(y)为第二图像的第一尺度的特征图中的第i个特征图,PSNR i为第一尺度第i个特征图的峰值信噪比,α为常数,
Figure PCTCN2022076277-appb-000014
为第一图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000015
为第二图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000016
为第一图像的第j个尺度的第i个特征图和第二图像的第j个尺度的第i个特征图的协方差,max(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最大值,min(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最小值,H为第一图像或第二图像的高度,W为第一图像或第二图像的宽度。
Among them, ASSIM is the similarity, x is the first image, y is the second image, f() is the convolution operation of the single-layer convolutional neural network, and i is the i-th volume in the single-layer convolutional neural network Product kernel, 1≤i≤C, j is the jth scale, 1≤j≤M, w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale,
Figure PCTCN2022076277-appb-000013
is the structural similarity of the i-th feature map of the j-th scale, β ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix, and f 1 i (x) is the feature map of the first scale of the first image The i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image, PSNR i is the peak signal-to-noise ratio of the i-th feature map in the first scale, α is a constant,
Figure PCTCN2022076277-appb-000014
is the variance of the i-th feature map of the j-th scale of the first image,
Figure PCTCN2022076277-appb-000015
is the variance of the i-th feature map of the j-th scale of the second image,
Figure PCTCN2022076277-appb-000016
is the covariance of the i-th feature map of the j-th scale of the first image and the i-th feature map of the j-th scale of the second image, max(f 1 i (x), f 1 i (y)) is the maximum value of the pixel value in the feature map of the first scale of the first image and the second image, min(f 1 i (x), f 1 i (y)) is the first scale of the first image and the second image The minimum value of the pixel value in the feature map, H is the height of the first image or the second image, W is the width of the first image or the second image.
结合第二方面,在第二方面的某些实现方式中,上述权重矩阵为C行M列的矩阵, 所述权重矩阵中每一行的权重系数用于表示特征图在不同尺度下的权重系数,所述权重矩阵中每一行的权重系数之和为1。In conjunction with the second aspect, in some implementations of the second aspect, the above weight matrix is a matrix with C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, The sum of the weight coefficients of each row in the weight matrix is 1.
结合第二方面,在第二方面的某些实现方式中,上述权重矩阵中每一行的权重系数的每一个权重系数均相同。With reference to the second aspect, in some implementation manners of the second aspect, each weight coefficient of each row of the weight coefficients in the weight matrix is the same.
结合第二方面,在第二方面的某些实现方式中,上述M=5,权重矩阵中每一行的权重系数均相同,且为[0.2 0.2 0.2 0.2 0.2]。In combination with the second aspect, in some implementations of the second aspect, the above M=5, and the weight coefficients of each row in the weight matrix are the same, and are [0.2 0.2 0.2 0.2 0.2].
结合第二方面,在第二方面的某些实现方式中,上述M=5,权重矩阵中每一行的权重系数均相同,且为[0.0448 0.2856 0.3001 0.2363 0.1333]In combination with the second aspect, in some implementations of the second aspect, the above M=5, and the weight coefficients of each row in the weight matrix are the same, and are [0.0448 0.2856 0.3001 0.2363 0.1333]
结合第二方面,在第二方面的某些实现方式中,上述C个卷积核按照特征频率高低划分为高频特征组、中频特征组以及低频特征组,M个尺度的特征图根据M个尺度的特征图的分辨率划分为低尺度特征图、中尺度特征图以及高尺度特征图;权重矩阵中高频特征组、中频特征组以及低频特征组对应的特征图在不同尺度下的权重系数分布满足以下条件:高频特征组对应的特征图的不为0的权重系数分布于第一图像和第二图像的低尺度的特征图中,高尺度特征图的权重系数为0;低频特征组对应的特征图的不为0的权重系数分布于低尺度特征图、中尺度特征图以及高尺度特征图中;中频特征组对应的特征图的不为0的权重系数的分布多于高频特征组对应的特征图的权重系数分布,且少于低频特征组对应的特征图的权重系数分布。In combination with the second aspect, in some implementations of the second aspect, the above C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and feature maps of M scales are based on M The resolution of the scale feature map is divided into low-scale feature map, medium-scale feature map and high-scale feature map; the weight coefficient distribution of the feature maps corresponding to the high-frequency feature group, medium-frequency feature group and low-frequency feature group in the weight matrix at different scales The following conditions are met: the non-zero weight coefficients of the feature maps corresponding to the high-frequency feature groups are distributed in the low-scale feature maps of the first image and the second image, and the weight coefficients of the high-scale feature maps are 0; the low-frequency feature groups correspond to The non-zero weight coefficients of the feature map are distributed in the low-scale feature map, medium-scale feature map and high-scale feature map; the distribution of non-zero weight coefficients of the feature map corresponding to the medium-frequency feature group is more than that of the high-frequency feature group The weight coefficient distribution of the corresponding feature map is less than the weight coefficient distribution of the feature map corresponding to the low-frequency feature group.
结合第二方面,在第二方面的某些实现方式中,高频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,低频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,中频特征对应的特征图的不为0的权重系数中每一个权重系数均相同。In combination with the second aspect, in some implementations of the second aspect, each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, and the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group Each of the weight coefficients of is the same, and each of the non-zero weight coefficients of the feature map corresponding to the intermediate frequency feature is the same.
结合第二方面,在第二方面的某些实现方式中,上述M=5,高频特征组对应的特征图的权重系数为[1 0 0 0 0],低频特征组对应的特征图的权重系数为[1/5 1/5 1/5 1/5 1/5],中频特征对应的特征图的权重系数为[1/3 1/3 1/3 0 0];Combined with the second aspect, in some implementations of the second aspect, the above M=5, the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0], and the weight of the feature map corresponding to the low-frequency feature group The coefficient is [1/5 1/5 1/5 1/5 1/5], and the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
其中,[1 0 0 0 0]、[1/5 1/5 1/5 1/5 1/5]以及[1/3 1/3 1/3 0 0]中,第一列的权重系数均用于表示低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数均用于表示中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数均用于表示高尺度特征图的权重系数。Among them, in [1 0 0 0 0], [1/5 1/5 1/5 1/5 1/5] and [1/3 1/3 1/3 0 0], the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map, the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map, the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
结合第二方面,在第二方面的某些实现方式中,上述特征频率由单层卷积神经网络的卷积核的模大小和/或单层卷积神经网络的归一化层的缩放系数和归一化系数确定。In conjunction with the second aspect, in some implementations of the second aspect, the above-mentioned characteristic frequency is determined by the modulus size of the convolution kernel of the single-layer convolutional neural network and/or the scaling factor of the normalization layer of the single-layer convolutional neural network And the normalization coefficient is determined.
结合第二方面,在第二方面的某些实现方式中,上述第一图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第一图像进行压缩和重建以输出第二图像,相似度用于优化编解码器。With reference to the second aspect, in some implementations of the second aspect, the above-mentioned first image is the input of the codec, the second image is the output of the codec, and the codec is used to compress and reconstruct the first image to Output the second image, the similarity is used to optimize the codec.
结合第二方面,在第二方面的某些实现方式中,上述第一图像为第三图像经过图像信号处理后的图像,第三图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第三图像进行图像信号处理、压缩以及重建以输出第二图像,相似度用于优化编解码器。With reference to the second aspect, in some implementations of the second aspect, the above-mentioned first image is an image after image signal processing of the third image, the third image is the input of the codec, and the second image is the output of the codec output, the codec is used to perform image signal processing, compression and reconstruction on the third image to output the second image, and the similarity is used to optimize the codec.
第三方面,提供了图像处理装置,包括,处理器,存储器,该存储器用于存储计 算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该装置执行上述第一方面中任一种可能实现方式中的方法。In a third aspect, an image processing device is provided, including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device performs any one of the above first aspects method in one possible implementation.
可选地,所述处理器为一个或多个,所述存储器为一个或多个。Optionally, there are one or more processors, and one or more memories.
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。Optionally, the memory may be integrated with the processor, or the memory may be set separately from the processor.
可选地,该装置还包括发射机(发射器)和接收机(接收器),发射机和接收机可以分离设置,也可以集成在一起,称为收发机(收发器)。Optionally, the device also includes a transmitter (transmitter) and a receiver (receiver). The transmitter and receiver can be set separately or integrated together, called a transceiver (transceiver).
第四方面,本申请提供了一种处理器,包括:输入电路、输出电路和处理电路。处理电路用于通过输入电路接收信号,并通过输出电路发射信号,使得处理器执行上述第一方面中任一种可能实现方式中的方法。In a fourth aspect, the present application provides a processor, including: an input circuit, an output circuit, and a processing circuit. The processing circuit is configured to receive a signal through the input circuit and transmit a signal through the output circuit, so that the processor executes the method in any possible implementation manner of the first aspect above.
在具体实现过程中,上述处理器可以为芯片,输入电路可以为输入管脚,输出电路可以为输出管脚,处理电路可以为晶体管、门电路、触发器和各种逻辑电路等。输入电路所接收的输入的信号可以是由例如但不限于接收器接收并输入的,输出电路所输出的信号可以是例如但不限于输出给发射器并由发射器发射的,且输入电路和输出电路可以是同一电路,该电路在不同的时刻分别用作输入电路和输出电路。本申请对处理器及各种电路的具体实现方式不做限定。In a specific implementation process, the above-mentioned processor can be a chip, the input circuit can be an input pin, the output circuit can be an output pin, and the processing circuit can be a transistor, a gate circuit, a flip-flop, and various logic circuits. The input signal received by the input circuit may be received and input by, for example but not limited to, the receiver, the output signal of the output circuit may be, for example but not limited to, output to the transmitter and transmitted by the transmitter, and the input circuit and the output The circuit may be the same circuit, which is used as an input circuit and an output circuit respectively at different times. The present application does not limit the specific implementation manners of the processor and various circuits.
第五方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序(也可以称为代码,或指令)当其在计算机上运行时,使得计算机执行上述第一方面中任一种可能实现方式中的方法。In a fifth aspect, a computer-readable storage medium is provided, and the computer-readable storage medium stores a computer program (also referred to as code, or an instruction) which, when run on a computer, causes the computer to perform the above-mentioned first aspect. A method in any of the possible implementations.
第六方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序被运行时,使得计算机执行上述第一方面中任一种可能实现方式中的方法。According to a sixth aspect, a computer program product is provided, and the computer program product includes: a computer program (also referred to as code, or an instruction), which, when the computer program is executed, causes the computer to perform any of the above-mentioned first aspects. A method in one possible implementation.
附图说明Description of drawings
图1为本申请实施例适用的应用场景的示意图;FIG. 1 is a schematic diagram of an application scenario applicable to an embodiment of the present application;
图2为一种图像压缩的示意性框图;Fig. 2 is a schematic block diagram of image compression;
图3为本申请实施例提供的另一种图像压缩的示意性框图;FIG. 3 is a schematic block diagram of another image compression provided by the embodiment of the present application;
图4为本申请实施例提供的一种图像处理方法的示意性流程图;FIG. 4 is a schematic flowchart of an image processing method provided in an embodiment of the present application;
图5为本申请实施例提供的一种下采样的示意性框图;FIG. 5 is a schematic block diagram of a downsampling provided by an embodiment of the present application;
图6为本申请实施例提供的一种计算相似度的示意性框图;FIG. 6 is a schematic block diagram of calculating similarity provided by an embodiment of the present application;
图7为本申请实施例提供的一种特征图分组的示意图;FIG. 7 is a schematic diagram of a feature map grouping provided by an embodiment of the present application;
图8为本申请实施例提供的一种仿真示意图;FIG. 8 is a schematic diagram of a simulation provided by an embodiment of the present application;
图9为本申请实施例提供的另一种仿真示意图;FIG. 9 is another schematic diagram of simulation provided by the embodiment of the present application;
图10为本申请实施例提供的又一种仿真示意图;FIG. 10 is another schematic diagram of simulation provided by the embodiment of the present application;
图11为本申请实施例提供的另一种仿真示意图;FIG. 11 is another schematic diagram of simulation provided by the embodiment of the present application;
图12是本申请实施例提供的一种图像处理装置的示意性框图;Fig. 12 is a schematic block diagram of an image processing device provided by an embodiment of the present application;
图13是本申请实施例提供的另一种图像处理装置的示意性框图。Fig. 13 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below with reference to the accompanying drawings.
为了更好的理解本申请实施例,本申请实施例先做出以下说明:In order to better understand the embodiment of this application, the embodiment of this application first makes the following description:
第一,本申请实施例中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。First, the term "and/or" in the embodiment of the present application is only an association relationship describing associated objects, which means that there may be three relationships, for example, A and/or B, which can mean: A exists alone, and at the same time There are A and B, and there are three cases of B alone, where A and B can be singular or plural. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship, but it may also indicate an "and/or" relationship, which can be understood by referring to the context.
第二,本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。Second, in the embodiments of the present application, "at least one" means one or more, and "multiple" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
示例性地,图1示出了一种车辆100的示意图。如图1所示,车辆100包括摄像头101、显示设备102以及移动数据计算平台(mobile data center,MDC)103。摄像头101与显示设备102之间、摄像头101与MDC 103之间均可以通过网络进行通信。车辆100可以包括摄像头101在内的多个摄像头,本申请实施例仅以摄像头101为例进行说明,但对此并不作限定。Exemplarily, FIG. 1 shows a schematic diagram of a vehicle 100 . As shown in FIG. 1 , a vehicle 100 includes a camera 101 , a display device 102 and a mobile data computing platform (mobile data center, MDC) 103 . Communication between the camera 101 and the display device 102, and between the camera 101 and the MDC 103 can be performed through a network. The vehicle 100 may include multiple cameras including the camera 101 , and the embodiment of the present application only uses the camera 101 as an example for illustration, but this is not limited thereto.
应理解,车辆100的部分或所有功能可以由MDC 103控制。MDC 103可以包括至少一个处理器,处理器可以执行存储在例如存储器这样的计算机可读介质中的指令。在一些实施例中,MDC 103还可以是采用分布式方式控制车辆100的个体组件或子***的多个计算设备。处理器可以是任何常规的处理器,诸如中央处理单元(central processing unit,CPU)。替选地,处理器还可以包括诸如图像处理器(graphic process unit,GPU),现场可编程门阵列(field programmable gate array,FPGA)、片上***(system on chip,SOC)、专用集成芯片(application specific integrated circuit,ASIC)或它们的组合。It should be understood that some or all functions of the vehicle 100 may be controlled by the MDC 103. MDC 103 can include at least one processor that can execute instructions stored in a computer-readable medium such as memory. In some embodiments, the MDC 103 may also be a plurality of computing devices that control individual components or subsystems of the vehicle 100 in a distributed manner. The processor can be any conventional processor, such as a central processing unit (central processing unit, CPU). Alternatively, the processor may also include a graphics processor (graphic process unit, GPU), a field programmable gate array (field programmable gate array, FPGA), a system on chip (system on chip, SOC), an application-specific integrated chip (application specific integrated circuit, ASIC) or their combination.
除了指令以外,存储器还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和MDC 103使用。In addition to instructions, memory may also store data such as road maps, route information, the vehicle's position, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the MDC 103 during operation of the vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
应理解,图1中车辆的结构不应理解为对本申请实施例的限制。可选地,上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车等,本申请实施例不做特别的限定。It should be understood that the structure of the vehicle in Fig. 1 should not be construed as limiting the embodiment of the present application. Optionally, the above-mentioned vehicle 100 may be a car, truck, motorcycle, bus, boat, plane, helicopter, lawn mower, recreational vehicle, playground vehicle, construction equipment, tram, golf cart, train, etc. The embodiments of the present application do not make special limitations.
上述摄像头101可以采集图像或视频,并通过网络向显示设备102和MDC 103发送图像或视频。显示设备102接收该图像或视频后,可以显示该图像或视频,用于用户观看。MDC 103接收该图像或视频后,可以对该图像或视频进行目标识别、语义分割、交通灯检测以及车道线检测等处理。The above-mentioned camera 101 can collect images or videos, and send the images or videos to the display device 102 and the MDC 103 through the network. After receiving the image or video, the display device 102 may display the image or video for viewing by the user. After the MDC 103 receives the image or video, it can perform target recognition, semantic segmentation, traffic light detection and lane line detection on the image or video.
但随着摄像头分辨率、帧率、采样深度等的不断提高,输出的视频对传输带宽需求越来越大。为缓解传输网络的压力,一般采用压缩的方法来降低带宽需求。However, with the continuous improvement of camera resolution, frame rate, sampling depth, etc., the output video requires more and more transmission bandwidth. In order to alleviate the pressure on the transmission network, compression is generally used to reduce bandwidth requirements.
示例性地,摄像头可以输出贝叶尔原始(bayer raw)图像或视频,贝叶尔原始图像或视频的精度较高,对传输带宽的要求较高。例如,摄像头可以输出帧率为30每秒 传输帧数(frame per second,fps)、采样深度为16比特位深(bitdepth)、分辨率为4K(即4096×2160)的超高清(ultra high definition,UHD)视频,摄像头传输该视频需要的带宽高达4交换带宽(Gbps),即4K*2K*30*16。Exemplarily, the camera may output a Bayer raw image or video. The Bayer raw image or video has higher precision and has higher requirements on transmission bandwidth. For example, the camera can output a frame rate of 30 frames per second (frame per second, fps), a sampling depth of 16 bit depth (bitdepth), and a resolution of 4K (that is, 4096×2160) ultra high definition (ultra high definition) , UHD) video, the bandwidth required by the camera to transmit the video is as high as 4 switching bandwidth (Gbps), that is, 4K*2K*30*16.
因此,为了缓解传输网络的压力,摄像头在传输贝叶尔原始图像或视频之前,可以对图像或视频进行压缩处理。Therefore, in order to relieve the pressure on the transmission network, the camera can compress the image or video before transmitting the original Bayer image or video.
示例性地,图2示出了一种图像压缩的示意性框图,如图2所示,摄像头101输出贝叶尔原始图像,贝叶尔原始图像经过图像信号处理(image signal processing,ISP)模块201可以得到红绿蓝(red green blue,RGB)图像,RGB图像的精度远低于摄像头输出的原始图像的精度,由此可以降低网络传输的带宽需求。该RGB图像经过编解码器202(encoder and decoder,CODEC)可以得到重建图像,即编码压缩后,再通过解码重建得到重建图像。若该重建图像传输到MDC 103,可以用于目标检测、语义分割、红绿灯以及车道线的检测等任务。若该重建图像传输到显示设备102,可以用于显示以供用户观看。Exemplarily, FIG. 2 shows a schematic block diagram of image compression. As shown in FIG. 2 , the camera 101 outputs a Bayer original image, and the Bayer original image is processed through an image signal processing (image signal processing, ISP) module The 201 can obtain red green blue (RGB) images, and the accuracy of the RGB images is much lower than that of the original image output by the camera, thereby reducing the bandwidth requirements for network transmission. The RGB image can be reconstructed through a codec 202 (encoder and decoder, CODEC), that is, after coding and compression, the reconstructed image can be obtained through decoding and reconstruction. If the reconstructed image is transmitted to the MDC 103, it can be used for tasks such as object detection, semantic segmentation, detection of traffic lights and lane lines. If the reconstructed image is transmitted to the display device 102, it can be displayed for viewing by the user.
虽然压缩技术可以降低图像或视频传输对带宽的需求,但不可避免地会导致图像或视频质量损伤,同时,ISP处理也会导致图像或视频的信息受损。图像或视频质量损伤后会造成人工智能任务的正确率降低,还会造成图像或视频的清晰度和分辨率降低,影响用户观看,进而影响用户体验。因此,上述图2中,经解码重建得到的重建图像的质量好坏对于后续的任务具有关键意义,因此重建图像具有较高质量则尤为重要。Although compression technology can reduce the bandwidth requirements of image or video transmission, it will inevitably cause image or video quality damage, and at the same time, ISP processing will also cause image or video information damage. Impaired image or video quality will reduce the accuracy of artificial intelligence tasks, as well as reduce the clarity and resolution of images or videos, affecting users' viewing and thus affecting user experience. Therefore, in the above-mentioned FIG. 2 , the quality of the reconstructed image obtained through decoding and reconstruction is of key significance for subsequent tasks, so it is particularly important that the reconstructed image has a high quality.
自动驾驶视觉感知对重建图像的质量需求为满足人眼视觉的需求和机器视觉需求。The quality requirements of the reconstructed image for the visual perception of autonomous driving are to meet the requirements of human vision and machine vision.
示例性地,自动驾驶视觉感知对重建图像的质量需求可以包括:1)面向机器视觉:自动驾驶识别图像中的目标;2)适应多种机器视觉任务,包括目标检测、语义分割,红绿灯/车道线检测等;3)人眼视觉:自动驾驶采集的图像提供给人眼观看。Exemplarily, the quality requirements of automatic driving visual perception for reconstructed images may include: 1) machine vision-oriented: automatic driving recognizes objects in images; 2) adapting to various machine vision tasks, including object detection, semantic segmentation, traffic lights/lanes Line detection, etc.; 3) Human vision: The images collected by automatic driving are provided to human eyes.
但目前,机器视觉常用的客观评价指标未考虑人眼视觉的需求,人眼视觉常用的客观评价指标未考虑机器视觉的需求,即现有的评价指标无法兼顾人眼视觉的需求和机器视觉的需求。But at present, the objective evaluation indicators commonly used in machine vision do not consider the needs of human vision, and the objective evaluation indicators commonly used in human vision do not consider the needs of machine vision, that is, the existing evaluation indicators cannot take into account the needs of human vision and machine vision. need.
示例性地,从某种程度上来说,上述目标检测、语义分割和红绿灯、车道线的检测等任务可以称为机器视觉任务,即重建图像是直接由机器***进行处理,因此重建图像主要满足可以被机器***快速识别和检测的需求。多种面向机器视觉的评价指标,用于评价重建图像的质量,包括图像分类评价指标如准确率(accuracy,Acc)、目标检测评价指标如目标平均精度(mean average precision,mAP)、语义分割评价指标如平均交并比(mean intersection over union,mIoU)和车道线检测评价指标如Acc等。For example, to some extent, the above-mentioned tasks such as target detection, semantic segmentation, detection of traffic lights, and lane lines can be called machine vision tasks, that is, the reconstructed image is directly processed by the machine system, so the reconstructed image mainly meets the requirements of The need to be quickly identified and tested by machine systems. A variety of machine vision-oriented evaluation indicators are used to evaluate the quality of reconstructed images, including image classification evaluation indicators such as accuracy (accuracy, Acc), target detection evaluation indicators such as target average precision (mean average precision, mAP), semantic segmentation evaluation Indicators such as mean intersection over union (mIoU) and lane line detection evaluation indicators such as Acc, etc.
使用上述评价指标针对相应的单个机器视觉任务对CODEC进行优化可以取得较好的效果,但如此优化后的CODEC会与相应的单个机器视觉任务耦合,而无法适配多种任务场景,例如,使用图像分类评价指标如Acc对CODEC进行优化,优化后的CODEC输出的重建图像对于图像分类任务具有较好的效果,但对于目标检测、语义分割和车道线检测等任务依然效果较差,无法做到任务泛化的评价,同时,也没有考虑对人眼视觉的影响。Using the above evaluation indicators to optimize the CODEC for the corresponding single machine vision task can achieve better results, but the optimized CODEC will be coupled with the corresponding single machine vision task, and cannot adapt to multiple task scenarios. For example, using The image classification evaluation index such as Acc optimizes CODEC, and the reconstructed image output by the optimized CODEC has a good effect on image classification tasks, but it still has poor effects on tasks such as object detection, semantic segmentation, and lane line detection. The evaluation of task generalization, meanwhile, does not consider the impact on human vision.
除了上述面向机器视觉的评价指标,还包括面向人眼视觉的评价指标,包括峰值 信噪比(peak signal noise ratio,PSNR)、多尺度结构相似度(multiscale structural similarity,MSSSIM)、学习感知图块相似度(learned perceptual image patch similarity,LPIPS)、深度图像结构纹理相似性(depth image structural texture similarity,DISTS)等评价指标,根据面向人眼视觉的评价指标优化的CODEC所输出的重建图像可以更加符合人眼主观感受,例如,将重建图像直接显示在车辆内部的显示屏上以供驾驶员观看,因此重建图像需要具有更高的清晰度以及便于人眼观看等特点。In addition to the above-mentioned evaluation indicators for machine vision, it also includes evaluation indicators for human vision, including peak signal-to-noise ratio (peak signal noise ratio, PSNR), multiscale structural similarity (multiscale structural similarity, MSSSIM), learning perception block Similarity (learned perceptual image patch similarity, LPIPS), depth image structure texture similarity (depth image structural texture similarity, DISTS) and other evaluation indicators, the reconstructed image output by CODEC optimized according to the evaluation indicators for human vision can be more in line with The subjective perception of the human eye, for example, the reconstructed image is directly displayed on the display screen inside the vehicle for the driver to watch, so the reconstructed image needs to have higher definition and be easy for human eyes to watch.
然而实际上,人眼的复杂性难以用一个评价指标来满足,而现有的面向人眼视觉的评价指标均有其各自的不足,以LPIPS为例,使用LPIPS评价指标评价重建图像质量时需要计算网络的全部卷积层,计算复杂度高;此外由于网络涉及池化和下采样,评价过程中导致图像信息丢失,并且评价过程中仅使用了低分辨率的特征图,难以做出准确的评价结果,而不准确的评价结果也难以保证CODEC的优化结果,因此难以保证重建图像的质量。However, in reality, the complexity of the human eye is difficult to satisfy with an evaluation index, and the existing evaluation indexes for human vision have their own shortcomings. Taking LPIPS as an example, when using the LPIPS evaluation index to evaluate the quality of the reconstructed image, it needs Computing all the convolutional layers of the network has high computational complexity; in addition, since the network involves pooling and downsampling, image information is lost during the evaluation process, and only low-resolution feature maps are used in the evaluation process, making it difficult to make accurate Evaluation results, and inaccurate evaluation results are also difficult to guarantee the optimization results of CODEC, so it is difficult to guarantee the quality of reconstructed images.
综上所述,重建图像存在三个问题:To sum up, there are three problems in reconstructing images:
1)ISP处理会导致图像或视频的信息受损。1) ISP processing will cause image or video information to be damaged.
2)经解码器重建得到的重建图像的质量好坏对于后续的任务具有关键意义,因此重建图像具有较高质量则尤为重要。2) The quality of the reconstructed image reconstructed by the decoder is of key significance for subsequent tasks, so it is particularly important to have a high quality reconstructed image.
3)现有的图像质量评价指标仅针对单个机器视觉任务对CODEC进行优化,优化后的CODEC会与相应的单个机器视觉任务耦合,而无法适配多种机器视觉任务,同时,未考虑对人眼视觉的影响。3) The existing image quality evaluation index only optimizes the CODEC for a single machine vision task, and the optimized CODEC will be coupled with the corresponding single machine vision task, and cannot adapt to a variety of machine vision tasks. Effects on eye vision.
有鉴于此,本申请实施例提供一种图像处理方法和图像处理装置,用于获取重建图像和原始图像的相似度,以评价重建图像的质量,从而指导优化编码器和/或解码器,同时,重建图像既可以满足多种机器视觉任务的需求又可以满足人眼视觉的需求,另外,也可以避免ISP处理造成的信息受损。In view of this, the embodiment of the present application provides an image processing method and an image processing device, which are used to obtain the similarity between the reconstructed image and the original image, so as to evaluate the quality of the reconstructed image, so as to guide the optimization of the encoder and/or decoder, and at the same time , the reconstructed image can not only meet the needs of various machine vision tasks but also meet the needs of human vision. In addition, it can also avoid information damage caused by ISP processing.
需要说明的是,本申请提出的技术方案,提供一种“面向机器、任务解耦、兼顾人眼”的评价指标,以此计算重建图像和原始图像的相似度,以评价重建图像的质量,从而指导优化编码器和/或解码器。It should be noted that the technical solution proposed in this application provides an evaluation index of “machine-oriented, task decoupling, and human eyes” to calculate the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image. Thereby guiding the optimization of the encoder and/or decoder.
还需要说明的是,本申请提出的技术方案可以应用于车联网,如车-万物(vehicle to everything,V2X)、车间通信长期演进技术(long term evolution-vehicle,LTE-V)、车辆-车辆(vehicle to vehicle,V2V)等。例如可以应用于具有驾驶移动功能的车辆,或者车辆中具有驾驶移动功能的其它装置。该其它装置包括但不限于:车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或车载摄像头等其他传感器,车辆可通过该车载终端、车载控制器、车载模块、车载模组、车载部件、车载芯片、车载单元、车载雷达或车载摄像头,实施本申请实施例提供的车辆控制方法。当然,本申请实施例中的方案还可以用于除了车辆之外的其它具有移动控制功能的智能终端,或设置在除了车辆之外的其它具有移动控制功能的智能终端中,或设置于该智能终端的部件中。该智能终端可以为智能运输设备、智能家居设备、机器人等。例如包括但不限于智能终端或智能终端内的控制器、芯片、雷达或摄像头等其它传感器、以及其它部件等。It should also be noted that the technical solutions proposed in this application can be applied to the Internet of Vehicles, such as vehicle-to-everything (V2X), long-term evolution-vehicle (LTE-V), vehicle-to-vehicle (vehicle to vehicle, V2V) and so on. For example, it can be applied to a vehicle with a driving movement function, or other devices in a vehicle with a driving movement function. The other devices include but are not limited to: vehicle-mounted terminals, vehicle-mounted controllers, vehicle-mounted modules, vehicle-mounted modules, vehicle-mounted components, vehicle-mounted chips, vehicle-mounted units, vehicle-mounted radars, or vehicle-mounted cameras. , a vehicle-mounted module, a vehicle-mounted module, a vehicle-mounted component, a vehicle-mounted chip, a vehicle-mounted unit, a vehicle-mounted radar, or a vehicle-mounted camera, implement the vehicle control method provided by the embodiment of the present application. Of course, the solutions in the embodiments of the present application can also be used in smart terminals with mobile control functions other than vehicles, or set in smart terminals with mobile control functions other than vehicles, or set in the smart terminal with mobile control functions. part of the terminal. The smart terminal may be a smart transportation device, a smart home device, a robot, and the like. For example, it includes but is not limited to smart terminals or controllers, chips, radars or cameras and other sensors in the smart terminals, and other components.
本申请实施例提供的技术方案可以应用于上述图2所示的图像压缩场景,图2所 示的图像压缩场景中,ISP模块201和CODEC 202是独立部署的,可以面向机器视觉,提高机器视觉任务的检测精度,同时,机器视觉任务模型无需改动(即MDC102无需改变),此外,解码器重建的图像还可以兼顾人眼视觉,对人眼视觉友好。本申请实施例提供的技术方案除可以应用于上述图2所示的图像压缩场景外,还可以应用于ISP和CODEC合一部署的图像压缩场景。其中,ISP和CODEC合一部署的图像压缩场景可以解决上述问题1)。The technical solution provided by the embodiment of the present application can be applied to the image compression scenario shown in Figure 2 above. In the image compression scenario shown in Figure 2, the ISP module 201 and CODEC 202 are independently deployed, which can be oriented to machine vision and improve machine vision. At the same time, the machine vision task model does not need to be changed (that is, the MDC102 does not need to be changed). In addition, the image reconstructed by the decoder can also take into account human vision and is friendly to human vision. The technical solution provided by the embodiment of the present application can be applied not only to the image compression scenario shown in FIG. 2 above, but also to the image compression scenario where the ISP and the CODEC are deployed together. Among them, the image compression scenario deployed by ISP and CODEC can solve the above problem 1).
示例性地,图3示出了另一种图像压缩的示意性框图。如图3所示,摄像头101输出贝叶尔原始图像,贝叶尔原始图像经过CODEC 301可以得到重建图像。CODEC 301可以对贝叶尔原始图像进行图像信号处理、压缩以及重建,得到重建图像。若该重建图像传输到MDC 103,可以用于目标检测、语义分割、红绿灯以及车道线的检测等任务。若该重建图像传输到显示设备102,可以用于显示以供用户观看。Exemplarily, Fig. 3 shows a schematic block diagram of another image compression. As shown in FIG. 3 , the camera 101 outputs the Bayer original image, and the Bayer original image can be reconstructed through the CODEC 301 . CODEC 301 can perform image signal processing, compression and reconstruction on Bayer original images to obtain reconstructed images. If the reconstructed image is transmitted to the MDC 103, it can be used for tasks such as object detection, semantic segmentation, detection of traffic lights and lane lines. If the reconstructed image is transmitted to the display device 102, it can be displayed for viewing by the user.
本申请实施例提供的技术方案应用于图3所示的图像压缩场景,可以面向机器视觉,提高机器视觉任务的检测精度,同时,重建图像还可以兼顾人眼视觉,对人眼视觉友好,另外,也可以避免ISP处理造成的信息受损。此外,该图像压缩场景无需单独部署ISP模块,部署简单。The technical solution provided by the embodiment of the present application is applied to the image compression scene shown in Figure 3, which can be oriented to machine vision and improve the detection accuracy of machine vision tasks. At the same time, the reconstructed image can also take into account human vision and is friendly to human vision. , and can also avoid information damage caused by ISP processing. In addition, the image compression scenario does not need to deploy an ISP module separately, and the deployment is simple.
需要说明的是,ISP和CODEC合一部署的图像压缩场景,为本申请提出的。It should be noted that the image compression scenario where the ISP and the CODEC are integrated is proposed by this application.
还需要说明的是,本申请实施例提供的技术方案应用于图3所示的图像压缩场景时,需要借助额外的ISP对贝叶尔原始图像进行处理,得到原始图像,并计算重建图像和原始图像的相似度,以评价重建图像的质量,从而指导优化编码器和/或解码器。It should also be noted that when the technical solution provided by the embodiment of the present application is applied to the image compression scene shown in Figure 3, it is necessary to use an additional ISP to process the Bayer original image to obtain the original image, and calculate the reconstructed image and the original Image similarity to evaluate the quality of the reconstructed image and guide the optimization of the encoder and/or decoder.
图4为本申请实施例提供的一种图像处理方法400的示意性流程图,该方法400可以由图像处理设备执行。该方法400可以应用于上述图2或者图3所示的场景,但本申请实施例并不限于此。FIG. 4 is a schematic flowchart of an image processing method 400 provided by an embodiment of the present application, and the method 400 may be executed by an image processing device. The method 400 may be applied to the scenario shown in FIG. 2 or FIG. 3 above, but this embodiment of the present application is not limited thereto.
如图4所示,该方法400可以包括以下步骤:As shown in FIG. 4, the method 400 may include the following steps:
S401、获取第一图像和第二图像。S401. Acquire a first image and a second image.
第一图像和第二图像是两个不同的图像,当第一图像为原始图像时,第二图像可以为重建图像;当第一图像为重建图像时,第二图像可以为原始图像。The first image and the second image are two different images. When the first image is an original image, the second image may be a reconstructed image; when the first image is a reconstructed image, the second image may be an original image.
示例性地,第一图像可以为上述图2中的RGB图像,第二图像可以为图2中的重建图像。Exemplarily, the first image may be the RGB image in FIG. 2 above, and the second image may be the reconstructed image in FIG. 2 .
示例性地,第一图像可以为上述图2中的重建图像,第二图像可以为图2中的RGB图像。Exemplarily, the first image may be the reconstructed image in FIG. 2 above, and the second image may be the RGB image in FIG. 2 .
图像处理设备获取第一图像和第二图像的方式存在多种可能的实现方式,本申请实施例对此不作限定。There are many possible implementation manners for the manner in which the image processing device acquires the first image and the second image, which is not limited in this embodiment of the present application.
在一种可能的实现方式中,在上述图2所示的场景中,当第一图像为原始图像,第二图像为重建图像时,图像处理设备可以从ISP获取第一图像,从编解码器获取第二图像。In a possible implementation, in the scenario shown in Figure 2 above, when the first image is an original image and the second image is a reconstructed image, the image processing device may obtain the first image from the ISP, and obtain the first image from the codec Get the second image.
在另一种可能的实现方式中,在上述3所示的场景中,当第一图像为原始图像,第二图像为重建图像时,图像处理设备需要借助额外的ISP对贝叶尔原始图像进行处理,得到第一图像。图像处理设备可以从编解码器获取第二图像。In another possible implementation, in the scenario shown in 3 above, when the first image is the original image and the second image is the reconstructed image, the image processing device needs to use an additional ISP to process the Bayer original image processing to obtain the first image. The image processing device may acquire the second image from the codec.
S402、将第一图像和第二图像分别输入至单层卷积神经网络,得到第一图像的第 一尺度的特征图和第二图像的第一尺度的特征图,单层卷积神经网络包括C个卷积核,第一图像的第一尺度的特征图的数量和第二图像的第一尺度的特征图的数量均为C。S402. Input the first image and the second image respectively into the single-layer convolutional neural network to obtain the feature map of the first scale of the first image and the feature map of the first scale of the second image. The single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C.
单层卷积神经网络的参数来自于预训练模型的第一层卷积层的参数,预训练模型即为预先训练好的卷积神经网络模型,例如在大规模训练集ImageNet上训练的Resnet、alexnet、vggnet、regnet等分类模型。The parameters of the single-layer convolutional neural network come from the parameters of the first convolutional layer of the pre-training model. The pre-training model is the pre-trained convolutional neural network model, such as Resnet trained on the large-scale training set ImageNet, Classification models such as alexnet, vggnet, regnet, etc.
单层卷积神经网络可以包括C个卷积核,该C个卷积核中不同的卷积核用于提取不同的特征,得到不同的特征图。这些不同的特征图可以称为第一尺度的特征图。The single-layer convolutional neural network may include C convolution kernels, and different convolution kernels among the C convolution kernels are used to extract different features and obtain different feature maps. These different feature maps can be referred to as first-scale feature maps.
需要说明的是,这里的第一尺度的特征图是指第一图像和第二图像经过一次处理(即经单层卷积神经网络处理)得到的图像。It should be noted that the feature map of the first scale here refers to an image obtained by processing the first image and the second image once (ie, processed by a single-layer convolutional neural network).
当图像处理设备将第一图像输入至单层卷积神经网络,得到第一图像的第一尺度的特征图,第一图像的第一尺度的特征图的数量为C,当图像处理设备将第二图像输入至单层卷积神经网络,得到第二图像的第一尺度的特征图,第二图像的第一尺度的特征图的数量为C。其中,C个卷积核和C个第一图像的第一尺度的特征图是一一对应的,C个卷积核和C个第二图像的第一尺度的特征图是一一对应的。When the image processing device inputs the first image to the single-layer convolutional neural network to obtain the feature map of the first scale of the first image, and the number of feature maps of the first scale of the first image is C, when the image processing device inputs the feature map of the first scale The second image is input to the single-layer convolutional neural network to obtain the feature map of the first scale of the second image, and the number of feature maps of the first scale of the second image is C. Among them, there is a one-to-one correspondence between the C convolution kernels and the first-scale feature maps of the C first images, and there is a one-to-one correspondence between the C convolution kernels and the C first-scale feature maps of the second images.
另外,上述C个卷积核中每个卷积核可以为7*7的卷积核,且步长为1,即单层卷积神经网络不进行下采样,这样得到的第一尺度的特征图具有分辨率高的特点。In addition, each of the above C convolution kernels can be a 7*7 convolution kernel, and the step size is 1, that is, the single-layer convolutional neural network does not perform downsampling, so that the obtained first-scale features The graph is characterized by high resolution.
S403、分别对第一图像的第一尺度的特征图和第二图像的第二尺度的特征图进行M-1次下采样,得到第一图像的M-1个尺度的特征图和第二图像的M-1个尺度的特征图,其中,C和M均为大于1的正整数。S403. Perform M-1 downsampling on the feature map of the first scale of the first image and the feature map of the second scale of the second image respectively, to obtain the feature map of the first image of M-1 scales and the second image The feature map of M-1 scales, where C and M are both positive integers greater than 1.
下采样的具体实现方式可以为平均池化下采样,相比于其他下采样(例如随机下采样)可以减少图像信息的丢失。The specific implementation of downsampling can be average pooling downsampling, which can reduce the loss of image information compared with other downsampling (such as random downsampling).
第一图像的M-1个尺度的特征图和第二图像的M-1个尺度的特征图中每个尺度的特征图的数量均为C。The number of feature maps of each scale in the M-1 scale feature maps of the first image and the M-1 scale feature maps of the second image is C.
图像处理设备可以对第一图像的第一尺度的特征图进行M-1次下采样中的第一次下采样,得到第一图像的第二尺度的特征图,对第一图像的第二尺度的特征图进行M-1次下采样中的第二次下采样,得到第一图像的第三尺度的特征图,依次类推,可以得到第一图像的第M尺度的特征图。The image processing device may perform the first downsampling in M-1 times of downsampling on the feature map of the first scale of the first image to obtain the feature map of the second scale of the first image, and the second scale of the first image The feature map of the M-1 times of downsampling is performed for the second downsampling to obtain the feature map of the third scale of the first image, and so on, the feature map of the Mth scale of the first image can be obtained.
应理解,图像处理设备对第一图像的第一尺度的特征图进行M-1次下采样,得到第一图像的M-1个尺度的特征图。同理,图像处理设备可以得到第二图像的M-1个尺度的特征图。It should be understood that the image processing device performs M-1 downsampling on the first-scale feature map of the first image to obtain M-1 feature maps of the first image. Similarly, the image processing device can obtain feature maps of M-1 scales of the second image.
还应理解,第一图像的第一尺度的特征图的数量为C,该第一图像的第一尺度的特征图进行第一次下采样后,得到第一图像的第二尺度的特征图,该第一图像的第二尺度的特征图的数量也为C,同理,第一图像的其他尺度的特征图的数量也为C,即第一图像的M-1个尺度的特征图中每一个尺度的特征图的数量均为C。同理,第二图像的M-1个尺度的特征图中每一个尺度的特征图的数量也均为C。It should also be understood that the number of feature maps of the first scale of the first image is C, and after the first downsampling is performed on the feature maps of the first scale of the first image, the feature maps of the second scale of the first image are obtained, The number of feature maps of the second scale of the first image is also C, similarly, the number of feature maps of other scales of the first image is also C, that is, each feature map of M-1 scales of the first image The number of feature maps of a scale is C. Similarly, the number of feature maps of each scale in the M-1 scale feature maps of the second image is also C.
需要说明的是,第一图像的第一尺度的特征图中第一个特征图进行下采样,可以得到第一图像的第二尺度的特征图中第一个特征图,其他同理,此处不再赘述。It should be noted that the first feature map in the feature map of the first scale of the first image is down-sampled, and the first feature map of the feature map of the second scale of the first image can be obtained, and the other is the same, here No longer.
示例性地,图5示出了一种下采样的示意性框图。如图5所示,第一图像和第二图像经过单层卷积神经网络得到第一图像的第一尺度的特征图和第二图像的第一尺度 的特征图,第一图像的第一尺度的特征图和第二图像的第一尺度的特征图经过第一次下采样,得到第一图像的第二尺度的特征图和第二图像的第二尺度的特征图,第一图像的第二尺度的特征图和第二图像的第二尺度的特征图经过第二次下采样,得到第一图像的第三尺度的特征图和第二图像的第三尺度的特征图,依次类推,经过第M-1次下采样,可以得到第一图像的第M尺度的特征图和第二图像的第M尺度的特征图。若M=5,C=64,图像处理设备通过S403可以得到第一图像的4个尺度的特征图,分别为第一图像的第二尺度的特征图、第一图像的第三尺度的特征图、第一图像的第四尺度的特征图、第一图像的第五尺度的特征图,也可以得到第二图像的4个尺度的特征图,分别为第二图像的第二尺度的特征图、第二图像的第三尺度的特征图、第二图像的第四尺度的特征图、第二图像的第五尺度的特征图,同时,第一图像的4个尺度的特征图和第二图像的4个尺度的特征图中每个尺度的特征图均包括64个特征图。Exemplarily, Fig. 5 shows a schematic block diagram of downsampling. As shown in Figure 5, the first image and the second image pass through a single-layer convolutional neural network to obtain the first-scale feature map of the first image and the first-scale feature map of the second image, and the first-scale feature map of the first image The feature map of the first image and the feature map of the first scale of the second image are down-sampled for the first time to obtain the feature map of the second scale of the first image and the feature map of the second scale of the second image, and the second scale of the first image The feature map of the scale and the feature map of the second scale of the second image are down-sampled for the second time to obtain the feature map of the third scale of the first image and the feature map of the third scale of the second image, and so on. M-1 times of downsampling, the feature map of the Mth scale of the first image and the feature map of the Mth scale of the second image can be obtained. If M=5, C=64, the image processing device can obtain feature maps of four scales of the first image through S403, which are feature maps of the second scale of the first image and feature maps of the third scale of the first image , the feature map of the fourth scale of the first image, the feature map of the fifth scale of the first image, and the feature maps of the 4 scales of the second image, which are respectively the feature map of the second scale of the second image, The feature map of the third scale of the second image, the feature map of the fourth scale of the second image, the feature map of the fifth scale of the second image, and at the same time, the feature map of the 4 scales of the first image and the feature map of the second image The feature maps of the 4 scales each include 64 feature maps.
S404、根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度,权重矩阵包括不同尺度的不同特征图的权重系数。S404. According to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, and the feature map of M-1 scales of the second image And a weight matrix related to the M scales and the number C of feature maps to determine the similarity of the first image relative to the second image, the weight matrix includes weight coefficients for different feature maps of different scales.
权重矩阵的维度为C×M,权重矩阵包括不同尺度的不同特征图的权重系数,例如,第一尺度第四特征图的权重系数为权重矩阵中第4行第1列的数值。应理解,权重矩阵中的数值可以理解为权重系数。The dimension of the weight matrix is C×M, and the weight matrix includes weight coefficients of different feature maps of different scales. For example, the weight coefficient of the fourth feature map of the first scale is the value of the fourth row and the first column in the weight matrix. It should be understood that the values in the weight matrix can be understood as weight coefficients.
需要说明的是,权重矩阵可以理解为不同尺度的不同特征图的权重系数组成的矩阵。权重矩阵只是一种表现形式,不同尺度的不同特征图的权重系数也可以表示成其他形式,例如,图表、文字等形式,本申请实施例对此不作限定。It should be noted that the weight matrix can be understood as a matrix composed of weight coefficients of different feature maps of different scales. The weight matrix is only one form of expression, and the weight coefficients of different feature maps of different scales can also be expressed in other forms, such as charts, texts, etc., which are not limited in this embodiment of the present application.
本申请实施例提供的图像处理方法,利用单层卷积神经网络获取第一尺度的特征图,只需单层卷积计算,计算简单,通过单层卷积提取底层通用特征,有利于保证图像多任务的泛化能力,第一尺度的特征图分辨率较高,可以保证图像的质量,同时,对分辨率高的特征图进行多次下采样,可以得到更多的图像特征信息,另外,利用权重矩阵为不同尺度的不同特征图分配权重,灵活调节图像特征和尺度维度上的权重值,优化机器视觉性能,实现高效的面向机器视觉的图像编码压缩性能,有利于利用第一图像相对于第二图像的相似度评价第一图像的质量。The image processing method provided by the embodiment of the present application uses a single-layer convolutional neural network to obtain the feature map of the first scale, and only needs to be calculated by a single-layer convolution, which is simple to calculate, and extracts the underlying general features through a single-layer convolution, which is beneficial to ensure that the image Multi-task generalization ability, the feature map of the first scale has a higher resolution, which can ensure the quality of the image. At the same time, multiple downsampling of the feature map with high resolution can obtain more image feature information. In addition, Use the weight matrix to assign weights to different feature maps of different scales, flexibly adjust the weight values of image features and scale dimensions, optimize machine vision performance, and achieve efficient image coding and compression performance for machine vision, which is beneficial to use the first image relative to The similarity of the second image evaluates the quality of the first image.
需要说明的是,当本申请实施例应用于上述图2所示的场景时,第一图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第一图像进行压缩和重建以输出第二图像,相似度用于优化编解码器。It should be noted that when the embodiment of the present application is applied to the scene shown in FIG. 2 above, the first image is the input of the codec, the second image is the output of the codec, and the codec is used to process the first image. Compressed and reconstructed to output the second image, the similarity is used to optimize the codec.
当本申请实施例应用于上述图3所示的场景时,第一图像为第三图像经过图像信号处理后的图像,也可以称为原始图像;第三图像为编解码器的输入,第二图像为编解码器的输出,也可以称为重建图像;编解码器用于对第三图像进行图像信号处理、压缩以及重建以输出第二图像,图像处理设备可以使用上述方法400得到原始图像相对于重建图像的相似度,以用于优化编解码器。When the embodiment of the present application is applied to the scene shown in FIG. 3 above, the first image is an image processed by the image signal of the third image, which can also be called an original image; the third image is the input of the codec, and the second The image is the output of the codec, which can also be referred to as a reconstructed image; the codec is used to perform image signal processing, compression, and reconstruction on the third image to output the second image, and the image processing device can use the above method 400 to obtain the original image relative to Similarity of reconstructed images for use in optimizing codecs.
本申请提供的图像处理方法,可以用于获取重建图像和原始图像的相似度以评价重建图像的质量,从而指导优化编解码器。The image processing method provided in this application can be used to obtain the similarity between the reconstructed image and the original image to evaluate the quality of the reconstructed image, so as to guide the optimization of the codec.
作为一个可选的实施例,在上述S404、根据第一图像的第一尺度的特征图、第二 图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度之前,方法400还包括:根据第一尺度的特征图的峰值信噪比,确定第一尺度的特征图的权重系数,第一尺度的特征图包括第一图像的第一尺度的特征图和第二图像的第一尺度的特征图;根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度,包括:根据第一尺度的特征图的权重系数、第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及权重矩阵,确定相似度。As an optional embodiment, in the above S404, according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, the second The feature maps of the M-1 scales of the image and the weight matrix related to the M scales and the number C of the feature maps, before determining the similarity of the first image with respect to the second image, the method 400 further includes: according to the first scale The peak signal-to-noise ratio of the feature map determines the weight coefficient of the feature map of the first scale, and the feature map of the first scale includes the feature map of the first scale of the first image and the feature map of the first scale of the second image; according to the first scale The feature map of the first scale of an image, the feature map of the first scale of the second image, the feature map of the M-1 scale of the first image, the feature map of the M-1 scale of the second image, and M The scale and the weight matrix related to the number C of feature maps determine the similarity between the first image and the second image, including: according to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the second The feature maps of the first scale of the two images, the feature maps of M-1 scales of the first image, the feature maps of M-1 scales of the second image, and the weight matrix determine the similarity.
第一图像的第一尺度的特征图和第二图像的第一尺度的特征图的数量均为C,则第一尺度的特征图的峰值信噪比的数量为C,图像处理设备可以根据第一尺度的特征图的峰值信噪比,得到第一尺度的特征图中每个特征图的权重系数。图像处理设备可以根据第一尺度的特征图的权重系数、第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及权重矩阵,确定相似度。The number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C, then the number of peak signal-to-noise ratios of the feature maps of the first scale is C, and the image processing device can be based on the first scale The peak signal-to-noise ratio of the feature map of the first scale is used to obtain the weight coefficient of each feature map in the feature map of the first scale. The image processing device may be based on the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, The M-1 scale feature maps and the weight matrix of the second image are used to determine the similarity.
应理解,第一尺度的特征图有两个权重系数,一个权重系数是根据峰值信噪比确定的,另一个是权重矩阵中第一列的数值。It should be understood that the feature map of the first scale has two weight coefficients, one weight coefficient is determined according to the peak signal-to-noise ratio, and the other is the value in the first column of the weight matrix.
本申请实施例提供的图像处理方法,第一尺度的特征图的峰值信噪比可以使第一图像和第二图像的颜色、亮度等特征保持一致,基于第一尺度的特征图的峰值信噪比,得到第一尺度的特征图中每个特征图的权重系数,利用权重系数对每个特征图进行加权,可以增加特征图的细节效果,提高人眼视觉和机器视觉性能,兼顾人眼视觉的需求和机器视觉的需求,即可以解决上述问题3)。In the image processing method provided in the embodiment of the present application, the peak signal-to-noise ratio of the feature map of the first scale can make the color, brightness and other characteristics of the first image and the second image consistent, based on the peak signal-to-noise ratio of the feature map of the first scale The weight coefficient of each feature map in the feature map of the first scale is obtained, and each feature map is weighted by using the weight coefficient, which can increase the detail effect of the feature map, improve the performance of human vision and machine vision, and take into account human vision. The needs of the machine vision and the needs of the machine vision can solve the above problems 3).
作为一个可选的实施例,上述相似度可以通过下列公式得到:As an optional embodiment, the above similarity can be obtained by the following formula:
Figure PCTCN2022076277-appb-000017
Figure PCTCN2022076277-appb-000017
Figure PCTCN2022076277-appb-000018
Figure PCTCN2022076277-appb-000018
Figure PCTCN2022076277-appb-000019
Figure PCTCN2022076277-appb-000019
Figure PCTCN2022076277-appb-000020
Figure PCTCN2022076277-appb-000020
其中,ASSIM为相似度,x为第一图像,y为第二图像,f()为单层卷积神经网络的卷积操作,i为单层卷积神经网络中的第i个卷积核,1≤i≤C,j为第j个尺度,1≤j≤M,w i(f 1 i(x),f 1 i(y))为第一尺度的特征图的权重系数,
Figure PCTCN2022076277-appb-000021
为第j个尺度的第i个特征图的结构相似性,β ij为权重矩阵中第i行第j列对应的权重系数,f 1 i(x)为第一图像的第一尺度的特征图中的第i个特征图,f 1 i(y)为第二图像的第一尺度的特 征图中的第i个特征图,PSNR i为第一尺度第i个特征图的峰值信噪比,α为常数,
Figure PCTCN2022076277-appb-000022
为第一图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000023
为第二图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000024
为第一图像的第j个尺度的第i个特征图和第二图像的第j个尺度的第i个特征图的协方差,max(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最大值,min(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最小值,H为第一图像或第二图像的高度,W为第一图像或第二图像的宽度。
Among them, ASSIM is the similarity, x is the first image, y is the second image, f() is the convolution operation of the single-layer convolutional neural network, and i is the i-th convolution kernel in the single-layer convolutional neural network , 1≤i≤C, j is the jth scale, 1≤j≤M, w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale,
Figure PCTCN2022076277-appb-000021
is the structural similarity of the i-th feature map of the j-th scale, β ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix, and f 1 i (x) is the feature map of the first scale of the first image The i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image, PSNR i is the peak signal-to-noise ratio of the i-th feature map in the first scale, α is a constant,
Figure PCTCN2022076277-appb-000022
is the variance of the i-th feature map of the j-th scale of the first image,
Figure PCTCN2022076277-appb-000023
is the variance of the i-th feature map of the j-th scale of the second image,
Figure PCTCN2022076277-appb-000024
is the covariance of the i-th feature map of the j-th scale of the first image and the i-th feature map of the j-th scale of the second image, max(f 1 i (x), f 1 i (y)) is the maximum value of the pixel value in the feature map of the first scale of the first image and the second image, min(f 1 i (x), f 1 i (y)) is the first scale of the first image and the second image The minimum value of the pixel value in the feature map, H is the height of the first image or the second image, W is the width of the first image or the second image.
i为单层卷积神经网络中的第i个卷积核,也可以称为第i个通道,本申请实施例对此不作限定。i is the i-th convolution kernel in the single-layer convolutional neural network, which may also be referred to as the i-th channel, which is not limited in this embodiment of the present application.
ASSIM也可以称为自适应结构相似性(adaptive structure similarity,ASSIM)指标,应理解,该名称仅仅为一个示例,本申请实施例对此不作限定。The ASSIM may also be called an adaptive structure similarity (ASSIM) index, and it should be understood that this name is only an example, which is not limited in this embodiment of the present application.
可选地,上述C可以为64,M可以为5,α可以为0.1,单层卷积神经网络可以包括64个7*7的卷积核,步长为1,即单层卷积神经网络不进行下采样,这样得到的第一尺度的特征图具有分辨率高的特点。Optionally, the above C can be 64, M can be 5, α can be 0.1, the single-layer convolutional neural network can include 64 convolution kernels of 7*7, and the step size is 1, that is, the single-layer convolutional neural network Without downsampling, the feature map of the first scale obtained in this way has the characteristics of high resolution.
示例性地,图6示出了一种计算相似度的示意图,如图6所示,单层卷积神经网络包括C=64个7*7的卷积核,步长为1,M为5,第一图像的第一尺度的特征图为f 1 i(x),第一图像的第二尺度的特征图为
Figure PCTCN2022076277-appb-000025
第一图像的第三尺度的特征图为
Figure PCTCN2022076277-appb-000026
第一图像的第四尺度的特征图为
Figure PCTCN2022076277-appb-000027
第一图像的第五尺度的特征图为
Figure PCTCN2022076277-appb-000028
第二图像的第一尺度的特征图为f 1 i(y),第二图像的第二尺度的特征图为
Figure PCTCN2022076277-appb-000029
第二图像的第三尺度的特征图为
Figure PCTCN2022076277-appb-000030
第二图像的第四尺度的特征图为
Figure PCTCN2022076277-appb-000031
第二图像的第五尺度的特征图为
Figure PCTCN2022076277-appb-000032
图像处理设备可以根据f 1 i(x)和f 1 i(y)计算第一尺度的特征图的权重系数w i(f 1 i(x),f 1 i(y))和第一尺度的结构相似性
Figure PCTCN2022076277-appb-000033
可以根据
Figure PCTCN2022076277-appb-000034
Figure PCTCN2022076277-appb-000035
计算第二尺度的结构相似性
Figure PCTCN2022076277-appb-000036
可以根据
Figure PCTCN2022076277-appb-000037
Figure PCTCN2022076277-appb-000038
计算第三尺度的结构相似性
Figure PCTCN2022076277-appb-000039
可以根据
Figure PCTCN2022076277-appb-000040
Figure PCTCN2022076277-appb-000041
计算第四尺度的结构相似性
Figure PCTCN2022076277-appb-000042
可以根据
Figure PCTCN2022076277-appb-000043
Figure PCTCN2022076277-appb-000044
计算第五尺度的结构相似性
Figure PCTCN2022076277-appb-000045
并根据w i(f 1 i(x),f 1 i(y))、
Figure PCTCN2022076277-appb-000046
Figure PCTCN2022076277-appb-000047
以及
Figure PCTCN2022076277-appb-000048
利用上述公式得到相似度ASSIM。其中,1≤i≤C,1≤j≤M。
Exemplarily, FIG. 6 shows a schematic diagram of calculating similarity. As shown in FIG. 6, the single-layer convolutional neural network includes C=64 convolution kernels of 7*7, the step size is 1, and M is 5 , the feature map of the first scale of the first image is f 1 i (x), and the feature map of the second scale of the first image is
Figure PCTCN2022076277-appb-000025
The feature map of the third scale of the first image is
Figure PCTCN2022076277-appb-000026
The feature map of the fourth scale of the first image is
Figure PCTCN2022076277-appb-000027
The feature map of the fifth scale of the first image is
Figure PCTCN2022076277-appb-000028
The feature map of the first scale of the second image is f 1 i (y), and the feature map of the second scale of the second image is
Figure PCTCN2022076277-appb-000029
The feature map of the third scale of the second image is
Figure PCTCN2022076277-appb-000030
The feature map of the fourth scale of the second image is
Figure PCTCN2022076277-appb-000031
The feature map of the fifth scale of the second image is
Figure PCTCN2022076277-appb-000032
The image processing device can calculate the weight coefficient w i (f 1 i (x), f 1 i (y)) of the feature map of the first scale according to f 1 i (x) and f 1 i (y) and the weight coefficient w i (f 1 i (x), f 1 i (y)) of the first scale structural similarity
Figure PCTCN2022076277-appb-000033
can be based on
Figure PCTCN2022076277-appb-000034
and
Figure PCTCN2022076277-appb-000035
Computing Structural Similarity on the Second Scale
Figure PCTCN2022076277-appb-000036
can be based on
Figure PCTCN2022076277-appb-000037
and
Figure PCTCN2022076277-appb-000038
Computing structural similarity on the third scale
Figure PCTCN2022076277-appb-000039
can be based on
Figure PCTCN2022076277-appb-000040
and
Figure PCTCN2022076277-appb-000041
Computing Structural Similarity at the Fourth Scale
Figure PCTCN2022076277-appb-000042
can be based on
Figure PCTCN2022076277-appb-000043
and
Figure PCTCN2022076277-appb-000044
Computing Structural Similarity at the Fifth Scale
Figure PCTCN2022076277-appb-000045
And according to w i (f 1 i (x),f 1 i (y)),
Figure PCTCN2022076277-appb-000046
Figure PCTCN2022076277-appb-000047
as well as
Figure PCTCN2022076277-appb-000048
Use the above formula to get the similarity ASSIM. Among them, 1≤i≤C, 1≤j≤M.
作为一个可选的实施例,权重矩阵为C行M列的矩阵,权重矩阵中每一行的权重系数用于表示特征图在不同尺度下的权重系数,权重矩阵中每一行的权重系数之和为1。As an optional embodiment, the weight matrix is a matrix with C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, and the sum of the weight coefficients of each row in the weight matrix is 1.
权重矩阵中每一行的权重系数为同一通道对应的特征图(或者同一卷积核对应的特征图)在不同尺度下的权重系数,同一通道对应的特征图在不同尺度的权重系数之和为1。The weight coefficient of each row in the weight matrix is the weight coefficient of the feature map corresponding to the same channel (or the feature map corresponding to the same convolution kernel) at different scales, and the sum of the weight coefficients of the feature maps corresponding to the same channel at different scales is 1 .
示例性地,在上述图6所示的示例中,单层卷积神经网络包括C=64个7*7的卷积核,即64个通道,M为5,则该64个通道中每一个通道对应的特征图包括5个尺度的特征图,例如,该64个通道中第一通道对应的特征图可以包括第一图像的第一尺度的特征图f 1 1(x),第一图像的第二尺度的特征图
Figure PCTCN2022076277-appb-000049
第一图像的第三尺度的特征 图
Figure PCTCN2022076277-appb-000050
第一图像的第四尺度的特征图
Figure PCTCN2022076277-appb-000051
第一图像的第五尺度的特征图
Figure PCTCN2022076277-appb-000052
Exemplarily, in the above-mentioned example shown in Figure 6, the single-layer convolutional neural network includes C=64 convolution kernels of 7*7, that is, 64 channels, and M is 5, then each of the 64 channels The feature map corresponding to the channel includes feature maps of 5 scales. For example, the feature map corresponding to the first channel among the 64 channels may include the feature map f 1 1 (x) of the first scale of the first image, and the feature map of the first image Feature map of the second scale
Figure PCTCN2022076277-appb-000049
Feature map of the third scale of the first image
Figure PCTCN2022076277-appb-000050
Feature map of the fourth scale of the first image
Figure PCTCN2022076277-appb-000051
Feature map of the fifth scale of the first image
Figure PCTCN2022076277-appb-000052
在该情况下,权重矩阵中每一行的权重系数存在多种可能的实现方式。In this case, there are many possible implementations of the weight coefficients of each row in the weight matrix.
在一种可能的实现方式中,权重矩阵中每一行的权重系数的每一个权重系数均相同。In a possible implementation manner, each weight coefficient of each row of weight coefficients in the weight matrix is the same.
示例性地,若M=5,则权重矩阵中每一行有5个权重系数,该5个的权重系数均相同,且每一行的权重系数之和为1,则每一列的权重系数中的每一个权重系数均为1/5=0.2,则权重矩阵中每一行的权重系数为[0.2 0.2 0.2 0.2 0.2]。Exemplarily, if M=5, each row in the weight matrix has 5 weight coefficients, the 5 weight coefficients are all the same, and the sum of the weight coefficients in each row is 1, then each of the weight coefficients in each column A weight coefficient is 1/5=0.2, then the weight coefficient of each row in the weight matrix is [0.2 0.2 0.2 0.2 0.2].
若C=64,则权重矩阵为64行5列的矩阵,该64行中每一行的权重系数均为[0.2 0.2 0.2 0.2 0.2]。If C=64, the weight matrix is a matrix with 64 rows and 5 columns, and the weight coefficient of each row in the 64 rows is [0.2 0.2 0.2 0.2 0.2].
权重矩阵中的权重系数也可以通过图表或者文字的形式显示,但本申请实施例并不限于此。The weight coefficients in the weight matrix may also be displayed in the form of graphs or text, but this embodiment of the present application is not limited thereto.
在该实现方式中,权重矩阵中每一行的权重系数中的每一个权重系数均相同,计算简单,有利于提高计算相似度的效率。In this implementation manner, each of the weight coefficients of each row in the weight matrix is the same, and the calculation is simple, which is conducive to improving the efficiency of calculating the similarity.
在另一种可能的实现方式中,权重矩阵中的权重系数可以复用传统多尺度结构相似性(multi-scale structure similarity index,MSSSIM)指标中的权重系数,传统MSSSIM指标中的权重系数是通过人眼视觉实验来确定的。In another possible implementation, the weight coefficients in the weight matrix can reuse the weight coefficients in the traditional multi-scale structure similarity index (MSSSIM) index, and the weight coefficients in the traditional MSSSIM index are obtained by Determined by human vision experiments.
传统MSSSIM指标中的权重系数包括0.0448、0.2856、0.3001、0.2363以及0.1333,在本申请实施例中,若M=5,权重矩阵中每一行的权重系数均相同,且可以为[0.0448 0.2856 0.3001 0.2363 0.1333]。The weight coefficients in the traditional MSSSIM index include 0.0448, 0.2856, 0.3001, 0.2363 and 0.1333. In the embodiment of this application, if M=5, the weight coefficients of each row in the weight matrix are the same, and can be [0.0448 0.2856 0.3001 0.2363 0.1333 ].
若C=64,则权重矩阵为64行5列的矩阵,该64行中每一行的权重系数均为[0.0448 0.2856 0.3001 0.2363 0.1333]。If C=64, the weight matrix is a matrix with 64 rows and 5 columns, and the weight coefficient of each row in the 64 rows is [0.0448 0.2856 0.3001 0.2363 0.1333].
在该实现方式中,权重矩阵中的权重系数可以复用传统MSSSIM指标中的权重系数,有助于满足人眼视觉需求。In this implementation, the weight coefficients in the weight matrix can reuse the weight coefficients in the traditional MSSSIM index, which helps to meet the visual needs of the human eye.
在又一种可能的实现方式中,C个卷积核按照特征频率高低划分为高频特征组、中频特征组以及低频特征组,M个尺度的特征图根据M个尺度的特征图的分辨率划分为低尺度特征图、中尺度特征图以及高尺度特征图;权重矩阵中高频特征组、中频特征组以及低频特征组对应的特征图在不同尺度下的权重系数分布满足以下条件:高频特征组对应的特征图的不为0的权重系数分布于第一图像和第二图像的低尺度的特征图中,高尺度特征图的权重系数为0;低频特征组对应的特征图的不为0的权重系数分布于低尺度特征图、中尺度特征图以及高尺度特征图中;中频特征组对应的特征图的不为0的权重系数的分布多于高频特征组对应的特征图的权重系数分布,且少于低频特征组对应的特征图的权重系数分布。In another possible implementation, the C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and the feature maps of M scales are based on the resolution of the feature maps of M scales. Divided into low-scale feature maps, medium-scale feature maps, and high-scale feature maps; the weight coefficient distributions of the feature maps corresponding to high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups in the weight matrix at different scales meet the following conditions: high-frequency features The weight coefficient of the feature map corresponding to the group is distributed in the low-scale feature map of the first image and the second image, the weight coefficient of the high-scale feature map is 0; the feature map corresponding to the low-frequency feature group is not 0 The weight coefficients of are distributed in low-scale feature maps, medium-scale feature maps, and high-scale feature maps; the distribution of non-zero weight coefficients of feature maps corresponding to intermediate-frequency feature groups is more than that of feature maps corresponding to high-frequency feature groups distribution, and less than the weight coefficient distribution of the feature map corresponding to the low-frequency feature group.
图像处理设备可以将C个卷积核按照特征频率从高到低或者从低到高进行排序,然后将排序后的卷积核进行分组,划分为高频特征组、中频特征组以及低频特征组。The image processing device can sort the C convolution kernels according to the characteristic frequency from high to low or from low to high, and then group the sorted convolution kernels into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups. .
示例性地,C=64,图像处理设备可以将该64个卷积核按照特征频率从高到低进行排序,然后将排序后的卷积核进行分组,划分为高频特征组、中频特征组以及低频特征组。图7示出了一种卷积核分组的示意图,如图7所示,高频特征组的卷积核可以包括8个卷积核,分别为卷积核1至卷积核8,中频特征组的卷积核可以包括24个卷积核,分别为卷积核9至卷积核32,低频特征组的卷积核可以包括32个卷积核,分别为卷积核33至卷积核64。Exemplarily, C=64, the image processing device can sort the 64 convolution kernels according to the feature frequency from high to low, and then group the sorted convolution kernels into high-frequency feature groups and intermediate-frequency feature groups and low-frequency feature groups. Figure 7 shows a schematic diagram of a convolution kernel grouping, as shown in Figure 7, the convolution kernel of the high-frequency feature group can include 8 convolution kernels, which are convolution kernel 1 to convolution kernel 8, and the intermediate frequency feature The convolution kernel of the group can include 24 convolution kernels, respectively convolution kernel 9 to convolution kernel 32, and the convolution kernel of the low-frequency feature group can include 32 convolution kernels, respectively convolution kernel 33 to convolution kernel 64.
高频特征组对应的特征图可以理解为高频特征组中每个卷积核对应的特征图,每个卷积核对应的特征图包括M个尺度。中频特征组对应的特征图可以理解为中频特征组中每个卷积核对应的特征图,每个卷积核对应的特征图包括M个尺度的特征图。同理,低频特征组对应的特征图可以理解为低频特征组中每个卷积核对应的特征图,每个卷积核对应的特征图包括M个尺度的特征图。The feature map corresponding to the high-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the high-frequency feature group, and the feature map corresponding to each convolution kernel includes M scales. The feature map corresponding to the intermediate frequency feature group can be understood as the feature map corresponding to each convolution kernel in the intermediate frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales. Similarly, the feature map corresponding to the low-frequency feature group can be understood as the feature map corresponding to each convolution kernel in the low-frequency feature group, and the feature map corresponding to each convolution kernel includes feature maps of M scales.
图像处理设备可以将M个尺度的特征图根据M个尺度的特征图的分辨率从高到低或者从低到高进行顺序,然后将排序后的M个尺度的特征图划分为低尺度特征图、中尺度特征图以及高尺度特征图。The image processing device can sort the feature maps of M scales according to the resolution of the feature maps of M scales from high to low or from low to high, and then divide the sorted feature maps of M scales into low-scale feature maps , mid-scale feature maps, and high-scale feature maps.
本申请提供的图像处理方法,在该实现方式中,权重矩阵可以为不同特征不同尺度分配权重系数,可以发挥不同特征的作用,对于高频特征组对应的特征,在低尺度特征图上分配较高的权重,对于低频特征组对应的特征,考虑了高尺度下提取轮廓信息的能力,在不同尺度上均匀分配权重,对于中频特征组对应的特征,介于高频特征组和低频特征组之间,既考虑了高频细节也兼顾轮廓信息,有助于提升机器视觉性能。In the image processing method provided by this application, in this implementation, the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. High weight. For the features corresponding to the low-frequency feature group, the ability to extract contour information at a high scale is considered, and the weight is evenly distributed on different scales. For the features corresponding to the medium-frequency feature group, it is between the high-frequency feature group and the low-frequency feature group. During this time, both high-frequency details and contour information are considered, which helps to improve the performance of machine vision.
可选地,高频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,低频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,中频特征对应的特征图的不为0的权重系数中每一个权重系数均相同。Optionally, each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, each of the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group is the same, and the intermediate frequency Each of the non-zero weight coefficients of the feature map corresponding to the feature is the same.
可选地,若M=5,高频特征组对应的特征图的权重系数为[1 0 0 0 0],低频特征组对应的特征图的权重系数为[1/5 1/5 1/5 1/5 1/5],中频特征对应的特征图的权重系数为[1/3 1/3 1/3 0 0];Optionally, if M=5, the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0], and the weight coefficient of the feature map corresponding to the low-frequency feature group is [1/5 1/5 1/5 1/5 1/5], the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
其中,[1 0 0 0 0]、[1/5 1/5 1/5 1/5 1/5]以及[1/3 1/3 1/3 0 0]中,第一列的权重系数均用于表示低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数均用于表示中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数均用于表示高尺度特征图的权重系数。Among them, in [1 0 0 0 0], [1/5 1/5 1/5 1/5 1/5] and [1/3 1/3 1/3 0 0], the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map, the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map, the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
若C=64,则权重矩阵为64行5列的矩阵,若高频特征组对应的卷积核包括8个卷积核,则权重矩阵中存在8行权重系数对应高频特征组,该8行权重系数中每一个权重系数相同,且为[1 0 0 0 0];若中频特征组对应的卷积核包括24个卷积核,则权重矩阵中存在24行权重系数对应中频特征组,该24行权重系数中每一个权重系数相同,且为[1/3 1/3 1/3 0 0];若低频特征组对应的卷积核包括32个卷积核,则权重矩阵中存在32行权重系数对应中频特征组,该32行权重系数中每一个权重系数相同,且为[1/5 1/5 1/5 1/5 1/5]。If C=64, the weight matrix is a matrix of 64 rows and 5 columns. If the convolution kernel corresponding to the high-frequency feature group includes 8 convolution kernels, there are 8 rows of weight coefficients corresponding to the high-frequency feature group in the weight matrix. The 8 Each weight coefficient in the row weight coefficient is the same, and it is [1 0 0 0 0]; if the convolution kernel corresponding to the intermediate frequency feature group includes 24 convolution kernels, there are 24 row weight coefficients in the weight matrix corresponding to the intermediate frequency feature group, Each weight coefficient in the 24 row weight coefficients is the same, and is [1/3 1/3 1/3 0 0]; if the convolution kernel corresponding to the low-frequency feature group includes 32 convolution kernels, there are 32 convolution kernels in the weight matrix The row weight coefficient corresponds to the intermediate frequency feature group, and each of the 32 row weight coefficients is the same, and is [1/5 1/5 1/5 1/5 1/5].
需要说明的是,上述权重矩阵中第一列的权重系数用于表示低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数用于表示中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数用于表示高尺度特征图的权重系数,仅仅为一个示例,本申请实施例对此不作限定。本申请实施例提供的图像处理方法,权重矩阵可以为不同特征不同尺度分配权重系数,可以发挥不同特征的作用,对于细节特征,在高分辨率的第一尺度特征图上分配权重1,对于平滑特征,考虑了高尺度下提取轮廓信息的能力,在不同尺度上均匀分配权重,对于边缘特征,介于细节特征和平滑特征之间,既考虑了高频细节也兼顾轮廓信息,有助于提升机器视觉性能。It should be noted that the weight coefficients in the first column in the above weight matrix are used to represent the weight coefficients of low-scale feature maps, and the weight coefficients in the second and third columns are used to represent the weight coefficients of medium-scale feature maps. The weight coefficient in the fourth column and the weight coefficient in the fifth column are used to indicate the weight coefficient of the high-scale feature map, which is just an example, and is not limited in this embodiment of the present application. In the image processing method provided by the embodiment of the present application, the weight matrix can assign weight coefficients to different features and scales, and can play the role of different features. For detailed features, a weight of 1 is assigned on the high-resolution first-scale feature map. For smoothing Features, considering the ability to extract contour information at high scales, evenly distribute weights on different scales. For edge features, it is between detail features and smooth features. It considers both high-frequency details and contour information, which helps to improve machine vision performance.
可选地,特征频率由单层卷积神经网络的卷积核的模大小和/或单层卷积神经网络 的归一化层的缩放系数和归一化系数确定。Optionally, the feature frequency is determined by the modulus of the convolution kernel of the single-layer convolutional neural network and/or the scaling factor and normalization coefficient of the normalization layer of the single-layer convolutional neural network.
在一种可能的实现方式中,单层卷积神经网络的第i个卷积核的模大小可以用符号||f i||表示,特征频率可以由单层卷积神经网络的卷积核的模大小确定。 In a possible implementation, the module size of the i-th convolution kernel of the single-layer convolutional neural network can be represented by the symbol ||f i ||, and the characteristic frequency can be represented by the convolution kernel of the single-layer convolutional neural network The mold size is determined.
在另一种可能的实现方式中,归一化层的缩放系数和归一化系数可以为预训练模型的归一化层(batch normalization,BN)的参数,预训练模型即为预先训练好的卷积神经网络模型,例如Resnet-50预训练模型。归一化层的缩放系数可以用符号γ i表示,归一化系数可以用符号σ i表示,特征频率可以由单层卷积神经网络的归一化层的缩放系数和归一化系数确定。 In another possible implementation, the scaling coefficient and normalization coefficient of the normalization layer can be the parameters of the normalization layer (batch normalization, BN) of the pre-training model, and the pre-training model is the pre-trained Convolutional neural network models, such as the Resnet-50 pretrained model. The scaling factor of the normalization layer can be represented by the symbol γ i , the normalization coefficient can be represented by the symbol σ i , and the feature frequency can be determined by the scaling factor and the normalization coefficient of the normalization layer of the single-layer convolutional neural network.
在又一种可能的实现方式中,特征频率可以由单层卷积神经网络的卷积核的模大小、归一化层的缩放系数以及归一化系数确定,例如,特征频率由
Figure PCTCN2022076277-appb-000053
确定。
In yet another possible implementation, the feature frequency can be determined by the modulus size of the convolution kernel of the single-layer convolutional neural network, the scaling factor of the normalization layer, and the normalization coefficient. For example, the feature frequency is determined by
Figure PCTCN2022076277-appb-000053
Sure.
上述结合图1至图7详细描述了本申请实施例提供的图像处理方法,下面结合图8至图11描述本申请实施例的仿真结果。The image processing method provided by the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 7 , and the simulation results of the embodiment of the present application are described below in conjunction with FIG. 8 to FIG. 11 .
本申请实施例提供的图像处理方法,计算相似度ASSIM,相比于基于像素域计算均方误差(mean square error,MSE)、基于特征域计算均方误差(wfMSE)以及基于特征域计算MSSSIM(可以简称为wfMSSSIM),可以提高机器视觉(如目标检测mAP和语义分割mIoU)性能和人眼视觉(如LPIPS和DISTS)性能。The image processing method provided by the embodiment of the present application calculates the similarity ASSIM, compared to calculating the mean square error (mean square error, MSE) based on the pixel domain, calculating the mean square error (wfMSE) based on the feature domain, and calculating the MSSSIM based on the feature domain ( It can be referred to as wfMSSSIM for short), which can improve the performance of machine vision (such as target detection mAP and semantic segmentation mIoU) and human vision (such as LPIPS and DISTS).
ASSIM相比于wfMSSSIM、wfMSE以及MSE,人眼视觉性能中的LPIPS较优。Compared with wfMSSSIM, wfMSE and MSE, ASSIM has better LPIPS in human visual performance.
示例性地,图8示出了一种仿真示意图。如图8所示,图8中的曲线为随着比特每像素(bit per pixel,BPP)变化LPIPS的变化曲线,图中包括4条曲线,分别为填充黑色的圆圈的曲线、填充白色的圆圈的曲线、填充黑色的三角形的曲线以及填充白色的三角形的曲线。其中,填充黑色的圆圈的曲线用于表示ASSIM,填充白色的圆圈的曲线用于表示wfMSSSIM,填充黑色的三角形的曲线用于表示wfMSE,填充白色的三角形的曲线用于表示MSE。在图中可以看出,ASSIM在其他曲线之上,可以得到ASSIM的LPIPS较优。Exemplarily, Fig. 8 shows a schematic diagram of simulation. As shown in Figure 8, the curve in Figure 8 is the change curve of LPIPS as the bit per pixel (BPP) changes, and the figure includes 4 curves, which are the curves filled with black circles and the circles filled with white , a curve with a black-filled triangle, and a curve with a white-filled triangle. Among them, the curve filled with black circles is used to represent ASSIM, the curve filled with white circles is used to represent wfMSSSIM, the curve filled with black triangles is used to represent wfMSE, and the curve filled with white triangles is used to represent MSE. It can be seen in the figure that ASSIM is above other curves, and ASSIM's LPIPS can be obtained better.
ASSIM相比于wfMSSSIM、wfMSE以及MSE,人眼视觉性能中的DISTS较优。Compared with wfMSSSIM, wfMSE, and MSE, ASSIM is superior to DISTS in human visual performance.
示例性地,图9示出了一种仿真示意图。如图9所示,图9中的曲线为随着BPP变化DISTS的变化曲线,图中包括4条曲线,分别为填充黑色的圆圈的曲线、填充白色的圆圈的曲线、填充黑色的三角形的曲线以及填充白色的三角形的曲线。其中,填充黑色的圆圈的曲线用于表示ASSIM,填充白色的圆圈的曲线用于表示wfMSSSIM,填充黑色的三角形的曲线用于表示wfMSE,填充白色的三角形的曲线用于表示MSE。在图中可以看出,ASSIM在其他曲线之上,可以得到ASSIM的DISTS较优。Exemplarily, Fig. 9 shows a schematic diagram of simulation. As shown in Figure 9, the curve in Figure 9 is the change curve of DISTS with the change of BPP. The figure includes 4 curves, namely the curve filled with black circles, the curve filled with white circles, and the curve filled with black triangles and the curve of the triangle filled with white. Among them, the curve filled with black circles is used to represent ASSIM, the curve filled with white circles is used to represent wfMSSSIM, the curve filled with black triangles is used to represent wfMSE, and the curve filled with white triangles is used to represent MSE. As can be seen in the figure, ASSIM is above other curves, and the DISTS of ASSIM can be obtained better.
ASSIM相比于wfMSSSIM、wfMSE以及MSE,机器视觉性能中的mAP较优。Compared with wfMSSSIM, wfMSE and MSE, ASSIM has better mAP in machine vision performance.
示例性地,图10示出了一种仿真示意图。如图10所示,图10中的曲线为随着BPP变化mAP的变化曲线,图中包括5条曲线,分别为填充黑色的圆圈的曲线、填充白色的圆圈的曲线、填充黑色的三角形的曲线、填充白色的三角形的曲线以及直线。其中,填充黑色的圆圈的曲线用于表示ASSIM,填充白色的圆圈的曲线用于表示wfMSSSIM,填充黑色的三角形的曲线用于表示wfMSE,填充白色的三角形的曲线用于表示MSE,直线用于表示无压缩。在图中可以看出,ASSIM在wfMSSSIM、wfMSE 以及MSE的曲线之上,同时,在BPP为1.2以及以上时,ASSIM的mAP在无压缩的直线之上,可以得到ASSIM的mAP较优。另外,在图中所示的箭头处,ASSIM相比于MSE指标,在相同机器视觉性能下,目标检测例如Faster-RCNN的编码比特节省了约50%。Exemplarily, Fig. 10 shows a schematic diagram of simulation. As shown in Figure 10, the curve in Figure 10 is the change curve of mAP with the change of BPP, and the figure includes 5 curves, which are the curve filled with black circles, the curve filled with white circles, and the curve filled with black triangles , curved lines of white-filled triangles, and straight lines. Among them, the curve filled with black circles is used to represent ASSIM, the curve filled with white circles is used to represent wfMSSSIM, the curve filled with black triangles is used to represent wfMSE, the curve filled with white triangles is used to represent MSE, and the straight line is used to represent No compression. It can be seen from the figure that ASSIM is above the curves of wfMSSSIM, wfMSE and MSE. At the same time, when the BPP is 1.2 or above, the mAP of ASSIM is above the straight line without compression, and the mAP of ASSIM can be obtained better. In addition, at the arrow shown in the figure, compared with the MSE index, ASSIM saves about 50% of the coding bits of target detection such as Faster-RCNN under the same machine vision performance.
ASSIM相比于wfMSE和MSE,机器视觉性能中的mIoU较优。Compared with wfMSE and MSE, ASSIM has better mIoU in machine vision performance.
示例性地,图11示出了一种仿真示意图。如图11所示,图11中的曲线为随着BPP变化mIoU的变化曲线,图中包括5条曲线,分别为填充黑色的圆圈的曲线、填充白色的圆圈的曲线、填充黑色的三角形的曲线、填充白色的三角形的曲线以及直线。其中,填充黑色的圆圈的曲线用于表示ASSIM,填充白色的圆圈的曲线用于表示wfMSSSIM,填充黑色的三角形的曲线用于表示wfMSE,填充白色的三角形的曲线用于表示MSE,直线用于表示无压缩。在图中可以看出,ASSIM在wfMSE和MSE的曲线之上,在wfMSSSIM曲线之下,可以得到ASSIM相比wfMSE和MSE,ASSIM的mIoU较优。另外,在图中所示的箭头处,ASSIM相比于MSE指标,在相同机器视觉性能下,语义分割例如金字塔场景解析网络(pyramid scene parsing network,PSPnet)的编码比特节省了约40%。Exemplarily, Fig. 11 shows a schematic diagram of simulation. As shown in Figure 11, the curve in Figure 11 is the change curve of mIoU with the change of BPP. The figure includes 5 curves, namely the curve filled with black circles, the curve filled with white circles, and the curve filled with black triangles , curved lines of white-filled triangles, and straight lines. Among them, the curve filled with black circles is used to represent ASSIM, the curve filled with white circles is used to represent wfMSSSIM, the curve filled with black triangles is used to represent wfMSE, the curve filled with white triangles is used to represent MSE, and the straight line is used to represent No compression. As can be seen in the figure, ASSIM is above the curves of wfMSE and MSE, and below the curve of wfMSSSIM, it can be obtained that ASSIM has better mIoU than wfMSE and MSE. In addition, at the arrow shown in the figure, compared with the MSE index, ASSIM saves about 40% of coding bits for semantic segmentation such as pyramid scene parsing network (PSPnet) under the same machine vision performance.
上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。The sequence numbers of the above processes do not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
上文中结合图1至图11,详细描述了本申请实施例的图像处理方法,下面将结合图12和图13,详细描述本申请实施例的图像处理装置。The image processing method of the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 11 , and the image processing apparatus of the embodiment of the present application will be described in detail below in conjunction with FIG. 12 and FIG. 13 .
图12示出了本申请实施例提供的一种图像处理装置1200,该图像处理装置1200包括:获取模块1210和处理模块1220。其中,获取模块1210用于:获取第一图像和第二图像;处理模块1220用于:将第一图像和第二图像分别输入至单层卷积神经网络,得到第一图像的第一尺度的特征图和第二图像的第一尺度的特征图,第一图像的第一尺度的特征图的数量和第二图像的第一尺度的特征图的数量均为C;分别对第一图像的第一尺度的特征图和第二图像的第二尺度的特征图进行M-1次下采样,得到第一图像的M-1个尺度的特征图和第二图像的M-1个尺度的特征图,其中,C和M均为大于1的正整数;根据第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定第一图像相对于第二图像的相似度,权重矩阵包括不同尺度的不同特征图的权重系数。FIG. 12 shows an image processing apparatus 1200 provided by an embodiment of the present application. The image processing apparatus 1200 includes: an acquisition module 1210 and a processing module 1220 . Wherein, the acquiring module 1210 is used for: acquiring the first image and the second image; the processing module 1220 is used for: inputting the first image and the second image into the single-layer convolutional neural network respectively, to obtain the first scale of the first image The feature map and the feature map of the first scale of the second image, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C; The feature map of one scale and the feature map of the second scale of the second image are down-sampled M-1 times to obtain the feature map of M-1 scales of the first image and the feature map of M-1 scales of the second image , where C and M are both positive integers greater than 1; according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, The feature map of M-1 scales of the second image and the weight matrix related to the number C of M scales and feature maps determine the similarity of the first image with respect to the second image, and the weight matrix includes different feature maps of different scales weight coefficient of .
可选地,上述处理模块1220还用于:根据第一尺度的特征图的峰值信噪比,确定第一尺度的特征图的权重系数,第一尺度的特征图包括第一图像的第一尺度的特征图和第二图像的第一尺度的特征图;根据第一尺度的特征图的权重系数、第一图像的第一尺度的特征图、第二图像的第一尺度的特征图、第一图像的M-1个尺度的特征图、第二图像的M-1个尺度的特征图以及权重矩阵,确定相似度。Optionally, the above-mentioned processing module 1220 is further configured to: determine the weight coefficient of the feature map of the first scale according to the peak signal-to-noise ratio of the feature map of the first scale, and the feature map of the first scale includes the first scale of the first image The feature map of the first scale and the feature map of the second image; according to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, the first The feature map of M-1 scales of the image, the feature map of M-1 scales of the second image, and the weight matrix determine the similarity.
可选地,上述相似度通过下列公式得到:Optionally, the above similarity is obtained by the following formula:
Figure PCTCN2022076277-appb-000054
Figure PCTCN2022076277-appb-000054
Figure PCTCN2022076277-appb-000055
Figure PCTCN2022076277-appb-000055
Figure PCTCN2022076277-appb-000056
Figure PCTCN2022076277-appb-000056
Figure PCTCN2022076277-appb-000057
Figure PCTCN2022076277-appb-000057
其中,ASSIM为所述相似度,x为第一图像,y为第二图像,f()为单层卷积神经网络的卷积操作,i为单层卷积神经网络中的第i个卷积核,1≤i≤C,j为第j个尺度,1≤j≤M,w i(f 1 i(x),f 1 i(y))为第一尺度的特征图的权重系数,
Figure PCTCN2022076277-appb-000058
为第j个尺度的第i个特征图的结构相似性,β ij为权重矩阵中第i行第j列对应的权重系数,f 1 i(x)为第一图像的第一尺度的特征图中的第i个特征图,f 1 i(y)为第二图像的第一尺度的特征图中的第i个特征图,PSNR i为第一尺度第i个特征图的峰值信噪比,α为常数,
Figure PCTCN2022076277-appb-000059
为第一图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000060
为第二图像的第j个尺度的第i个特征图的方差,
Figure PCTCN2022076277-appb-000061
为第一图像的第j个尺度的第i个特征图和第二图像的第j个尺度的第i个特征图的协方差,max(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最大值,min(f 1 i(x),f 1 i(y))为第一图像和第二图像的第一尺度的特征图中像素值的最小值,H为第一图像或第二图像的高度,W为第一图像或第二图像的宽度。
Among them, ASSIM is the similarity, x is the first image, y is the second image, f() is the convolution operation of the single-layer convolutional neural network, and i is the i-th volume in the single-layer convolutional neural network Product kernel, 1≤i≤C, j is the jth scale, 1≤j≤M, w i (f 1 i (x), f 1 i (y)) is the weight coefficient of the feature map of the first scale,
Figure PCTCN2022076277-appb-000058
is the structural similarity of the i-th feature map of the j-th scale, β ij is the weight coefficient corresponding to the i-th row and column j in the weight matrix, and f 1 i (x) is the feature map of the first scale of the first image The i-th feature map in , f 1 i (y) is the i-th feature map in the feature map of the first scale of the second image, PSNR i is the peak signal-to-noise ratio of the i-th feature map in the first scale, α is a constant,
Figure PCTCN2022076277-appb-000059
is the variance of the i-th feature map of the j-th scale of the first image,
Figure PCTCN2022076277-appb-000060
is the variance of the i-th feature map of the j-th scale of the second image,
Figure PCTCN2022076277-appb-000061
is the covariance of the i-th feature map of the j-th scale of the first image and the i-th feature map of the j-th scale of the second image, max(f 1 i (x), f 1 i (y)) is the maximum value of the pixel value in the feature map of the first scale of the first image and the second image, min(f 1 i (x), f 1 i (y)) is the first scale of the first image and the second image The minimum value of the pixel value in the feature map, H is the height of the first image or the second image, W is the width of the first image or the second image.
可选地,上述权重矩阵为C行M列的矩阵,所述权重矩阵中每一行的权重系数用于表示特征图在不同尺度下的权重系数,所述权重矩阵中每一行的权重系数之和为1。Optionally, the above weight matrix is a matrix of C rows and M columns, the weight coefficient of each row in the weight matrix is used to represent the weight coefficient of the feature map at different scales, and the sum of the weight coefficients of each row in the weight matrix is 1.
可选地,上述权重矩阵中每一行的权重系数的每一个权重系数均相同。Optionally, each weight coefficient of each row of the weight coefficients in the above weight matrix is the same.
可选地,上述M=5,权重矩阵中每一行的权重系数均相同,且为[0.2 0.2 0.2 0.2 0.2]。Optionally, the above M=5, and the weight coefficients of each row in the weight matrix are the same and are [0.2 0.2 0.2 0.2 0.2].
可选地,上述M=5,权重矩阵中每一行的权重系数均相同,且为[0.0448 0.2856 0.3001 0.2363 0.1233]Optionally, the above M=5, the weight coefficients of each row in the weight matrix are the same, and are [0.0448 0.2856 0.3001 0.2363 0.1233]
可选地,上述C个卷积核按照特征频率高低划分为高频特征组、中频特征组以及低频特征组,M个尺度的特征图根据M个尺度的特征图的分辨率划分为低尺度特征图、中尺度特征图以及高尺度特征图;权重矩阵中高频特征组、中频特征组以及低频特征组对应的特征图在不同尺度下的权重系数分布满足以下条件:高频特征组对应的特征图的不为0的权重系数分布于第一图像和第二图像的低尺度的特征图中,高尺度特征图的权重系数为0;低频特征组对应的特征图的不为0的权重系数分布于低尺度特征图、中尺度特征图以及高尺度特征图中;中频特征组对应的特征图的不为0的权重系数的分布多于高频特征组对应的特征图的权重系数分布,且少于低频特征组对应的特征图的权重系数分布。Optionally, the above C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to the feature frequency, and the feature maps of M scales are divided into low-scale features according to the resolution of the feature maps of M scales graph, medium-scale feature map, and high-scale feature map; the weight coefficient distributions of the feature maps corresponding to the high-frequency feature group, intermediate-frequency feature group, and low-frequency feature group in the weight matrix at different scales meet the following conditions: the feature map corresponding to the high-frequency feature group The non-zero weight coefficients are distributed in the low-scale feature maps of the first image and the second image, and the high-scale feature maps have a weight coefficient of 0; the non-zero weight coefficients of the feature maps corresponding to the low-frequency feature group are distributed in Low-scale feature maps, medium-scale feature maps, and high-scale feature maps; the distribution of non-zero weight coefficients of feature maps corresponding to intermediate-frequency feature groups is more than the distribution of weight coefficients of feature maps corresponding to high-frequency feature groups, and less than The weight coefficient distribution of the feature map corresponding to the low-frequency feature group.
可选地,高频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,低频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,中频特征对应的特征图的不为0的权重系数中每一个权重系数均相同。Optionally, each of the non-zero weight coefficients of the feature map corresponding to the high-frequency feature group is the same, each of the non-zero weight coefficients of the feature map corresponding to the low-frequency feature group is the same, and the intermediate frequency Each of the non-zero weight coefficients of the feature map corresponding to the feature is the same.
可选地,上述M=5,高频特征组对应的特征图的权重系数为[1 0 0 0 0],低频特征 组对应的特征图的权重系数为[1/5 1/5 1/5 1/5 1/5],中频特征对应的特征图的权重系数为[1/3 1/3 1/3 0 0];Optionally, the above M=5, the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0], and the weight coefficient of the feature map corresponding to the low-frequency feature group is [1/5 1/5 1/5 1/5 1/5], the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
其中,[1 0 0 0 0]、[1/5 1/5 1/5 1/5 1/5]以及[1/3 1/3 1/3 0 0]中,第一列的权重系数均用于表示低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数均用于表示中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数均用于表示高尺度特征图的权重系数。Among them, in [1 0 0 0 0], [1/5 1/5 1/5 1/5 1/5] and [1/3 1/3 1/3 0 0], the weight coefficients of the first column are all It is used to represent the weight coefficient of the low-scale feature map, the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map, the weight coefficient of the fourth column and the weight coefficient of the fifth column are both Weight coefficients used to represent high-scale feature maps.
可选地,上述特征频率由单层卷积神经网络的卷积核的模大小和/或单层卷积神经网络的归一化层的缩放系数和归一化系数确定。Optionally, the characteristic frequency is determined by the modulus of the convolution kernel of the single-layer convolutional neural network and/or the scaling coefficient and normalization coefficient of the normalization layer of the single-layer convolutional neural network.
可选地,上述第一图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第一图像进行压缩和重建以输出第二图像,相似度用于优化编解码器。Optionally, the above-mentioned first image is the input of the codec, the second image is the output of the codec, the codec is used to compress and reconstruct the first image to output the second image, and the similarity is used to optimize the codec device.
可选地,上述第一图像为第三图像经过图像信号处理后的图像,第三图像为编解码器的输入,第二图像为编解码器的输出,编解码器用于对第三图像进行图像信号处理、压缩以及重建以输出第二图像,相似度用于优化编解码器。Optionally, the above-mentioned first image is an image after image signal processing of the third image, the third image is the input of the codec, the second image is the output of the codec, and the codec is used to image the third image Signal processing, compression and reconstruction to output the second image, similarity is used to optimize the codec.
应理解,这里的图像处理装置1200以功能模块的形式体现。这里的术语“模块”可以指应用特有集成电路ASIC、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。It should be understood that the image processing apparatus 1200 here is embodied in the form of functional modules. The term "module" here may refer to an application-specific integrated circuit ASIC, an electronic circuit, a processor (such as a shared processor, a dedicated processor or a group processor, etc.) and a memory for executing one or more software or firmware programs, Logic and/or other suitable components are incorporated to support the described functionality.
上述图像处理装置1200具有实现上述方法实施例中图像处理设备执行的相应步骤的功能;上述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。The above-mentioned image processing apparatus 1200 has the function of implementing the corresponding steps performed by the image processing device in the above-mentioned method embodiment; the above-mentioned functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
在本申请的实施例中,图12中的图像处理装置1200也可以是芯片或者芯片***,例如:片上***(system on chip,SoC)。In the embodiment of the present application, the image processing device 1200 in FIG. 12 may also be a chip or a chip system, for example: a system on chip (system on chip, SoC).
图13是本申请实施例提供的另一种图像处理装置1300的示意性框图。该图像处理装置1300包括处理器1310、通信接口1320和存储器1330。其中,处理器1310、通信接口1320和存储器1330通过内部连接通路互相通信,该存储器1330用于存储指令,该处理器1320用于执行该存储器1330存储的指令,以控制该通信接口1320发送信号和/或接收信号。FIG. 13 is a schematic block diagram of another image processing apparatus 1300 provided by an embodiment of the present application. The image processing device 1300 includes a processor 1310 , a communication interface 1320 and a memory 1330 . Wherein, the processor 1310, the communication interface 1320 and the memory 1330 communicate with each other through an internal connection path, the memory 1330 is used to store instructions, and the processor 1320 is used to execute the instructions stored in the memory 1330 to control the communication interface 1320 to send signals and /or to receive a signal.
应理解,图像处理装置1300可以具体为上述方法实施例中的图像处理设备,或者,上述方法实施例中图像处理设备的功能可以集成在图像处理装置1300中,图像处理装置1300可以用于执行上述方法实施例中与图像处理设备对应的各个步骤和/或流程。可选地,该存储器1330可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器的一部分还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。该处理器1310可以用于执行存储器中存储的指令,并且该处理器执行该指令时,该处理器可以执行上述方法实施例中与图像处理设备对应的各个步骤和/或流程。It should be understood that the image processing apparatus 1300 may specifically be the image processing device in the above method embodiment, or the functions of the image processing device in the above method embodiment may be integrated in the image processing apparatus 1300, and the image processing apparatus 1300 may be used to execute the above Each step and/or process corresponding to the image processing device in the method embodiment. Optionally, the memory 1330 may include read-only memory and random-access memory, and provides instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information. The processor 1310 may be configured to execute instructions stored in the memory, and when the processor executes the instructions, the processor may execute various steps and/or processes corresponding to the image processing device in the foregoing method embodiments.
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄 存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器执行存储器中的指令,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software. The steps of the methods disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in the field. The storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, no detailed description is given here.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质用于存储计算机程序,该计算机程序用于实现上述实施例中图像处理设备对应的方法。An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to implement the method corresponding to the image processing device in the foregoing embodiments.
本申请实施例还提供了一种芯片***,该芯片***用于支持上述方法实施例中图像处理设备实现本申请实施例所示的功能。The embodiment of the present application also provides a chip system, which is used to support the image processing device in the above method embodiment to implement the functions shown in the embodiment of the present application.
本申请实施例提供一种计算机程序产品,该计算机程序产品包括计算机程序(也可以称为代码,或指令),当该计算机程序在计算机上运行时,该计算机可以执行上述实施例中图像处理设备对应的方法。An embodiment of the present application provides a computer program product, the computer program product includes a computer program (also called code, or instruction), when the computer program runs on a computer, the computer can execute the image processing device in the above-mentioned embodiments corresponding method.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (30)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, characterized in that, comprising:
    获取第一图像和第二图像;acquire the first image and the second image;
    将所述第一图像和所述第二图像分别输入至单层卷积神经网络,得到所述第一图像的第一尺度的特征图和所述第二图像的第一尺度的特征图,所述单层卷积神经网络包括C个卷积核,所述第一图像的第一尺度的特征图的数量和所述第二图像的第一尺度的特征图的数量均为C;The first image and the second image are respectively input to a single-layer convolutional neural network to obtain a feature map of the first scale of the first image and a feature map of the first scale of the second image, so The single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are C;
    分别对所述第一图像的第一尺度的特征图和所述第二图像的第二尺度的特征图进行M-1次下采样,得到所述第一图像的M-1个尺度的特征图和所述第二图像的M-1个尺度的特征图,其中,C和M均为大于1的正整数;respectively performing M-1 downsampling on the feature map of the first scale of the first image and the feature map of the second scale of the second image to obtain feature maps of M-1 scales of the first image and feature maps of M-1 scales of the second image, wherein C and M are both positive integers greater than 1;
    根据所述第一图像的第一尺度的特征图、所述第二图像的第一尺度的特征图、所述第一图像的M-1个尺度的特征图、所述第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定所述第一图像相对于所述第二图像的相似度,所述权重矩阵包括不同尺度的不同特征图的权重系数。According to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, the M- A feature map of 1 scale and a weight matrix related to M scales and the number C of feature maps to determine the similarity of the first image relative to the second image, the weight matrix includes different feature maps of different scales weight coefficient of .
  2. 根据权利要求1所述的方法,其特征在于,在所述根据所述第一图像的第一尺度的特征图、所述第二图像的第一尺度的特征图、所述第一图像的M-1个尺度的特征图、所述第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定所述第一图像相对于所述第二图像的相似度之前,所述方法还包括:The method according to claim 1, characterized in that, according to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the M of the first image - a feature map of 1 scale, a feature map of M-1 scales of the second image, and a weight matrix related to M scales and the number C of feature maps, to determine the relative weight of the first image to the second Before the similarity of the images, the method also includes:
    根据第一尺度的特征图的峰值信噪比,确定所述第一尺度的特征图的权重系数,所述第一尺度的特征图包括所述第一图像的第一尺度的特征图和所述第二图像的第一尺度的特征图;According to the peak signal-to-noise ratio of the feature map of the first scale, the weight coefficient of the feature map of the first scale is determined, and the feature map of the first scale includes the feature map of the first image of the first scale and the A feature map of the first scale of the second image;
    所述根据所述第一图像的第一尺度的特征图、所述第二图像的第一尺度的特征图、所述第一图像的M-1个尺度的特征图、所述第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定所述第一图像相对于所述第二图像的相似度,包括:According to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, the feature map of the second image Feature maps of M-1 scales and a weight matrix related to the number C of M scales and feature maps, determine the similarity of the first image relative to the second image, including:
    根据所述第一尺度的特征图的权重系数、所述第一图像的第一尺度的特征图、所述第二图像的第一尺度的特征图、所述第一图像的M-1个尺度的特征图、所述第二图像的M-1个尺度的特征图以及所述权重矩阵,确定所述相似度。According to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, M-1 scales of the first image The feature map of the second image, the feature map of M-1 scales of the second image, and the weight matrix determine the similarity.
  3. 根据权利要求2所述的方法,其特征在于,所述相似度通过下列公式得到:The method according to claim 2, wherein the similarity is obtained by the following formula:
    Figure PCTCN2022076277-appb-100001
    Figure PCTCN2022076277-appb-100001
    Figure PCTCN2022076277-appb-100002
    Figure PCTCN2022076277-appb-100002
    Figure PCTCN2022076277-appb-100003
    Figure PCTCN2022076277-appb-100003
    Figure PCTCN2022076277-appb-100004
    Figure PCTCN2022076277-appb-100004
    其中,ASSIM为所述相似度,x为所述第一图像,y为所述第二图像,所述f()为所述单层卷积神经网络的卷积操作,i为所述单层卷积神经网络中的第i个卷积核,1≤i≤C,j为第j个尺度,1≤j≤M,w i(f 1 i(x),f 1 i(y))为所述第一尺度的特征图的权重系数,
    Figure PCTCN2022076277-appb-100005
    为第j个尺度的第i个特征图的结构相似性,β ij为所述权重矩阵中第i行第j列对应的权重系数,
    Figure PCTCN2022076277-appb-100006
    为所述第一图像的第一尺度的特征图中的第i个特征图,f 1 i(y)为所述第二图像的第一尺度的特征图中的第i个特征图,PSNR i为第一尺度第i个特征图的峰值信噪比,α为常数,
    Figure PCTCN2022076277-appb-100007
    为所述第一图像的第j个尺度的第i个特征图的方差,
    Figure PCTCN2022076277-appb-100008
    为所述第二图像的第j个尺度的第i个特征图的方差,
    Figure PCTCN2022076277-appb-100009
    为所述第一图像的第j个尺度的第i个特征图和所述第二图像的第j个尺度的第i个特征图的协方差,max(f 1 i(x),f 1 i(y))为所述第一图像和第二图像的第一尺度的特征图中像素值的最大值,min(f 1 i(x),f 1 i(y))为所述第一图像和第二图像的第一尺度的特征图中像素值的最小值,H为所述第一图像或所述第二图像的高度,W为所述第一图像或所述第二图像的宽度。
    Wherein, ASSIM is the similarity, x is the first image, y is the second image, the f() is the convolution operation of the single-layer convolutional neural network, and i is the single-layer The i-th convolution kernel in the convolutional neural network, 1≤i≤C, j is the j-th scale, 1≤j≤M, w i (f 1 i (x),f 1 i (y)) is The weight coefficient of the feature map of the first scale,
    Figure PCTCN2022076277-appb-100005
    is the structural similarity of the i-th feature map of the j-th scale, β ij is the weight coefficient corresponding to the i-th row and j-th column in the weight matrix,
    Figure PCTCN2022076277-appb-100006
    is the i-th feature map in the first-scale feature map of the first image, f 1 i (y) is the i-th feature map in the first-scale feature map of the second image, PSNR i is the peak signal-to-noise ratio of the i-th feature map in the first scale, α is a constant,
    Figure PCTCN2022076277-appb-100007
    is the variance of the i-th feature map of the j-th scale of the first image,
    Figure PCTCN2022076277-appb-100008
    is the variance of the i-th feature map of the j-th scale of the second image,
    Figure PCTCN2022076277-appb-100009
    is the covariance of the i-th feature map of the j-th scale of the first image and the i-th feature map of the j-th scale of the second image, max(f 1 i (x),f 1 i (y)) is the maximum value of the pixel value in the feature map of the first scale of the first image and the second image, and min(f 1 i (x), f 1 i (y)) is the first image and the minimum value of pixel values in the feature map of the first scale of the second image, H is the height of the first image or the second image, and W is the width of the first image or the second image.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述权重矩阵为C行M列的矩阵,所述权重矩阵中每一行的权重系数用于表示特征图在不同尺度下的权重系数,所述权重矩阵中每一行的权重系数之和为1。The method according to any one of claims 1 to 3, wherein the weight matrix is a matrix of C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the feature map at different scales The weight coefficient of each row in the weight matrix is 1.
  5. 根据权利要求4所述的方法,其特征在于,所述权重矩阵中每一行的权重系数的每一个权重系数均相同。The method according to claim 4, wherein each weight coefficient of each row of the weight coefficients in the weight matrix is the same.
  6. 根据权利要求5所述的方法,其特征在于,M=5,所述权重矩阵中每一行的权重系数均相同,且为[0.2 0.2 0.2 0.2 0.2]。The method according to claim 5, characterized in that, M=5, the weight coefficients of each row in the weight matrix are the same, and are [0.2 0.2 0.2 0.2 0.2].
  7. 根据权利要求4所述的方法,其特征在于,M=5,所述权重矩阵中每一行的权重系数均相同,且为[0.0448 0.2856 0.3001 0.2363 0.1333]。The method according to claim 4, characterized in that, M=5, the weight coefficients of each row in the weight matrix are the same, and are [0.0448 0.2856 0.3001 0.2363 0.1333].
  8. 根据权利要求4所述的方法,其特征在于,所述C个卷积核按照特征频率高低划分为高频特征组、中频特征组以及低频特征组,所述M个尺度的特征图根据所述M个尺度的特征图的分辨率划分为低尺度特征图、中尺度特征图以及高尺度特征图;The method according to claim 4, wherein the C convolution kernels are divided into high-frequency feature groups, intermediate-frequency feature groups, and low-frequency feature groups according to feature frequencies, and the feature maps of the M scales are based on the The resolution of feature maps of M scales is divided into low-scale feature maps, medium-scale feature maps and high-scale feature maps;
    所述高频特征组、所述中频特征组以及所述低频特征组对应的特征图在不同尺度下的权重系数分布满足以下条件:The distribution of weight coefficients at different scales of the feature maps corresponding to the high-frequency feature group, the intermediate-frequency feature group, and the low-frequency feature group satisfies the following conditions:
    所述高频特征组对应的特征图的不为0的权重系数分布于所述低尺度特征图中,所述高尺度特征图的权重系数为0;The non-zero weight coefficient of the feature map corresponding to the high-frequency feature group is distributed in the low-scale feature map, and the weight coefficient of the high-scale feature map is 0;
    所述低频特征组对应的特征图的不为0的权重系数分布于所述低尺度特征图、所述中尺度特征图以及所述高尺度特征图中;The non-zero weight coefficients of the feature map corresponding to the low-frequency feature group are distributed in the low-scale feature map, the medium-scale feature map, and the high-scale feature map;
    所述中频特征组对应的特征图的不为0的权重系数的分布多于所述高频特征组对应的特征图的权重系数分布,且少于所述低频特征组对应的特征图的权重系数分布。The distribution of non-zero weight coefficients of the feature map corresponding to the intermediate frequency feature group is more than the weight coefficient distribution of the feature map corresponding to the high frequency feature group, and less than the weight coefficient of the feature map corresponding to the low frequency feature group distributed.
  9. 根据权利要求8所述的方法,其特征在于,所述高频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,所述低频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,所述中频特征对应的特征图的不为0的权重系数中每一个权重系数均相同。The method according to claim 8, wherein the weight coefficients of the feature map corresponding to the high-frequency feature group are not 0, and each weight coefficient is the same, and the weight coefficients of the feature map corresponding to the low-frequency feature group are not 0. Each of the weight coefficients of 0 is the same, and each of the weight coefficients of the feature map corresponding to the intermediate frequency feature is the same.
  10. 根据权利要求9所述的方法,其特征在于,M=5,所述高频特征组对应的特征图的权重系数为[1 0 0 0 0],所述低频特征组对应的特征图的权重系数为[1/5 1/5 1/5 1/5  1/5],所述中频特征对应的特征图的权重系数为[1/3 1/3 1/3 0 0];The method according to claim 9, wherein M=5, the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0], and the weight of the feature map corresponding to the low-frequency feature group The coefficient is [1/5 1/5 1/5 1/5 1/5], and the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
    其中,[1 0 0 0 0]、[1/5 1/5 1/5 1/5 1/5]以及[1/3 1/3 1/3 0 0]中,第一列的权重系数均用于表示所述低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数均用于表示所述中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数均用于表示所述高尺度特征图的权重系数。Among them, in [1 0 0 0 0], [1/5 1/5 1/5 1/5 1/5] and [1/3 1/3 1/3 0 0], the weight coefficients of the first column are all Used to represent the weight coefficient of the low-scale feature map, the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map, the weight coefficient of the fourth column and the fifth column The weight coefficients of are used to represent the weight coefficients of the high-scale feature map.
  11. 根据权利要求8至10中任一项所述的方法,其特征在于,所述特征频率由所述单层卷积神经网络的卷积核的模大小和/或所述单层卷积神经网络的归一化层的缩放系数和归一化系数确定。The method according to any one of claims 8 to 10, wherein the characteristic frequency is determined by the modulus size of the convolution kernel of the single-layer convolutional neural network and/or the single-layer convolutional neural network The scaling factor and normalization factor of the normalization layer are determined.
  12. 如权利要求1至11中任一项所述的方法,其特征在于,所述第一图像为编解码器的输入,所述第二图像为所述编解码器的输出,所述编解码器用于对所述第一图像进行压缩和重建以输出所述第二图像,所述相似度用于优化所述编解码器。The method according to any one of claims 1 to 11, wherein the first image is an input of a codec, the second image is an output of the codec, and the codec uses For compressing and reconstructing the first image to output the second image, the similarity is used to optimize the codec.
  13. 如权利要求1至11中任一项所述的方法,其特征在于,所述第一图像为第三图像经过图像信号处理后的图像,所述第三图像为编解码器的输入,所述第二图像为所述编解码器的输出,所述编解码器用于对所述第三图像进行图像信号处理、压缩以及重建以输出所述第二图像,所述相似度用于优化所述编解码器。The method according to any one of claims 1 to 11, wherein the first image is an image after image signal processing of a third image, the third image is an input of a codec, and the The second image is the output of the codec, and the codec is used to perform image signal processing, compression and reconstruction on the third image to output the second image, and the similarity is used to optimize the codec decoder.
  14. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it comprises:
    获取模块,用于获取第一图像和第二图像;An acquisition module, configured to acquire the first image and the second image;
    处理模块,用于将所述第一图像和所述第二图像分别输入至单层卷积神经网络,得到所述第一图像的第一尺度的特征图和所述第二图像的第一尺度的特征图,所述单层卷积神经网络包括C个卷积核,所述第一图像的第一尺度的特征图的数量和所述第二图像的第一尺度的特征图的数量均为C;A processing module, configured to input the first image and the second image to a single-layer convolutional neural network, respectively, to obtain a feature map of the first scale of the first image and a first scale of the second image The feature map of the single-layer convolutional neural network includes C convolution kernels, the number of feature maps of the first scale of the first image and the number of feature maps of the first scale of the second image are both C;
    分别对所述第一图像的第一尺度的特征图和所述第二图像的第二尺度的特征图进行M-1次下采样,得到所述第一图像的M-1个尺度的特征图和所述第二图像的M-1个尺度的特征图,其中,C和M均为大于1的正整数;以及,respectively performing M-1 downsampling on the feature map of the first scale of the first image and the feature map of the second scale of the second image to obtain feature maps of M-1 scales of the first image and feature maps of M-1 scales of the second image, wherein both C and M are positive integers greater than 1; and,
    根据所述第一图像的第一尺度的特征图、所述第二图像的第一尺度的特征图、所述第一图像的M-1个尺度的特征图、所述第二图像的M-1个尺度的特征图以及与M个尺度和特征图的数量C相关的权重矩阵,确定所述第一图像相对于所述第二图像的相似度,所述权重矩阵包括不同尺度的不同特征图的权重系数。According to the feature map of the first scale of the first image, the feature map of the first scale of the second image, the feature map of M-1 scales of the first image, the M- A feature map of 1 scale and a weight matrix related to M scales and the number C of feature maps to determine the similarity of the first image relative to the second image, the weight matrix includes different feature maps of different scales weight coefficient of .
  15. 根据权利要求14所述的装置,其特征在于,所述处理模块还用于:The device according to claim 14, wherein the processing module is also used for:
    根据第一尺度的特征图的峰值信噪比,确定所述第一尺度的特征图的权重系数,所述第一尺度的特征图包括所述第一图像的第一尺度的特征图和所述第二图像的第一尺度的特征图;According to the peak signal-to-noise ratio of the feature map of the first scale, the weight coefficient of the feature map of the first scale is determined, and the feature map of the first scale includes the feature map of the first image of the first scale and the A feature map of the first scale of the second image;
    根据所述第一尺度的特征图的权重系数、所述第一图像的第一尺度的特征图、所述第二图像的第一尺度的特征图、所述第一图像的M-1个尺度的特征图、所述第二图像的M-1个尺度的特征图以及所述权重矩阵,确定所述相似度。According to the weight coefficient of the feature map of the first scale, the feature map of the first scale of the first image, the feature map of the first scale of the second image, M-1 scales of the first image The feature map of the second image, the feature map of M-1 scales of the second image, and the weight matrix determine the similarity.
  16. 根据权利要求15所述的装置,其特征在于,所述相似度通过下列公式得到:The device according to claim 15, wherein the similarity is obtained by the following formula:
    Figure PCTCN2022076277-appb-100010
    Figure PCTCN2022076277-appb-100010
    Figure PCTCN2022076277-appb-100011
    Figure PCTCN2022076277-appb-100011
    Figure PCTCN2022076277-appb-100012
    Figure PCTCN2022076277-appb-100012
    Figure PCTCN2022076277-appb-100013
    Figure PCTCN2022076277-appb-100013
    其中,ASSIM为所述相似度,x为所述第一图像,y为所述第二图像,所述f()为所述单层卷积神经网络的卷积操作,i为所述单层卷积神经网络中的第i个卷积核,1≤i≤C,j为第j个尺度,1≤j≤M,w i(f 1 i(x),f 1 i(y))为所述第一尺度的特征图的权重系数,
    Figure PCTCN2022076277-appb-100014
    为第j个尺度的第i个特征图的结构相似性,β ij为所述权重矩阵中第i行第j列对应的权重系数,f 1 i(x)为所述第一图像的第一尺度的特征图中的第i个特征图,f 1 i(y)为所述第二图像的第一尺度的特征图中的第i个特征图,PSNR i为第一尺度第i个特征图的峰值信噪比,α为常数,
    Figure PCTCN2022076277-appb-100015
    为所述第一图像的第j个尺度的第i个特征图的方差,
    Figure PCTCN2022076277-appb-100016
    为所述第二图像的第j个尺度的第i个特征图的方差,
    Figure PCTCN2022076277-appb-100017
    为所述第一图像的第j个尺度的第i个特征图和所述第二图像的第j个尺度的第i个特征图的协方差,max(f 1 i(x),f 1 i(y))为所述第一图像和第二图像的第一尺度的特征图中像素值的最大值,min(f 1 i(x),f 1 i(y))为所述第一图像和第二图像的第一尺度的特征图中像素值的最小值,H为所述第一图像或所述第二图像的高度,W为所述第一图像或所述第二图像的宽度。
    Wherein, ASSIM is the similarity, x is the first image, y is the second image, the f() is the convolution operation of the single-layer convolutional neural network, and i is the single-layer The i-th convolution kernel in the convolutional neural network, 1≤i≤C, j is the j-th scale, 1≤j≤M, w i (f 1 i (x),f 1 i (y)) is The weight coefficient of the feature map of the first scale,
    Figure PCTCN2022076277-appb-100014
    is the structural similarity of the i-th feature map of the j-th scale, β ij is the weight coefficient corresponding to the i-th row and j-th column in the weight matrix, and f 1 i (x) is the first The i-th feature map in the feature map of the first scale, f 1 i (y) is the i-th feature map in the first-scale feature map of the second image, PSNR i is the i-th feature map of the first scale The peak signal-to-noise ratio of , α is a constant,
    Figure PCTCN2022076277-appb-100015
    is the variance of the i-th feature map of the j-th scale of the first image,
    Figure PCTCN2022076277-appb-100016
    is the variance of the i-th feature map of the j-th scale of the second image,
    Figure PCTCN2022076277-appb-100017
    is the covariance of the i-th feature map of the j-th scale of the first image and the i-th feature map of the j-th scale of the second image, max(f 1 i (x),f 1 i (y)) is the maximum value of the pixel value in the feature map of the first scale of the first image and the second image, and min(f 1 i (x), f 1 i (y)) is the first image and the minimum value of pixel values in the feature map of the first scale of the second image, H is the height of the first image or the second image, and W is the width of the first image or the second image.
  17. 根据权利要求14至16中任一项所述的装置,其特征在于,所述权重矩阵为C行M列的矩阵,所述权重矩阵中每一行的权重系数用于表示特征图在不同尺度下的权重系数,所述权重矩阵中每一行的权重系数之和为1。The device according to any one of claims 14 to 16, wherein the weight matrix is a matrix of C rows and M columns, and the weight coefficient of each row in the weight matrix is used to represent the feature map at different scales The weight coefficient of each row in the weight matrix is 1.
  18. 根据权利要求17所述的装置,其特征在于,所述权重矩阵中每一行的权重系数的每一个权重系数均相同。The device according to claim 17, wherein each weight coefficient of each row of the weight coefficients in the weight matrix is the same.
  19. 根据权利要求18所述的装置,其特征在于,M=5,所述权重矩阵中每一行的权重系数均相同,且为[0.2 0.2 0.2 0.2 0.2]。The device according to claim 18, characterized in that, M=5, and the weight coefficients of each row in the weight matrix are the same, and are [0.2 0.2 0.2 0.2 0.2].
  20. 根据权利要求17所述的装置,其特征在于,M=5,所述中每一行的权重系数均相同,且为[0.0448 0.2856 0.3001 0.2363 0.1333]。The device according to claim 17, wherein M=5, and the weight coefficients of each row are the same, and are [0.0448 0.2856 0.3001 0.2363 0.1333].
  21. 根据权利要求17所述的装置,其特征在于,所述C个卷积核按照特征频率高低划分为高频特征组、中频特征组以及低频特征组,所述M个尺度的特征图根据所述M个尺度的特征图的分辨率划分为低尺度特征图、中尺度特征图以及高尺度特征图;The device according to claim 17, wherein the C convolution kernels are divided into a high-frequency feature group, an intermediate-frequency feature group, and a low-frequency feature group according to the feature frequency, and the feature maps of the M scales are based on the The resolution of feature maps of M scales is divided into low-scale feature maps, medium-scale feature maps and high-scale feature maps;
    所述权重矩阵中所述高频特征组、所述中频特征组以及所述低频特征组对应的特征图在不同尺度下的权重系数分布满足以下条件:The distribution of weight coefficients at different scales of the feature maps corresponding to the high-frequency feature group, the intermediate-frequency feature group, and the low-frequency feature group in the weight matrix meet the following conditions:
    所述高频特征组对应的特征图的不为0的权重系数分布于所述第一图像和所述第二图像的低尺度的特征图中,所述高尺度特征图的权重系数为0;The non-zero weight coefficient of the feature map corresponding to the high-frequency feature group is distributed in the low-scale feature map of the first image and the second image, and the weight coefficient of the high-scale feature map is 0;
    所述低频特征组对应的特征图的不为0的权重系数分布于所述低尺度特征图、所述中尺度特征图以及所述高尺度特征图中;The non-zero weight coefficients of the feature map corresponding to the low-frequency feature group are distributed in the low-scale feature map, the medium-scale feature map, and the high-scale feature map;
    所述中频特征组对应的特征图的不为0的权重系数的分布多于所述高频特征组对应的特征图的权重系数分布,且少于所述低频特征组对应的特征图的权重系数分布。The distribution of non-zero weight coefficients of the feature map corresponding to the intermediate frequency feature group is more than the weight coefficient distribution of the feature map corresponding to the high frequency feature group, and less than the weight coefficient of the feature map corresponding to the low frequency feature group distributed.
  22. 根据权利要求21所述的装置,其特征在于,所述高频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,所述低频特征组对应的特征图的不为0的权重系数中每一个权重系数均相同,所述中频特征对应的特征图的不为0的权重系数中每一个权重系数均相同。The device according to claim 21, wherein the weight coefficients of the feature map corresponding to the high-frequency feature group are not 0, and each weight coefficient is the same, and the weight coefficients of the feature map corresponding to the low-frequency feature group are not 0. Each of the weight coefficients of 0 is the same, and each of the weight coefficients of the feature map corresponding to the intermediate frequency feature is the same.
  23. 根据权利要求22所述的装置,其特征在于,M=5,所述高频特征组对应的特征图的权重系数为[1 0 0 0 0],所述低频特征组对应的特征图的权重系数为[1/5 1/5 1/5 1/5 1/5],所述中频特征对应的特征图的权重系数为[1/3 1/3 1/3 0 0];The device according to claim 22, wherein M=5, the weight coefficient of the feature map corresponding to the high-frequency feature group is [1 0 0 0 0], and the weight of the feature map corresponding to the low-frequency feature group The coefficient is [1/5 1/5 1/5 1/5 1/5], and the weight coefficient of the feature map corresponding to the intermediate frequency feature is [1/3 1/3 1/3 0 0];
    其中,[1 0 0 0 0]、[1/5 1/5 1/5 1/5 1/5]以及[1/3 1/3 1/3 0 0]中,第一列的权重系数均用于表示所述低尺度特征图的权重系数,第二列的权重系数和第三列的权重系数均用于表示所述中尺度特征图的权重系数,第四列的权重系数和第五列的权重系数均用于表示所述高尺度特征图的权重系数。Among them, in [1 0 0 0 0], [1/5 1/5 1/5 1/5 1/5] and [1/3 1/3 1/3 0 0], the weight coefficients of the first column are all Used to represent the weight coefficient of the low-scale feature map, the weight coefficient of the second column and the weight coefficient of the third column are both used to represent the weight coefficient of the medium-scale feature map, the weight coefficient of the fourth column and the fifth column The weight coefficients of are used to represent the weight coefficients of the high-scale feature map.
  24. 根据权利要求21至23中任一项所述的装置,其特征在于,所述特征频率由所述单层卷积神经网络的卷积核的模大小和/或所述单层卷积神经网络的归一化层的缩放系数和归一化系数确定。The device according to any one of claims 21 to 23, wherein the characteristic frequency is determined by the modulus size of the convolution kernel of the single-layer convolutional neural network and/or the single-layer convolutional neural network The scaling factor and normalization factor of the normalization layer are determined.
  25. 如权利要求14至24中任一项所述的装置,其特征在于,所述第一图像为编解码器的输入,所述第二图像为所述编解码器的输出,所述编解码器用于对所述第一图像进行压缩和重建以输出所述第二图像,所述相似度用于优化所述编解码器。The device according to any one of claims 14 to 24, wherein the first image is an input of a codec, the second image is an output of the codec, and the codec uses For compressing and reconstructing the first image to output the second image, the similarity is used to optimize the codec.
  26. 如权利要求14至24中任一项所述的装置,其特征在于,所述第一图像为第三图像经过图像信号处理后的图像,所述第三图像为编解码器的输入,所述第二图像为所述编解码器的输出,所述编解码器用于对所述第三图像进行图像信号处理、压缩以及重建以输出所述第二图像,所述相似度用于优化所述编解码器。The device according to any one of claims 14 to 24, wherein the first image is an image after image signal processing of a third image, the third image is an input of a codec, and the The second image is the output of the codec, and the codec is used to perform image signal processing, compression and reconstruction on the third image to output the second image, and the similarity is used to optimize the codec decoder.
  27. 一种图像处理装置,其特征在于,包括处理器和存储器,所述存储器用于存储代码指令,所述处理器用于运行所述代码指令,以执行如权利要求1至13中任一项所述的方法。An image processing device, characterized in that it includes a processor and a memory, the memory is used to store code instructions, and the processor is used to run the code instructions, so as to perform any one of claims 1 to 13 Methods.
  28. 一种芯片***,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片***的通信设备执行权利要求1至13中任一项所述的方法。A system-on-a-chip, characterized by comprising: a processor, configured to invoke and run a computer program from a memory, so that a communication device installed with the system-on-a-chip executes the method according to any one of claims 1 to 13.
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序在计算机上运行时,使得权利要求1至13中任一项所述的方法被执行。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run on a computer, the method according to any one of claims 1 to 13 be executed.
  30. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述指令被执行时,使得权利要求1至13中任一项所述的方法被执行。A computer program product, characterized in that the computer program product includes instructions that, when executed, cause the method of any one of claims 1 to 13 to be performed.
PCT/CN2022/076277 2022-02-15 2022-02-15 Image processing method and image processing apparatus WO2023155032A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/076277 WO2023155032A1 (en) 2022-02-15 2022-02-15 Image processing method and image processing apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/076277 WO2023155032A1 (en) 2022-02-15 2022-02-15 Image processing method and image processing apparatus

Publications (1)

Publication Number Publication Date
WO2023155032A1 true WO2023155032A1 (en) 2023-08-24

Family

ID=87577251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/076277 WO2023155032A1 (en) 2022-02-15 2022-02-15 Image processing method and image processing apparatus

Country Status (1)

Country Link
WO (1) WO2023155032A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843690A (en) * 2023-09-01 2023-10-03 荣耀终端有限公司 Image quality evaluation method, device and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830796A (en) * 2018-06-20 2018-11-16 重庆大学 Based on the empty high spectrum image super-resolution reconstructing method combined and gradient field is lost of spectrum
EP3624452A1 (en) * 2017-07-06 2020-03-18 Samsung Electronics Co., Ltd. Method for encoding/decoding image, and device therefor
CN111046893A (en) * 2018-10-12 2020-04-21 富士通株式会社 Image similarity determining method and device, and image processing method and device
CN111754403A (en) * 2020-06-15 2020-10-09 南京邮电大学 Image super-resolution reconstruction method based on residual learning
CN113269843A (en) * 2021-04-16 2021-08-17 西安数合信息科技有限公司 Digital radiographic image reconstruction algorithm based on basic dense connecting block
CN113487564A (en) * 2021-07-02 2021-10-08 杭州电子科技大学 Double-current time sequence self-adaptive selection video quality evaluation method for user original video
US11170581B1 (en) * 2020-11-12 2021-11-09 Intrinsic Innovation Llc Supervised domain adaptation
CN113724222A (en) * 2021-08-30 2021-11-30 京东方科技集团股份有限公司 Video quality evaluation method and device, electronic equipment and storage medium
CN113947642A (en) * 2021-10-18 2022-01-18 北京航空航天大学 X-space magnetic particle imaging deconvolution method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3624452A1 (en) * 2017-07-06 2020-03-18 Samsung Electronics Co., Ltd. Method for encoding/decoding image, and device therefor
CN108830796A (en) * 2018-06-20 2018-11-16 重庆大学 Based on the empty high spectrum image super-resolution reconstructing method combined and gradient field is lost of spectrum
CN111046893A (en) * 2018-10-12 2020-04-21 富士通株式会社 Image similarity determining method and device, and image processing method and device
CN111754403A (en) * 2020-06-15 2020-10-09 南京邮电大学 Image super-resolution reconstruction method based on residual learning
US11170581B1 (en) * 2020-11-12 2021-11-09 Intrinsic Innovation Llc Supervised domain adaptation
CN113269843A (en) * 2021-04-16 2021-08-17 西安数合信息科技有限公司 Digital radiographic image reconstruction algorithm based on basic dense connecting block
CN113487564A (en) * 2021-07-02 2021-10-08 杭州电子科技大学 Double-current time sequence self-adaptive selection video quality evaluation method for user original video
CN113724222A (en) * 2021-08-30 2021-11-30 京东方科技集团股份有限公司 Video quality evaluation method and device, electronic equipment and storage medium
CN113947642A (en) * 2021-10-18 2022-01-18 北京航空航天大学 X-space magnetic particle imaging deconvolution method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAO-HSIANG YANG ET AL.: "Y-NET: MULTI-SCALE FEATURE AGGREGATION NETWORK WITH WAVELET", INFORMATION & COMMUNICATIONS, 31 May 2020 (2020-05-31), pages 2628 - 2632, XP033793574, DOI: 10.1109/ICASSP40776.2020.9053920 *
HUIJUAN HUANG, YU JING,AND SUN WEIDONG: "A No-Reference SVD-Based Image Quality Assessment Method for Super-Resolution Reconstruction", JOURNAL OF COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS, vol. 24, no. 9, 15 September 2012 (2012-09-15), pages 1204 - 2010, XP093085612 *
XIAOYI WANG: "Analysis of Image Quality Evaluation Algorithms for Super-resolution Reconstruction", INFORMATION & COMMUNICATIONS, vol. 9, 15 September 2017 (2017-09-15), pages 109 - 111, XP093085615 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843690A (en) * 2023-09-01 2023-10-03 荣耀终端有限公司 Image quality evaluation method, device and system
CN116843690B (en) * 2023-09-01 2024-03-01 荣耀终端有限公司 Image quality evaluation method, device and system

Similar Documents

Publication Publication Date Title
EP3916628A1 (en) Object identification method and device
WO2021164731A1 (en) Image enhancement method and image enhancement apparatus
EP4090022A1 (en) Image processing method and related device
EP4075374A1 (en) Image processing method and apparatus, and image processing system
CN111046781B (en) Robust three-dimensional target detection method based on ternary attention mechanism
CN111274980B (en) Small-size traffic sign identification method based on YOLOV3 and asymmetric convolution
JP2021536071A (en) Obstacle detection method, intelligent driving control method, device, medium, and equipment
CN109522831B (en) Real-time vehicle detection method based on micro-convolution neural network
KR20210064123A (en) Method and apparatus for recognizing wearing state of safety belt, electronic device, and storage medium
WO2023155032A1 (en) Image processing method and image processing apparatus
WO2024002211A1 (en) Image processing method and related apparatus
CN117111055A (en) Vehicle state sensing method based on thunder fusion
CN113705427B (en) Fatigue driving monitoring and early warning method and system based on vehicle-mounted chip SoC
EP3575986B1 (en) A lossy data compressor for vehicle control systems
CN117274115A (en) Image enhancement method and system based on multi-scale sparse transducer network
CN110633630B (en) Behavior identification method and device and terminal equipment
CN117132964A (en) Model training method, point cloud coding method, object processing method and device
CN115909078A (en) Ship classification method based on HRRP and SAR data feature level fusion
CN113744220A (en) PYNQ-based preselection-frame-free detection system
WO2021189321A1 (en) Image processing method and device
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium
WO2023082162A1 (en) Image processing method and apparatus
EP4364091A1 (en) Multimodal method and apparatus for segmentation and depht estimation
Krömer et al. Adaptive fuzzy video compression control for advanced driver assistance systems
WO2024032075A1 (en) Training method for image processing network, and coding method, decoding method, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22926370

Country of ref document: EP

Kind code of ref document: A1