CN111630570A - Image processing method, apparatus and computer-readable storage medium - Google Patents

Image processing method, apparatus and computer-readable storage medium Download PDF

Info

Publication number
CN111630570A
CN111630570A CN201980008045.4A CN201980008045A CN111630570A CN 111630570 A CN111630570 A CN 111630570A CN 201980008045 A CN201980008045 A CN 201980008045A CN 111630570 A CN111630570 A CN 111630570A
Authority
CN
China
Prior art keywords
image
processed
channel
frequency domain
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980008045.4A
Other languages
Chinese (zh)
Inventor
李恒杰
赵文军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Shenzhen Dajiang Innovations Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN111630570A publication Critical patent/CN111630570A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the invention provides an image processing method, image processing equipment and a computer readable storage medium, wherein the image processing method comprises the following steps: acquiring frequency domain information of an image to be processed, wherein the frequency domain information is obtained by performing time-frequency conversion processing on an image encoder; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter. According to the embodiment of the invention, the image encoder is effectively combined with the first neural network model, so that the efficiency and the performance of image encoding are improved.

Description

Image processing method, apparatus and computer-readable storage medium
Technical Field
Embodiments of the present invention relate to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, and a computer-readable storage medium.
Background
Since image information often contains a lot of redundant information, image encoding of the image information is required when the image information is transmitted or stored by a digital method. Image coding, also called image compression, is a technique for representing an image or information included in an image with a small number of bits under a condition that a certain quality is satisfied.
The existing image coding is generally realized by an image coder, and because differences often exist between different images, differences also exist between different image characteristics. When the image is coded by the image coder, coding personnel are required to manually extract the characteristics of different images and adjust the parameters according to the extracted characteristics.
However, when the method is used for image coding, on one hand, the essential features of the image cannot be directly extracted in the feature extraction process, on the other hand, the method often has high requirements on the professional quality of coding personnel, and the feature extraction and the parameter determination are both manually realized, which is time-consuming and labor-consuming, and further results in low image coding efficiency.
Disclosure of Invention
The embodiment of the invention provides an image processing method, image processing equipment and a computer readable storage medium, which aim to solve the technical problems that image coding is time-consuming and labor-consuming through manual work in the prior art.
According to a first aspect of embodiments of the present invention, there is provided an image processing method including: acquiring frequency domain information of an image to be processed, wherein the frequency domain information is obtained by performing time-frequency conversion processing on an image encoder; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter.
According to a second aspect of the embodiments of the present invention, there is provided an image processing apparatus including: a memory and a processor; the memory is used for storing program codes; the processor, invoking the program code, when executed, is configured to: acquiring frequency domain information of an image to be processed, wherein the frequency domain information is obtained by performing time-frequency conversion processing on an image encoder; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the first aspect.
In the technical solutions provided by some embodiments of the present invention, by obtaining frequency domain information of an image to be processed, the frequency domain information is obtained by performing time-frequency conversion processing by an image encoder; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter. On one hand, by combining the image encoder with the first neural network model and designing the image encoder parameter optimization scheme based on deep learning, the automatic optimization of the first encoding parameter of the image to be processed can be realized through the first neural network model, the image encoding efficiency is effectively improved, the human resources can be saved, the artificial design characteristics and the complex artificial calculation and parameter selection are not needed, the difficulty and time consumption of the image encoder optimization are reduced, and the efficiency is improved; on the other hand, based on the deep learning, the optimal first coding parameter is selected for the image encoder according to the frequency domain information of the image to be processed, so that the coding efficiency and performance of the image encoder are improved, and the image coding effect with higher quality can be obtained under the same compression rate, so that the optimal decoding effect under the same evaluation index can be further realized, and the effective combination of the deep learning and the internal structure of the image encoder is realized. The embodiment of the invention discloses a frequency domain characteristic-based image encoder parameter optimization mode which can be applied to scene-related products such as image and video compression encoding.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart illustrating an image encoding method according to an exemplary embodiment of the present invention;
FIG. 2 is a block diagram of an image encoder provided in an exemplary embodiment of the present invention;
FIG. 3 is a block diagram of a VGG-16 model provided in an exemplary embodiment of the invention;
FIG. 4 is a block diagram of the ResNet34 model provided in an exemplary embodiment of the invention;
FIG. 5 is a block diagram of a GoogleLeNet model provided in an exemplary embodiment of the invention;
FIG. 6 is a block diagram of an image encoder based on a first neural network model according to an exemplary embodiment of the present invention;
fig. 7 is a block diagram of an image encoder based on a first neural network model according to another exemplary embodiment of the present invention;
FIG. 8 is a block diagram of an image encoder based on a first neural network model and a second neural network model provided by an exemplary embodiment of the present invention;
FIG. 9 is a flow chart of a process for providing image information according to an exemplary embodiment of the present invention;
fig. 10 is a schematic structural diagram of an image processing apparatus according to a sixth embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Currently, the optimization of image coding parameters is mainly performed by manually selecting encoder parameters, but because the image encoder is relatively complex, the process of optimization is often very complex and time-consuming. In addition, the differences between different pictures cannot be fully considered in the optimization process of the coding parameters, so that the effects of the same coding parameters in different pictures are greatly different.
The deep learning does not need to artificially select features, but extracts image features in a learning training network mode, and then generates a subsequent decision result by the extracted features, thereby realizing the functions of classification, identification and the like. Namely, the neural network has strong learning ability, can comprehensively and accurately establish the mapping relation between the sample and the label, complete a large number of tasks which cannot be completed by the traditional method or greatly improve the efficiency and the precision of the traditional method. Therefore, performing parameter tuning of an image encoder using deep learning can greatly avoid deficiencies in manual encoder parameter optimization.
In order to solve the above technical problems, the present invention provides an image processing method, an image processing apparatus, and a computer-readable storage medium, which effectively combine deep learning with an encoding process or an internal structure of an image encoder, and fully exert the advantages of deep learning while avoiding the disadvantages of manual tuning.
It should be noted that the image encoding method, the image encoding device, and the computer-readable storage medium provided by the present invention can be applied to any image encoding scene.
Fig. 1 is a flowchart illustrating an image encoding method according to an exemplary embodiment of the present invention. The method provided by the embodiment of the present invention may be executed by any terminal device and/or server with computing processing capability, which is not limited in the present invention. As shown in fig. 1, the method provided by the embodiment of the present invention may include the following steps.
In step S110, frequency domain information of an image to be processed is obtained, and the frequency domain information is obtained by performing time-frequency conversion processing on the image to be processed by an image encoder. Specifically, the time-frequency transformation includes any one of K-L transformation, fourier transformation, cosine transformation, and wavelet transformation.
In the embodiment of the present invention, a standard image encoder is taken as an example to illustrate the method, but the technical solution provided by the embodiment of the present invention may be applied to any image encoder.
Fig. 2 is a block diagram of an image encoder according to an exemplary embodiment of the present invention. As shown in fig. 2, the general flow of a standard image encoder is: and inputting the image to be processed into an image encoder, and outputting code stream information after transformation processing, quantization processing and entropy coding processing in sequence. The main current image coding standards are: JPEG (Joint Photographic experts group), JPEG2000, and the like. The basic encoding is done under the framework of fig. 2.
The transformation processing is mainly a process of converting the information of the image to be processed from a time domain to a frequency domain, and aims to separate frequency domain information of different frequency bands, select different quantization step lengths for different frequency bands by utilizing the characteristic that human eyes are insensitive to high-frequency information, and reduce spatial redundancy in image compression so as to obtain higher compression ratio.
The quantization process is to approximately represent the transformed frequency domain information according to a certain quantization step, and the quantized image information can be represented by fewer bits and is an important link for compressing the image.
The entropy encoding process is to represent the quantized image information according to a certain encoding rule. And code stream information obtained after entropy coding is the representation of the image to be processed after being coded by the image coder.
At present, the traditional optimization aiming at an image encoder is mainly to perform parameter optimization in a manual mode, and starts from an encoding flow of the image encoder, namely transformation precision, quantization table design, probability distribution estimation in entropy encoding and the like. But the optimization of image coding parameters is further limited due to higher requirements on professional quality of coding personnel and time and labor consumption; meanwhile, the difference between the images is not fully considered by the current image encoder, and the universality of the encoding parameters is poor.
In step S120, the frequency domain information is processed through a preset first neural network model, so as to obtain a first encoding parameter of the image to be processed.
In an exemplary embodiment, the first neural network model may include N operation units connected in sequence. The processing the frequency domain information through a preset first neural network model to obtain a first encoding parameter of the image to be processed may include: inputting the frequency domain information to an nth operation unit of the first neural network model, and outputting the first coding parameter through the nth operation unit of the first neural network model; wherein N is more than or equal to N; n is a positive integer of 2 or more. It should be noted that the arithmetic unit mentioned in the embodiment of the present invention may be a software module.
In an exemplary embodiment, the first neural network model may include at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet, and the like.
In an exemplary embodiment, n is 2. However, the present invention is not limited to this, and in the following description, n ═ 2 is exemplified.
VGGNet is a deep convolutional neural network. VGGNet explores the relation between the depth and the performance of a convolutional neural network, successfully constructs a 16-19-layer-deep convolutional neural network, proves that the final performance of the network can be influenced to a certain extent by increasing the depth of the network, so that the error rate is greatly reduced, meanwhile, the expansibility is very strong, and the generalization when the network is migrated to other picture data is very good, so that the image characteristics can be extracted.
There are several different configurations of VGGNet, such as the VGG-16 and VGG-19 models. In the following fig. 3, design modification based on the VGG-16 model is performed as the first neural network model for parameter optimization in the embodiment of the present invention. Since the transformation process of the image encoder can be considered to be similar to the function of the neural network convolutional layer, the first two convolutional layers and the first pooling layer of the VGG-16 model can be removed, and a new neural network can be obtained as the neural network model with optimized parameters.
As shown in fig. 3, the VGG-16 model may include a first operation unit 310, a second operation unit 320, a third operation unit 330, a fourth operation unit 340, a fifth operation unit 350, a sixth operation unit 360, and a seventh operation unit 370, which are connected in sequence. The first operation unit 310 may include a first convolutional layer 311, a second convolutional layer 312, and a first pooling layer 313. The second operation unit 320 may include a third convolutional layer 321 and a fourth convolutional layer 322. The third operation unit 330 may include a second pooling layer 331, a fifth convolution layer 332, a sixth convolution layer 333, and a seventh convolution layer 334. The fourth operation unit 340 may include a third pooling layer 341, an eighth convolutional layer 342, a ninth convolutional layer 343, and a tenth convolutional layer 344. The fifth arithmetic unit 350 may include a fourth pooling layer 351, an eleventh convolution layer 352, a twelfth convolution layer 353, and a thirteenth convolution layer 354. The sixth operation unit 360 may include a fifth pooling layer 361, a first fully-connected layer 362, a second fully-connected layer 363, and a third fully-connected layer 364. The seventh operation unit 370 includes a softmax layer (normalization layer).
In the embodiment of the present invention, the frequency domain information output after the image to be processed is transformed by the image encoder is directly input to the second operation unit 320 of the VGG-16 model, and then sequentially processed by the third operation unit 330, the fourth operation unit 340, the fifth operation unit 350, the sixth operation unit 360, and the seventh operation unit 370, so as to output the first encoding parameter. That is, the network that processes the frequency domain information contains 11 convolutional layers, 4 pooling layers, and 3 fully-connected layers. Compared with 13 convolutional layers, 5 pooling layers and 3 full-connection layers of the VGG-16 network, the network layer number for processing the frequency domain information is reduced, the calculation amount is reduced, the speed is increased, and the real-time performance of the encoding process is improved.
Fig. 4 is a structural diagram of a ResNet34 model according to an exemplary embodiment of the present invention.
As shown in fig. 4, the first neural network model for processing frequency domain information in the embodiment of the present invention may also use a ResNet model with the first convolutional layer and the first pooling layer (i.e., the first arithmetic unit 410 in fig. 4) removed, that is, directly input the frequency domain information output by the image encoder to the second convolutional layer, and finally output the first encoding parameter through the fully connected layer (fc).
Fig. 5 is a structural diagram of a *** lenet model according to an exemplary embodiment of the present invention.
As shown in fig. 5, the first neural network model for processing frequency domain information in the embodiment of the present invention may also use a *** lenet model without the first operation unit 510. The first arithmetic unit 510 may include a first convolution layer, a first maximum pooling layer, and a first lrn (local Response norm) layer, and finally outputs the first encoding parameter through a softmax (normalization) layer. Wherein, IM in fig. 5 is an inclusion module (short-end module).
It should be noted that, although the above-mentioned fig. 3-5 exemplifies the first Neural Network model designed based on the VGG-16, ResNet34 and *** lenet models, the present invention is not limited to this, and other Neural Networks may be selected for feature extraction according to specific application scenarios and actual requirements, such as other variants of CNN (Convolutional Neural Networks), RNN (Recurrent Neural Networks), etc., for example, VGG-19 or Networks of the same depth or level, such as Networks of res net50 and above, *** lenet and some Neural Networks of the same depth level. In addition, although the frequency domain information is processed by taking the first operation unit of the first neural network model as an example in the above description, in other embodiments, more operation units may be eliminated.
In step S130, the first encoding parameter is sent to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter.
In an exemplary embodiment, the first encoding parameter may include at least one of a typical quantization parameter design, a quantization table design, a feature transformation precision design, a scale design of a code rate control, and the like. Wherein, the typical quantization parameter design and the quantization table design correspond to the parameter of the image encoder for quantization processing; the feature transformation precision relates to a parameter corresponding to the time-frequency conversion of the image encoder; and the ratio design of the code rate control corresponds to parameters for entropy coding processing of the image encoder.
According to the image processing method provided by the embodiment of the invention, frequency domain information of an image to be processed after time-frequency conversion processing is carried out through an image encoder is obtained; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter. On one hand, by combining the image encoder with the first neural network model and designing the image encoder parameter optimization scheme based on deep learning, the automatic optimization of the first encoding parameter of the image to be processed can be realized through the first neural network model, the image encoding efficiency is effectively improved, the human resources can be saved, the artificial design characteristics and the complex artificial calculation and parameter selection are not needed, the difficulty and time consumption of the image encoder optimization are reduced, and the efficiency is improved; on the other hand, based on the deep learning, the optimal first coding parameter is selected for the image encoder according to the frequency domain information of the image to be processed, so that the coding efficiency and performance of the image encoder are improved, and the image coding effect with higher quality can be obtained under the same compression rate, so that the optimal decoding effect under the same evaluation index can be further realized, and the effective combination of the deep learning and the internal structure of the image encoder is realized. The embodiment of the invention discloses a frequency domain characteristic-based image encoder parameter optimization mode which can be applied to scene-related products such as image and video compression encoding.
In an exemplary embodiment, the sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter may include: and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing and entropy encoding processing on the frequency domain information of the image to be processed according to the first encoding parameter.
In an exemplary embodiment, the method may further include: and the image encoder encodes the image to be processed according to the first encoding parameter to obtain code stream information of the image to be processed.
In an exemplary embodiment, the method may further include: and executing decoding operation on the code stream information by using an image decoder to obtain a reconstructed image to be processed.
Fig. 6 is a block diagram of an image encoder based on a first neural network model according to an exemplary embodiment of the present invention. In the embodiment of fig. 6, the first encoding parameter may include typical quantization parameter design and/or quantization table design and rate-controlled scaling design.
As shown in fig. 6, firstly, an image to be processed is input into an image encoder, and frequency domain information of the image to be processed is output after transformation processing is performed; secondly, the frequency domain information of the image to be processed is used as the input of a first neural network model; then, the first neural network model can output a first coding parameter of the image to be processed; then, the obtained first encoding parameters are input into the image encoder so as to encode the image to be processed. The image encoder quantizes the image to be processed according to the first encoding parameter (e.g., typical quantization parameter design and/or quantization table design) to generate quantization information of the image to be processed, and then performs entropy encoding processing on the quantization information according to the first encoding parameter (e.g., rate control scale design) to output code stream information. And finally, inputting the code stream information into an image decoder for decoding operation, and outputting a decoded image by the image decoder, namely obtaining a reconstructed image to be processed.
In an exemplary embodiment, the sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter may include: and sending the first encoding parameter to the image encoder so that the image encoder performs time-frequency conversion on the image to be processed again according to the first encoding parameter and generates new frequency domain information, and then performing quantization processing and entropy encoding processing on the new frequency domain information based on the first encoding parameter. By the method, the parameters of time-frequency conversion can be reversely adjusted, and the encoding performance of the image encoder is further improved. This is illustrated below in connection with the schematic view of fig. 7.
Fig. 7 is a block diagram of an image encoder based on a first neural network model according to another exemplary embodiment of the present invention.
As shown in fig. 7, an image to be processed is input to an image encoder, and is subjected to transform operation to perform a time-frequency transform process, so as to generate frequency domain information of the image to be processed; then, inputting the frequency domain information to a first neural network model, processing the frequency domain information through the first neural network model, outputting a first coding parameter of the image to be processed, inputting the first coding parameter to a transformation operation of an image encoder, performing time-frequency conversion on the image to be processed again according to the first coding parameter to generate new frequency domain information of the image to be processed, and performing quantization processing on the newly generated new frequency domain information according to the first coding parameter to generate quantization information of the image to be processed; and entropy coding processing is carried out on the quantization information according to the first coding parameter to generate code stream information. And then, an image decoder receives the code stream information and can decode the image to reconstruct the image to be processed.
In an exemplary embodiment, the sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter may include: and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing on the frequency domain information according to the first encoding parameter and generates quantization information of the image to be processed.
In an exemplary embodiment, the method may further include: obtaining quantitative information of the image to be processed; processing the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed; and sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter.
In the embodiment of the invention, the quantization information after quantization processing of the image encoder can be processed by utilizing the second neural network model, and the second encoding parameter is output, so that the image encoder can further encode the image to be processed according to the second encoding parameter. Thereby further improving coding efficiency and performance. The second neural network model may adopt any one or more neural network structures, and the present invention is not limited thereto.
In an exemplary embodiment, the second neural network may include M operation units connected in sequence. The processing the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed may include: inputting the quantization information into an M-th operation unit of the second neural network model, and outputting the second coding parameter through the M-th operation unit of the second neural network model; wherein M is more than or equal to M; m is a positive integer of 2 or more. For example, m may be equal to 2 or 3, but the invention is not limited thereto.
In an exemplary embodiment, the second neural network model may include at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model, etc. Reference may be made to the description above regarding the first neural network model in the embodiments of fig. 3-5.
In an exemplary embodiment, the second encoding parameter may include at least one of a typical quantization parameter design, a quantization table design, a scale design of rate control, and the like.
In an exemplary embodiment, the sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter may include: and sending the second encoding parameter to the image encoder, so that the image encoder performs quantization processing on the frequency domain information of the image to be processed again according to the second encoding parameter (such as typical quantization parameter design and/or quantization table design) to generate new quantization information, and performs entropy encoding processing on the new quantization information based on the second encoding parameter (such as rate-controlled scale design). By the method, the parameters of the quantization processing can be reversely adjusted, and the encoding performance of the image encoder is further improved.
In an exemplary embodiment, the sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter may include: and sending the second encoding parameter to the image encoder so that the image encoder performs entropy encoding processing on the quantization information according to the second encoding parameter. This is illustrated below in connection with fig. 8.
Fig. 8 is a block diagram of an image encoder based on a first neural network model and a second neural network model according to an exemplary embodiment of the present invention. In the embodiment of fig. 8, the first encoding parameters may include an exemplary quantization parameter design and/or a quantization table design; the second encoding parameter may include a scaling of rate control.
As shown in fig. 8, an image to be processed is input to an image encoder, and is subjected to transform operation, and a time-frequency transform processing procedure is performed to generate frequency domain information of the image to be processed; then, inputting the frequency domain information into a first neural network model, processing the frequency domain information through the first neural network model, outputting a first coding parameter of the image to be processed, inputting the first coding parameter into quantization operation of an image encoder, and performing quantization processing on the frequency domain information according to the first coding parameter to generate quantization information of the image to be processed; and entropy coding processing is carried out on the quantization information according to the first coding parameter to generate code stream information. And then, an image decoder receives the code stream information and can decode the image to reconstruct the image to be processed.
In deep learning, a neural network is trained, the purpose is to enable the neural network to have feature extraction capability, different feature information is extracted on the basis of each layer of the neural network on the basis of the previous layer, the deep neural network can extract high-dimensional features, a mapping relation between a sample and a target is established, and complex tasks such as classification and regression are completed. In the field of image application, a convolution kernel of a neural network can be regarded as a series of filters, and an initial convolution layer can be regarded as extraction of image frequency domain features. Meanwhile, as mentioned above, the transform process of the image encoder is to extract frequency characteristics of the image and separate frequency domain information of different frequency bands, that is, the transform process is also a filtering operation.
In the embodiment of the present invention, in consideration of the similarity between the convolution process of the neural network and the transformation process in the encoding flow of the image encoder, the output of the image encoder after transformation (for example, DCT (Discrete Cosine Transform), DWT (Discrete Wavelet Transform), etc.) is used as the input of the first neural network model. The frequency domain information is used as the input of the first neural network model, so that the number of layers of the neural network can be reduced, the training process is accelerated, the internal structure of the image encoder can be better combined with deep learning, the advantage of the deep learning is better played, and the application capability of the deep learning on the parameter optimization task of the image encoder is improved.
In an exemplary embodiment, before the processing the frequency domain information by the preset first neural network model, the method may further include: and training the first neural network model through a preset first training data set. Here the first training data set may comprise frequency domain information of several images for which the first encoding parameters have been labeled.
In an exemplary embodiment, the first training data set may be obtained by: carrying out time-frequency conversion on a plurality of images marked with first coding parameters to obtain frequency domain information of the images; and forming the first training data set according to the frequency domain information of the image marked with the first coding parameter.
For example, K (K is a positive integer greater than or equal to 1) groups of optimal first encoding parameters are prepared in advance for an image data set, and the optimal first encoding parameters corresponding to at least a part of the image in the image data set are known, that is, at least a part of the image label is known.
And obtaining corresponding frequency domain information after time-frequency transformation of the image with the known label as a first training data set of the first neural network model, so that the first neural network model learns the mapping relation between the image characteristics and the optimal first coding parameters. After multiple iterations, network training is completed, and modeling of the mapping relation between the image and the optimal first coding parameter is achieved.
And the trained first neural network model has the capability of optimizing parameters of the image encoder. When the method is used, for each image to be processed, after the transformation processing of the image encoder, the obtained frequency domain information is input into the first neural network model, and the optimal first encoding parameter can be output.
In an exemplary embodiment, before the processing the quantitative information through the preset second neural network model, the method may further include: and training the second neural network model through a preset second training data set. Here the second training data set comprises quantization information of several images for which the second encoding parameters have been labeled.
In an exemplary embodiment, the second training data set may be obtained by: carrying out time-frequency conversion and quantization processing on a plurality of images marked with second coding parameters to obtain quantization information of the images; and forming the second training data set according to the quantization information of the image marked with the second coding parameter.
For example, P (P is a positive integer greater than or equal to 1) groups of optimal second encoding parameters are prepared in advance for the image data set, and the optimal second encoding parameters corresponding to at least a part of the images in the image data set are known, that is, at least a part of the image labels are known.
And obtaining corresponding quantization information after time-frequency transformation and quantization processing of the image with the known label, wherein the quantization information is used as a second training data set of the second neural network model, so that the second neural network model learns the mapping relation between the image characteristics and the optimal second coding parameters. After multiple iterations, network training is completed, and modeling of the mapping relation between the image and the optimal second coding parameter is achieved.
And the trained second neural network model has the capability of optimizing parameters of the image encoder. When the method is used, for each image to be processed, after the transformation processing and the quantization processing of the image encoder, the obtained quantization information is input into the second neural network model, and the optimal second encoding parameter can be output.
In an exemplary embodiment, the method may further include: if the image to be processed is in a YUV format, determining the dimensionality of a U channel and a dimensionality of a V channel of the image to be processed and the dimensionality of a Y channel; and if the dimensions of the U channel and the V channel are not consistent with the dimensions of the Y channel, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimensions of the Y channel.
In an exemplary embodiment, the method may further include: and if the image to be processed is in a preset format, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimension of the Y channel.
In an exemplary embodiment, when the preset format is YUV422 format or YUV420 format, a preprocessing operation is performed on the image to be processed, so that the dimensions of the U channel and the V channel of the image to be processed are consistent with the dimension of the Y channel.
At present, image formats input by an image encoder are basically YUV formats, and the main YUV formats of the image encoder include YUV444, YUV422, YUV420 and the like. Because of YUV422 and YUV420 data formats, there is a down-sampling operation for the UV component, resulting in inconsistent data in each channel dimension.
In an exemplary embodiment, the performing a preprocessing operation on the image to be processed to make the dimensions of the U channel and the V channel of the image to be processed coincide with the dimensions of the Y channel may include: and performing upsampling operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
In an exemplary embodiment, the upsampling the U channel and the V channel of the image to be processed so that the dimensions of the Y, U and the V channel of the image to be processed are the same may include: and carrying out bilinear interpolation operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
In an exemplary embodiment, performing a preprocessing operation on the image to be processed to make the dimensions of the U channel and the V channel of the image to be processed coincide with the dimensions of the Y channel may include: and carrying out downsampling operation on the Y channel of the image to be processed, so that the dimensions of Y, U and V channels of the image to be processed are the same.
In the embodiment of the invention, in the data preprocessing part, for the problem of different YUV three-channel dimensions, a mode of sampling U and V channels can be adopted, and three channels are unified to Y dimension to solve the problem. For example, three channels may be unified into the Y dimension by bilinear interpolation of the U and V channels. Or the Y channel can be downsampled, and the three channels are unified to the dimension of UV. The invention is not limited in this regard.
In an exemplary embodiment, obtaining frequency domain information of the image to be processed after performing time-frequency conversion processing by the image encoder may include: and performing DCT transformation on Y, U and V channels of the image to be processed with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
In an exemplary embodiment, obtaining frequency domain information of the image to be processed after performing time-frequency conversion processing by the image encoder may include: and carrying out DWT (discrete wavelet transform) on Y, U channels and V channels of the to-be-processed image with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
In the embodiment of the present invention, transform processing may be implemented by Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), or the like. These are exemplified below.
For the image to be processed, two-dimensional DCT transformation can be performed on the image to be processed. For example, the image to be processed is divided into blocks of different sizes, and each block is respectively subjected to the following transformation processes:
Figure BDA0002578876470000151
wherein the content of the first and second substances,
Figure BDA0002578876470000152
in the above formula, Cu,CvFirst transformation parameters and second transformation parameters, respectively.
In the two-dimensional discrete wavelet transform, an input image to be processed can be regarded as x [ m, n ], and both m and n are positive integers which are more than or equal to 1. The DWT continuously extracts different frequencies of the image to be processed for multiple times through a high-pass filter h [ n ] and a low-pass filter g [ n ]. Each DWT procedure is as follows:
firstly, high-pass and low-pass filtering is carried out on the n direction:
Figure BDA0002578876470000153
Figure BDA0002578876470000154
then pair v1,L[m,n]And v1,H[m,n]High-pass and low-pass filtering is performed along the m direction.
Figure BDA0002578876470000155
Figure BDA0002578876470000156
Figure BDA0002578876470000157
Figure BDA0002578876470000158
In an exemplary embodiment, before the processing the frequency domain information through the preset first neural network model, the method may further include: and cascading Y, U of the image to be processed and frequency domain information of the V channel.
Fig. 9 is a flowchart of processing image information according to an exemplary embodiment of the present invention. In the embodiment of fig. 9, the up-sampling of the U and V channels, respectively, is exemplified.
As shown in fig. 9, the Y channel directly performs DCT/DWT conversion, the U and V channels perform upsampling operations respectively to make Y, U and V channels have the same dimension, and then perform DCT/DWT conversion, keep the same dimension, cascade the frequency domain information after YUV conversion, and output the frequency domain information as the input of the first neural network model. Thus, the input to the first neural network model is the DCT/DWT transformed coefficient matrix, i.e., the frequency domain information. The dimension of the matrix is not changed in the transformation process, namely, for each image to be processed, the transformed coefficient matrix can still be regarded as three-channel data information. Thus, after the DCT/DWT transformation, the coefficient matrix can be directly used as input for the first neural network model.
The image processing method provided by the embodiment of the invention utilizes the similarity between the transformation process and the deep learning feature extraction in the encoding process of the image encoder, uses the frequency domain information output by the transformation link in the encoding process of the image encoder as the input of the neural network model, and combines the time-frequency transformation part with the neural network model in the encoding process of the image encoder, so that the image features can be extracted by using a new neural network model with fewer layers. On the other hand, through network training, the neural network model provided by the embodiment of the invention can automatically establish the mapping relation between the image characteristics and the optimal coding parameters, so that the optimal coding parameter mode can be adaptively selected according to the input image to be processed to guide the image encoder to carry out image coding, the coding parameter optimization is simplified, and the deep learning is applied to the parameter optimization task of the image encoder. Meanwhile, through the learning of the neural network model, the neural network model can be used for adaptively generating the corresponding optimal coding parameters for each image to be processed, the efficiency of parameter optimization of the image encoder is improved, meanwhile, the characteristics of each image can be subjected to targeted optimization, and the difference between the images is considered, so that the coding performance of the image encoder is improved, and the quality of the decoded image is greatly improved.
Fig. 10 is a schematic structural diagram of an image processing apparatus according to an exemplary embodiment of the present invention. As shown in fig. 10, the image processing apparatus may include: a memory 101 and a processor 102. The memory 101 may be used for storing program codes, among other things. The processor 102 may invoke the program code, and when executed, may be configured to: acquiring frequency domain information of an image to be processed, wherein the frequency domain information is obtained by performing time-frequency conversion processing on an image encoder; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter.
According to the image processing device provided by the embodiment of the invention, the frequency domain information of the image to be processed is obtained by performing time-frequency conversion processing on the frequency domain information through an image encoder; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed; and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter. On one hand, by combining the image encoder with the first neural network model and designing the image encoder parameter optimization scheme based on deep learning, the automatic optimization of the first encoding parameter of the image to be processed can be realized through the first neural network model, the image encoding efficiency is effectively improved, the human resources can be saved, the artificial design characteristics and the complex artificial calculation and parameter selection are not needed, the difficulty and time consumption of the image encoder optimization are reduced, and the efficiency is improved; on the other hand, based on the deep learning, the optimal first coding parameter is selected for the image encoder according to the frequency domain information of the image to be processed, so that the coding efficiency and performance of the image encoder are improved, and the image coding effect with higher quality can be obtained under the same compression rate, so that the optimal decoding effect under the same evaluation index can be further realized, and the effective combination of the deep learning and the internal structure of the image encoder is realized. The embodiment of the invention discloses a frequency domain characteristic-based image encoder parameter optimization mode which can be applied to scene-related products such as image and video compression encoding.
Further, on the basis of any of the above embodiments, the first neural network model may include N operation units connected in sequence. When the processor processes the frequency domain information through a preset first neural network model to obtain a first encoding parameter of the image to be processed, the processor may be configured to: and inputting the frequency domain information to an nth operation unit of the first neural network model, and outputting the first coding parameter through the nth operation unit of the first neural network model. Wherein N is more than or equal to N; n is a positive integer of 2 or more.
Further, on the basis of any of the above embodiments, the first neural network model may include at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model.
Further, in any of the above embodiments, n is 2.
Further, on the basis of any of the above embodiments, before the processing of the frequency domain information by the preset first neural network model, the processor may further be configured to: training the first neural network model through a preset first training data set; wherein the first training data set comprises frequency domain information of a number of images for which first encoding parameters have been labeled.
Further, on the basis of any of the above embodiments, the processor may obtain the first training data set by performing the following steps: carrying out time-frequency conversion on a plurality of images marked with first coding parameters to obtain frequency domain information of the images; and forming the first training data set according to the frequency domain information of the image marked with the first coding parameter.
Further, on the basis of any of the above embodiments, the processor may be further configured to: if the image to be processed is in a YUV format, determining the dimensionality of a U channel and a dimensionality of a V channel of the image to be processed and the dimensionality of a Y channel; and if the dimensions of the U channel and the V channel are not consistent with the dimensions of the Y channel, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimensions of the Y channel.
Further, on the basis of any of the above embodiments, the processor may be further configured to: and if the image to be processed is in a preset format, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimension of the Y channel.
Further, on the basis of any of the above embodiments, when the preset format is YUV422 format or YUV420 format, the processor performs a preprocessing operation on the image to be processed, so that the dimensions of the U channel and the V channel of the image to be processed are consistent with the dimension of the Y channel.
Further, on the basis of any of the above embodiments, when performing a pre-processing operation on the image to be processed to make the dimensions of the U channel and the V channel of the image to be processed coincide with the dimensions of the Y channel, the processor may be configured to: and performing upsampling operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
Further, on the basis of any of the above embodiments, the processor, in performing an upsampling operation on the U channel and the V channel of the image to be processed, so that dimensions of Y, U and V channels of the image to be processed are the same, may be configured to: and carrying out bilinear interpolation operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
Further, on the basis of any of the above embodiments, when performing a pre-processing operation on the image to be processed to make the dimensions of the U channel and the V channel of the image to be processed coincide with the dimensions of the Y channel, the processor may be configured to: and carrying out downsampling operation on the Y channel of the image to be processed, so that the dimensions of Y, U and V channels of the image to be processed are the same.
Further, on the basis of any of the foregoing embodiments, when obtaining frequency domain information of the image to be processed after performing time-frequency conversion processing by the image encoder, the processor may be configured to: and performing DCT transformation on Y, U and V channels of the image to be processed with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
Further, on the basis of any of the foregoing embodiments, when obtaining frequency domain information of the image to be processed after performing time-frequency conversion processing by the image encoder, the processor may be configured to: and carrying out DWT (discrete wavelet transform) on Y, U channels and V channels of the to-be-processed image with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
Further, on the basis of any of the above embodiments, before processing the frequency domain information through the preset first neural network model, the processor may further be configured to: and cascading Y, U of the image to be processed and frequency domain information of the V channel.
Further, on the basis of any of the above embodiments, the first encoding parameter includes at least one of typical quantization parameter design, quantization table design, feature transformation precision design, and rate control scale design.
Further, on the basis of any of the above embodiments, when sending the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter, the processor may be configured to: and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing and entropy encoding processing on the frequency domain information of the image to be processed according to the first encoding parameter.
Further, on the basis of any of the above embodiments, when sending the first encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the first encoding parameter, the processor may be configured to: and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing on the frequency domain information according to the first encoding parameter and generates quantization information of the image to be processed.
Further, on the basis of any of the above embodiments, the processor may be further configured to: obtaining quantitative information of the image to be processed; processing the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed; and sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter.
Further, on the basis of any of the above embodiments, when sending the second encoding parameter to the image encoder, so that the image encoder encodes the image to be processed according to the second encoding parameter, the processor may be configured to: and sending the second encoding parameter to the image encoder so that the image encoder performs entropy encoding processing on the quantization information according to the second encoding parameter.
Further, on the basis of any one of the above embodiments, the second neural network includes M operation units connected in sequence; when the processor processes the quantization information through a preset second neural network model to obtain a second encoding parameter of the image to be processed, the processor may be configured to: inputting the quantization information into an M-th operation unit of the second neural network model, and outputting the second coding parameter through the M-th operation unit of the second neural network model; wherein M is more than or equal to M; m is a positive integer of 2 or more.
Further, on the basis of any of the above embodiments, the second neural network model may include at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model.
Further, on the basis of any of the above embodiments, the second encoding parameter may include at least one of typical quantization parameter design, quantization table design, and rate control scale design.
Further, on the basis of any of the above embodiments, the processor may be further configured to: and the image encoder encodes the image to be processed according to the first encoding parameter to obtain code stream information of the image to be processed.
Further, on the basis of any of the above embodiments, the processor may be further configured to: and executing decoding operation on the code stream information by using an image decoder to obtain a reconstructed image to be processed.
In addition, the present embodiment also provides a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the image processing method described in the above embodiment.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (51)

1. An image processing method, comprising:
acquiring frequency domain information of an image to be processed, wherein the frequency domain information is obtained by performing time-frequency conversion processing on an image encoder;
processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed;
and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter.
2. The method of claim 1, wherein the first neural network model comprises N arithmetic units connected in series; processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed, including:
inputting the frequency domain information to an nth operation unit of the first neural network model, and outputting the first coding parameter through the nth operation unit of the first neural network model;
wherein N is more than or equal to N; n is a positive integer of 2 or more.
3. The method of claim 1, wherein the first neural network model comprises at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model.
4. The method of claim 2, wherein n-2.
5. The method according to any one of claims 1 to 4, wherein before the processing the frequency domain information by the preset first neural network model, the method further comprises:
training the first neural network model through a preset first training data set; wherein the first training data set comprises frequency domain information of a number of images for which first encoding parameters have been labeled.
6. The method of claim 5, wherein the first training data set is obtained by:
carrying out time-frequency conversion on a plurality of images marked with first coding parameters to obtain frequency domain information of the images;
and forming the first training data set according to the frequency domain information of the image marked with the first coding parameter.
7. The method according to any one of claims 1 to 6, further comprising:
if the image to be processed is in a YUV format, determining the dimensionality of a U channel and a dimensionality of a V channel of the image to be processed and the dimensionality of a Y channel;
and if the dimensions of the U channel and the V channel are not consistent with the dimensions of the Y channel, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimensions of the Y channel.
8. The method according to any one of claims 1 to 6, further comprising:
and if the image to be processed is in a preset format, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimension of the Y channel.
9. The method according to claim 8, wherein when the preset format is YUV422 format or YUV420 format, a preprocessing operation is performed on the image to be processed, so that the U-channel and V-channel dimensions of the image to be processed are consistent with those of the Y-channel.
10. The method according to any one of claims 7 to 9, wherein the performing a pre-processing operation on the image to be processed to make the U-channel and V-channel dimensions of the image to be processed coincide with the Y-channel dimensions comprises:
and performing upsampling operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
11. The method of claim 10, wherein upsampling the U-channel and the V-channel of the image to be processed such that Y, U and V-channel of the image to be processed have the same dimension comprises:
and carrying out bilinear interpolation operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
12. The method according to any one of claims 7 to 9, wherein the performing a pre-processing operation on the image to be processed to make the U-channel and V-channel dimensions of the image to be processed coincide with the Y-channel dimensions comprises:
and carrying out downsampling operation on the Y channel of the image to be processed, so that the dimensions of Y, U and V channels of the image to be processed are the same.
13. The method according to any one of claims 7 to 12, wherein the obtaining frequency domain information of the image to be processed comprises:
and performing DCT transformation on Y, U and V channels of the image to be processed with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
14. The method according to any one of claims 7 to 12, wherein the obtaining frequency domain information of the image to be processed comprises:
and carrying out DWT (discrete wavelet transform) on Y, U channels and V channels of the to-be-processed image with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
15. The method according to claim 13 or 14, wherein before the processing the frequency domain information by the preset first neural network model, the method further comprises:
and cascading Y, U of the image to be processed and frequency domain information of the V channel.
16. The method according to any of claims 1 to 15, wherein the first encoding parameter comprises at least one of a typical quantization parameter design, a quantization table design, a characteristic transform precision design, and a scale design for code rate control.
17. The method according to any one of claims 1 to 15, wherein sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter comprises:
and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing and entropy encoding processing on the frequency domain information of the image to be processed according to the first encoding parameter.
18. The method according to any one of claims 1 to 15, wherein sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter comprises:
and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing on the frequency domain information according to the first encoding parameter and generates quantization information of the image to be processed.
19. The method of claim 18, further comprising:
obtaining quantitative information of the image to be processed;
processing the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed;
and sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter.
20. The method according to claim 19, wherein sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter comprises:
and sending the second encoding parameter to the image encoder so that the image encoder performs entropy encoding processing on the quantization information according to the second encoding parameter.
21. The method of claim 19, wherein the second neural network comprises M arithmetic units connected in series; processing the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed, including:
inputting the quantization information into an M-th operation unit of the second neural network model, and outputting the second coding parameter through the M-th operation unit of the second neural network model;
wherein M is more than or equal to M; m is a positive integer of 2 or more.
22. The method of claim 19, wherein the second neural network model comprises at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model.
23. The method of claim 19, wherein the second encoding parameter comprises at least one of a typical quantization parameter design, a quantization table design, and a rate-controlled scaling design.
24. The method of any one of claims 1 to 17, further comprising:
and the image encoder encodes the image to be processed according to the first encoding parameter to obtain code stream information of the image to be processed.
25. The method of claim 24, further comprising:
and executing decoding operation on the code stream information by using an image decoder to obtain a reconstructed image to be processed.
26. An image processing apparatus characterized by comprising: a memory and a processor;
the memory is used for storing program codes;
the processor, invoking the program code, when executed, is configured to:
acquiring frequency domain information of an image to be processed, wherein the frequency domain information is obtained by performing time-frequency conversion processing on an image encoder;
processing the frequency domain information through a preset first neural network model to obtain a first coding parameter of the image to be processed;
and sending the first encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the first encoding parameter.
27. The apparatus of claim 26, wherein the first neural network model comprises N arithmetic units connected in series; the processor is configured to, when processing the frequency domain information through a preset first neural network model to obtain a first encoding parameter of the image to be processed,:
inputting the frequency domain information to an nth operation unit of the first neural network model, and outputting the first coding parameter through the nth operation unit of the first neural network model;
wherein N is more than or equal to N; n is a positive integer of 2 or more.
28. The apparatus of claim 26, wherein the first neural network model comprises at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model.
29. The apparatus of claim 27, wherein n-2.
30. The apparatus according to any one of claims 26 to 29, wherein the processor, prior to processing the frequency domain information by a preset first neural network model, is further configured to:
training the first neural network model through a preset first training data set; wherein the first training data set comprises frequency domain information of a number of images for which first encoding parameters have been labeled.
31. The apparatus of claim 30, wherein the processor obtains the first training data set by performing the steps of:
carrying out time-frequency conversion on a plurality of images marked with first coding parameters to obtain frequency domain information of the images;
and forming the first training data set according to the frequency domain information of the image marked with the first coding parameter.
32. The apparatus of any one of claims 26 to 31, wherein the processor is further configured to:
if the image to be processed is in a YUV format, determining the dimensionality of a U channel and a dimensionality of a V channel of the image to be processed and the dimensionality of a Y channel;
and if the dimensions of the U channel and the V channel are not consistent with the dimensions of the Y channel, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimensions of the Y channel.
33. The apparatus of any one of claims 26 to 31, wherein the processor is further configured to:
and if the image to be processed is in a preset format, performing preprocessing operation on the image to be processed to enable the dimensions of the U channel and the V channel of the image to be processed to be consistent with the dimension of the Y channel.
34. The device of claim 33, wherein the processor is further configured to perform a pre-processing operation on the image to be processed when the preset format is YUV422 format or YUV420 format, so that the U-channel and V-channel dimensions of the image to be processed are consistent with the Y-channel dimensions.
35. The apparatus according to any one of claims 32 to 34, wherein the processor, when performing a pre-processing operation on the image to be processed such that the U-channel and V-channel dimensions of the image to be processed coincide with the Y-channel dimensions, is configured to:
and performing upsampling operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
36. The apparatus of claim 35, wherein the processor, in upsampling the U-channel and V-channel of the image to be processed such that Y, U and V-channel dimensions of the image to be processed are the same, is configured to:
and carrying out bilinear interpolation operation on the U channel and the V channel of the image to be processed, so that the dimensions of Y, U and the V channel of the image to be processed are the same.
37. The apparatus according to any one of claims 32 to 34, wherein the processor, when performing a pre-processing operation on the image to be processed such that the U-channel and V-channel dimensions of the image to be processed coincide with the Y-channel dimensions, is configured to:
and carrying out downsampling operation on the Y channel of the image to be processed, so that the dimensions of Y, U and V channels of the image to be processed are the same.
38. The apparatus according to any one of claims 32 to 37, wherein the processor, when obtaining frequency domain information of the image to be processed, is configured to:
and performing DCT transformation on Y, U and V channels of the image to be processed with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
39. The apparatus according to any one of claims 32 to 37, wherein the processor, when obtaining frequency domain information of the image to be processed, is configured to:
and carrying out DWT (discrete wavelet transform) on Y, U channels and V channels of the to-be-processed image with consistent dimensions respectively to generate Y, U frequency domain information and V channel frequency domain information.
40. The apparatus of claim 38 or 39, wherein the processor, before processing the frequency domain information through a preset first neural network model, is further configured to:
and cascading Y, U of the image to be processed and frequency domain information of the V channel.
41. The apparatus according to any of claims 26 to 40, wherein the first encoding parameter comprises at least one of a typical quantization parameter design, a quantization table design, a characteristic transform precision design, a scale design for rate control.
42. The apparatus according to any of claims 26 to 40, wherein the processor, when sending the first encoding parameter to the image encoder, causes the image encoder to encode the image to be processed according to the first encoding parameter, is configured to:
and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing and entropy encoding processing on the frequency domain information of the image to be processed according to the first encoding parameter.
43. The apparatus according to any of claims 26 to 40, wherein the processor, when sending the first encoding parameter to the image encoder, causes the image encoder to encode the image to be processed according to the first encoding parameter, is configured to:
and sending the first encoding parameter to the image encoder so that the image encoder performs quantization processing on the frequency domain information according to the first encoding parameter and generates quantization information of the image to be processed.
44. The device of claim 43, wherein the processor is further configured to:
obtaining quantitative information of the image to be processed;
processing the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed;
and sending the second encoding parameter to the image encoder so that the image encoder encodes the image to be processed according to the second encoding parameter.
45. The apparatus of claim 44, wherein the processor, when sending the second encoding parameter to the image encoder, causes the image encoder to encode the image to be processed according to the second encoding parameter, is configured to:
and sending the second encoding parameter to the image encoder so that the image encoder performs entropy encoding processing on the quantization information according to the second encoding parameter.
46. The apparatus of claim 44, wherein the second neural network comprises M arithmetic units connected in series; wherein, when the processor processes the quantization information through a preset second neural network model to obtain a second coding parameter of the image to be processed, the processor is configured to:
inputting the quantization information into an M-th operation unit of the second neural network model, and outputting the second coding parameter through the M-th operation unit of the second neural network model;
wherein M is more than or equal to M; m is a positive integer of 2 or more.
47. The apparatus of claim 44, wherein the second neural network model comprises at least one of: VGG-16 model, VGG-19 model, ResNet model, GoogleNet model.
48. The apparatus of claim 44, wherein the second encoding parameter comprises at least one of a typical quantization parameter design, a quantization table design, and a rate-controlled scaling design.
49. The apparatus according to any one of claims 26 to 42, wherein the processor is further configured to:
and the image encoder encodes the image to be processed according to the first encoding parameter to obtain code stream information of the image to be processed.
50. The device of claim 49, wherein the processor is further configured to:
and executing decoding operation on the code stream information by using an image decoder to obtain a reconstructed image to be processed.
51. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the image processing method of any one of claims 1 to 25.
CN201980008045.4A 2019-05-31 2019-05-31 Image processing method, apparatus and computer-readable storage medium Pending CN111630570A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/089588 WO2020237646A1 (en) 2019-05-31 2019-05-31 Image processing method and device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN111630570A true CN111630570A (en) 2020-09-04

Family

ID=72261321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980008045.4A Pending CN111630570A (en) 2019-05-31 2019-05-31 Image processing method, apparatus and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN111630570A (en)
WO (1) WO2020237646A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330567A (en) * 2020-11-23 2021-02-05 建信金融科技有限责任公司 Image processing method and device
CN113643261A (en) * 2021-08-13 2021-11-12 江南大学 Lung disease diagnosis method based on frequency attention network
CN113691818A (en) * 2021-08-25 2021-11-23 深圳龙岗智能视听研究院 Video target detection method, system, storage medium and computer vision terminal
CN114745556A (en) * 2022-02-07 2022-07-12 浙江智慧视频安防创新中心有限公司 Encoding method, encoding device, digital video film system, electronic device, and storage medium
WO2022222726A1 (en) * 2021-04-21 2022-10-27 华为技术有限公司 Data processing method and apparatus
WO2022237427A1 (en) * 2021-05-11 2022-11-17 北京字跳网络技术有限公司 Video processing method and apparatus, device, and storage medium
CN116506622A (en) * 2023-06-26 2023-07-28 瀚博半导体(上海)有限公司 Model training method and video coding parameter optimization method and device
CN116600106A (en) * 2023-05-18 2023-08-15 深圳聚源视芯科技有限公司 Image compression method and system capable of dynamically adjusting compression rate

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669202B (en) * 2020-12-25 2023-08-08 北京达佳互联信息技术有限公司 Image processing method, apparatus, electronic device, and computer-readable storage medium
CN112749802B (en) * 2021-01-25 2024-02-09 深圳力维智联技术有限公司 Training method and device for neural network model and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018099579A1 (en) * 2016-12-02 2018-06-07 Huawei Technologies Co., Ltd. Apparatus and method for encoding an image
US20180350110A1 (en) * 2017-05-31 2018-12-06 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video
CN109325595A (en) * 2018-10-23 2019-02-12 天津天地伟业信息***集成有限公司 JPEG self study quantization method and device based on traffic scene

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10034005B2 (en) * 2015-06-05 2018-07-24 Sony Corporation Banding prediction for video encoding
CN109819252B (en) * 2019-03-20 2021-05-18 福州大学 Quantization parameter cascading method independent of GOP structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018099579A1 (en) * 2016-12-02 2018-06-07 Huawei Technologies Co., Ltd. Apparatus and method for encoding an image
US20180350110A1 (en) * 2017-05-31 2018-12-06 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
CN109325595A (en) * 2018-10-23 2019-02-12 天津天地伟业信息***集成有限公司 JPEG self study quantization method and device based on traffic scene
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330567B (en) * 2020-11-23 2023-07-21 中国建设银行股份有限公司 Image processing method and device
CN112330567A (en) * 2020-11-23 2021-02-05 建信金融科技有限责任公司 Image processing method and device
WO2022222726A1 (en) * 2021-04-21 2022-10-27 华为技术有限公司 Data processing method and apparatus
WO2022237427A1 (en) * 2021-05-11 2022-11-17 北京字跳网络技术有限公司 Video processing method and apparatus, device, and storage medium
CN115412731A (en) * 2021-05-11 2022-11-29 北京字跳网络技术有限公司 Video processing method, device, equipment and storage medium
CN113643261A (en) * 2021-08-13 2021-11-12 江南大学 Lung disease diagnosis method based on frequency attention network
CN113691818A (en) * 2021-08-25 2021-11-23 深圳龙岗智能视听研究院 Video target detection method, system, storage medium and computer vision terminal
CN113691818B (en) * 2021-08-25 2023-06-30 深圳龙岗智能视听研究院 Video target detection method, system, storage medium and computer vision terminal
CN114745556A (en) * 2022-02-07 2022-07-12 浙江智慧视频安防创新中心有限公司 Encoding method, encoding device, digital video film system, electronic device, and storage medium
CN114745556B (en) * 2022-02-07 2024-04-02 浙江智慧视频安防创新中心有限公司 Encoding method, encoding device, digital retina system, electronic device, and storage medium
CN116600106A (en) * 2023-05-18 2023-08-15 深圳聚源视芯科技有限公司 Image compression method and system capable of dynamically adjusting compression rate
CN116600106B (en) * 2023-05-18 2024-04-09 深圳聚源视芯科技有限公司 Image compression method and system capable of dynamically adjusting compression rate
CN116506622A (en) * 2023-06-26 2023-07-28 瀚博半导体(上海)有限公司 Model training method and video coding parameter optimization method and device
CN116506622B (en) * 2023-06-26 2023-09-08 瀚博半导体(上海)有限公司 Model training method and video coding parameter optimization method and device

Also Published As

Publication number Publication date
WO2020237646A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
CN111630570A (en) Image processing method, apparatus and computer-readable storage medium
Cheng et al. Energy compaction-based image compression using convolutional autoencoder
CN109889839B (en) Region-of-interest image coding and decoding system and method based on deep learning
CN111641832B (en) Encoding method, decoding method, device, electronic device and storage medium
WO2021169408A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN112866694B (en) Intelligent image compression optimization method combining asymmetric convolution block and condition context
CN111669587A (en) Mimic compression method and device of video image, storage medium and terminal
CN114449276B (en) Super prior side information compensation image compression method based on learning
WO2023130333A1 (en) Encoding and decoding method, encoder, decoder, and storage medium
Siddeq et al. A novel 2D image compression algorithm based on two levels DWT and DCT transforms with enhanced minimize-matrix-size algorithm for high resolution structured light 3D surface reconstruction
Alsayyh et al. A Novel Fused Image Compression Technique Using DFT, DWT, and DCT.
Zhuang et al. A robustness and low bit-rate image compression network for underwater acoustic communication
CN111107377A (en) Depth image compression method, device, equipment and storage medium
CN115294222A (en) Image encoding method, image processing method, terminal, and medium
CN115361555A (en) Image encoding method, image encoding device, and computer storage medium
CN115239563A (en) Point cloud attribute lossy compression device and method based on neural network
Yin et al. A co-prediction-based compression scheme for correlated images
Siddeq et al. DCT and DST based Image Compression for 3D Reconstruction
CN115914630B (en) Image compression method, device, equipment and storage medium
Hilles Spatial Frequency Filtering Using Sofm For Image Compression
CN117915107B (en) Image compression system, image compression method, storage medium and chip
CN117459737B (en) Training method of image preprocessing network and image preprocessing method
CN114882133B (en) Image coding and decoding method, system, device and medium
CN114663536B (en) Image compression method and device
Naval et al. Image Compression Technique Using Contour Coding and Wavelet Transform in Digital Image Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200904