CN116958534A

CN116958534A - Image processing method, training method of image processing model and related device

Info

Publication number: CN116958534A
Application number: CN202211712785.6A
Authority: CN
Inventors: 陈铭良; 王子瑞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-10-27

Abstract

The application discloses an image processing method, a training method of an image processing model and a related device. The intelligent vehicle can be used for cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and other scenes. Semantic segmentation is carried out on the image to be processed to obtain semantic probability distribution information, feature extraction is carried out on the semantic probability distribution information to obtain semantic features, and feature fusion is carried out on the image features and the semantic features of the image to be processed to obtain fusion semantic features. The method comprises the steps of obtaining global features and local features according to fusion semantic features, carrying out feature fusion on the local features and the semantic features to obtain local fusion features, carrying out fusion on the global features and the local fusion features to obtain target fusion semantic features, obtaining color adjustment coefficients of an image to be processed according to the target fusion semantic features, and processing the image to be processed according to the color adjustment coefficients to obtain a color enhanced image.

Description

Image processing method, training method of image processing model and related device

Technical Field

The present application relates to the field of data processing, and in particular, to an image processing method and related apparatus.

Background

There is a need to perform color adjustment for video or image, such as a problem of overall darkness and color fringing of the video or image, and the color of the video or image needs to be analyzed from multiple dimensions of brightness, contrast, saturation, etc. to perform promotion, especially, the problem that serious color cast and even darkness are easy to occur in old films or old photos to be close to vanishing color is easy to occur, and the color distribution with low quality presents diversified changes. Color adjustment can be performed for an image through artificial intelligence, however, it is difficult for a conventional color adjustment method to provide a stable color adjustment scheme in various scenes.

Disclosure of Invention

In order to solve the technical problems, the application provides an image processing method, a training method of an image processing model and a related device, which improve the color adjustment effect and style stability.

The embodiment of the application discloses the following technical scheme:

in one aspect, the present application provides an image processing method, the method comprising:

carrying out semantic segmentation on the image to be processed to obtain semantic probability distribution information;

Extracting features of the semantic probability distribution information to obtain semantic features;

carrying out feature fusion on the image features of the image to be processed and the semantic features to obtain fusion semantic features;

obtaining global features and local features according to the fusion semantic features;

carrying out feature fusion on the local features and the semantic features to obtain local fusion features;

fusing the global features and the local fusion features to obtain target fusion semantic features;

obtaining a color adjustment coefficient aiming at the image to be processed according to the target fusion semantic features;

and processing the image to be processed according to the color adjustment coefficient to obtain a color enhanced image.

In another aspect, the present application provides a training method of an image processing model, the image processing model being used for executing the image processing method, the method comprising:

and training an initial model by utilizing a reference image and a degradation image of the reference image to obtain the image processing model, inputting the degradation image of the reference image into the initial model to perform image processing in the training process to obtain a predicted image, constructing a loss function based on the reference image and the predicted image, and training the initial model into the image processing model according to the loss function.

In another aspect, the present application provides an image processing apparatus, the apparatus comprising:

the semantic segmentation unit is used for carrying out semantic segmentation on the image to be processed to obtain semantic probability distribution information;

the feature extraction unit is used for extracting features of the semantic probability distribution information to obtain semantic features;

the first feature fusion unit is used for carrying out feature fusion on the image features of the image to be processed and the semantic features to obtain fusion semantic features;

the branch unit is used for obtaining global features and local features according to the fusion semantic features;

the second feature fusion unit is used for carrying out feature fusion on the local features and the semantic features to obtain local fusion features;

the third feature fusion unit is used for fusing the global features and the local fusion features to obtain target fusion semantic features;

the coefficient determining unit is used for obtaining a color adjustment coefficient aiming at the image to be processed according to the target fusion semantic features;

and the image processing unit is used for processing the image to be processed according to the color adjustment coefficient to obtain a color enhanced image.

In another aspect, the present application provides a training apparatus for an image processing model, wherein the image processing model is used for executing the image processing method, the apparatus comprising:

The training unit is used for training an initial model by utilizing a reference image and a degradation image of the reference image to obtain the image processing model, inputting the degradation image of the reference image into the initial model to perform image processing to obtain a predicted image in the training process, constructing a loss function based on the reference image and the predicted image, and training the initial model into the image processing model according to the loss function.

In another aspect, the application provides a computer device comprising a processor and a memory:

the memory is used for storing a computer program and transmitting the computer program to the processor;

the processor is configured to execute the image processing method according to the above aspect or the training method of the image processing model according to instructions in the computer program.

In another aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program for executing the image processing method described in the above aspect or the training method of the image processing model.

In another aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the image processing method or the training method of the image processing model.

According to the technical scheme, semantic segmentation can be performed on the image to be processed to obtain semantic probability distribution information, the semantic probability distribution information is used for indicating semantic distribution probability of each pixel point of the image to be processed, semantic feature is obtained by feature extraction of the semantic probability distribution information, fusion semantic feature is obtained by feature fusion of the image features and the semantic features of the image to be processed, and the fusion semantic features take the image features and the semantic features of the image to be processed into consideration, so that subsequent color adjustment coefficients can be constrained through the semantic features, and more semantic related details can be reserved for image adjustment. According to the method, global features and local features are obtained according to the fusion semantic features, the local features and the semantic features are subjected to feature fusion to obtain local fusion features, the local fusion features can further reflect local semantic related details, then the global features and the local fusion features are fused to obtain target fusion semantic features, so that the target fusion semantic features can contain semantic features and reflect more local semantic related details, after a color adjustment coefficient of an image to be processed is obtained according to the target fusion semantic features, the image to be processed can be processed according to the color adjustment coefficient to obtain a color enhanced image, and as the color adjustment coefficient is influenced by the semantic features, adjustment of the image to be processed is not based on color rule statistics but considered semantic features, thereby being beneficial to stabilizing color adjustment style, preserving the local details, reducing color jump and improving color enhancement capability.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario of an image processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 3 is a diagram of a color adjustment network according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a spatial signature transform convolution layer according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a codec calculation flow according to an embodiment of the present application;

FIG. 6 is a schematic diagram of upsampling according to an embodiment of the present application;

FIG. 7 is a schematic diagram showing a random degradation process of a reference image according to an embodiment of the present application;

FIG. 8 is a comparison chart of color adjustment effects according to an embodiment of the present application;

FIG. 9 is a diagram showing another color adjustment effect according to an embodiment of the present application;

Fig. 10 is a schematic diagram of an image processing effect according to an embodiment of the present application;

FIG. 11 is a schematic diagram showing a color enhancement effect comparison according to an embodiment of the present application;

fig. 12 is a block diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 13 is a block diagram of a training device for an image processing model according to an embodiment of the present application;

fig. 14 is a block diagram of a terminal device according to an embodiment of the present application;

fig. 15 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

Currently, there are problems of video or image overall frame darkness and color darkness, and it is difficult for the conventional color adjustment method to provide a stable color adjustment scheme under various scenes.

In order to solve the technical problems, the embodiment of the application provides an image processing method, a training method of an image processing model and a related device, wherein image color adjustment is performed according to semantic features, and the color adjustment effect and style stability are improved.

The image processing method and the training method of the image processing model provided by the embodiment of the application are realized based on artificial intelligence (Artificial Intelligence, AI), wherein the artificial intelligence is the theory, method, technology and application system which utilizes a digital computer or a machine controlled by the digital computer to simulate, extend and expand the intelligence of people, sense the environment, acquire knowledge and acquire the best result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

In the embodiment of the application, the mainly related artificial intelligence software technology comprises the directions of Computer Vision (CV), machine learning/deep learning and the like. For example, deep Learning (Deep Learning) in Machine Learning (ML) may be involved, including various types of artificial neural networks (Artificial Neural Network, ANN).

The image processing method and the training method of the image processing model provided by the embodiment of the application can be implemented through computer equipment with data processing capability, wherein the computer equipment can be terminal equipment or a server, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and also can be a cloud server for providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, big data, artificial intelligent platforms and the like. Terminal devices include, but are not limited to, smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. Embodiments of the present application may be applied to a variety of scenarios including, but not limited to, cloud technology, artificial intelligence, digital humans, virtual humans, games, virtual Reality, extended Reality (XR), and the like.

The computer vision technology is a science for researching how to make a machine "see", and further means that a camera and a computer are used for replacing human eyes to perform machine vision such as identification, monitoring and measurement on a target, and further performing graphic processing, so that the computer is processed into an image which is more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

The computer equipment with the data processing function has the machine learning capability, and the machine learning is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

In the image processing method, the training method of the image processing model and the related device provided by the embodiment of the application, the adopted artificial intelligent model mainly relates to color adjustment processing, semantic recognition and the like of an image to be processed, semantic features are obtained through semantic recognition, and color enhancement processing is performed on the image to be processed according to the semantic features, so that the color adjustment effect and style stability are improved.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

Artificial intelligence cloud services, also commonly referred to as AIaaS (AI as Service, chinese is "AI as Service"). The service mode of the artificial intelligent platform is the mainstream at present, and particularly, the AIaaS platform can split several common AI services and provide independent or packaged services at the cloud. This service mode is similar to an AI theme mall: all developers can access one or more artificial intelligence services provided by the use platform through an API interface, and partial deep developers can also use an AI framework and AI infrastructure provided by the platform to deploy and operate and maintain self-proprietary cloud artificial intelligence services.

In order to facilitate understanding of the technical scheme provided by the application, an image processing method and a training method of an image processing model provided by the embodiment of the application will be described next with reference to an actual application scenario.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an image processing method according to an embodiment of the present application. In the application scenario shown in fig. 1, the server 100 is illustrated by taking the foregoing computer device as an example, and the server 100 may communicate with a terminal device, where the image to be processed may be from the terminal device, and the color enhanced image may be provided to the terminal device.

The server 100 may perform semantic segmentation on the image to be processed to obtain semantic probability distribution information, where the semantic probability distribution information is used to indicate the semantic distribution probability of each pixel point of the image to be processed, and perform feature extraction on the semantic probability distribution information to obtain semantic features, and perform feature fusion on the image features and the semantic features of the image to be processed to obtain fusion semantic features, where the fusion semantic features consider the image features and the semantic features of the image to be processed, so that subsequent color adjustment coefficients can be constrained by the semantic features, and more semantic related details can be reserved for image adjustment.

The server 100 obtains global features and local features according to the fused semantic features, performs feature fusion on the local features and the semantic features to obtain local fused features, wherein the local fused features can further embody more local semantic related details, and then fuses the global features and the local fused features to obtain target fused semantic features, so that the target fused semantic features can contain the semantic features and can embody the local semantic related details.

After obtaining the color adjustment coefficient of the image to be processed according to the target fusion semantic feature, the server 100 can process the image to be processed according to the color adjustment coefficient to obtain a color enhanced image, and because the color adjustment coefficient is affected by the semantic feature, the adjustment of the image to be processed is not based on the statistics of the color rule, but the considered semantic feature is beneficial to stabilizing the style of the color adjustment, preserving local details, reducing color jump and improving the color enhancement capability.

Next, an image processing method and a training method for an image processing model according to an embodiment of the present application will be described with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a flowchart of an image processing method provided in an embodiment of the present application, and the image processing method is taken as an example of execution of the image processing method by a server, where the method includes:

S101, carrying out semantic segmentation on an image to be processed to obtain semantic probability distribution information, and carrying out feature extraction on the semantic probability distribution information to obtain semantic features.

In the embodiment of the application, the image to be processed can be an image with problems of color, such as dark picture, low color, and the like, and the image to be processed can be an image obtained by shooting an actual scene or an image frame in a video. The color of the image to be processed is enhanced by adjusting the brightness, contrast, saturation and other dimensions of the image to be processed.

At present, color enhancement can be performed through a CSRNET network and an HDRNET network, the two networks can learn the coefficient of color adjustment through a neural network to perform color enhancement on the whole picture, specifically, the color enhancement of the CSRNET network is provided to perform uniform full-connection nonlinear transformation on each pixel point independently, and another group of network branches GFM are adopted to extract global color characteristics to regulate and control a full-connection layer, so that traditional color operation is approximated, such as brightness adjustment, saturation adjustment and the like, and all possible color adjustment combinations can be learned theoretically because the traditional color operation is not limited by the traditional fixed color adjustment operation sequence; the color enhancement of the HDRNET network is performed by directly learning the coefficients of the color linear transformation through a convolutional neural network (Convolutional Neural Network, CNN), and in order to accelerate the reasoning speed of the whole network, global and local color characteristics can be simultaneously extracted through the CNN on a picture with small resolution, and then the characteristics are fused to reason the color transformation coefficients.

However, the two networks are coefficients for directly training color adjustment from end to end, and have the problem of color jump caused by unstable style, and cannot be stably applied on the ground. In addition, the CSRNET network only extracts color related characteristics to transform, the capability of local color adjustment is far insufficient, the conversion relation is single, the purpose of adaptively approaching the traditional color adjustment operator combination is only achieved, and the problems of various local color cast and color degradation in a truly complex scene cannot be solved; although the HDRNET adopts a CNN structure with stronger feature extraction, more features are remained on the statistics of color rules in the aspect of feature extraction, the adjustment effect is easily influenced by the color migration rule of the training data set, the color adjustment style cannot be stabilized, and color jump exists.

In the embodiment of the application, the image to be processed can be subjected to semantic segmentation to obtain the semantic segmentation graph, the semantic segmentation graph is used as a semantic segmentation result, semantic probability distribution information can be obtained through semantic segmentation before the semantic segmentation result is obtained, and the semantic probability distribution information can be expressed by a probability distribution graph (Segmentation Probability Map) and is used for determining a color adjustment strategy of the image to be processed, so that the color adjustment style is stabilized, and color jump is reduced. The semantic probability distribution information is used for indicating the semantic distribution probability of each pixel point of the image to be processed, and may comprise a plurality of channels, wherein each channel represents the probability that each pixel value of the current channel indicates that the pixel point belongs to a certain semantic category.

Referring to fig. 3, a color adjustment network structure diagram is provided in an embodiment of the present application, where an input image of the color adjustment network structure is an image 300 to be processed, and semantic segmentation may be implemented through a semantic segmentation network or a classification model, for example, a PSPNet semantic segmentation network, to obtain semantic probability distribution information 301, and a semantic segmentation graph 302 may be obtained by prediction according to the semantic probability distribution information 301. Before the semantic segmentation of the image to be processed, it may be subjected to feature extraction.

After the semantic probability distribution information is obtained, feature extraction can be performed on the semantic probability distribution information to obtain semantic features. Feature extraction of semantic probability distribution information may be specifically performed by continuous downsampling to perform feature extraction of different sensitivity fields (Feature Extraction), so as to generate multi-scale semantic features, which may be represented by a semantic condition feature map (Semantic Conditional Map, SCM), and the multi-scale semantic features are represented as multi-scale semantic condition feature maps (Hierarchical Semantic Conditional Map).

Feature extraction of the semantic probability distribution information can be achieved through first convolution layers (Conv), referring to fig. 3, each first convolution layer can output semantic features of one scale, the number of the first convolution layers and the convolution kernel size can be determined according to practical situations, the number of the first convolution layers can be denoted as n, feature extraction can be performed on the semantic probability distribution information through n first convolution layers to obtain n-layer semantic features corresponding to n first convolution layers respectively, for example, the convolution kernels of the first convolution layers can be 3*3, the number of the first convolution layers can be 4, and then 4-layer semantic features corresponding to 4 first convolution layers respectively can be obtained: layer 1 semantic features, layer 2 semantic features, layer 3 semantic features, and layer 4 semantic features.

S102, carrying out feature fusion on the image features and the semantic features of the image to be processed to obtain fusion semantic features.

After the semantic features are acquired, the image features and the semantic features of the image to be processed can be subjected to feature fusion to obtain fusion semantic features, and the fusion semantic features take the image features and the semantic features of the image to be processed into consideration, so that the subsequent color adjustment coefficients can be constrained through the semantic features, and more semantic related details can be reserved for image adjustment. Feature fusion of image features and semantic features of an image to be processed may be achieved by hybrid semantic coding (Mixed Semantic Encode), which may introduce semantic features for image features by spatial feature transformation (Spatial Feature Transform, SFT).

Feature fusion of image features and semantic features of an image to be processed can be performed under multiple scales, and before feature fusion, feature extraction can be performed on the image to be processed to obtain multi-scale image features, so that multi-scale feature fusion is performed. Feature extraction of the image to be processed may be specifically performed by continuous downsampling for feature extraction of different fields of perception (Feature Extraction), resulting in multi-scale image features.

Feature extraction of the image to be processed may be achieved by using second convolution layers, referring to fig. 3, each second convolution layer may output image features of one scale, the number of second convolution layers and the size of convolution kernels may be determined according to practical situations, taking, for example, extracting semantic features by using n layers of first convolution layers, the number of second convolution layers for performing feature extraction on the image to be processed is also n, for example, 4, and the number of convolution kernels of the second convolution layers is, for example, 3*3. The fusion of the image features and the semantic features can adopt a space feature transformation convolution (SFT Conv) layer to introduce the semantic features, so that the image features are subjected to feature transformation, the input of each space feature transformation convolution layer can have the image features and the semantic features with the same scale, and the number of the space feature transformation convolution layers can be recorded as n.

Taking an ith layer convolution layer and an ith layer spatial feature transformation convolution layer as an example, i is a positive integer less than or equal to n, the ith layer second convolution layer can be utilized to convolve the ith-1 layer fusion semantic features to obtain an ith layer image feature, then the ith layer image feature and the ith layer semantic features are subjected to feature fusion through the spatial feature transformation convolution layer to obtain an ith layer fusion semantic feature, and the total layer number of the spatial feature transformation convolution layers is also n as shown in reference to fig. 3. The image to be processed can be used as 0 th layer fusion semantic feature, the 1 st layer second convolution layer is used for processing the image to be processed to obtain 1 st layer image feature, the 2 nd layer second convolution layer is used for processing the 1 st layer fusion semantic feature to obtain 2 nd layer image feature, the 3 rd layer second convolution layer is used for processing the 2 nd layer fusion semantic feature to obtain 3 rd layer image feature, and the n th layer fusion semantic feature can be finally obtained through convolution operation of n second convolution layers and n spatial feature transformation convolution layers. That is, downsampling and feature fusion can be alternately performed, so that the image features of each scale can be fused with the semantic features of the same scale, and the obtained fused semantic features contain the semantic features of each scale, so that the internal semantic association of the image to be processed can be better reflected.

In the embodiment of the application, after the nth layer of fusion semantic features are output by the nth layer of spatial feature transformation convolution layer, the nth layer of fusion semantic features can be used as final fusion semantic features 304, convolution operation can be performed on the nth layer of fusion semantic features to obtain final fusion semantic features 304, a convolution kernel of the convolution operation is 3*3, and feature extraction can be further performed through the convolution operation, so that deeper fusion semantic features can be obtained.

The ith layer of image features and the ith layer of semantic features are subjected to feature fusion through the ith layer of spatial feature transformation convolution layer to obtain an ith layer of fusion semantic features, namely the ith layer of semantic features are converted into first transformation parameters and second transformation parameters through convolution operation of the ith layer of spatial feature transformation convolution layer, and feature transformation is carried out on the ith layer of image features according to the first transformation parameters and the second transformation parameters through the ith layer of spatial feature transformation convolution layer. Because the first transformation parameter and the second transformation parameter are obtained according to the i-th layer semantic feature, the image features can be fused in an affine transformation mode from two dimensions based on the first transformation parameter and the second transformation parameter, so that the fusion of the semantic feature and the image features is realized, and the calculated amount is not increased more.

Referring to fig. 4, a schematic structural diagram of a spatial feature transformation convolutional layer according to an embodiment of the present application may be shown in the following, where a first transformation parameter corresponding to a semantic feature of an ith layer may be denoted as γ _i The second transformation parameter may be expressed as beta _i The i-th layer semantic feature is denoted as C _i Through M (-) operation, the i-th layer semantic features can be converted into a first transformation parameter and a second transformation parameter. The M (·) operation may comprise a convolution conversion implementation of a convolution layer, which may be expressed as: (gamma) _i ,β _i )＝M(C _i ). Specifically, the first transformation parameters can be obtained through convolution conversion of the first group of convolution layers, and the second transformation parameters can be obtained through convolution conversion of the second group of convolution layers. The number of the first set of convolution layers and the second set of convolution layers and the convolution kernel may be determined according to the actual situation, and for example, may each include two convolution layers (Conv), and the convolution kernel is 3*3.

The i-th layer image feature may be represented as F _i The ith layer image feature is converted into an ith layer fusion semantic feature through SFT (·) operation, and the ith layer fusion semantic feature can be expressed as CF _i . Performing matrix multiplication operation on the first transformation parameters and the ith layer of image features to obtain a product result, and taking the sum of the product result and the second transformation parameters as an ith layer of fusion semantic features, wherein the ith layer of fusion semantic features can be expressed as: SFT (CF) _i |γ _i ,β _i )＝γ _i ⊙F _i +β _i 。

S103, obtaining global features and local features according to the fused semantic features, and carrying out feature fusion on the local features and the semantic features to obtain local fused features.

S104, fusing the global features and the local fusion features to obtain target fusion semantic features.

After the fusion semantic features are obtained, global (Global Branch) features and Local (Local Branch) features can be obtained according to the fusion semantic features, the Local features and the semantic features are subjected to feature fusion to obtain Local fusion features, the Local fusion features can further embody Local semantic related details, and then the Global features and the Local fusion features are fused to obtain target fusion semantic features, so that the target fusion semantic features can contain the semantic features and can embody the Local semantic related details. In S103, the fused semantic features are obtained by feature extraction and feature fusion, and thus the fused semantic features include small-scale local features and large-scale global features, so that the fused semantic features can be divided into two branches, and prediction of color adjustment coefficients can be performed from local and global.

Specifically, in S103, the local feature and the semantic feature may be subjected to feature fusion to obtain a local fusion feature, where the local feature is a small-scale feature, and may be fused with a small-scale feature in the semantic feature, and the feature fusion manner may be implemented by a spatial feature transformation convolution layer, and the process is similar to the feature fusion manner implemented by the ith spatial feature transformation convolution layer in S102. Taking the local feature as an n-th layer fusion semantic feature as an example, the input of the spatial feature transformation convolution layer of the local feature and the semantic feature can be the n-th layer fusion semantic feature and the n-th layer semantic feature. In specific implementation, a spatial feature transformation convolution layer of local features and semantic features can be performed, the n-th layer semantic features can be converted into first transformation parameters and second transformation parameters through convolution operation, then feature transformation is performed on the n-th layer fusion semantic features according to the first transformation parameters and the second transformation parameters, matrix multiplication operation is performed on the first transformation parameters and the n-th layer fusion semantic features to obtain a product result, then the sum of the product result and the second transformation parameters can be used as the local fusion features, or convolution operation can be performed on the sum of the product result and the second transformation parameters, the result of the convolution operation is used as the local fusion features, and a convolution kernel of the convolution operation can be 3*3, for example.

In S104, the global feature and the local fusion feature may be fused to obtain the target fusion semantic feature, where the fusion of the global feature and the local fusion feature may be implemented by direct addition.

In S104, the global feature may be processed to obtain a global fusion feature, and then the global fusion feature and the local fusion feature are fused to obtain a target fusion semantic feature, where the fusion of the global fusion feature and the local fusion feature may be achieved by a direct addition method. Referring to FIG. 3, the local fusion feature 305 and the global fusion feature 306 are fused to obtain a target fusion semantic feature 307.

Specifically, the global feature and the position coding information of the image to be processed can be encoded and decoded to update the global feature to obtain a global fusion feature, the global fusion feature and the local fusion feature are fused to obtain a target fusion semantic feature, and the fusion of the features of different regional blocks in the global feature can be performed by encoding and decoding the global feature and the position coding information of the image to be processed, so that the different regional blocks can interact in color adjustment, and thus each regional block is respectively analyzed and adjusted, the local color adjustment capability can be improved, and the adjustment capability of various local color cast and color degradation in different application scenes can be improved.

Encoding and decoding the global features and the position coding information of the image to be processed, namely encoding and decoding the global features and the position coding information of the image to be processed to determine update features corresponding to the block features of each region in the global features, wherein the update features form global fusion features; in the determining process of the update feature corresponding to the jth region block in the features of each region block, the correlation weight of the jth region block by other region blocks is determined according to the position coding information of the jth region block and the position coding information of other region blocks, the features of other region blocks and the features of the jth region block, the update feature corresponding to the jth region block is determined according to the correlation weight, the features of other region blocks and the features of the jth region block, and the jth region block is any region block in the global feature, so that the features of each region block are influenced by the features of other region blocks, and different region blocks can interact in color adjustment.

Referring to fig. 5, a schematic diagram of a codec calculation flow is provided in an embodiment of the present application, where a codec mode is implemented by a transducer codec, the transducer is a deep learning model, and a self-attention mechanism is adopted to differentially weight the importance of each portion of input data. The codec calculation flow may include an encoding process (Transformer Encode) and a decoding process (Transformer Decode), the encoding process including a plurality of encoding operations, each encoding operation may include a Multi-Head Self-Attention mechanism (Multi-Head Self-Attention), a residual and normalization (Add & Norm), a forward network (FFN), a residual and normalization (Add & Norm) process, the number of times of the encoding operation may be denoted as N, the decoding process may include a plurality of decoding operations, each decoding operation may include a Multi-Head Self-Attention mechanism (Multi-Head Self-Attention), a residual and normalization (Add & Norm), a forward network (FFN), a residual and normalization (Add & Norm) process, and the number of times of the decoding operation may be denoted as M. The position-coding information may be encoded by learning parameters to achieve fusion with global features, which may be denoted as learning position-coding (Learned Positional Encoding).

The global feature and the position coding information of the image to be processed are coded and decoded, the similarity of the feature vector of the jth region block and the feature vector of other region blocks can be calculated according to the feature of the jth region block in a non-local calculation mode, the similarity is normalized to obtain the weight corresponding to the other region blocks, the weight is multiplied by the corresponding feature of each region block, and the added result is the update feature of the jth region block feature, so that the update feature contains global information. And the contribution degree of other area blocks to the jth area block can be indicated by using the similarity, so that in the weighted updating feature, the larger the relation of the other area blocks to the jth area block is, the larger the similarity is, and the larger the influence on the updating feature is.

The update feature corresponding to the feature of the jth regional block can be the sum of the products of the correlation weight of each other regional block to the feature of the jth regional block and the corresponding other regional block, wherein the feature vector obtained by learning position coding of the ith regional block can be represented as Q (Query), the feature vector obtained by learning position coding of the other regional block is represented as K (Key), the features of the other regional block and the jth regional block are represented as V (Value), and the similarity of the feature vector of the jth regional block and the feature vector of the other regional block can be represented as

Where softmax () is a normalization function, T represents the transpose of vector K, d _k For the distance between the ith area block and other area blocks, the other area blocks can be multiple in characteristics, each other area block can determine the correlation weight of the ith area block to the jth area block, and the correlation weight of the jth area block to the jth area block is the largest.

In this way, by combining the global feature and the position coding information, Q, K and V matrices can be generated, and by calculating the Attention (Q, K, V) of the multi-head Attention mechanism, the correlation between each regional block feature and other regional block features can be obtained, and all other regional block features are fused to update the feature of the current position. Specifically, the update feature may be represented by the attribute (Q, K, V), that is:

s105, obtaining a color adjustment coefficient aiming at the image to be processed according to the target fusion semantic features, and processing the image to be processed according to the color adjustment coefficient to obtain a color enhanced image.

After the target fusion semantic features are obtained, color adjustment coefficients for the image to be processed can be obtained according to the target fusion semantic features, and the color adjustment coefficients are used for adjusting the image to be processed. The number of color adjustment times of the image to be processed may include color adjustment coefficients corresponding to each pixel of the image to be processed, so that color adjustment may be performed on each pixel of the image to be processed according to the color adjustment coefficients corresponding to each pixel.

Specifically, a color adjustment coefficient of a first size can be obtained according to the target fusion semantic feature, then the color adjustment coefficient of the first size is reconstructed (reshape) to obtain a bilateral grid coefficient (Bilateral Grid of Coefficients), the bilateral grid coefficient is upscaled (upscaled) in a slice (slice) manner, the color adjustment coefficient corresponding to each pixel point of the image to be processed can be obtained, and referring to fig. 3, the bilateral grid coefficient 308 is enlarged in scale after reconstruction, and the calculation speed of the whole reasoning can be accelerated and the calculation efficiency is improved in a bilateral grid coefficient slice (Bilateral Grid Slicing) up-sampling manner. Bilateral grid (biplane) is a traditional method of edge-sensitive image manipulation that can be implemented to preserve edge image processing operations.

The color adjustment coefficient of the first size is a coefficient on a small scale, the color adjustment coefficient of the first size is obtained according to the target fusion semantic feature, the color adjustment coefficient can be achieved by carrying out convolution operation on the target fusion semantic feature, the convolution operation is achieved through convolution layers, the number of the convolution layers and the size of a convolution kernel can be determined according to practical situations, for example, the number of the convolution layers can be 2, and the size of the convolution kernel can be 3*3.

In specific implementation, the convolution operation can be performed on the image to be processed to obtain a bilateral grid guide Map, and based on the bilateral grid guide Map (guide Map), the bilateral grid coefficients are up-sampled in a slicing manner to obtain color adjustment coefficients corresponding to each pixel point of the image to be processed, that is, the color adjustment coefficients obtained finally are color adjustment coefficients of pixels on the original image size. The color adjustment coefficient corresponding to each pixel point can be represented by a matrix, and the size of the matrix can be 3×4. The convolution operation of the image to be processed can be realized through a convolution layer and an activation layer, the convolution kernel of the convolution layer can be 1*1, and the activation layer can be a sigmoid activation layer, so that the effect of protecting the edge can be better achieved.

Referring to fig. 6, an up-sampling schematic diagram provided in this embodiment of the present application is shown, where the size of an image 300 to be processed may be represented as h×w, a bilateral mesh guide map 310 may be obtained by a convolution operation of a convolution layer, the size of the bilateral mesh guide map 310 is h×w, and the bilateral mesh coefficient 308 is up-sampled based on the bilateral mesh guide map 310 in a slice manner, so that in addition to the correlation between pixels at different positions (in an XY plane), the correlation between different channels (in a Z direction) of the same pixel is considered, so that the color adjustment coefficient 311 obtained after up-sampling is more accurate and the calculation efficiency is higher.

After obtaining the color adjustment coefficient of the image to be processed according to the target fusion semantic feature, the image to be processed can be processed according to the color adjustment coefficient to obtain a color enhanced image 309, and referring to fig. 3, since the color adjustment coefficient is affected by the semantic feature, the adjustment of the image to be processed is not based on the statistics of the color law, but based on the considered semantic feature, which is beneficial to stabilizing the style of color adjustment, preserving local details, reducing color jump, and improving the color enhancement capability.

Taking the color adjustment coefficient corresponding to each pixel point as 3×4 as an example, the color adjustment coefficient may be applied (Apply Coefficients) to the image to be processed to perform color adjustment, and the pixel value and the color adjustment coefficient of the pixel point on the image to be processed may be respectively expressed as:

the corrected pixel values RT, GT and BT can be obtained by multiplying the color adjustment coefficient and the pixel value, corresponding to the red channel, the green channel and the blue channel, respectively:

after the image to be processed is processed according to the color adjustment coefficient, corrected pixel values of all pixel points can be obtained, the corrected pixel values of all pixel points can form a color enhancement image, color enhancement is realized, and in practice, the correction of the pixel values can not only improve the color degraded image, but also convert the image to a fixed style.

Based on the image processing method provided by the above embodiment, the embodiment of the present application further provides a training method for an image processing model, where the image processing model is used to execute the foregoing image processing method, and the training method may include: training the initial model by using the reference image and the degradation image of the reference image to obtain an image processing model. In the training process of the initial model, the degraded image of the reference image can be input into the initial model to be subjected to image processing to obtain a predicted image, a loss function is constructed based on the reference image and the predicted image, and the initial model is trained into an image processing model according to the loss function.

The reference image is an image conforming to the image correction result, for example, the reference image may be a higher quality image, or the reference image may be a certain style of image, and the degraded image of the reference image is an image after the reference image is subjected to at least one of brightness adjustment, contrast adjustment, and saturation adjustment to reduce the color quality. The degradation image of the reference image can be processed through the initial model to obtain a predicted image, when the predicted image approaches the reference image, the initial model has better color enhancement capability, and can be used for executing image processing aiming at color enhancement, and the initial model can be used as an image processing model.

The image processing model may include a semantic segmentation network model or a classification model for semantic segmentation, and may further include a convolution layer for feature extraction, a spatial feature transformation convolution layer for implementing feature fusion, a transform codec for implementing codec, etc., a convolution layer for implementing upsampling, etc.

The degraded image of the reference image may be realized by a color degradation operation, the color degradation operation may be indicated by a degradation coefficient, the degradation coefficient may include at least one of a brightness adjustment coefficient, a contrast coefficient, and a saturation adjustment coefficient, and when the color degradation operation includes at least one of a brightness adjustment operation, a contrast operation, and a saturation operation, an operation order may be randomly generated for the color degradation operation, and an execution order of the plurality of color degradation operations may be determined according to the operation order. Current image processing models are prone to style shifts and color jumps in color adjustment by simply training with some fixed color degradation dataset or datasets.

The degradation coefficients can be randomly generated through the brightness adjustment coefficients in the randomly generated degradation coefficients, the contrast coefficients are randomly generated, and the saturation coefficients are randomly generated, namely, a plurality of groups of degradation coefficients can be randomly generated for the reference image, and the operation sequence is randomly generated for the color degradation operation of the reference image. In other words, in the process of generating the degraded image of the reference image, the execution sequence among the degradation operations is random, and the adjustment coefficient according to each degradation operation is also random, so that each reference image can theoretically obtain innumerable different degraded images with low quality after random degradation, and training of the model is performed through pairing of the reference image and the degraded image, so that the model can better process when encountering different types of high-quality and low-quality color data, and a high-quality color image with relatively uniform style is generated, thereby reducing jump.

The brightness adjustment coefficient according to the brightness adjustment operation may be represented as b, the contrast adjustment coefficient according to the contrast adjustment operation may be represented as c, and the saturation adjustment coefficient according to the saturation adjustment operation may be represented as cThe input feature of the above adjustment operation is denoted as I _in The output characteristics after the brightness adjustment operation are marked as I _bright ＝b*I _in The output characteristic after the contrast adjustment operation is I _contrast ＝c*(I _in -mean(I _in ))+mean(I _in ) The output after saturation operation is characterized as I _saturation ＝s*(I _in -channel_mean(I _in ))+channel_mean(I _in ) When the above adjustment operations are sequentially performed, the output of the former adjustment operation may be used as the input of the latter adjustment operation. Wherein mean (-) represents the function of global averaging of the image, its output is a single value, channel _ mean (-) represents the function of averaging of the image on all channels pixel by pixel, and the output is a single channel map, such as a single channel color mean map calculated for an rgb three channel map.

Referring to fig. 7, which is a schematic diagram of a random degradation process of a reference image according to an embodiment of the present application, the reference image refers to an image on the left side, and a first set of degradation coefficients includes a contrast adjustment coefficient 0.791, a brightness adjustment coefficient 1.113, and a saturation adjustment coefficient 1.096, and the corresponding operation sequences are a contrast adjustment operation, a brightness adjustment operation, and a saturation adjustment operation; the second set of degradation coefficients includes a saturation adjustment coefficient 1.139, a contrast adjustment coefficient 1.191, and a brightness adjustment coefficient 0.845, corresponding to the operation order of saturation adjustment operation, contrast adjustment operation, and brightness adjustment operation; the third set of degradation coefficients includes a contrast adjustment coefficient of 0.513, a saturation adjustment coefficient of 1.128, and a brightness adjustment coefficient of 0.791, which correspond to the order of operations of the contrast adjustment operation, the saturation adjustment operation, and the brightness adjustment operation. Thus, three degraded images of the reference image, such as the image on the right side in the figure, can be obtained from the three sets of degradation coefficients, respectively.

In the embodiment of the application, the image quality of the image to be processed can be enhanced by the image processing method, so that the color adjustment, enhancement and migration of the designated style are realized, the color impression of the image to be processed is improved, for example, the darkness and the color cast can be adjusted, and under the condition that the image processing method is executed by using the image processing model, the degraded image of the reference image is formed by using a random degradation mode, so that the model can achieve a stable color adjustment effect on the application of new and old pictures and videos of various scenes.

Referring to fig. 8, a color adjustment effect contrast chart provided in this embodiment of the present application is shown, in which each image (8A-8G) of the first row is an image to be processed, the color adjustment is performed on the input multiple types of images to be processed by the current SCRNET model, the obtained processed images are each image (8H-8N) of the second row, the images (8H-8N) of the second row 1-7 are obtained by processing the images (8A-8G) of the first row 1-7 respectively, as can be seen from the figure, there is a problem of higher jump in adjustment, and in the case of performing the image processing method by using the image processing model to form the degraded image of the reference image by using the random degradation mode, the obtained processed images are each image (8O-8U) of the third row, the images (8O-8U) of the third row 1-7 are obtained by processing the images (8A-8G) of the first row 1-7 respectively, and as can be seen from the figure, the stability of the adjustment is significantly improved.

Referring to fig. 9, another color adjustment effect contrast chart provided by the embodiment of the application is shown in fig. 9A and fig. 9B, in which, for an input image to be processed with more types, the image obtained after performing color adjustment by using the current HDR model is referred to fig. 9C and fig. 9D, which has the problem of excessive color enhancement, the degraded image obtained by introducing the random color degradation into the HDR model is used as training data, the image obtained by using the model is referred to fig. 9E and fig. 9F, so that the model generalization can be significantly improved.

To further illustrate the effectiveness of the image processing method provided by embodiments of the present application, the present application is directed to comparing the proposed scheme with other State-of-the-art color enhancement methods (Exposure, distort-and-Recover, DUPE, CSRNet, HDRNet) on the public data set (MIT-Adobe FiveK dataset). The comparison index of the image processing results for the public dataset includes three: PSNR is peak signal-to-noise ratio, and the higher the value is, the closer the index is to the true value after adjustment; the SSIM measures the structural similarity between the true value and the color adjustment result, and the higher the value is, the better the value is; LPIPS is the learning perception image block similarity, measures the difference between the color adjustment result and the true value, and the lower the value is, the more similar the value is.

Referring to fig. 10, which is a schematic diagram of an image processing effect provided by the embodiment of the present application, an image to be processed may be a degraded image a obtained by degrading a high-quality image B based on a fixed color, the degraded image a is taken as an original test image (Original Degenerate Dataset), and in order to demonstrate that the proposed solution can perform stable-style color adjustment when encountering different types of images to be processed, the test image may be expanded: performing random color degradation on the degraded image A to obtain 5 groups of different new degraded images A < - >, wherein the new degraded images A < - > areused as variable test images (Variable Degenerate Dataset); the high quality image B was subjected to random color degradation to obtain 5 different degraded images B-, as random test images (GT Degenerate Dataset). And taking the high-quality image B as a contrast image of the image processing result of the test image, so that different models and test results corresponding to different test images can be obtained. The results obtained based on the different test images can be averaged to obtain the average value (over) of the different models for the various test images, and the Running Time (Running Time) of the image processor (GPU) and the Central Processing Unit (CPU) is used for representing the computational power consumption of the image processing process.

Firstly, the other color enhancement methods Exposure, distort-and-Recover, DUPE, CSRNet, HDRNet can obtain corresponding PSNR, SSIM and LPIPS results by using the three types of test data respectively, and as can be seen from the figure, the CSRNet network and the HDR network can obtain better image processing results by using the original training data.

Secondly, in order to verify the effect of random color degradation on stability, during training, the current HDR network can be retrained, random color degradation operation is added, the obtained test result is recorded as HDR-Data buffer, from the index point of view, the HDR network is added with a model obtained by training after random Data degradation, and the robustness of the model to Data disturbance can be obviously improved.

Next, in order to verify the validity of the color adjustment network based on semantic fusion (SCM) in the fixed style color adjustment, the HDR network is replaced by a semantic fusion-based transform structure, denoted as an SCM transform structure, and it can be seen that the SCM transform structure significantly improves the color adjustment effect in various degradation test data. For fairness, the parameter quantity of the original HDR network is increased to be the same as that of the proposed SCM converter structure, so that the Fat HDR is obtained, and the SCM converter on the comparison index is still obviously improved.

In order to further verify the influence of SCM semantic information and a transducer global information exchange structure in the SCM transducer structure on color adjustment respectively, experiments are carried out by replacing SCM and transducer respectively. The Our transducer is a result of replacing only the transducer, and has better effect than the Fat HDR with the same parameter, and brings about the loss of 0.008s gpu calculation time (all time consumption is carried out on 1080 x 1920 video frames), and compared with the direct and simple fusion mode of local and global information in an HDR network, the interaction of different areas on color adjustment can be fused better by adopting a non-local transducer structure. The aim SCM is a result of replacing SCM only, the introduction of semantic information in the aim SCM cannot always improve the effect of a model (compared with the Fat HDR), because the degraded picture A on the Original data pair is more a very dark scene, the existing semantic segmentation algorithm easily obtains wrong segmentation characteristics to cause the effect to be slightly reduced (in Original and Variable Degenerate Dataset), and after the segmentation effect is improved, the semantic characteristics of the fusion scene are improved on GT Degenerate Dataset which is more in line with the actual application scene, which means that the accuracy of semantic segmentation has a great influence on the effect of an SCM transducer structure.

In general, our SCM Transformer is to replace SCM and Transformer and result, and further enhance the color enhancement effect by fusing semantic information better through the Transformer structure, and is more robust in each disturbance scene, and the operation speed (with 0.026s of semantic segmentation time consumption) is equivalent to CSRNet, which is attributed to the fact that the color conversion coefficient is learned at a small resolution to be accelerated, and then applied to the original resolution map through the bilateral grid.

Referring to fig. 11, which is a schematic diagram showing a color enhancement effect contrast provided by an embodiment of the present application, fig. 11A is a to-be-processed image, and fig. 11B is obtained by using a CSRNET model, it can be seen from the figure that CSRNET is easy to be over-fitted on a fixed data pair, resulting in unstable effect and easy occurrence of color shift; FIG. 11C is obtained by using a Fat HDR model, and the problem that detail overexposure easily occurs to Fat HDR global features by full connection is seen in the disappearance of snowfield details in the figure; FIG. 11F shows the results obtained by the Our transducer, with the transducer structure correlating sub-block features, with more detail processing; FIG. 11E is a result corresponding to the Our SCM with semantic information, enhanced colors being more realistic (associated with a specified color style); fig. 11D shows Our SCM Transformer as a result, our SCM Transformer combines the advantages of both.

Based on the image processing method provided by the foregoing embodiment, the embodiment of the present application further provides an image processing apparatus, referring to fig. 12, fig. 12 is a block diagram of an image processing apparatus provided by the embodiment of the present application, where the image processing apparatus 1200 includes:

the semantic segmentation unit 1201 is configured to perform semantic segmentation on the image to be processed to obtain semantic probability distribution information;

a feature extraction unit 1202, configured to perform feature extraction on the semantic probability distribution information to obtain semantic features;

a first feature fusion unit 1203, configured to perform feature fusion on the image feature of the image to be processed and the semantic feature to obtain a fused semantic feature;

a branching unit 1204, configured to obtain global features and local features according to the fused semantic features;

a second feature fusion unit 1205, configured to perform feature fusion on the local feature and the semantic feature to obtain a local fusion feature;

a third feature fusion unit 1206, configured to fuse the global feature and the local fusion feature to obtain a target fusion semantic feature;

a coefficient determining unit 1207, configured to obtain a color adjustment coefficient for the image to be processed according to the target fusion semantic feature;

The image processing unit 1208 is configured to process the image to be processed according to the color adjustment coefficient, so as to obtain a color enhanced image.

Optionally, the feature extraction unit 1202 is specifically configured to:

sequentially extracting features of the semantic probability distribution information through n first convolution layers to obtain n layers of semantic features respectively corresponding to the n first convolution layers;

the first feature fusion unit 1203 includes:

the convolution unit is used for convolving the i-1 th layer fusion semantic features by using an i-th layer second convolution layer to obtain i-th layer image features, wherein the total layer number of the second convolution layer is n, i is a positive integer less than or equal to n, and the image to be processed is used as 0-th layer fusion semantic features;

the feature fusion subunit is used for carrying out feature fusion on the ith layer image features and the ith layer semantic features through the ith layer spatial feature transformation convolution layer to obtain the ith layer fusion semantic features until the nth layer fusion semantic features are obtained, and the total layer number of the spatial feature transformation convolution layer is n.

Optionally, the feature fusion subunit includes:

the feature conversion unit is used for converting the ith layer semantic features into a first conversion parameter and a second conversion parameter through convolution operation of the ith layer spatial feature conversion convolution layer;

And the feature transformation unit is used for transforming the convolution layer through the spatial feature of the ith layer and carrying out feature transformation on the image feature of the ith layer according to the first transformation parameter and the second transformation parameter.

Optionally, the third feature fusion unit 1206 includes:

the encoding and decoding operation unit is used for encoding and decoding the global feature and the position coding information of the image to be processed so as to update the global feature to obtain a global fusion feature;

and the feature fusion subunit is used for fusing the global fusion feature and the local fusion feature to obtain a target fusion semantic feature.

Optionally, the codec operation unit is specifically configured to:

encoding and decoding the global features and the position coding information of the image to be processed to determine updating features corresponding to all area blocks in the global features, wherein all the updating features form global fusion features; in the determining process of the update feature corresponding to the jth region block in the features of each region block, determining the correlation weight of the other region block to the jth region block according to the position coding information of the jth region block and the position coding information of other region blocks, as well as the features of the other region blocks and the features of the jth region block, and determining the update feature corresponding to the jth region block according to the correlation weight, the features of the other region blocks and the features of the jth region block, wherein the jth region block is any region block in the global feature.

Optionally, the coefficient determining unit 1207 includes:

the first coefficient determining unit is used for obtaining a color adjustment coefficient of a first size according to the target fusion semantic features;

a reconstruction unit, configured to reconstruct the color adjustment coefficient of the first size to obtain a bilateral grid coefficient;

and the upsampling unit is used for upsampling the bilateral grid coefficients in a slicing mode to obtain color adjustment coefficients corresponding to each pixel point of the image to be processed.

Optionally, the upstroke unit includes:

the convolution unit is used for carrying out convolution operation on the image to be processed to obtain a bilateral grid guide graph;

and the upsampling subunit is used for upsampling the bilateral grid coefficients in a slicing way based on the bilateral grid guide map to obtain color adjustment coefficients corresponding to each pixel point of the image to be processed.

Based on the training method of the image processing model provided by the embodiment, the embodiment of the application also provides a training device of the image processing model, referring to fig. 13, fig. 13 is a structural block diagram of the training device of the image processing model provided by the embodiment of the application, where the image processing model is used to execute the image processing method, and the training device of the image processing model may include:

training unit 1301 is configured to train an initial model to obtain the image processing model by using a reference image and a degraded image of the reference image, in the training process, input the degraded image of the reference image into the initial model to perform image processing to obtain a predicted image, construct a loss function based on the reference image and the predicted image, and train the initial model to the image processing model according to the loss function.

Optionally, the apparatus further includes:

a coefficient generation unit for randomly generating a plurality of groups of degradation coefficients for the reference image, the degradation coefficients including a brightness adjustment coefficient, a contrast adjustment coefficient, and a saturation adjustment coefficient;

and the color degradation operation execution unit is used for carrying out color degradation operation on the reference image according to the multiple groups of degradation coefficients to obtain multiple groups of degraded images of the reference image, wherein the color degradation operation comprises brightness adjustment operation, contrast adjustment operation and saturation adjustment operation.

Optionally, the apparatus further includes:

an operation order generation unit configured to randomly generate an operation order for the color degradation operation, the operation order being used to indicate an execution order of the brightness adjustment operation, the contrast adjustment operation, and the saturation adjustment operation.

The embodiment of the application also provides a computer device, which is the computer device introduced above, and can comprise a terminal device or a server, wherein the image processing device and the training device of the image processing model can be configured in the computer device. The computer device is described below with reference to the accompanying drawings.

If the computer device is a terminal device, please refer to fig. 14, an embodiment of the present application provides a terminal device, which is exemplified by a mobile phone:

fig. 14 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided by an embodiment of the present application. Referring to fig. 14, the mobile phone includes: radio Frequency (RF) circuitry 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuitry 1460, wireless fidelity (WiFi) module 1470, processor 1480, and power supply 1490. It will be appreciated by those skilled in the art that the handset construction shown in fig. 14 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 14:

the RF circuit 1410 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the downlink information is processed by the processor 1480; in addition, the data of the design uplink is sent to the base station.

The memory 1420 may be used to store software programs and modules, and the processor 1480 performs various functional applications and data processing of the cellular phone by executing the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432.

The display unit 1440 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1440 may include a display panel 1441.

The handset can also include at least one sensor 1450, such as a light sensor, motion sensor, and other sensors.

Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between the user and the handset.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1470, so that wireless broadband Internet access is provided for the user.

The processor 1480 is a control center of the handset, connects various parts of the entire handset using various interfaces and lines, performs various functions of the handset and processes data by running or executing software programs and/or modules stored in the memory 1420, and invoking data stored in the memory 1420.

The handset also includes a power supply 1490 (e.g., a battery) that provides power to the various components.

In this embodiment, the processor 1480 included in the terminal apparatus also has the following functions:

Or alternatively, the first and second heat exchangers may be,

training an initial model by using a reference image and a degradation image of the reference image to obtain an image processing model, inputting the degradation image of the reference image into the initial model to perform image processing in the training process to obtain a predicted image, constructing a loss function based on the reference image and the predicted image, training the initial model into the image processing model according to the loss function, and performing the image processing method by using the image processing model.

If the computer device is a server, as shown in fig. 15, fig. 15 is a block diagram of a server 1500 according to an embodiment of the present application, where the server 1500 may have a relatively large difference due to different configurations or performances, and may include one or more processors 1522, such as a central processing unit (Central Processing Units, abbreviated as CPU), a memory 1532, one or more storage media 1530 (such as one or more mass storage devices) storing application programs 1542 or data 1544. Wherein the memory 1532 and the storage medium 1530 may be transitory or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a series of instruction operations on the server. Still further, a processor 1522 may be provided in communication with the storage medium 1530, executing a series of instruction operations on the server 1500 in the storage medium 1530.

The Server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input/output interfaces 1558, and/or one or more operating systems 1541, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 15.

In addition, the embodiment of the application also provides a storage medium for storing a computer program for executing the method provided by the embodiment.

The embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method provided by the above embodiments.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-only Memory (ROM), RAM, magnetic disk or optical disk, etc.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

The foregoing is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. An image processing method, the method comprising:

2. The method according to claim 1, wherein the feature extraction of the semantic probability distribution information to obtain semantic features includes:

the step of carrying out feature fusion on the image features of the image to be processed and the semantic features to obtain fusion semantic features comprises the following steps:

convolving the i-1 th layer fusion semantic features by using an i-th layer second convolution layer to obtain i-th layer image features, wherein the total layer number of the second convolution layer is n, i is a positive integer less than or equal to n, and the image to be processed is used as 0-th layer fusion semantic features;

and carrying out feature fusion on the ith image features and the ith semantic features through an ith spatial feature transformation convolution layer to obtain the ith fusion semantic features until the nth fusion semantic features are obtained, wherein the total layer number of the spatial feature transformation convolution layer is n.

3. The method according to claim 2, wherein the feature fusion of the ith image feature and the ith semantic feature by the ith spatial feature transformation convolution layer to obtain an ith fused semantic feature comprises:

converting the ith layer semantic features into first transformation parameters and second transformation parameters through convolution operation of an ith layer spatial feature transformation convolution layer;

and performing feature transformation on the ith layer image features according to the first transformation parameters and the second transformation parameters through an ith layer spatial feature transformation convolution layer.

4. A method according to any one of claims 1-3, wherein the fusing the global feature and the local fusion feature to obtain a target fusion semantic feature comprises:

performing encoding and decoding operations on the global features and the position coding information of the image to be processed so as to update the global features to obtain global fusion features;

and fusing the global fusion feature and the local fusion feature to obtain a target fusion semantic feature.

5. The method of claim 4, wherein encoding and decoding the global feature and the position coding information of the image to be processed to fuse different regional block features in the global feature to obtain a global fusion feature, comprises:

6. A method according to any one of claims 1-3, wherein said deriving color adjustment coefficients for the image to be processed from the target fusion semantic features comprises:

obtaining a color adjustment coefficient of a first size according to the target fusion semantic features;

reconstructing the color adjustment coefficient of the first size to obtain a bilateral grid coefficient;

And up-sampling the bilateral grid coefficient in a slicing mode to obtain color adjustment coefficients corresponding to all pixel points of the image to be processed.

7. The method of claim 6, wherein the upsampling the bilateral mesh coefficients by slicing to obtain color adjustment coefficients corresponding to each pixel of the image to be processed comprises:

performing convolution operation on the image to be processed to obtain a bilateral grid guide map;

and based on the bilateral grid guide graph, upsampling the bilateral grid coefficient in a slicing mode to obtain color adjustment coefficients corresponding to all pixel points of the image to be processed.

8. A method of training an image processing model for performing the image processing method of any one of claims 1-7, the method comprising:

9. The method of claim 8, wherein the method further comprises:

randomly generating a plurality of groups of degradation coefficients for the reference image, wherein the degradation coefficients comprise a brightness adjustment coefficient, a contrast adjustment coefficient and a saturation adjustment coefficient;

and performing color degradation operation on the reference image according to the multiple groups of degradation coefficients to obtain multiple groups of degraded images of the reference image, wherein the color degradation operation comprises brightness adjustment operation, contrast adjustment operation and saturation adjustment operation.

10. The method according to claim 9, wherein the method further comprises:

an operation order for indicating an execution order of the brightness adjustment operation, the contrast adjustment operation, and the saturation adjustment operation is randomly generated for the color degradation operation.

11. An image processing apparatus, characterized in that the apparatus comprises:

12. A training apparatus of an image processing model for performing the image processing method of any one of claims 1 to 7, the apparatus comprising:

13. A computer device, the computer device comprising a processor and a memory:

the processor is configured to perform the image processing method according to any one of claims 1-7 or the training method of the image processing model according to any one of claims 8-10 according to instructions in the computer program.

14. A computer-readable storage medium storing a computer program for executing the image processing method according to any one of claims 1 to 7 or the training method of the image processing model according to any one of claims 8 to 10.

15. A computer program product comprising a computer program, characterized in that it, when run on a computer device, causes the computer device to perform the image processing method of any one of claims 1-7 or the training method of the image processing model of any one of claims 8-10.