CN116452466A - Image processing method, device, equipment and computer readable storage medium - Google Patents

Image processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN116452466A
CN116452466A CN202310703061.3A CN202310703061A CN116452466A CN 116452466 A CN116452466 A CN 116452466A CN 202310703061 A CN202310703061 A CN 202310703061A CN 116452466 A CN116452466 A CN 116452466A
Authority
CN
China
Prior art keywords
image
feature
dictionary
processing
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310703061.3A
Other languages
Chinese (zh)
Other versions
CN116452466B (en
Inventor
夏致冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202310703061.3A priority Critical patent/CN116452466B/en
Publication of CN116452466A publication Critical patent/CN116452466A/en
Application granted granted Critical
Publication of CN116452466B publication Critical patent/CN116452466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The application provides an image processing method, an image processing device and a computer readable storage medium, which can be used for image restoration processing. The method comprises the following steps: performing image preprocessing on an image to be processed to obtain a first image; invoking a target image processing model to extract image features of each image region in N image regions of the first image, and invoking the target image processing model to perform first fusion processing on the image features of each image region and dictionary features corresponding to each image region to obtain reconstruction features corresponding to each image region; and calling a target image processing model to perform second fusion processing on the image characteristics of each image area and the reconstruction characteristics corresponding to each image area to obtain a target reconstruction image corresponding to the image to be processed, wherein the target reconstruction image comprises reconstruction images corresponding to each image area. By implementing the embodiment of the application, reconstruction processing can be carried out on different image areas based on the corresponding dictionary features, so that the image restoration effect is improved.

Description

Image processing method, device, equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of computer vision, and more particularly, to an image processing method, apparatus, device, and computer readable storage medium.
Background
With the development of electronic equipment and computer technology, the quality requirements of users on images and videos are gradually improved, so that the image restoration technology is gradually called one of the most important image processing technologies in the field of computer vision and is widely applied to tasks such as face recovery and super-resolution processing. The image restoration technique is a technique for restoring a degraded image to its original purpose by using a priori knowledge of the degraded image. The degraded image is affected by factors such as imperfect imaging system, equipment limitation, transmission loss and the like, and the image lacks details and has lower quality. And the restored image comprises more details, so that the definition of the image is improved.
Currently, in order to improve the definition of an image, a technician can filter the image through a filter to eliminate noise in the image, so as to achieve the purpose of improving the image quality. However, with this processing method, pseudo-texture may be generated, and the image restoration effect is poor.
Therefore, how to improve the image restoration effect is a technical problem to be solved.
Disclosure of Invention
Embodiments of the present application provide an image processing method, apparatus, device, and computer readable storage medium, which can reduce pseudo textures in a degraded image, thereby being beneficial to improving the image restoration effect.
In a first aspect, embodiments of the present application provide an image processing method, which may be performed by an electronic device, or by a module in an electronic device, such as a chip or a processor. The method may include: image preprocessing is carried out on an image to be processed to obtain a first image, wherein the first image comprises N image areas, and N is an integer greater than 1; invoking a target image processing model to extract image features of each image region in the N image regions, and invoking the target image processing model to perform first fusion processing on the image features of each image region and dictionary features corresponding to each image region to obtain reconstruction features corresponding to each image region; and calling a target image processing model to perform second fusion processing on the image characteristics of each image area and the reconstruction characteristics corresponding to each image area to obtain a target reconstruction image corresponding to the image to be processed, wherein the target reconstruction image corresponding to the image to be processed comprises the reconstruction images corresponding to each image area.
According to the method provided by the first aspect, in the process of processing the image to be processed, the target image processing model is used for reconstructing different image areas based on dictionary features corresponding to the different image areas, on one hand, the processing is respectively carried out based on the different image areas, so that the flexibility and the difference of the image restoration processing are improved, and especially, scenes with different degrees of the image restoration processing of the different image areas in the image to be processed are required, and therefore the requirements of users can be met; on the other hand, the dictionary features corresponding to the image areas are fused, so that the dictionary features are matched more accurately, pseudo textures which should not occur in the image restoration process are reduced to a certain extent, the definition of the image to be processed is improved, and the image restoration effect is improved.
With reference to the first aspect, in one implementation, the target image processing model includes a cross-attention module; the dictionary features corresponding to the image area n comprise a plurality of reference dictionary features; the cross attention module is used for determining the similarity between the image feature k of the image region n and a plurality of reference dictionary features corresponding to the image region n to obtain a plurality of similarities, and performing first fusion processing on the plurality of reference dictionary features corresponding to the image region n according to the plurality of similarities to obtain reconstruction features corresponding to the image region n; the image region N is any one of the N image regions, and the image feature k is an image feature of the extracted image region N. In the embodiment of the application, the image restoration processing is performed on different image areas in the image to be processed based on the reference dictionary features included in the dictionary features corresponding to the image areas, so that the image restoration processing is more flexible, and the method and the device can be particularly applied to scenes with different degrees of image restoration processing on different area images of the image to be processed. And the fusion processing is carried out based on each reference dictionary feature in the dictionary features instead of adopting one reference dictionary feature in the dictionary features, so that fusion results obtained by fusion are more diversified, the reference dictionary features in the dictionary features can be fully utilized, a plurality of different features can be represented by using fewer reference dictionary features, and the dictionary features are more flexible and high in applicability.
With reference to the first aspect, in one implementation manner, the target image processing model includes a feature fusion module; the feature fusion module is used for acquiring a superdivision coefficient corresponding to the image region n, and carrying out second fusion processing on the image feature k of the image region n and the reconstruction feature corresponding to the image region n based on the superdivision coefficient to obtain a reconstruction image corresponding to the image region n, wherein the superdivision coefficient corresponding to the image region n is used for indicating the reconstruction degree of the image region n; the image region N is any one of the N image regions, and the image feature k is an image feature of the extracted image region N. In the embodiment of the present application, the image restoration processing is performed on different image areas based on different superdivision coefficients, which may be applicable to scenes with different degrees of image restoration processing required for different image areas in an image to be processed, for example, if the blur degree of a part of the image areas is high, the degree of restoration required is higher, otherwise, the degree of restoration required is lower. And the image restoration processing of different degrees can be carried out on different image areas according to the requirements of users, so that the effect of the image restoration processing is improved, the applicability is higher, and the requirements of the users can be met.
With reference to the first aspect, in an embodiment, the method further includes: performing image preprocessing on the initial training sample set to obtain a training sample set; the training sample set comprises M training image areas; the training image areas M1 in the M training image areas are the same as the characteristic categories of the image areas N1 in the N image areas, and M is an integer larger than 1; training an initial reconstruction model based on the training sample set and the initial dictionary feature set to obtain a first loss parameter; according to the first loss parameters, adjusting model parameters of an initial reconstruction model and an initial dictionary feature set to obtain an adjusted dictionary feature set and a reference reconstruction model, wherein the adjusted dictionary feature set comprises dictionary features corresponding to each training image area in M training image areas; training the reference reconstruction model based on the training sample set to obtain a second loss parameter; when the second loss parameter satisfies the training end condition, it is determined that the dictionary feature corresponding to the image area n1 is the dictionary feature corresponding to the training image area m 1. In the embodiment of the application, the effectiveness of dictionary features can be gradually improved by continuously adjusting the feature vectors included in the initial dictionary feature set in the iterative training process of the model, and different image areas correspond to one dictionary feature respectively, so that the dictionary features corresponding to the image areas can be more accurately matched in the subsequent image restoration processing, the difficulty of matching the dictionary features in the image restoration processing is reduced, and the effect of the image restoration processing can be improved to a certain extent.
With reference to the first aspect, in one implementation manner, performing image preprocessing on an image to be processed to obtain a first image includes: carrying out alignment treatment on the image to be treated according to the reference standard image corresponding to the image to be treated to obtain an alignment image; extracting brightness components of the aligned images to obtain brightness component images; and extracting a high-frequency characteristic image in the brightness component image to obtain a first image. In the embodiment of the application, by performing alignment processing and extracting the brightness component and the high-frequency characteristic information on the image to be processed, on one hand, the low-frequency part in the image to be processed is not involved in the image restoration processing, and only the high-frequency part involved in the detail information is subjected to the image restoration processing, so that the content of the main part of the original image (the image to be processed) is not changed in the image restoration processing, and for example, when the image to be processed is a face image, the face image of the same person as the face image of the person included in the result obtained after the image restoration processing can be ensured; on the other hand, the image restoration processing can be made not to relate to the color in the image to be processed, and therefore, the obtained result of the image restoration processing does not change the color of the original image (the image to be processed), thereby being beneficial to improving the effect of the image restoration processing.
With reference to the first aspect, in one embodiment, the reference standard image corresponding to the image to be processed includes position information of P preset key points, where P is an integer greater than or equal to 1; performing alignment processing on the image to be processed according to a reference standard image corresponding to the image to be processed to obtain an aligned image, including: identifying position information of P key points in an image to be processed; according to the position information of the P preset key points and the position information of the P key points, carrying out alignment treatment on the image to be processed to obtain an alignment image; the position information of the key point P in the alignment image is the same as the position information of the preset key point corresponding to the key point P, and the key point P is any one of the P key points. In the embodiment of the application, the alignment processing is performed on the image to be processed, so that the positions of different image areas are fixed, the image characteristics are extracted by the target image processing model more conveniently, and the effect of the image restoration processing of the subsequent processing is improved.
With reference to the first aspect, in an embodiment, the method further includes: extracting a low-frequency characteristic image in the brightness component image, and combining a target reconstruction image corresponding to the image to be processed with the low-frequency characteristic image to obtain a reconstruction brightness component image; extracting a chrominance component in the aligned image to obtain a chrominance component image; combining the reconstructed luminance component image with the chrominance component image to obtain a transformed reconstructed image; and performing anti-alignment processing on the transformed reconstructed image to obtain a target restoration image corresponding to the image to be processed. In the embodiment of the application, during the image restoration processing, only the brightness component and the high-frequency information in the aligned image obtained by the alignment processing of the image to be processed are subjected to the image restoration processing, and the obtained target reconstruction image can be subjected to the reverse processing according to the image preprocessing, so that the result image of the image restoration processing is obtained.
In a second aspect, an embodiment of the present application provides an electronic device, where the electronic device includes: one or more processors and memory; the memory is coupled to the one or more processors, the memory for storing computer program code, the computer program code comprising computer instructions that the one or more processors call to cause the electronic device to perform:
image preprocessing is carried out on an image to be processed to obtain a first image, wherein the first image comprises N image areas, and N is an integer greater than 1; invoking a target image processing model to extract image features of each image region in the N image regions, and invoking the target image processing model to perform first fusion processing on the image features of each image region and dictionary features corresponding to each image region to obtain reconstruction features corresponding to each image region; and calling a target image processing model to perform second fusion processing on the image characteristics of each image area and the reconstruction characteristics corresponding to each image area to obtain a target reconstruction image corresponding to the image to be processed, wherein the target reconstruction image corresponding to the image to be processed comprises the reconstruction images corresponding to each image area.
With reference to the second aspect, in one embodiment, the target image processing model includes a cross-attention module; the dictionary features corresponding to the image area n comprise a plurality of reference dictionary features; the cross attention module is used for determining the similarity between the image feature k of the image region n and a plurality of reference dictionary features corresponding to the image region n to obtain a plurality of similarities, and performing first fusion processing on the plurality of reference dictionary features corresponding to the image region n according to the plurality of similarities to obtain reconstruction features corresponding to the image region n; the image region N is any one of the N image regions, and the image feature k is an image feature of the extracted image region N.
With reference to the second aspect, in one embodiment, the target image processing model includes a feature fusion module; the feature fusion module is used for acquiring a superdivision coefficient corresponding to the image region n, and carrying out second fusion processing on the image feature k of the image region n and the reconstruction feature corresponding to the image region n based on the superdivision coefficient to obtain a reconstruction image corresponding to the image region n, wherein the superdivision coefficient corresponding to the image region n is used for indicating the reconstruction degree of the image region n; the image region N is any one of the N image regions, and the image feature k is an image feature of the extracted image region N.
With reference to the second aspect, in one embodiment, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: performing image preprocessing on the initial training sample set to obtain a training sample set; the training sample set comprises M training image areas; the training image areas M1 in the M training image areas are the same as the characteristic categories of the image areas N1 in the N image areas, and M is an integer larger than 1; training an initial reconstruction model based on the training sample set and the initial dictionary feature set to obtain a first loss parameter; according to the first loss parameters, adjusting model parameters of an initial reconstruction model and an initial dictionary feature set to obtain an adjusted dictionary feature set and a reference reconstruction model, wherein the adjusted dictionary feature set comprises dictionary features corresponding to each training image area in M training image areas; training the reference reconstruction model based on the training sample set to obtain a second loss parameter; when the second loss parameter satisfies the training end condition, it is determined that the dictionary feature corresponding to the image area n1 is the dictionary feature corresponding to the training image area m 1.
With reference to the second aspect, in one embodiment, the one or more processors are configured to call the computer instructions to cause the electronic device to perform image preprocessing on an image to be processed to obtain a first image, specifically configured to perform: carrying out alignment treatment on the image to be treated according to the reference standard image corresponding to the image to be treated to obtain an alignment image; extracting brightness components of the aligned images to obtain brightness component images; and extracting a high-frequency characteristic image in the brightness component image to obtain a first image.
With reference to the second aspect, in one embodiment, the reference standard image corresponding to the image to be processed includes position information of P preset key points, where P is an integer greater than or equal to 1; the one or more processors are configured to call the computer instruction to cause the electronic device to perform alignment processing on an image to be processed according to a reference standard image corresponding to the image to be processed, so as to obtain an aligned image, and specifically is configured to perform: identifying position information of P key points in an image to be processed; according to the position information of the P preset key points and the position information of the P key points, carrying out alignment treatment on the image to be processed to obtain an alignment image; the position information of the key point P in the alignment image is the same as the position information of the preset key point corresponding to the key point P, and the key point P is any one of the P key points.
With reference to the second aspect, in one embodiment, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: extracting a low-frequency characteristic image in the brightness component image, and combining a target reconstruction image corresponding to the image to be processed with the low-frequency characteristic image to obtain a reconstruction brightness component image; extracting a chrominance component in the aligned image to obtain a chrominance component image; combining the reconstructed luminance component image with the chrominance component image to obtain a transformed reconstructed image; and performing anti-alignment processing on the transformed reconstructed image to obtain a target restoration image corresponding to the image to be processed.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors and one or more memories; the one or more processors are coupled with the one or more memories, the one or more memories being configured to store computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method according to the first aspect or any implementation of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on an electronic device, cause the electronic device to perform the method according to the first aspect or any implementation of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer storage medium storing computer software instructions for an electronic device provided in the second aspect, where the computer storage medium includes a program designed to execute the method according to the second aspect.
In a sixth aspect, embodiments of the present application provide a computer program comprising computer instructions which, when executed by a computer, enable the computer program to perform the process performed by the electronic device in the second aspect.
In a seventh aspect, the present application provides a chip system comprising a processor for supporting an electronic device to implement the functions referred to in the first aspect above, e.g. to generate or process information referred to in the image processing method above. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the data transmission device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
Drawings
In order to more clearly describe the technical solutions in the embodiments or the background of the present application, the following description will describe the drawings that are required to be used in the embodiments or the background of the present application.
Fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present application.
Fig. 2 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a software system according to an embodiment of the present application.
Fig. 4 is a flowchart of an image processing method according to an embodiment of the present application.
Fig. 5 is a schematic diagram of an alignment process according to an embodiment of the present application.
Fig. 6 is a schematic diagram of a segmentation process according to an embodiment of the present application.
Fig. 7 is a timing diagram of an image processing method according to an embodiment of the present application.
Fig. 8 is a timing diagram of image restoration of a face image according to an embodiment of the present application.
Fig. 9 is a schematic flow chart of constructing dictionary features according to an embodiment of the present application.
FIG. 10 is a timing diagram for constructing dictionary features provided by embodiments of the present application.
Fig. 11 is a timing diagram for constructing a face dictionary feature according to an embodiment of the present application.
Fig. 12 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application are suitable for the following explanation:
1. Image restoration
Image restoration, which may also be referred to as image restoration, refers to performing image restoration processing on a degraded image to restore the original purpose of the image. The degraded image is an image of which the image quality cannot meet the expected requirement due to the influences of factors such as imperfect imaging system, equipment limitation, imperfect transmission medium and the like. The image restoration is to remove the blurred part in the degraded image by some deblurring means so as to restore the original purpose of the image, and mainly adopts a mode of repairing or reconstructing the degraded image based on priori knowledge in the degradation process, eliminating the blurred part in the degraded image, and increasing the details of the degraded image so as to improve the definition of the degraded image, thereby improving the quality of the degraded image.
In the following embodiments of the present application, the term "image restoration" may also be referred to as "image restoration" or "image reconstruction", and when an image including a face is subjected to image restoration processing, the term "image reconstruction" may also be referred to as "face reconstruction", "face restoration", or the like. The meaning of the representation thereof has been described in the above explanation, and the words used are not to be interpreted as limiting the present embodiment.
In the embodiment of the application, the image to be processed is a degraded image, the image characteristics of the image to be processed can be extracted through the target image processing model, and the image restoration processing is carried out based on the image characteristics of the image to be processed, so that a target restoration image is obtained, and compared with the image main body information of the image to be processed, the target restoration image is unchanged, and the definition and the image quality of the image are improved.
2. Dictionary features
Dictionary features, which may also be referred to as dictionaries (Dictionary), discrete dictionaries, etc., are analogous to dictionaries or dictionaries in everyday life and are a collection of features (feature vectors), each feature vector is a multidimensional vector and is used to represent a feature (such as an image feature), for example, a Dictionary feature includes a plurality of 512-dimensional feature vectors. The dictionary feature includes a plurality of features that can be analogous to words in a dictionary or words in a dictionary in daily life, any sentence can be composed of a plurality of words or words, and likewise, a sample (image feature) can be represented by one or more feature vectors included in the dictionary feature. Thus, dictionary features can be understood as a reduced-dimension representation of a vast library of features, extracting features of the nature of things (e.g., images).
Among them, the stage (Dictionary Generate) of constructing dictionary features (which may also be referred to as dictionary construction, dictionary creation, dictionary restoration, etc.) and the stage (Sparse coding with a precomputed dictionary) of representing samples with feature vectors included in the dictionary features may be referred to as dictionary learning (Dictionary Learning). In the stage of constructing dictionary features, the objective of construction is to make each feature vector in the dictionary features sparse. In mathematical expression, the sparse vector refers to that most elements in the vector are zero elements, and because the sparse vector is used for removing irrelevant redundant information, the included information is the most important and essential information, so that the sparse representation of each sample (such as image characteristics) can be completed by using one or more characteristic vectors in dictionary characteristics, namely, the sample (such as image characteristics) to be represented is restored as much as possible by using the sparse vector, the knowledge as much as possible can be expressed by using as few resources as possible, and therefore, the image characteristics can be subjected to sparse representation by using one or more characteristics in the dictionary characteristics and can be used for the task of image restoration.
In the embodiment of the application, the dictionary feature is one of a plurality of dictionary features included in the dictionary feature set, and each dictionary feature in the dictionary feature set may correspond to a different image area. For example, in an image including a face, image areas of different parts of the face may each correspond to a dictionary feature, such as: hair dictionary features corresponding to the hair image areas, eye dictionary features corresponding to the eye image areas, and the like, and dictionary features corresponding to the image areas of all parts of the human face are image feature sets. One or more feature vectors included in the dictionary features may be referred to as reference dictionary features, and since one or more of the dictionary features may be used to represent image features, and each of the dictionary features is derived based on image features of a better image (or image region), the target image processing model may be invoked to implement image restoration based on the dictionary feature set.
With the development of electronic devices and computer technologies, particularly those related to computer vision, the quality requirements of images are gradually increased, and an important index for evaluating the quality of images is the definition of the images. At present, the electronic equipment is influenced by factors such as imperfect imaging system, equipment limitation, transmission loss and the like, the image lacks details, and the definition is not high. For example, people take pictures through electronic devices such as mobile phones, and the signal-to-noise ratio of the taken pictures is low due to the hardware limitation of the mobile phones, especially in the scene of backlight shooting and night shooting. Currently, in order to increase the lack of image details and improve the definition of an image, image restoration processing may be performed on the image to be restored. At present, the restoration of an image can be achieved in different ways. For example, the restoration of the image may be realized based on a deep convolutional neural network in the process of performing restoration processing on the image by a deep learning method. However, the deep learning method may generate too much details, and the generated result is unstable, which may cause pseudo-textures to appear in the restored image or cause the subject content of the original image to change. For another example, the filter processing can be performed on the image to be restored to remove noise of the image, thereby improving the definition and the quality of the image. However, this approach may have limited improvement in image sharpness and poor image restoration.
Based on this, the embodiment of the application provides an image processing method, which can perform image restoration processing on an image to be restored, increase details of the image, improve definition of the image, and simultaneously not change main content in the image, and reduce occurrence of pseudo textures, thereby being beneficial to improving the image restoration effect and meeting the requirement of user image restoration.
In order to facilitate understanding of the embodiments of the present application, one of the image processing system architectures on which the embodiments of the present application are based is described below.
Referring to fig. 1, fig. 1 is a schematic diagram of an image processing system architecture according to an embodiment of the present application. As shown in fig. 1, the image processing system includes an image input device 101, an image restoration device 102, a dictionary feature construction device 103, and an image acquisition device 104. The image input device 101 may be directly or indirectly connected to the image restoration device 102 through a wired or wireless manner, the image restoration device 102 may be directly or indirectly connected to the dictionary feature construction device 103 through a wired or wireless manner, and the dictionary feature construction device 103 may be directly or indirectly connected to the image acquisition device 104 through a wired or wireless manner.
In one implementation, the image input device 101 may be the same device as any two or more of the image restoration device 102, the dictionary feature construction device 103, and the image capture device 104. For example, the image input device 101 and the image restoration device 102 may be the same device, and for example, the dictionary feature construction device 103 and the image acquisition device 104 may be the same device. For another example, the image input device 101 is the same device as the image restoration device 102 and the dictionary feature construction device 103. For another example, the image input device 101 may be the same device as the image restoration device 102, the dictionary feature construction device 103, and the image acquisition device 104.
It should be noted that the number and the form of the devices shown in fig. 1 are used as examples, and are not limited to the embodiments of the present application, and the image recognition system may include two devices, such as an image input device and an image restoration device, in practical application, or may include one device, such as an image restoration device, in practical application. The image recognition system may in practice also comprise at least one image input device, at least one image restoration device, etc. The embodiment of the present application is drawn and explained taking an image input apparatus 101, an image restoration apparatus 102, a dictionary feature construction apparatus 103, and an image acquisition apparatus 104 as examples.
As shown in fig. 1, the image input by the image input device 101 is an image to be processed, where the image to be processed may be a degraded image acquired by the image input device 101, for example, the image input device 101 is an electronic device with a shooting function, the image to be processed may be an image shot by the image input device 101, and the image to be processed may also be an image received by the image input device 101 and sent from another device, which is not limited in this application. After the image input device 101 acquires the image to be processed, the image input device 101 may transmit the image to be processed to the image restoration device 102. The image restoration device 102 performs image preprocessing on the image to be processed to obtain a first image, and performs image restoration processing on the first image to obtain an image restoration result, namely, a target reconstructed image corresponding to the image to be processed. The image restoration device 102 may deploy a target image processing model for image restoration processing, and the image restoration device 102 may perform fusion processing on image features of different image areas included in the first image and dictionary features corresponding to each image area by calling the target image processing model to obtain reconstructed images corresponding to each image area, and then combine the reconstructed images corresponding to each image area to obtain a target reconstructed image.
Specifically, the image restoration device 102 performs fusion processing during image restoration processing of an image to be processed, including first fusion processing performed on image features of each image region and dictionary features corresponding to each image region, and second fusion processing performed on reconstructed features obtained by the first fusion processing and image features of each image region. Since the dictionary features are obtained based on the image features of the image (image area) with better image quality and higher definition, the image restoration device 102 can perform image restoration processing on the image to be processed based on the dictionary features corresponding to each area, so that the image quality of the image to be processed can be improved to a certain extent.
Alternatively, after acquiring the image to be processed, the image input device 101 may perform image preprocessing on the image to be processed to obtain a first image, and then transmit the first image to the image restoration device 102.
In the image processing system, the dictionary feature construction device 103 is configured to construct dictionary features, and may specifically be configured to construct a dictionary feature set including dictionary features corresponding to different image areas. Specifically, the dictionary feature construction apparatus 103 may initialize dictionary features to obtain initial dictionary features of each image area, and the dictionary feature construction apparatus 103 may adjust the initial dictionary features by means of model training. For example, the dictionary feature construction apparatus 103 may obtain a dictionary feature set satisfying the training end condition, that is, dictionary features corresponding to the respective image areas, by acquiring an image with higher image quality as a training sample set, and performing iterative training based on the training sample set through an initial reconstruction model, and adjusting model parameters and initial dictionary features of the initial reconstruction model during the training process.
The training sample set may be an image that is transmitted from the image capturing device 104 to the dictionary feature construction device 103, where each training sample image in the training sample set is an image with higher image quality, for example, the image capturing device 104 is a device with a function of capturing a high-definition image, and the training sample images in the training sample set may include images captured by the image capturing device 104. Each training sample image can comprise a plurality of image areas, and dictionary features corresponding to the plurality of image areas respectively, namely dictionary features used for carrying out image restoration processing on different image areas of the first image, are included in the training sample image. The image acquisition device 104 may acquire an initial training sample set, transmit the initial training sample set to the dictionary feature construction device 103, perform image preprocessing on the initial training sample set by the dictionary feature construction device 103, obtain a training sample set, and perform construction of dictionary features.
Optionally, after acquiring the initial training sample set, the image acquisition device 104 may perform image preprocessing on the initial training sample set to obtain a training sample set, and further transmit the training sample set to the dictionary feature construction device 103. It will be appreciated that this image preprocessing is the same as the image preprocessing described above for the image to be processed.
Any of the image input device 101, the image restoration device 102, the dictionary feature construction device 103, and the image acquisition device 104 may be an electronic device with a certain computing capability, a certain storage capability, and a certain communication resource, for example, but not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. Any one of the image input device 101, the image restoration device 102, the dictionary feature construction device 103, and the image collection device 104 may be a server, for example, an independent physical server (such as a central server), a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content distribution networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
It may be understood that, the image processing system described in the embodiments of the present application is for more clearly describing the technical solution of the embodiments of the present application, and does not constitute a limitation on the technical solution provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiments of the present application is equally applicable to similar technical problems. The image processing system architecture in fig. 1 is for example and is not to be construed as limiting the embodiments of the present application.
The electronic device according to the present application may be, for example, the above-described image input device 101 and image restoration device 102, dictionary feature construction device 103, and image acquisition device 104, or may be a module, such as a chip or a processor, in the image input device 101 and image restoration device 102, dictionary feature construction device 103, and image acquisition device 104.
The image processing method provided by the embodiment of the application can carry out image restoration processing on the degraded image. The image processing method can be applied to various scenes such as face recovery and image stylization so as to realize specific functions.
The application scene is an exemplary face recovery scene.
The face recovery scene refers to: and (3) performing image restoration processing on the degraded face image to restore the original face of the face image. It can be understood that the face image to be recovered is an image with lower definition, the image quality is lower, the face part in the face image is more blurred, the details of the face are required to be increased, and the information content in the face image is enriched, so that the image quality is improved. The face image may be an image including a face captured by an electronic device such as a mobile phone, and the signal-to-noise ratio of the face image captured is low due to the hardware limitation of the electronic device such as the mobile phone, such as the limitation of a lens, especially in special capturing scenes, such as backlight, night scenes, shake, and the like. However, one of the current methods for recovering the face image is to filter the face image through a filter to remove noise of the image, and the method has limited improvement on the definition of the image and poor face recovery effect; the other way is a deep learning method, and the face image is restored by the deep learning method, so that the face attribute in the face image can be changed, and the face in the face image and the face of the face image obtained after the image restoration processing can be understood as the face of the face which is not the same person, namely the main body content of the face image is changed, and the detail generated by the deep learning method is too much, so that the generated result is unstable, and pseudo textures appear in the face image obtained after the image restoration processing.
Therefore, the electronic device (such as the image restoration device 102) can perform image preprocessing on the face image to be processed, and call the target image processing model to perform fusion processing based on dictionary features corresponding to each image area in the preprocessed image to be processed and the extracted image features of each image area, so as to obtain a target reconstructed image corresponding to the image to be processed. In an exemplary embodiment, the electronic device may invoke the target processing model to perform fusion processing on the preprocessed image to be processed, such as the eye dictionary feature corresponding to the eye image area and the image feature of the eye image area, the nose dictionary feature corresponding to the nose image area and the image feature of the nose image area, and the mouth dictionary feature corresponding to the mouth image area and the image feature of the mouth image area, so as to obtain a face image with higher definition and better image quality. The dictionary features are obtained based on the image features of the image (image area) with better image quality, and the dictionary features corresponding to different image areas are used for carrying out image restoration processing on the face image to be processed, so that the definition and the image quality of the face image to be processed can be improved, and the main body part (low-frequency part) of the face image is not involved in the image restoration processing, so that the main body content in the image is unchanged while the definition and the image quality of the image are improved, and the image restoration effect can be improved.
The application scene is an image stylized scene, for example.
The image stylized scene refers to: an image is converted into an image having an artistic style, which may refer to a texture style, a hue style, a structural style, etc. of the image. It is understood that the artistic style image is different from the current image in style, and retains the content of the original image as much as possible after the image is stylized. The input of the image stylization is an original image and a style image, and the output is a result graph after style migration. The image style may include, among other things, cartoon style, pencil style, oil painting style, and the like. At present, one of the modes of carrying out image style migration on an image to be processed is to carry out feature fusion on the image features of an extracted original image and the image features of a style image, and the mode may cause that the fused migration image lacks part of texture style information due to the fact that the style image cannot accurately describe the potential features of an image in a certain artistic style, so that the migration effect is poor; the other mode is based on generating image stylization of an countermeasure network, and the mode may not consider the condition that the information amounts of the two image fields are unequal in the learning process of the data sets of the original style image fields and the data sets of the migrated style image fields, so that the image quality of the obtained migrated image is poor, and the requirement of the image stylization of a user cannot be met.
At this time, the electronic device (such as the image restoration device 102) may perform image preprocessing on the migration image to be processed, and call the target image processing model to perform fusion processing based on dictionary features corresponding to each image area in the image after image preprocessing, and the extracted image features of each image area, so as to obtain a target reconstructed image corresponding to the image to be processed. The dictionary features can be image features of images in artistic styles such as cartoon styles, pencil drawing styles, oil painting styles and the like, and the target reconstructed image is a result after style migration. The electronic device can perform fusion processing based on dictionary features of an artistic style and image features of each image area of the image after image preprocessing to obtain a target reconstructed image. Because the dictionary features can cover the features of texture styles, color tones and styles of different artistic styles, the image stylization processing based on the dictionary features is more flexible, and the requirements of the image stylization of a user can be met, thereby being beneficial to improving the effect of the image stylization processing.
It should be noted that the application scenario described above is used as an example, and may also be applied to other scenarios, which are not limited to the embodiments of the present application.
Based on the architecture of the image processing system provided in fig. 1, the embodiment of the present application further provides an electronic device, and the following description describes the structure of the electronic device 200. The electronic device 200 may be one or more of the image input device 101, the image restoration device 102, the dictionary feature construction device 103, and the image capture device 104 shown in fig. 1. Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 200 according to an embodiment of the present application.
The electronic device 200 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 200. In other embodiments of the present application, electronic device 200 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller may be a neural hub and a command center of the electronic device 200, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The electronic device 200 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 200 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 200 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the electronic device 200 may include 1 or N cameras 193, N being a positive integer greater than 1.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the electronic device 200 may be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
The software system of the electronic device 200 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In this embodiment, taking an Android system with a layered architecture as an example, a software structure of the electronic device 200 is illustrated. Fig. 3 is a software configuration block diagram of the electronic device 200 of the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.
The application layer may include a series of application packages.
As shown in fig. 3, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.
In some embodiments, the application program further includes a function of an image restoration application to implement an image restoration process. In other embodiments, the image restoration function may also be implemented in an application program such as a camera application or gallery application.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 3, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephony manager is used to provide the communication functions of the electronic device 200. Such as the management of call status (including on, hung-up, etc.).
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
Android run time includes a core library and virtual machines. Android run is responsible for scheduling and management of the Android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
In some embodiments of the present application, the image restoration device 102 may take a photo through a preset APP (for example, a camera APP), and receive an operation that triggers image restoration processing on the taken photo, where the operation may also indicate restoration procedures for different image areas of the taken photo, and information related to the operation is sent to the kernel layer. The kernel layer processes the operation into a trigger event. The application framework layer acquires a trigger event from the kernel layer, and sends the trigger event to the gallery APP, and the gallery APP triggers a process of performing image restoration processing on the shot photo. After the image restoration processing is finished, the gallery APP calls a display driver of the kernel layer through an interface of the application program framework layer to drive the display screen to display the result of the image restoration processing.
The technical problems presented in the present application are specifically analyzed and results based on the architecture of the image processing system provided in fig. 1, the hardware structure of the electronic device in fig. 2, the software structure of the electronic device in fig. 3, and the image processing method provided in the present application. Referring to fig. 4, fig. 4 is a flowchart of an image processing method according to an embodiment of the present application. The image processing method according to the embodiment of the present application may be performed by an electronic device, which may be the image restoration device 102 in the image processing system shown in fig. 1, and the image restoration device 102 may be the same device as the image input device 101, the dictionary feature construction device 103, and the image capturing device 104 described above. The method may comprise the following steps 401-403.
Step 401, the electronic device performs image preprocessing on an image to be processed to obtain a first image.
In the embodiment of the application, the image to be processed is an image needing image restoration processing, and the image to be processed can be an image with lower image definition, which can be simply called as a low-definition image, and the image is blurred and has poor image quality. The electronic device may acquire an image to be processed, for example, the electronic device is a device with a shooting function, the image to be processed may be an image shot by the electronic device, and the image to be processed may also be that the electronic device receives an image sent by other electronic devices. The image to be processed includes N image areas, N is an integer greater than 1, different image areas included in the image to be processed may respectively correspond to one feature class, for example, when the image to be processed is a face image, the face image may include image areas corresponding to different parts, each part may respectively be one feature class, for example, the face image includes a hair image area, the corresponding feature class is a hair class, the face image includes an eyebrow image area, the corresponding feature class is an eyebrow class, the face image further includes an eye image area, and the corresponding feature class is an eye class.
The image preprocessing performed by the electronic device on the image to be processed may include alignment processing and high-frequency extraction processing performed on the image to be processed. The alignment process may be understood as a process of transforming the image to be processed such that the positions of the key points included in the image to be processed are at preset positions, i.e., positions of the pixel points in the image to be processed are changed. The alignment image obtained by performing alignment processing on the image to be processed can facilitate the extraction of the image characteristics of each image area in the subsequent image restoration processing process, thereby being beneficial to improving the image restoration effect.
In one possible implementation, the electronic device performing the alignment process on the image to be processed may be referred to as aligning the image to be processed to a standard space, where the standard space may refer to a reference standard image including a plurality of preset key points, for example, P preset keys, where P is an integer greater than or equal to 1. The electronic equipment can perform alignment processing according to the reference standard image corresponding to the image to be processed to obtain an alignment image. Specifically, the electronic device may detect key points in the image to be processed, where the key points may be pixel points where image areas of different feature types in the image to be processed are located, and it should be noted that the number of the detected key points is also P, and the detecting the key points refers to obtaining position information of each key point. Alternatively, the keypoints of the image to be processed may be detected in different manners, which is not limited in this application, and for example, the image to be processed may be input into a keypoint detection model to obtain the keypoints in the image to be processed.
Further, the electronic device may acquire a corresponding reference standard image for the image to be processed, where the reference standard image may include preset key points, each preset key point corresponds to one piece of location information, and the number of preset key points may be the same as the number of key points, for example, the number of preset key points is P. The reference standard image may be an image with the same content as the image to be processed, or the reference standard image is an image with the same category as the image to be processed, for example, the image to be processed is a face image, and the reference standard image is also a face image. The image to be processed is a landscape image, and the reference standard image is also a landscape image. Furthermore, the electronic device may perform transformation processing on the image to be processed, for example, by performing transformation processing such as rotation, translation, clipping, scaling, and the like, to align each key point in the image to be processed to a position of a corresponding preset key point, thereby completing alignment processing to obtain an aligned image, and it can be understood that in the aligned image, the position information of the key point p is the same as the position information of the preset key point corresponding to the key point p. Here, the transformation process changes only the position of the pixel point, and does not change the pixel value of the pixel point, and this transformation may be referred to as Image Warp (Image Warp) or affine transformation.
Illustratively, explanation is given by taking an image to be processed as a face image. Referring to fig. 5, fig. 5 is a schematic diagram of an alignment process according to an embodiment of the present application. As shown in fig. 5, the upper left corner is an image to be processed, and the upper right corner is an acquired reference standard image corresponding to the image to be processed (drawing is performed by taking a preset key point included in the reference standard image as an example). The electronic equipment transforms the image to be processed so that the key points in the image to be processed are positioned at the preset key points in the reference standard image. Firstly, the electronic device detects key points in an image to be processed to obtain position information of a plurality of key points, such as a (x 1, y 1), B (x 2, y 2) and the like, wherein the a point can be, for example, a key point on an eye, and the B point can be, for example, a key point on a mouth, and two key points are taken as an example for explanation. Furthermore, the electronic device may obtain a reference standard image corresponding to the face image, where the reference standard image includes location information of a preset key point, such as gray dots in fig. 4, for example, a (100 ), B (200, 100), where a is a preset key point corresponding to the point a, and B is a preset key point corresponding to the point B. The image to be processed may be further transformed based on the position information of each key point and the position information of the preset key corresponding to each key point, so that the position information of the point a is the same as the position information of the point a, the position information of the point B is the same as the position information of the point B, that is, the point a is located at (100 ), and the point B is located at (200, 100). Since the subsequent processing is related to the features related to the shape texture and the like in the image and is unrelated to the position of the point, the image features can be conveniently extracted in the subsequent image restoration processing.
The high-frequency extraction process may be a process of processing an alignment image obtained by the above-described alignment process, and the high-frequency extraction process may be a process of converting the alignment image from a spatial domain to a frequency domain, and extracting a high-frequency feature image in the alignment image. Because the main components of the image are low-frequency information, the edge and detail of the image are determined by the high-frequency information, the extracted high-frequency characteristic image comprises the detail, edge and other information of the original image (such as the image to be processed), the subsequent image restoration processing does not involve the low-frequency part in the image to be processed, and the obtained image restoration processing result does not change the main content of the original image (such as the image to be processed), so that the image restoration effect is improved.
In one possible implementation manner, the electronic device may convert the aligned image from the spatial domain to the frequency domain, to obtain a frequency domain feature map corresponding to the image to be processed, where the frequency domain feature map may include frequency values of each pixel point. Furthermore, the electronic device may determine the high-frequency pixel point set as a high-frequency feature map corresponding to the image to be processed. Optionally, the electronic device may also convert the high-frequency pixel set to a spatial domain, to obtain a high-frequency feature map corresponding to the image to be processed. The electronic device may perform discrete wavelet transform (discrete wavelet transform, DWT) or discrete cosine transform (discrete cosine transform, DCT), discrete fourier transform (discrete fourier transform, DFT) on the image to be processed, or the like.
Alternatively, after converting the aligned image to the frequency domain, the electronic device may distinguish between the high frequency and the low frequency by setting a frequency threshold, for example, a pixel above the frequency threshold may be determined as a high frequency pixel, whereas a pixel below the frequency threshold may be determined as a pixel below the frequency threshold. The electronic device may also input the frequency feature map converted to the frequency domain into a pre-trained classification model that classifies the high frequency pixel points and the low frequency pixel points. The above-described method of obtaining the high-frequency characteristic map is merely an example, and the high-frequency characteristic map may be obtained by other methods, which is not limited in this application.
For example, the electronic device may perform wavelet transform processing on the image to be processed to obtain a 3-channel high-frequency feature map. The number of channels of the obtained high-frequency feature map may be related to the manner of the transformation processing, and 4 components of the image to be processed may be obtained through the wavelet transformation processing, for example, a low-frequency component LL in the horizontal and vertical directions to be processed, a high-frequency LH in the horizontal and vertical directions to be processed, a high-frequency HL in the horizontal and vertical directions, and a high-frequency component HH in the horizontal and vertical directions, whereby LH, HL, and HH may be regarded as 3-channel high-frequency feature maps corresponding to the image to be processed. It will be appreciated that the high frequency signature obtained by other means may also be single channel, and this application is not limited thereto, and the number of channels is not involved in the subsequent processing. Since the high-frequency information (high-frequency feature map) can be used to represent the edge, texture, and other detailed information of the image, and the low-frequency information can be used to represent the main information of the basic structure, the subsequent processing can only process the high-frequency feature map, so that the main information of the original image (image to be processed) is not changed, thereby being beneficial to improving the image restoration effect.
Alternatively, before the electronic device converts the aligned image from the spatial domain to the frequency domain, the luminance component of the image to be processed may be extracted, resulting in a luminance component image. Specifically, the electronic device may first convert the image to be processed into an image in YUV format, where a Y (luminence or Luma) channel represents brightness, that is, luminance information, and U and V represent chromaticity (Chroma) that is, information of color, hue, and the like of the image. Then, the electronic device may extract an image of the Y channel in the image in YUV format, resulting in a luminance component image. Further, the electronic device may extract a high-frequency characteristic image in the luminance component image, resulting in a first image. The first image is an image for the subsequent image restoration processing, and it can be understood that the first image is a high-frequency characteristic image, and the content in the image to be processed is not changed in extracting the high-frequency information and the brightness information, so that the first image includes N image areas in the image to be processed, and it can also be understood that the first image includes the high-frequency information and the brightness information of the N image areas, and one image area can respectively correspond to one characteristic category. Thus, the amount of calculation can be reduced when the image restoration processing is further performed, and the subsequent image restoration processing does not involve color processing, so that the color of the original image (image to be processed) is not changed in the process of image restoration, and the effect of image restoration can be improved.
The image preprocessing of the image to be processed may further include segmentation processing, where the segmentation processing refers to determining feature types of each image area in the image to obtain a segmented image, and the segmented image may be a parsing map image (serving map), which may be understood as an image obtained by segmenting the image areas of different feature types in the image to be processed. The feature classes of the different image areas in the first image can thus be determined based on the segmented image for subsequent image restoration processing. Alternatively, the segmentation process may be performed on the aligned image by the electronic device after the alignment process. The segmentation process may be semantic segmentation process or other segmentation process, and this is not limited in this application.
In one possible implementation manner, the electronic device performs segmentation processing on the image to be processed, which may be to classify all pixels included in the aligned image, for example, the aligned image is an image including a vehicle, a road, and a building, and the obtained segmented image may be divided into image areas with different colors, and the image area with one color may include a set of pixels of one object. For example, the divided image may include a vehicle image area, a road image area, and a building image area. The vehicle image area comprises a pixel point set of the vehicle, the road image area comprises pixel point combinations of the road, and the building image area comprises a pixel point set of the building.
Further, the electronic device may detect the feature class corresponding to each image area, for example, determine that the feature class of the vehicle image area is a vehicle class, the feature class of the road image area is a road class, and so on. Alternatively, the electronic device may input the image to be processed into a segmentation model, which may be, for example, a convolutional neural network or the like, resulting in a segmented image. Alternatively, the segmentation process may determine to distinguish specific object categories within the same feature category, e.g., the vehicle image area described above includes multiple vehicles, and then the image area for each vehicle may be determined.
Illustratively, explanation is given by taking an image to be processed as a face image. Referring to fig. 6, fig. 6 is a schematic diagram illustrating a segmentation result according to an embodiment of the present application. As shown in fig. 6, the upper side is an image to be processed, and the lower side is a result of the segmentation processing, that is, a segmented image (analysis map image). The electronic device may input the aligned image into a pre-trained face segmentation network, and segment the aligned image with the face segmentation network to obtain a segmented image. For example, the divided image includes a hair image area, an eyebrow image area, an eye image area, an ear image area, a mouth image area, a nose image area, and a skin image area, and the colors of the different image areas are different. It should be noted that, the segmentation processing is performed on the image to be processed, and the content of the image to be processed is not changed, and the image to be processed is processed only to determine the feature class corresponding to each image area (each pixel point).
Step 402, the electronic device invokes a target image processing model to extract image features of each image region of the N image regions included in the first image, invokes the target image processing model to perform a first fusion process on the image features of each image region and dictionary features corresponding to each image region, and obtains reconstruction features corresponding to each image region.
In the embodiment of the present application, the target image processing model may be a model that performs image restoration processing on an image to be processed, and the structure of the target image processing model may be composed of an encoder and a decoder, where the encoder may be used to extract image features of an input image, for example, may extract image features of a first image, where the image features are feature vectors; the decoder can be used for fusing the extracted image features with dictionary features to obtain an image restoration processing result. The dictionary feature is a dictionary feature corresponding to one of the image areas, and one dictionary feature can be understood as a dictionary or a dictionary in daily life, and one dictionary feature comprises a plurality of reference dictionary features. The dictionary feature is a dictionary feature corresponding to the image area, and can be understood as a dictionary feature of a feature class corresponding to the image area. For example, the feature class corresponding to the eye image area included in the first image is an eye class, the dictionary feature is an eye dictionary feature corresponding to the eye class, and the plurality of reference dictionary features included in the dictionary feature are feature vectors belonging to the eye class. The electronic device may perform fusion processing (first fusion processing) on the image feature of each image area and the reference dictionary feature included in the dictionary feature corresponding to each image area, so as to obtain a fusion result, that is, a reconstruction feature corresponding to each image area, where the reconstruction feature is an intermediate quantity in the target image processing model, and the reconstruction feature is used in subsequent image restoration processing.
The image features extracted by the target image processing model may be image features of each of N image areas included in the first image. Taking the image feature of an image area as an example for explanation, it can be understood that the extracted image feature of the image area is a feature vector corresponding to each pixel point in the image area, and the feature vector corresponding to each pixel point carries the position information of the pixel point. The image to be processed is a face image, and the eye image area of the high-frequency feature image (first image) of the image to be processed comprises 12 pixels, so that feature vectors corresponding to each pixel can be extracted through the target image processing model to obtain 12 feature vectors, and the 12 feature vectors are image features extracted from the eye image area.
After extracting the image features of the first image, the electronic device needs to acquire dictionary features corresponding to each image area in the N image areas included in the first image. Therefore, the electronic device can compare the position information carried by the extracted feature vector with the position information in the segmented image (such as the analysis mapping image) obtained by the segmentation processing, so as to determine the feature class of the feature vector, and further obtain the dictionary feature corresponding to the image area according to the feature class. Specifically, the image preprocessing includes a segmentation process, where the segmentation process may be used to determine a feature class of each pixel, and the feature class of each pixel further includes location information of each pixel. The electronic device may compare the position information carried by the extracted image features (feature vectors) with the position information of each pixel point, so as to determine a feature class corresponding to the extracted feature vectors, and may obtain dictionary features of the feature class based on the feature class.
By taking an image to be processed as a face image as an example, the electronic device may extract image features of N image areas included in a high-frequency feature image (first image) of the face image by calling an encoder in a target image processing model, and then compare the image features according to position information carried by the image features and position information of pixel points in a segmentation result, so as to determine that the extracted image features such as an eye image area are image features belonging to an eye category, the extracted image features of a skin image area are image features belonging to a skin category, the extracted image features of a hair image area are image features belonging to a hair category, and the like, and further respectively obtain dictionary features corresponding to the eye image area, dictionary features corresponding to the skin image area, and dictionary features corresponding to the hair image area for performing first fusion processing.
Further, after the dictionary features corresponding to the respective image areas are acquired, fusion processing (first fusion processing) may be performed for each image area and the reference dictionary features included in the dictionary features corresponding thereto, to obtain reconstructed features. The electronic device may determine a similarity between the image feature extracted by each image region and a plurality of reference dictionary features included in the dictionary feature by invoking the target processing model, the similarity may be calculated by a distance between two feature vectors, for example, may determine the similarity between two feature vectors by a discrete cosine distance, a euclidean distance, or the like. And then, carrying out first fusion processing by calling a target processing model based on the similarity between the image features and a plurality of reference dictionary features included in the dictionary features to obtain reconstructed features.
Specifically, the target image processing model may include a cross attention module (cross attention), where the cross attention module may be configured to determine a similarity between an image feature of each image area in the first image and each reference dictionary feature in the dictionary feature corresponding to each image area, so as to obtain a plurality of similarities. Furthermore, the cross attention module may be further configured to perform a first fusion process on a plurality of reference dictionary features included in the dictionary features based on a plurality of similarities, to obtain reconstructed features corresponding to each image region. Taking an image area N as an example, the image area N is any image area in N image areas included in the first image, after the electronic device extracts an image feature k of the image area N by calling a target image processing model, a cross attention module in the target image processing model can determine similarity between the image feature k of the image area N and each reference dictionary feature in a plurality of reference dictionary features corresponding to the image area N to obtain a plurality of similarity, and further perform first fusion processing on the plurality of reference dictionary features corresponding to the image area N based on the plurality of similarity to obtain a reconstruction feature corresponding to the image area N.
The mathematical expression of the cross attention module performing the first fusion processing on the plurality of reference dictionary features corresponding to the image area n according to the plurality of similarities may be as shown in formula 1:
equation 1
In the case of the formula 1 of the present invention,for the features obtained by the first fusion process, i.e. the fusion result obtained by the cross-attention module, it should be noted that, taking the reconstructed feature corresponding to one image region as an example,/for example>Is the corresponding reconstruction feature of image region n. Q is the feature of the image region extracted by the target image processing model, such as the image feature k of the image region n, and can be understood as the query value input to the cross-attention module, for example, to beWhen the processed image is a face image, Q may be an input query value of the degraded face, that is, an extracted image feature of the face image. K is an index of the high-definition dictionary, and can be understood as a certain reference dictionary feature corresponding to the image area n, or can be understood as a feature vector obtained by converting a certain reference dictionary feature corresponding to the image area n through a linear layer or a convolution layer. D in equation 1 is a constant, which may be referred to as a normalized constant, and the value of d may be, for example, 64.V is the feature value (i.e., feature vector) of the reference dictionary feature corresponding to the image area n. Softmax is a normalization function used for normalization processing. Taking the image area n as an example, the first fusion processing can be understood as taking a plurality of similarities as weights, and carrying out fusion processing on each reference dictionary feature in the dictionary features corresponding to the image area n based on the weights to obtain a fusion result, namely obtaining the reconstruction features corresponding to the image area n.
It will be appreciated that the dimensions of the feature vectors of the plurality of reference dictionary features corresponding to image region n for image feature k of image region n are the same, for example 512 dimensions. In the process of invoking the target image processing model by the electronic device to perform similarity calculation and the first fusion processing, the calculation may be performed based on a matrix converted from the feature vector. For example, by invoking the target image processing model to extract 12 feature vectors included in the eye image area, each feature vector is 512-dimensional, so that the feature vector can be converted into a 12×512 matrix, taking 1024 reference dictionary features included in the eye dictionary features as an example, the eye dictionary features can be converted into a 1024×512 matrix, and then the operation and processing of the matrix can be performed.
The process of the first fusion process involves performing a fusion process on all the reference dictionary features in the dictionary features corresponding to the image area n, so that the process of calculating the similarity between the image feature k of the image area n and the plurality of reference dictionary features corresponding to the image area n and performing fusion based on the plurality of similarities and the plurality of reference dictionary features may be referred to as soft matching. The soft matching is based on the fusion processing of all the reference dictionary features in the dictionary features, compared with the hard matching based on one reference dictionary feature with the largest similarity in the dictionary features as the reconstruction feature, the soft matching can enable the reconstruction feature obtained by the fusion processing to be more flexible, can fully utilize the resources in the dictionary features (namely, each reference dictionary feature), can obtain different reconstruction features by the fusion processing of each reference dictionary feature in the dictionary features, and the reconstruction features obtained by the hard matching are limited, so that the number of the obtained reconstruction features is more, the applicable range is wider, and the expansibility of the dictionary features is improved.
And step 403, the electronic device calls the target image processing model to perform second fusion processing on the image characteristics of each image area and the reconstruction characteristics corresponding to each image area, so as to obtain a target reconstruction image corresponding to the image to be processed.
In this embodiment of the present application, the reconstructed features corresponding to each image area are features obtained through the first fusion processing. The target reconstructed image corresponding to the image to be processed is an image obtained by performing image restoration processing on the image to be processed, and because the image restoration processing is performed on the image to be processed not on the original head portrait (the image to be processed) but on the first image, namely, a high-frequency feature image obtained after performing image pretreatment on the image to be processed, the target reconstructed image can be understood as a result of the image restoration processing output by the target image processing model, and the target reconstructed image comprises high-frequency restoration result information of the image restoration processing in the image to be processed. For example, the image to be processed is a face image, and the target reconstructed image includes high-definition face information of the face image, where the high-definition face information refers to face information with higher definition. The second fusion process is a fusion process between the image features of each image area and the reconstruction features corresponding to each image area, and the fusion process is performed for each image area, so that the target reconstruction information includes a reconstructed image corresponding to each image area, that is, the target reconstruction information includes high-frequency restoration result information corresponding to each image area.
In the process of invoking the target image processing model to perform the second fusion processing on the image features of each image area and the reconstruction features corresponding to each image area, because subjective feelings and evaluations of different users for each image area in the image to be processed are different, for example, the image to be processed is a face image, the user may have a lower degree of restoration for eyes, which indicates that the user wants shape textures of eyes and the like of the original image to be maintained, and has a higher degree of restoration for nose, which indicates that the user wants shape textures of nose and the like to be changed. When the electronic device invokes the target image processing model, the image features and the reconstruction features of the different image areas can be subjected to second fusion processing according to different fusion coefficients, so as to obtain reconstructed images corresponding to the different image areas, wherein the fusion coefficients can be used for indicating the reconstruction degree, namely the restoration degree, of the different image areas, and the fusion coefficients can be called as a hyper-fraction coefficient (radio).
Specifically, the electronic device may call the target image processing model to obtain the superdivision coefficient of each image area, where the superdivision coefficient of each image area may be called a fusion map (ratio map), and may also be called a high-low definition fusion map. The fusion map may be a fusion map corresponding to a class of images to be processed, for example, if the images to be processed are face images, the face images correspond to one fusion map, if the images to be processed are scenic images, the scenic images correspond to another fusion map, and so on. The target image processing model may include a feature fusion module, where the feature fusion module is configured to perform a second fusion process on image features of each image area and reconstructed features corresponding to each image area. Taking an image area N as an example for explanation, the image area N is any image area in the N image areas included in the first image, the image feature of the image area N is called by the target image processing model to be an image feature k, the feature fusion module can be used for acquiring a super-division coefficient corresponding to the image area N, the feature fusion module can firstly acquire a fusion graph, further acquire the super-division coefficient corresponding to the image area N based on the fusion graph, and perform second fusion processing on the image feature k and the reconstruction feature corresponding to the image area N based on the super-division coefficient corresponding to the image area N to obtain a reconstructed image corresponding to the image area N, and the reconstructed image corresponding to the image area N can represent high-frequency restoration result information corresponding to the image area N obtained by image restoration processing.
The mathematical expression of the feature fusion module for performing the second fusion process based on the image features of each image area and the reconstruction features corresponding to each image area may be as shown in formula 2:
equation 2
In equation 2, the image area n is taken as an example for explanation,and (3) reconstructing an image corresponding to the image region n, namely, obtaining a fusion result by the feature fusion module. />The image features k of the image region n, i.e. the image features of the image region n extracted by the target image processing model, can also be understood as degraded image features, for example +_ when the image to be processed is a face image>Can be understood as degraded facial features. />The reconstructed feature is obtained by performing a first fusion process based on each reference dictionary feature in the dictionary features, and can be obtained by the above formula 1. Wherein each reference dictionary feature is an image feature based on an image (image area) with higher definition, thus +.>And can also be understood as high definition reconstruction features. ω is a superdivision coefficient corresponding to the image region n, and this ω is used to indicate the degree of reconstruction (degree of image restoration) of the image region n. Alpha and beta areThe parameters of the convolution layer may specifically be parameters of fusion processing of the image feature k (degraded image feature) of the image area n and the reconstructed feature corresponding to the image area n, where the fusion processing may also be referred to as feature stitching (concat) processing, and may be understood as fusion processing completed by stitching (increasing the number of channels), and α and β are parameters obtained by the convolution layer after joint processing.
Alternatively, the target image processing model may include one or more of the cross-attention module and the feature fusion module, or the target image processing model may not include both modules, and the target image processing model may include functions for implementing the cross-attention module and the feature fusion module.
Further, after the electronic device obtains the target reconstructed image including the reconstructed images corresponding to the image areas, since the target reconstructed image is a high-frequency characteristic image, the target reconstructed image can be reversely processed based on the image preprocessing of the image to be processed, so as to obtain a final target restoration image, wherein the target restoration image is a final result of the image restoration processing of the image to be processed, and is an image with improved definition and image quality compared with the image to be processed. The image preprocessing performed on the image to be processed may include an alignment process and a high-frequency extraction process. Thus, the electronic device can perform inverse processing on the target reconstructed image based on the high-frequency extraction processing and the alignment processing, and obtain a target restored image.
In one possible implementation, the alignment image obtained by the above-described alignment process is processed due to the high-frequency extraction process. Therefore, the electronic device may extract the low-frequency characteristic image in the brightness component image, and combine the low-frequency characteristic image with the target reconstructed image to obtain the reconstructed brightness component image. Since the low-frequency information is not changed, the content of the image is not changed in the process of restoring the image to be processed, that is, the main body information of the image to be processed contained in the low-frequency information is not changed, so that the content of the image to be processed can not be changed in the process of restoring the image. The low-frequency characteristic image in the luminance component image may correspond to the high-frequency characteristic image, and may be in a frequency domain or a spatial domain, which is not limited in the present application. The low-frequency characteristic image and the target reconstructed image are combined in a spatial domain or in a frequency domain, so that a reconstructed brightness component image can be obtained.
The high-frequency feature map (first image) may be obtained by converting the aligned image into a frequency domain, so as to obtain a frequency domain feature map corresponding to the image to be processed, where the high-frequency pixel point set may be determined as the high-frequency feature map in the frequency domain, or the high-frequency pixel point set may be converted into a spatial domain, so as to obtain the high-frequency feature map. Thus, the electronic device may extract a low-frequency feature map of the luminance component image, where the low-frequency feature map corresponds to the high-frequency feature map, and may be a low-frequency pixel point set in the frequency domain, or may be an image in which the low-frequency pixel point set is converted from the frequency domain to the spatial domain. Further, the electronic device may combine the low-frequency characteristic image and the high-frequency characteristic image in the frequency domain, and transform the low-frequency characteristic image and the high-frequency characteristic image into the spatial domain by inverse transformation, to obtain a reconstructed luminance component image. The inverse transform may be, for example, an inverse wavelet transform (inverse wavelet transform, IWT), or may be other inverse transform methods, such as inverse discrete cosine transform, which is not limited in this application.
Before extracting the high-frequency feature map, the electronic device may extract a luminance component image of the image to be processed, that is, extract an image of a Y channel of the image in YUV format, and further extract the high-frequency feature map of the image of the Y channel. Thus, after combining the low-frequency feature map and the high-frequency feature map (target reconstructed image), a transformed reconstructed image may be obtained based on the obtained reconstructed luminance component image and the chrominance component image extracted from the alignment image, for example, combining the reconstructed luminance component image (i.e., the image of the Y channel) and the image of the UV channel in the alignment image. Further, if the image to be processed is an image in another format, the converted reconstructed image in the YUV format may be converted into an image in a corresponding format, for example, if the image to be processed is an image in the RGB format, the obtained converted reconstructed image in the YUV format may be converted into the converted reconstructed image in the RGB format.
Further, the alignment process in the image preprocessing may be performed in an inverse manner, and since the image preprocessing is performed on the image to be processed according to the position information of the key point and the position information of the preset key point, the transformation process may be performed on the image to be processed, and the transformation process may be performed on the basis of a transformation matrix, which may be referred to as a warp (warp) matrix, the electronic device may calculate an inverse of the warp matrix during the inverse alignment process, and perform the inverse alignment process on the image converted from the format according to the inverse of the warp matrix, to obtain the target restoration image.
Referring to fig. 7 together, fig. 7 is a timing diagram of an image processing method according to an embodiment of the present application, and as shown in fig. 7, it should be noted that the image to be processed is an image in RGB format for drawing and explaining. Firstly, the electronic equipment performs image preprocessing on an image to be processed, wherein the image preprocessing comprises alignment processing on the image to be processed to obtain an aligned image. After the alignment processing is performed on the image to be processed, the processing can be divided into two false releases: on the one hand, the high-frequency extraction processing is performed on the aligned image, which may specifically include extracting a high-frequency feature image of a luminance component of an image to be processed, and the electronic device may perform format conversion first, that is, convert the format of the aligned image into a YUV format, and further may extract an image of a Y channel in the image in the YUV format, so as to obtain a luminance component image. And converting the brightness component image into a frequency domain through conversion processing to extract a high-frequency characteristic image, so as to obtain the high-frequency characteristic image corresponding to the image to be processed. On the other hand, the alignment image is subjected to segmentation processing to obtain a segmented image, such as a resolved mapping image, where the resolved mapping image is used to indicate N image areas included in the alignment image, and the segmentation processing is used to determine a feature class of each image area in the alignment image, or determine a feature class corresponding to each pixel point in the alignment image. Thus, the image preprocessing of the image to be processed is completed through the two aspects, and a first image is obtained, wherein the first image comprises N image areas, namely N image segmentation areas, and each image area corresponds to one feature type respectively.
Further, the electronic device inputs the first image into the target image processing model, and the electronic device extracts image features of the first image by calling the (encoder of the) target image processing model, wherein the image features may include image features of respective image areas in the first image. Further, dictionary features corresponding to the respective image areas are determined based on the position information carried by the extracted image features and the divided images, and different dictionary features correspond to different image areas, respectively. And matching and fusing the extracted image features and dictionary features of each image region based on a cross attention module in the target image processing model to obtain reconstruction features corresponding to each image region. Furthermore, the target image processing model may perform fusion processing based on the superdivision coefficient (representing the superdivision degree of different image areas) of each image area, the image feature, and the reconstruction feature corresponding to each image area, so as to obtain a target reconstructed image corresponding to each image area.
Further, the electronic device may perform the inverse process on the target reconstructed image based on the image preprocessing, that is, the alignment process and the high-frequency extraction process in the image preprocessing. Specifically, the electronic device may acquire a low-frequency feature image in the luminance component image, and combine the low-frequency feature image with the target reconstructed image to obtain a frequency domain feature image. Furthermore, the electronic device may perform inverse transform processing on the frequency domain feature map, for example, inverse wavelet transform processing to obtain a reconstructed luminance component image (i.e., an image of a Y channel), and combine an image of a UV channel in the image to be processed with the reconstructed luminance component image to obtain an image in YUV format. Further, the electronic device may convert the image in YUV format to an image in RGB format. And finally, performing anti-alignment processing corresponding to the alignment processing on the image after format conversion based on the position information of the key points and the position information of the preset key points to obtain a target restoration image. Optionally, the anti-alignment process may further include performing fusion processing on the image obtained by the anti-affine transformation process and the image to be processed to obtain the target restoration image.
For an explanation taking an image to be processed as a face image as an example, please refer to fig. 8, fig. 8 is a timing diagram of image restoration for the face image according to an embodiment of the present application. As shown in fig. 8, the image to be processed is taken as an initial face image, and an image in RGB format is taken as an example for explanation and drawing. Firstly, the electronic device may align the initial face image to a standard face space to obtain an aligned image, and then segment the aligned image to obtain a face analysis map (serving map) of feature types for representing different image areas. On the other hand, the electronic device may perform format conversion processing on the aligned image, convert the RGB format into the YUV format, and further convert the brightness component image of the Y channel in the format-converted image, and obtain a high-frequency feature image corresponding to the initial face image based on wavelet transformation, where the high-frequency feature image is also referred to as a first image, and the first image may also be referred to as a low-definition face image. The first image comprises image areas corresponding to all parts in the face.
Further, the electronic device invokes an encoder (encoder) of the target image processing model to extract image features of the first image to obtain image features of the first image, wherein the image features of the first image comprise image features of each image region, and dictionary features corresponding to each image feature are determined according to position information carried by the image features and a face analysis mapping chart, and the dictionary features can be discrete high-definition dictionary features. Further, soft matching and fusion processing (first fusion processing) of the image features and each reference dictionary feature in the dictionary features is performed according to a cross-attention module in the target image processing model, so as to obtain reconstructed features corresponding to each image region, wherein a process of obtaining the reconstructed features may be referred to as a cross-attention mechanism. And a feature fusion module in the target image processing model fuses the extracted image features of each image region and the corresponding reconstruction features according to the superdivision coefficient representing the superdivision degree of different image regions (second fusion processing) to obtain the reconstruction images corresponding to each image region, and then combines the reconstructed images to obtain the target reconstruction image.
Furthermore, the electronic device may combine the target reconstructed image with the low-frequency characteristic image in the luminance component image, and perform inverse wavelet transform processing to obtain a reconstructed luminance component image, i.e., an image of the Y channel. And combining the image of the Y channel with the image of the UV component (chroma component image) in the aligned image to obtain a converted reconstructed image in YUV format, and converting the obtained converted reconstructed image in YUV format into a converted reconstructed image in RGB format. Finally, the electronic device can perform anti-warping transformation (warp) on the transformation reconstruction to the position of the pixel point in the initial face image, and perform fusion processing with the initial face image to obtain a target restoration image, wherein the target restoration image is the high-definition face image. Therefore, the high-frequency part of the Y channel is processed, the color of the original image is not changed, the main content of the original image is not changed, and meanwhile, the dictionary features are classified based on segmentation processing in image preprocessing, so that when a human face is rebuilt, the dictionary features of the corresponding feature categories are respectively matched based on different image areas in the human face image, and further, soft matching and fusion processing are performed based on the dictionary features, the model matching dictionary features are more accurate, the output is more flexible, and pseudo textures in the human face image due to matching errors are avoided. In addition, by carrying out fusion processing on different image areas based on the degrees of different image restoration processing respectively, the change degrees of the original textures and the contents of different parts can be determined by a user, the requirements of the image restoration processing of the user can be met, and the output image restoration processing result is more flexible.
The construction of dictionary features is described in detail below.
Referring to fig. 9, fig. 9 is a schematic flow chart of constructing dictionary features according to an embodiment of the present application. The construction of the dictionary feature may include the following steps 901-905.
Step 901, the electronic device performs image preprocessing on the initial training sample set to obtain a training sample set.
In the embodiment of the present application, each training sample included in the initial training sample set is an image for training the initial reconstruction model, and the each training sample may be the same as a class of an image to be processed, for example, the training sample included in the initial training sample set is a face image, and the image to be processed is the face image. The initial training sample set is an image with higher definition and higher image quality, such as a high-definition face image.
In an actual scenario, the image to be processed may be an image captured by an electronic device such as a mobile phone, and the training samples included in each initial training sample set may be images captured by a high-definition camera (such as a single-lens reflex camera). Each training sample in the training sample set may include a plurality of image areas, and the training sample set may include M training image areas. It should be noted that M training image areas may be referred to as training image areas corresponding to feature categories, and one feature category corresponds to one training image area, instead of the sum of image areas included in all training samples. For example, in the case where the training sample image included in the initial training sample set is a face image, the image areas of the respective parts respectively correspond to one feature class, for example, the feature class corresponding to the hair image area is a hair class, the feature class corresponding to the eye image area is an eye class, and so on.
It is to be understood that the value of M may be the same as or different from the value of the number N of image areas included in the first image, and the application is not limited thereto, for example, M may be greater than N. Wherein the feature class of the N image areas included in the first image is the same as the feature class of the N image areas included in the first image, for example, the feature class of the N image areas N1 is the same as the feature class of the M training image areas M1. It can be understood that when the image to be processed is the same as the training sample category in the initial training sample set, the feature category corresponding to the image area is the same. For example, when the images to be processed and the training sample image included in the initial training sample set are both face images, the feature categories of the hair image areas in the M image areas included in the training sample set and the hair image areas in the N image areas of the first image are both hair categories.
In one possible implementation, the electronic device may perform image preprocessing on an initial training sample set to obtain a training sample set, where the training sample set includes a plurality of training sample images. In order to facilitate the processing of the target image processing model based on the dictionary features, the electronic device performs the same image preprocessing on the initial training sample set as the subsequent image preprocessing on the image to be processed when constructing the dictionary features. The method specifically comprises the steps of carrying out alignment processing and high-frequency extraction processing on each training sample in the initial training sample set, and also comprises the step of carrying out segmentation processing on each training sample in the initial training sample set.
Specifically, the alignment processing may include detecting key points of each training sample by the electronic device, obtaining position information of the key points, and performing transformation processing on each training according to the position information of preset key points in the reference standard image corresponding to the training sample, so as to obtain an alignment image corresponding to each training sample in the initial training sample set.
The so-called high frequency extraction process may include the electronic device extracting high frequency feature images of the alignment images corresponding to the respective training sample images, which may be the high frequency feature images in the respective training sample images. The method can also be based on converting each aligned image into an image in YUV format, extracting a brightness component image of a Y channel, and performing high-frequency characteristic images extracted after converting the brightness component image corresponding to each training sample from a space domain to a frequency domain by wavelet transformation, so as to complete image preprocessing of each training sample in an initial training sample set and obtain a training sample image.
The segmentation process may include the electronic device determining different image regions in each training sample and the feature classes of each of the different image regions to obtain segmented images.
Step 902, the electronic device trains the initial reconstruction model based on the training sample set and the initial dictionary feature set, and obtains a first loss parameter.
In the embodiment of the present application, the initial dictionary feature set includes a plurality of initial dictionary features, and each initial dictionary feature may respectively correspond to each of the M training image areas. It will be appreciated that each initial dictionary feature may correspond to a feature class, and that by associating a dictionary feature with a class, each initial dictionary feature includes a plurality of initial reference dictionary features. The matching of the subsequent target image processing model can be facilitated, so that the matching is more accurate, and the false texture in the image obtained by the image restoration processing due to the fact that the target image processing model is in a matching error is avoided.
The plurality of initial dictionary features included in the initial dictionary feature set may be acquired by the electronic device, for example, may be sent by other electronic devices, or may be obtained by initializing the dictionary features by the electronic device. The initializing dictionary feature may be k samples randomly selected from the sample set as the initial reference dictionary feature of a certain initial dictionary feature (e.g., initial dictionary feature i). It should be noted that, since the initial dictionary feature set includes a plurality of initial dictionary features in the same manner, for convenience of description, explanation will be given by taking the construction of one initial dictionary feature as an example.
Specifically, the initial reconstruction model may be a model for performing restoration processing on an image, and specifically, the image restoration processing may be performed based on initial dictionary features according to an input training sample set. The initial reconstruction model can extract image features of training sample images in the training sample set, and perform image restoration processing based on the initial dictionary features and the extracted image features, wherein the image restoration processing aims at each training sample in the initial training sample set. It should be noted that, since the training sample set, that is, the high-frequency feature image, is input into the initial reconstruction model, the result of the image restoration processing may be obtained by performing inverse processing according to the image preprocessing based on the output result of the initial reconstruction model.
In one possible implementation manner, the electronic device may input the training sample set into the initial reconstruction model to perform training, so as to obtain a first loss parameter, where the first loss parameter may be obtained based on two parts of difference data, and one part may be determined based on difference data between an image feature extracted from the initial reconstruction model and an initial reference dictionary feature that is most similar to the extracted image feature in the initial dictionary feature, where the image feature extracted from the initial reconstruction model may be an image feature of an image area, and the initial dictionary feature is an initial dictionary feature corresponding to the image area. Another portion may be derived based on difference data between the resulting image restoration results and the respective training sample images in the initial training sample set. Since each training sample image in the training sample set does not carry any label image, this method may also be referred to as performing training on the initial reconstructed model by means of self-supervised learning.
Specifically, after the training sample set is input into the initial reconstruction model, the initial reconstruction model may extract image features of each training sample image in the training sample set. Wherein the extracted image features may comprise image features of one or more of the M image areas. For convenience of description, the dictionary features corresponding to the image area m1 and the constructed image area m1 are taken as examples for description and explanation, and the construction modes of the dictionary features corresponding to the image areas are the same. The initial reconstruction model may extract image features of the respective training sample image, which may include image features of the image region m 1. Further, the electronic device may determine a corresponding feature class according to the location information of the image feature and the location information of the segmented image, that is, determine a feature class corresponding to the image region m1, and acquire an initial dictionary feature corresponding to the feature class of the image region m 1. It can be understood that each initial reference dictionary feature in the initial dictionary features corresponding to the image area m1 belongs to the feature class corresponding to the image area m 1.
Further, the similarity between the image feature of the image area m1 and each initial reference dictionary feature in the initial dictionary features corresponding to the image area m1 may be determined, so as to determine the initial dictionary feature most similar to the image feature of the image area m1 in the initial dictionary features corresponding to the image area m 1. The similarity can be determined by adopting a nearest neighbor matching mode, the initial reference dictionary features are feature vectors, the extracted image features are also feature vectors, and therefore the similarity can be determined by calculating the distance between the two feature vectors. Since one of the training targets of the initial reconstruction model is to make the extracted image feature and the reference initial dictionary feature with the highest similarity as close as possible, the first partial loss parameter can be determined according to the difference data between the reference initial dictionary feature with the highest similarity and the extracted image feature.
Another training target of the initial reconstruction model is that the image obtained by the image restoration processing is as similar as possible to each training sample of the original image (initial training sample set), that is, each training sample is restored as far as possible through the initial reconstruction model. For example, the initial reconstruction model may perform fusion processing on each initial reference dictionary feature in the extracted image features of the image region m1 and the initial dictionary features corresponding to the image region m1 to obtain a training reconstruction image, where the training reconstruction image may include a reconstructed image of the image region m1, and further determine a second partial loss parameter according to difference data between the reconstructed image of the image region m1 in the training reconstruction image and an original image of the image region m1 in a corresponding training sample, where, taking one image region as an example, the second partial loss function is actually determined based on difference data between the reconstructed image and the training sample.
Further, the electronic device may determine the first loss parameter according to the first partial loss parameter and the second partial loss parameter, for example, the first loss parameter may be a sum of the first partial loss parameter and the second partial loss parameter, or the first partial loss parameter and the second partial loss parameter may be the first loss parameter.
Alternatively, the first loss parameter may be a value of a loss function, which may be, for example, an average absolute value error (mean absolute error, MAE), also referred to as L1 loss, or a contrast loss function (GAN loss), which is not limited in this application.
And 903, the electronic device adjusts the model parameters and the initial dictionary feature set of the initial reconstruction model according to the first loss parameters to obtain an adjusted dictionary feature set and a reference reconstruction model.
In this embodiment of the present application, the electronic device may adjust the model parameters and the initial dictionary feature set of the initial reconstruction model according to the first loss parameter, which may be understood as using the initial dictionary feature set as a part of the model parameters of the initial reconstruction model to adjust the model parameters, and simultaneously adjust the initial reference dictionary features in the initial dictionary feature set. And after the adjustment is completed, obtaining an adjustment dictionary feature set and a reference reconstruction model. The electronic device may adjust each initial dictionary feature in the initial dictionary feature set, for example, may adjust an initial reference dictionary feature in each initial dictionary feature, that is, adjust feature vectors in each initial dictionary feature, so as to obtain an adjustment dictionary feature set, where the adjustment dictionary feature set includes a plurality of adjustment dictionary features, and each adjustment dictionary feature includes a plurality of reference adjustment dictionary features.
The model parameters of the initial reconstructed model can be adjusted by a gradient descent method (gradient descent). When the model parameters are updated by using a gradient descent method, the gradient of the loss function is calculated, the model parameters are iteratively updated according to the gradient, so that the initial reconstruction model is gradually converged to improve the accuracy of regression processing of the model, wherein the accuracy can be the similarity of the training reconstruction image and each training sample in the initial training sample set.
After the first loss parameter is obtained, whether the first loss parameter meets the training ending condition needs to be judged, and when the first loss function does not meet the training ending condition, the operation of adjusting the model parameter and the initial dictionary feature set of the initial reconstruction model according to the first loss parameter is executed, namely, the iterative training is continuously carried out on the initial reconstruction model until the obtained loss parameter meets the training ending condition, so that the constructed dictionary feature is obtained.
Step 904, the electronic device trains the reference reconstruction model based on the training sample set to obtain a second loss parameter.
In the embodiment of the application, under the condition that the first loss parameter does not meet the training ending condition, training is continuously performed on the reference reconstruction model after the adjustment parameter based on the training sample set. Similarly, each training sample image in the training sample set is still input into the reference reconstruction model, and the reference reconstruction model is trained based on the training sample set to obtain a second loss parameter. The second loss parameter may also be composed of two parts of loss parameters, and one part may be an image feature of each training sample image in the training sample set extracted based on the reference reconstruction model, where the image feature may be an image feature of a certain image area in the training sample image, and is determined according to difference data between the extracted image feature and a reference adjustment dictionary feature with highest similarity in adjustment dictionary features corresponding to the image area. The other part can be obtained based on difference data between a reconstructed image output by the reference reconstruction model and a training sample corresponding to the initial training sample set, and specifically can be determined based on difference data between the obtained reference reconstruction image and the corresponding training sample by performing image restoration processing based on image features extracted by the reference reconstruction model and adjustment dictionary features.
The reference reconstruction model extracts image features of the image region m1, and then determines, by a nearest neighbor algorithm, reference adjustment dictionary features with highest similarity among adjustment dictionary features corresponding to the image region m1, where the adjustment dictionary features are adjustment dictionary features belonging to feature categories corresponding to the image region m1 in the adjustment dictionary feature set. Further, the electronic device may determine the third partial loss parameter based on difference data between the image feature of the image region m1 and the reference adjustment dictionary feature having the highest similarity with the image region m 1. And determining a fourth partial loss parameter according to difference data between the reference reconstructed image output by the reference image processing model, namely, the image restoration processing result and the corresponding training sample. Further, a second loss parameter is determined based on the third portion loss parameter and the fourth portion loss parameter. It should be noted that, the result of the image restoration process may be obtained by performing an inverse process according to the image preprocessing based on the output result of the reference reconstruction model.
Alternatively, the second loss parameter is the same loss function as the first loss function, and may be, for example, L1 loss, GAN loss, or the like.
In step 905, the electronic device determines that the dictionary feature corresponding to the image area n1 is the dictionary feature corresponding to the training image area m1 when the second loss parameter satisfies the training end condition.
In this embodiment of the present application, the training end condition may be, for example, that the value of the loss function in the loss parameter (e.g., the second loss parameter) is within a preset threshold range, for example, the value of the loss function is the minimum value, or the training end condition may be that, among the values of the loss function obtained by continuous N times of iterative training, the number of times that the difference between the values of the loss functions obtained by two adjacent times is smaller than the preset difference threshold is greater than or equal to the preset number of times threshold. The application is not limited thereto, and may be specifically determined according to the use scenario. The electronic device may determine, in a case where the loss parameter (e.g., the second loss parameter) satisfies the training end condition, an adjustment dictionary feature set at the time of satisfying the training end condition as a dictionary feature set for the image restoration process, the dictionary feature set including dictionary features corresponding to respective feature categories. When the target image processing model is called to carry out image restoration processing on the first image, the obtained dictionary feature set to which the dictionary features corresponding to the image areas belong is the adjustment dictionary feature set.
For example, when the above-mentioned call target image processing model performs image restoration processing on the first image including N image areas, the image area N1 is an image area of the N image areas, and the feature class corresponding to the image area N1 is the same as the feature class of the training image area m1, then the adjustment dictionary feature set is constructed, and the adjustment dictionary feature corresponding to the training image area m1 may be used as the dictionary feature corresponding to the image area N1. It should be noted that, each adjustment dictionary feature in the adjustment dictionary feature set constructed in this way is a dictionary feature corresponding to each image area, and the feature vector included in each dictionary feature is a reference dictionary feature. The number of the reference dictionary features is not limited in this application, and may be specifically determined according to an actual business scenario, for example, the number of the reference dictionary features in each dictionary feature in this application is 1024, each feature vector is a vector in D dimension, and the value of D may be 512, for example. The image restoration processing of any image of the image category corresponding to the initial training sample set can be theoretically completed through the constructed dictionary feature (adjustment dictionary feature set). For example, if the initial training sample set is a face image, each adjustment dictionary feature in the adjustment dictionary feature set may be a dictionary feature corresponding to each part in the face, and based on the adjustment dictionary feature set, the reconstruction of the face may be theoretically completed.
Alternatively, the electronic device may use a decoder (decoder) of the reference reconstruction model satisfying the training end condition as the decoder of the above-described target image processing model.
The initial reconstruction model may be any model having an image feature extraction function and an image restoration function, for example, a convolutional neural network. The present application explains a structure of a convolutional neural network as a U-shaped network (U-net) structure, where the U-net is a deep neural network structure composed of an encoder (encoder) and a decoder (decoder), the encoder includes a downsampling process and a convolutional process, and is used for extracting image features, for example, for extracting image features of each training sample image in the training sample set, and the encoder is used for forming a left half of the U-shaped network and is used as a feature extraction network. The decoder of the U-net is used for fusion processing based on the extracted image features and dictionary features, and the decoder comprises feature stitching (concat) processing and convolution processing, so as to obtain the result of the fusion processing, such as the result of image restoration processing, for example, the training reconstructed image and the reference reconstructed image.
Referring to fig. 10 together, fig. 10 is a timing diagram for constructing dictionary features according to an embodiment of the present application, and as shown in fig. 10, it should be noted that, drawing and explaining are performed by taking an example in which each image included in the initial training sample set is an image in RGB format. The electronic device may perform image preprocessing on an initial training sample set with higher definition and higher image quality, where the image preprocessing may include performing alignment processing on each image in the initial training sample set, converting the format into YUV format, extracting a luminance component to obtain a luminance component image, converting the luminance component image into a frequency domain, and extracting a high-frequency feature image to obtain a high-frequency feature image corresponding to each training sample in the initial training sample set, that is, obtain a training sample set, where each high-frequency feature image is a training sample image in the training sample set. The aligned images corresponding to the respective training samples may also be subjected to a segmentation process for determining the feature class of the respective image areas included in the respective aligned images. It will be appreciated that the image preprocessing performed on each training sample in the initial training sample set is the same as the image preprocessing performed on the image to be processed described above.
Further, inputting the obtained training sample images into an initial reconstruction network, extracting image features of different image areas of the training sample images in the training sample set through the initial reconstruction network, and carrying out fusion processing based on the initial dictionary features corresponding to the different image areas in the initial dictionary feature set to obtain a training reconstruction image. The method comprises the steps of determining a part of loss parameters based on image features of different image areas of each training sample image in a training sample set extracted by an initial reconstruction network and matched initial reference dictionary features, wherein the initial reference dictionary features are image features with highest similarity to the image features of the image areas in the initial dictionary features, and determining the other part of loss parameters based on each training sample image and training reconstruction image in the training sample set. Further, model parameters of the initial reconstruction model and the initial dictionary features in the initial dictionary feature set may be adjusted based on the two-part loss. In the continuous iterative training process, the model parameters and the dictionary features are adjusted to obtain an adjustment dictionary feature set at the end of training as a dictionary feature set for image restoration processing, wherein the adjustment dictionary feature set comprises adjustment dictionary features corresponding to different image areas, namely dictionary features corresponding to different feature categories.
For an explanation of the initial training sample set as an example of the face image, please refer to fig. 11, fig. 11 is a timing diagram for constructing the face dictionary feature according to the embodiment of the present application. As shown in fig. 11, the drawing and explanation are performed taking the images included in the initial training sample set as high-definition face images and RGB format images as examples. Firstly, the electronic equipment performs alignment processing on the high-definition face images, namely, each high-definition face image is aligned to a face standard space, and an alignment image corresponding to each high-definition face image is obtained. And dividing the alignment image to obtain face analysis mapping maps (serving maps) corresponding to the alignment images, wherein the face analysis mapping maps are used for representing the feature types of different image areas. And further, performing format conversion processing on each aligned image, converting the aligned images into a YUV format, extracting a brightness component image of a Y channel, performing wavelet transformation on the brightness component image to obtain a high-frequency feature image corresponding to the high-definition face image, and obtaining training sample images corresponding to each high-definition face image, wherein each training sample image can form a training sample set.
Further, the electronic equipment invokes an initial reconstruction model to extract image features of each training sample image, obtain image features of each training image area in each training sample image, and determine initial dictionary features corresponding to each image area according to position information carried by the image features and a face analysis mapping chart, wherein the initial dictionary features can be initial dictionary features of a certain feature class in an initial dictionary feature set, and are discrete high-definition dictionary features. Furthermore, the initial reconstruction model may match the extracted image features, that is, determine, from the initial dictionary features, a reference initial dictionary feature having the highest similarity to the image features of the extracted image region, calculate a loss parameter with the image features of the extracted image region, calculate another loss parameter according to a result (training reconstruction image) output by the initial reconstruction model and each high-definition face image in the initial training sample set, and adjust the model parameters of the initial reconstruction model and the initial dictionary features in the initial dictionary feature set based on the two loss parameters. And obtaining dictionary features corresponding to the face parts of different people through iterative training.
The built dictionary features are discrete vectors, and each reference dictionary feature in the dictionary features is an image feature for representing an image (image area) with better image quality, so that the built dictionary features can be called as a high-definition discrete dictionary, each reference dictionary feature in the dictionary features can be called as a discrete high-definition dictionary feature, and the above-mentioned building process can be called as high-definition discrete dictionary recovery. Therefore, a dictionary feature is respectively constructed based on different image areas (feature types), so that the matching difficulty is lower, the matching can be conveniently carried out during the image restoration processing, and the image restoration effect is improved.
Referring to fig. 12, fig. 12 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 1200 shown in fig. 12 may include: a processing unit 1201, a calling unit 1202, a training unit 1203, an adjusting unit 1204, a determining unit 1205, and an extracting unit 1206. Wherein, the detailed description of each unit is as follows:
a processing unit 1201, configured to perform image preprocessing on an image to be processed to obtain a first image, where the first image includes N image areas, and N is an integer greater than 1;
A calling unit 1202, configured to call a target image processing model to extract image features of each image area in the N image areas, and call the target image processing model to perform a first fusion process on the image features of each image area and dictionary features corresponding to each image area, so as to obtain reconstruction features corresponding to each image area;
the calling unit 1202 is further configured to call the target image processing model to perform a second fusion process on the image features of the image areas and the reconstruction features corresponding to the image areas, so as to obtain a target reconstructed image corresponding to the image to be processed, where the target reconstructed image corresponding to the image to be processed includes the reconstructed image corresponding to the image areas.
In one possible implementation, the target image processing model includes a cross-attention module; the dictionary features corresponding to the image area n comprise a plurality of reference dictionary features;
the cross attention module is used for determining the similarity between the image feature k of the image region n and a plurality of reference dictionary features corresponding to the image region n to obtain a plurality of similarities, and performing the first fusion processing on the plurality of reference dictionary features corresponding to the image region n according to the plurality of similarities to obtain a reconstruction feature corresponding to the image region n;
The image region N is any one of the N image regions, and the image feature k is an extracted image feature of the image region N.
In one possible implementation, the target image processing model includes a feature fusion module;
the feature fusion module is used for acquiring a superdivision coefficient corresponding to an image area n, and carrying out second fusion processing on an image feature k of the image area n and a reconstruction feature corresponding to the image area n based on the superdivision coefficient to obtain a reconstruction image corresponding to the image area n, wherein the superdivision coefficient corresponding to the image area n is used for indicating the reconstruction degree of the image area n;
the image region N is any one of the N image regions, and the image feature k is an extracted image feature of the image region N.
In a possible implementation manner, the processing unit 1201 is further configured to perform image preprocessing on the initial training sample set to obtain a training sample set; the training sample set comprises M training image areas; the training image areas M1 in the M training image areas are the same as the characteristic categories of the image areas N1 in the N image areas, and M is an integer greater than 1;
The training unit 1203 is configured to train the initial reconstruction model based on the training sample set and the initial dictionary feature set, to obtain a first loss parameter;
an adjusting unit 1204, configured to adjust model parameters and an initial dictionary feature set of the initial reconstruction model according to the first loss parameter, to obtain an adjusted dictionary feature set and a reference reconstruction model, where the adjusted dictionary feature set includes dictionary features corresponding to each of the M training image areas;
the training unit 1203 is configured to train the reference reconstruction model based on the training sample set to obtain a second loss parameter;
a determining unit 1205, configured to determine, when the second loss parameter meets a training end condition, that the dictionary feature corresponding to the image area n1 is the dictionary feature corresponding to the training image area m 1.
In one possible implementation manner, the processing unit 1201 is configured to perform image preprocessing on an image to be processed to obtain a first image, specifically configured to:
carrying out alignment treatment on the image to be treated according to a reference standard image corresponding to the image to be treated to obtain an alignment image;
extracting the brightness component of the aligned image to obtain a brightness component image;
And extracting a high-frequency characteristic image in the brightness component image to obtain the first image.
In one possible implementation manner, the reference standard image corresponding to the image to be processed includes position information of P preset key points, where P is an integer greater than or equal to 1; the processing unit 1201 is configured to perform alignment processing on the image to be processed according to a reference standard image corresponding to the image to be processed, so as to obtain an aligned image, and specifically is configured to: identifying the position information of P key points in the image to be processed;
according to the position information of the P preset key points and the position information of the P key points, carrying out alignment processing on the image to be processed to obtain the alignment image; the position information of the key point P in the alignment image is the same as the position information of a preset key point corresponding to the key point P, and the key point P is any one of the P key points.
In one possible implementation, the image processing apparatus 1200 further includes:
an extracting unit 1206, configured to extract a low-frequency feature image in the luminance component image, and combine a target reconstructed image corresponding to the image to be processed with the low-frequency feature image to obtain a reconstructed luminance component image;
The extracting unit 1206 is further configured to extract a chrominance component in the aligned image, to obtain a chrominance component image;
the processing unit 1201 is further configured to combine the reconstructed luminance component image and the chrominance component image to obtain a transformed reconstructed image;
the processing unit 1201 is further configured to perform anti-alignment processing on the transformed reconstructed image to obtain a target restoration image corresponding to the image to be processed.
It should be noted that, the functions of each functional unit in the image processing apparatus 1200 described in the embodiments of the present application can be referred to the above-described method embodiment steps 401 to 403 in fig. 4, and the related descriptions of the method embodiment steps 901 to 905 in fig. 9 are not repeated here.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product described above includes one or more computer instructions. When the computer program instructions described above are loaded and executed on a computer, the processes or functions described above in accordance with the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from one website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk), etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server or a network device, or may be a processor in an electronic device in particular) to perform all or part of the steps of the above-mentioned method of the embodiments of the present application. Wherein the aforementioned storage medium may comprise: various media capable of storing program codes, such as a U disk, a removable hard disk, a magnetic disk, a compact disk, a Read-Only Memory (abbreviated as ROM), or a random access Memory (Random Access Memory, abbreviated as RAM), are provided.
The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. An image processing method, comprising:
image preprocessing is carried out on an image to be processed to obtain a first image, wherein the first image comprises N image areas, and N is an integer greater than 1;
invoking a target image processing model to extract image features of each image region in the N image regions, and invoking the target image processing model to perform first fusion processing on the image features of each image region and dictionary features corresponding to each image region to obtain reconstruction features corresponding to each image region;
and calling the target image processing model to perform second fusion processing on the image characteristics of each image area and the reconstruction characteristics corresponding to each image area to obtain a target reconstruction image corresponding to the image to be processed, wherein the target reconstruction image corresponding to the image to be processed comprises the reconstruction images corresponding to each image area.
2. The method of claim 1, wherein the target image processing model comprises a cross-attention module; the dictionary features corresponding to the image area n comprise a plurality of reference dictionary features;
the cross attention module is used for determining the similarity between the image feature k of the image region n and a plurality of reference dictionary features corresponding to the image region n to obtain a plurality of similarities, and performing the first fusion processing on the plurality of reference dictionary features corresponding to the image region n according to the plurality of similarities to obtain a reconstruction feature corresponding to the image region n;
the image region N is any one of the N image regions, and the image feature k is an extracted image feature of the image region N.
3. The method of claim 1, wherein the target image processing model comprises a feature fusion module;
the feature fusion module is used for acquiring a superdivision coefficient corresponding to an image area n, and carrying out second fusion processing on an image feature k of the image area n and a reconstruction feature corresponding to the image area n based on the superdivision coefficient to obtain a reconstruction image corresponding to the image area n, wherein the superdivision coefficient corresponding to the image area n is used for indicating the reconstruction degree of the image area n;
The image region N is any one of the N image regions, and the image feature k is an extracted image feature of the image region N.
4. The method according to claim 1, wherein the method further comprises:
performing image preprocessing on the initial training sample set to obtain a training sample set; the training sample set comprises M training image areas; the training image areas M1 in the M training image areas are the same as the characteristic categories of the image areas N1 in the N image areas, and M is an integer greater than 1;
training an initial reconstruction model based on the training sample set and the initial dictionary feature set to obtain a first loss parameter;
according to the first loss parameters, adjusting model parameters of the initial reconstruction model and an initial dictionary feature set to obtain an adjustment dictionary feature set and a reference reconstruction model, wherein the adjustment dictionary feature set comprises dictionary features corresponding to each training image area in the M training image areas;
training the reference reconstruction model based on the training sample set to obtain a second loss parameter;
and under the condition that the second loss parameter meets the training ending condition, determining the dictionary feature corresponding to the image area n1 as the dictionary feature corresponding to the training image area m 1.
5. The method according to claim 2, wherein the performing image preprocessing on the image to be processed to obtain the first image includes:
carrying out alignment treatment on the image to be treated according to a reference standard image corresponding to the image to be treated to obtain an alignment image;
extracting the brightness component of the aligned image to obtain a brightness component image;
and extracting a high-frequency characteristic image in the brightness component image to obtain the first image.
6. The method according to claim 5, wherein the reference standard image corresponding to the image to be processed includes position information of P preset key points, and P is an integer greater than or equal to 1;
the aligning the image to be processed according to the reference standard image corresponding to the image to be processed to obtain an aligned image, which comprises the following steps:
identifying the position information of P key points in the image to be processed;
according to the position information of the P preset key points and the position information of the P key points, carrying out alignment processing on the image to be processed to obtain the alignment image; the position information of the key point P in the alignment image is the same as the position information of a preset key point corresponding to the key point P, and the key point P is any one of the P key points.
7. The method of claim 5, wherein the method further comprises:
extracting a low-frequency characteristic image in the brightness component image, and combining a target reconstructed image corresponding to the image to be processed with the low-frequency characteristic image to obtain a reconstructed brightness component image;
extracting a chrominance component in the aligned image to obtain a chrominance component image;
combining the reconstructed luminance component image with the chrominance component image to obtain a transformed reconstructed image;
and performing anti-alignment processing on the transformed reconstructed image to obtain a target restoration image corresponding to the image to be processed.
8. An electronic device comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.
9. A computer program product comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-7.
CN202310703061.3A 2023-06-14 2023-06-14 Image processing method, device, equipment and computer readable storage medium Active CN116452466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310703061.3A CN116452466B (en) 2023-06-14 2023-06-14 Image processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310703061.3A CN116452466B (en) 2023-06-14 2023-06-14 Image processing method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN116452466A true CN116452466A (en) 2023-07-18
CN116452466B CN116452466B (en) 2023-10-20

Family

ID=87130537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310703061.3A Active CN116452466B (en) 2023-06-14 2023-06-14 Image processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116452466B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109109A (en) * 2017-12-22 2018-06-01 浙江大华技术股份有限公司 A kind of super-resolution image reconstruction method, device, medium and computing device
US20180225807A1 (en) * 2016-12-28 2018-08-09 Shenzhen China Star Optoelectronics Technology Co., Ltd. Single-frame super-resolution reconstruction method and device based on sparse domain reconstruction
CN109359575A (en) * 2018-09-30 2019-02-19 腾讯科技(深圳)有限公司 Method for detecting human face, method for processing business, device, terminal and medium
CN111768354A (en) * 2020-08-05 2020-10-13 哈尔滨工业大学 Face image restoration system based on multi-scale face part feature dictionary
CN112288664A (en) * 2020-09-25 2021-01-29 北京迈格威科技有限公司 High dynamic range image fusion method and device and electronic equipment
CN114119378A (en) * 2020-08-31 2022-03-01 华为技术有限公司 Image fusion method, and training method and device of image fusion model
CN114418919A (en) * 2022-03-25 2022-04-29 北京大甜绵白糖科技有限公司 Image fusion method and device, electronic equipment and storage medium
KR102402677B1 (en) * 2021-06-15 2022-05-26 (주)지큐리티 Method and apparatus for image convergence
CN114697543A (en) * 2020-12-31 2022-07-01 华为技术有限公司 Image reconstruction method, related device and system
CN114764745A (en) * 2020-12-31 2022-07-19 华为技术有限公司 Image reconstruction method and related device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180225807A1 (en) * 2016-12-28 2018-08-09 Shenzhen China Star Optoelectronics Technology Co., Ltd. Single-frame super-resolution reconstruction method and device based on sparse domain reconstruction
CN108109109A (en) * 2017-12-22 2018-06-01 浙江大华技术股份有限公司 A kind of super-resolution image reconstruction method, device, medium and computing device
CN109359575A (en) * 2018-09-30 2019-02-19 腾讯科技(深圳)有限公司 Method for detecting human face, method for processing business, device, terminal and medium
CN111768354A (en) * 2020-08-05 2020-10-13 哈尔滨工业大学 Face image restoration system based on multi-scale face part feature dictionary
CN114119378A (en) * 2020-08-31 2022-03-01 华为技术有限公司 Image fusion method, and training method and device of image fusion model
CN112288664A (en) * 2020-09-25 2021-01-29 北京迈格威科技有限公司 High dynamic range image fusion method and device and electronic equipment
CN114697543A (en) * 2020-12-31 2022-07-01 华为技术有限公司 Image reconstruction method, related device and system
CN114764745A (en) * 2020-12-31 2022-07-19 华为技术有限公司 Image reconstruction method and related device
KR102402677B1 (en) * 2021-06-15 2022-05-26 (주)지큐리티 Method and apparatus for image convergence
CN114418919A (en) * 2022-03-25 2022-04-29 北京大甜绵白糖科技有限公司 Image fusion method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STEFAN HORMANN ET AL.: "A Coarse-to-Fine Dual Attention Network for Blind Face Completion", 《2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021)》, pages 1 - 8 *

Also Published As

Publication number Publication date
CN116452466B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
KR102663519B1 (en) Cross-domain image transformation techniques
CN109543714B (en) Data feature acquisition method and device, electronic equipment and storage medium
CA2934514C (en) System and method for identifying faces in unconstrained media
WO2021078001A1 (en) Image enhancement method and apparatus
CN113706414B (en) Training method of video optimization model and electronic equipment
CN116048244B (en) Gaze point estimation method and related equipment
CN113452969B (en) Image processing method and device
CN115661912B (en) Image processing method, model training method, electronic device, and readable storage medium
CN113538227B (en) Image processing method based on semantic segmentation and related equipment
CN114926351B (en) Image processing method, electronic device, and computer storage medium
CN116916151B (en) Shooting method, electronic device and storage medium
CN116630354B (en) Video matting method, electronic device, storage medium and program product
CN116452466B (en) Image processing method, device, equipment and computer readable storage medium
CN116311389B (en) Fingerprint identification method and device
CN115170455B (en) Image processing method and related device
WO2023045724A1 (en) Image processing method, electronic device, storage medium, and program product
WO2022261856A1 (en) Image processing method and apparatus, and storage medium
CN114399622A (en) Image processing method and related device
CN113763517B (en) Facial expression editing method and electronic equipment
CN113518172A (en) Image processing method and device
CN115623317B (en) Focusing method, device and storage medium
CN116740777B (en) Training method of face quality detection model and related equipment thereof
CN117689545B (en) Image processing method, electronic device, and computer-readable storage medium
CN116630355B (en) Video segmentation method, electronic device, storage medium and program product
CN117593611B (en) Model training method, image reconstruction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant