CN116233626A

CN116233626A - Image processing method and device and electronic equipment

Info

Publication number: CN116233626A
Application number: CN202310495121.7A
Authority: CN
Inventors: 邵扬; 王宇
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-06-06
Anticipated expiration: 2043-05-05
Also published as: CN116233626B

Abstract

The embodiment of the application discloses an image processing method, an image processing device and electronic equipment, which are applicable to the technical field of image processing, and the method comprises the following steps: acquiring a plurality of images to be processed; respectively carrying out clear feature extraction operation on each image to be processed for a plurality of times, wherein after the clear features of each image to be processed are extracted in each clear feature extraction operation, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is respectively input into the next clear feature extraction operation on each image to be processed until the last clear feature extraction operation on each image to be processed is completed; and generating an output image based on the target definition characteristics of each image to be processed. The embodiment of the application can improve the definition and the signal-to-noise ratio of the motion blurred image.

Description

Image processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an image processing device, and an electronic device.

Background

When the electronic device photographs, a situation that the electronic device and a photographed object have relative motion may occur, for example, a situation that the electronic device shakes or the photographed object contains a moving object may occur. When there is a relative motion between the electronic device and the object, there is a motion blur in the final imaging effect, that is, a region where there is a motion blur in the image captured by the electronic device (hereinafter, the image where there is a motion blur is referred to as a motion blurred image).

Therefore, there is a need for an image processing method that can improve a motion blurred image in practical applications.

Disclosure of Invention

In view of this, the embodiments of the present application provide an image processing method, an image processing device, and an electronic device, which can improve the signal-to-noise ratio and the sharpness of a motion blurred image.

A first aspect of an embodiment of the present application provides an image processing method, including:

a plurality of images to be processed are acquired. Wherein the plurality of images to be processed comprise at least one motion blurred image. And respectively carrying out clear feature extraction operation on each image to be processed for a plurality of times, wherein after the clear features of each image to be processed are extracted in each clear feature extraction operation, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is respectively input into the next clear feature extraction operation on each image to be processed until the last clear feature extraction operation on each image to be processed is completed. And finally, generating an output image based on the target clear features of each image to be processed, wherein the target clear features are the clear features extracted from the image to be processed for the last time.

As an embodiment of the present application, after outputting the first fused image to the next clear feature extraction operation performed on each image to be processed, the method further includes:

and when the next clear characteristic extraction operation is respectively carried out on each image to be processed, the clear characteristic of each image to be processed is extracted according to the clear characteristic and the first fusion image extracted from each image to be processed at the time.

In the embodiment of the application, the clear features of the extracted images to be processed are fused after each clear feature extraction, and the obtained fused image is used as a reference for the next clear feature extraction. Because the clear features of the images to be processed are fused in the fused image, the extraction of the clear features can be guided in the next clear feature extraction, and the mutual reference of the clear features among different images to be processed is realized. The accuracy of extracting the clear features each time can be effectively improved, so that the definition and the signal-to-noise ratio of the final output image are improved.

In a first possible implementation manner of the first aspect, before performing the clear feature extraction operation on each image to be processed successively multiple times, the method further includes:

First definition information of each image to be processed is acquired.

And respectively carrying out clear characteristic extraction operation for each image to be processed for a plurality of times, wherein the clear characteristic extraction operation comprises the following steps:

and based on the first definition information, respectively carrying out continuous and repeated clear characteristic extraction operation on each image to be processed.

According to the embodiment of the application, definition information of the image to be processed is introduced into the image to be processed. By taking the image to be processed and the definition information thereof as input data and guiding the process of extracting the definition characteristics by utilizing the definition information, the accuracy of branches for processing the image to be processed on the definition characteristic extraction can be improved, so that the definition and the signal-to-noise ratio of the finally generated output image are improved.

As an embodiment of the present application, the first sharpness information includes sharpness scores of respective pixels in the corresponding image to be processed.

In the embodiment of the application, the definition information of each pixel point of the image to be processed is recorded in a fractional form.

In a second possible implementation manner of the first aspect, the operation of obtaining the first sharpness information of the target processing image includes:

inputting the target processing image into a pre-trained definition estimation model for processing to obtain corresponding first definition information. The definition estimation model is used for calculating definition information of the images, and is a neural network model obtained by training based on a plurality of motion blur images and second definition information corresponding to each image in the plurality of motion blur images.

In the embodiment of the application, the definition information of each image to be processed is estimated by adopting a definition estimation model. At this time, when the definition information of the image is determined by processing Shan Zhangdai, more reference images are not required to be acquired, so that the acquisition operation of the definition information is more convenient and more practical application scenes can be compatible. The definition estimation model is trained based on a plurality of motion blurred images and corresponding definition information, so that the obtained definition estimation model can estimate the definition information more accurately and reliably. Therefore, the embodiment of the application can realize the rapid, accurate, reliable and convenient acquisition of the definition information.

In a third possible implementation manner of the first aspect, the electronic device stores a pre-trained image processing model, where the image processing model includes a fusion network and the same number of image processing networks as the images to be processed. Each image processing network comprises a plurality of downsampling layers with the same number, and the downsampling layers in the single image processing network are connected in sequence. The downsampling layer is used for carrying out clear feature extraction operation on the image so as to extract corresponding clear features.

Correspondingly, based on the first definition information, respectively performing a plurality of continuous clear feature extraction operations on each image to be processed, fusing the clear features extracted from all the images to be processed into a first fused image after the clear features of each image to be processed are extracted in each clear feature extraction operation, respectively inputting the first fused image into the next clear feature extraction operation on each image to be processed until the last clear feature extraction operation on each image to be processed is completed, and comprising the following steps:

each image processing network uses a plurality of downsampling layers to perform continuous clear feature extraction operations on Shan Zhangdai processed images based on the first definition information. And after the clear features are extracted by the downsampling layer each time, the obtained clear features are input into the fusion network. Wherein the images to be processed by each image processing network are different. And after receiving the clear features input by the downsampling layers of the image processing networks each time, the fusion network fuses the received clear features into a first fusion image, and inputs the first fusion image to the next downsampling layer of the image processing networks until all downsampling layers finish clear feature extraction operation.

In the embodiment of the application, the image processing model capable of removing motion blur is trained in advance, the down sampling layer of each image processing network in the model is utilized to extract clear features, and meanwhile, the fusion network in the image processing model is utilized to realize the fusion and input of the clear features to the next down sampling layer, so that the image processing operation required by the embodiment of the application is realized. In practical applications, the electronic device may implement deblurring of the motion blurred image by storing the image processing model locally and invoking the model when needed. Therefore, the embodiment of the application has extremely high adaptability and can be suitable for various different electronic devices and different deblurring scene requirements.

In a fourth possible implementation manner of the first aspect, based on the first sharpness information, performing a sharpness feature extraction operation on each image to be processed successively multiple times, respectively, includes:

different images to be processed and first definition information corresponding to the input images to be processed are respectively input to each image processing network in the image processing model.

The 1 st downsampling layer in each image processing network performs 1 st downsampling on the image to be processed based on the input first definition information to obtain corresponding definition features, and the obtained definition features are input to the fusion network.

The fusion network fuses all the received clear features obtained by 1 st downsampling to obtain a corresponding 1 st first fusion image, and inputs the 1 st first fusion image to the 2 nd downsampling layer of each image processing network.

The ith downsampling layer in each image processing network performs the ith downsampling based on the data output by the ith-1 downsampling layer and the ith-1 first fusion image input by the fusion network to obtain corresponding clear features, and the obtained clear features are input to the fusion network. Where i is a positive integer greater than 1 and less than H1, H1 is the total number of downsampling layers in a single image processing network, and H1 is a positive integer greater than 2.

The fusion network fuses all the received clear features obtained by the ith downsampling to obtain corresponding ith first fusion images, and the ith first fusion images are input to the (i+1) th downsampling layers of the image processing networks.

The H1 down-sampling layer in each image processing network carries out H1 down-sampling based on the data output by the H1-1 down-sampling layer and the H1-1 first fusion image input by the fusion network, so as to obtain corresponding clear characteristics.

In the embodiment of the application, each image to be processed is processed relatively independently through the image processing network. When the downsampling layer of the image processing network outputs clear features, the fusion network is used for fusing the images and inputting the images to the next downsampling layer of each image processing network, so that the mutual reference of each image processing network when the clear features are extracted is realized. Therefore, the embodiment of the application realizes the mutual reference of clear features among different images to be processed. The accuracy of extracting the clear features each time can be effectively improved, so that the definition and the signal-to-noise ratio of the final output image are improved.

In a fifth possible implementation manner of the first aspect, generating an output image based on the target sharpness characteristics of each image to be processed includes:

and respectively carrying out continuous and repeated up-sampling operation on each target clear feature, wherein after the up-sampling operation on each target clear feature is finished and a corresponding intermediate image is generated, synthesizing the intermediate images of all the target clear features obtained at the time into a second fusion image, and respectively inputting the second fusion image into the next up-sampling operation on each target clear feature until the last up-sampling operation on each target clear feature is finished.

And fusing the intermediate images generated by the last upsampling operation on the clear features of each target to obtain an output image.

In the embodiment of the application, the second fusion image obtained by fusing the last upsampling result is used as a guide when upsampling is performed each time, so that the embodiment of the application effectively improves the upsampling effect to output the intermediate image with higher definition and higher signal-to-noise ratio, and accordingly the definition and the signal-to-noise ratio of the finally generated output image are improved.

In a sixth possible implementation manner of the first aspect, the operation of acquiring a plurality of motion blur images used in training the sharpness estimation model includes:

a plurality of continuous images are selected from an image set, wherein the image set comprises a plurality of continuous shooting images. And then respectively generating intermediate state images for each group of adjacent images in the selected multiple images, and generating motion blur images corresponding to each group of adjacent images one by one based on the generated intermediate state images to obtain multiple motion blur images.

The embodiment of the application has at least the following beneficial effects:

1. the embodiment of the application can generate the required motion blurred image based on a small amount of images, so that the cost and difficulty of motion blurred image generation are low, and the practical application value is high.

2. The embodiment of the application generates the motion blurred image based on the continuous multi-frame image as a basis, and can effectively control the complexity of the motion scene involved in image generation, so as to prevent the situation that the generated motion blurred image is too low in effectiveness due to the fact that the motion scene is too complex. Therefore, the embodiment of the application can improve the effectiveness of the motion blur image.

In a seventh possible implementation manner of the first aspect, the operation of generating an intermediate state image for the target image group for any one of the adjacent groups of images and obtaining a motion blur image corresponding to the target image group based on the generated intermediate state image includes:

and processing the target image group by using a frame inserting method to obtain a plurality of corresponding intermediate state images.

And randomly selecting a plurality of intermediate state images from the generated plurality of intermediate state images, and synthesizing the selected intermediate state images to obtain a motion blurred image corresponding to the target image group.

In the embodiment of the application, the intermediate state image range used when the motion blurred image is generated is determined by firstly inserting frames to generate the intermediate state image and then randomly assigning the blur proportion, so that the authenticity and randomness of the generated motion blurred image can be effectively improved, the motion blurred image is more in line with the actual application situation, and the effectiveness of the motion blurred image is improved.

In an eighth possible implementation manner of the first aspect, the obtaining operation of the second sharpness information of the target blurred image, where the target blurred image is any one of a plurality of motion blurred images, includes:

and determining an image used for generating the target blurred image from the plurality of images, and calculating pixel displacement information of the target blurred image based on a first frame image and a last frame image in the determined images.

Second sharpness information of the target blurred image is determined based on the pixel displacement information.

In the embodiment of the application, the displacement condition of each pixel point in the target blurred image is determined by calculating the pixel displacement information. And calculating corresponding definition information based on the pixel displacement information, thereby realizing accurate quantization calculation of the definition information.

As an embodiment of the application, an optical flow algorithm may be used to process the first frame image and the last frame image, so as to obtain pixel displacement information of the target blurred image.

In a ninth possible implementation manner of the first aspect, determining second sharpness information of the target blurred image based on the pixel displacement information includes:

and calculating the motion amplitude of each pixel point in the target blurred image based on the pixel displacement information.

And carrying out normalization processing on the motion amplitude, and carrying out inverse operation on the motion amplitude after the normalization processing to obtain second definition information of the target blurred image.

In the embodiment of the application, the motion amplitude of each pixel point is quantized by using the pixel displacement information. The greater the motion amplitude, the greater the degree of blurring of its corresponding pixel point. Therefore, normalization and inverse operation are carried out, thereby realizing accurate quantification of the definition of each pixel point and obtaining accurate and reliable definition information of the target blurred image.

As an alternative embodiment of the present application, the following formula may be used to calculate the motion amplitude of the pixel point ab in the target blurred image, and normalize the motion amplitude. And finally, performing inverse operation on the motion amplitude after normalization processing, so as to obtain definition data of the pixel point ab. The pixel point ab is any pixel point in the target blurred image.

Where M1 (a, b) is the motion amplitude of the pixel point ab, and x (a, b) and y (a, b) are the displacement degrees of the pixel point ab in the x direction and the y direction (for example, the displacement distance of several pixel points). M2 (a, b) is a value obtained by normalizing M1 (a, b), and Mmax is the maximum displacement amplitude. M3 (a, b) is sharpness data of the pixel point ab. The Mmax may be a preset constant term, or may be the largest motion amplitude among the motion amplitudes of all the pixels of the motion blur image.

The embodiment of the application provides a specific definition information calculation method, which is simple and unipolar in calculation and easy to implement, and can realize quick and accurate calculation of definition information, so that the practicability is extremely strong.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

and the image acquisition module is used for acquiring a plurality of images to be processed. Wherein the plurality of images to be processed comprise at least one motion blurred image.

The feature extraction module is used for respectively carrying out clear feature extraction operation on each image to be processed for a plurality of times, wherein after the clear features of each image to be processed are extracted in the clear feature extraction operation, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is respectively input into the next clear feature extraction operation on each image to be processed until the last clear feature extraction operation on each image to be processed is completed.

The image generation module is used for generating an output image based on the target clear characteristics of each image to be processed, wherein the target clear characteristics are the clear characteristics extracted from the image to be processed for the last time.

In an embodiment of the present application, the image processing apparatus may further implement a method as any one of the above first aspect.

In a third aspect, embodiments of the present application provide an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a method according to any one of the first aspects described above when the computer program is executed by the processor.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as in any of the first aspects described above.

In a fifth aspect, embodiments of the present application provide a chip system, the chip system including a processor, the processor being coupled to a memory, the processor executing a computer program stored in the memory to implement a method as described in any one of the first aspects. The chip system can be a single chip or a chip module composed of a plurality of chips.

In a sixth aspect, embodiments of the present application provide a computer program product for, when run on an electronic device, causing the electronic device to perform the method of any one of the first aspects.

It will be appreciated that the advantages of the second to sixth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

Fig. 1A is a schematic flow chart of a motion blur image generating method according to an embodiment of the present application;

FIG. 1B is a schematic flow chart of generating motion blurred images for a single set of neighboring images according to an embodiment of the present application;

fig. 1C is a schematic diagram of an inserting frame scene provided in an embodiment of the present application;

fig. 1D is a schematic diagram of another frame inserting scenario provided in an embodiment of the present application;

fig. 2A is a schematic flow chart of a sharpness calculation method according to an embodiment of the present application;

fig. 2B is a schematic view of a scene trained by a sharpness estimation model according to an embodiment of the present application;

FIG. 3A is a schematic diagram of a model architecture of an initial processing model according to an embodiment of the present application;

FIG. 3B is a schematic diagram of a model architecture of another initial process model according to an embodiment of the present application;

fig. 3C is a schematic architecture diagram of a converged network according to an embodiment of the present application;

fig. 4A is a flowchart of an implementation of an image processing method according to an embodiment of the present application;

fig. 4B is a schematic flow chart of a clear feature extraction method according to an embodiment of the present application;

FIG. 4C is a flowchart illustrating a method for generating an output image according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 7A is a schematic structural diagram of a mobile phone according to an embodiment of the present application;

fig. 7B is a software structural block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In practical applications, in order to improve a photographing effect or to achieve a specific photographing effect, a corresponding exposure time is often set for a camera of an electronic device. For example, in order to avoid power frequency interference of light at photographing, the exposure time may be adjusted to 10ms. For another example, in order to enhance the photographing effect in the case of insufficient ambient light, the exposure time or the like may be prolonged.

During exposure of a photograph of an electronic device, there may be a case where there is relative movement of the electronic device and the photographic subject. For example, the hand of the user shakes when the user takes a picture of the handheld electronic device, or the shaking condition of the electronic device caused by the movement of the user. For example, the subject may include a moving object (also referred to as a moving object, such as a running person or a small animal). When the electronic equipment and the shooting object have relative motion, the image shot by the electronic equipment has motion blur of a global area or a local area, so that a motion blurred image is obtained.

In order to improve the definition of the motion blurred image, the embodiment of the application firstly acquires a plurality of images to be processed with motion blur, and acquires definition information of each image to be processed. On the basis, clear characteristic extraction processing is respectively carried out on each image to be processed for a plurality of times. After the clear features are extracted each time, the clear features extracted from all the images to be processed can be fused to obtain corresponding fused images. On the basis, when clear characteristic extraction is carried out next time, adding the fused image into the operation of clear characteristic extraction of each image to be processed. And finally synthesizing an output image based on the finally extracted clear characteristics of each image to be processed.

Because the definition information of the image to be processed is introduced when the clear feature extraction is carried out, the analysis and the extraction of the clear feature can be better realized. And meanwhile, in each clear feature extraction, fusing based on the clear features extracted last time to obtain a fused image, and then carrying out clear feature extraction of the current Shan Zhangdai processed image by using the fused image containing the clear features of all the images to be processed as a guide. Therefore, the embodiment of the application can refer to the clear characteristic conditions of other motion blurred images at the same time when clear characteristic extraction is carried out, so that the effect of each clear characteristic extraction is improved. And finally, synthesizing an output image by the extracted clear features of each image to be processed, so that the content contained in the output image is as clear as possible, and the definition processing of the motion blur area is realized. And finally, an output image with high signal-to-noise ratio and high definition can be obtained.

The following describes an application scenario of the embodiment of the present application:

the embodiment of the application can be applied to any scene needing to improve or remove the motion blur in the image, namely, when a user or electronic equipment has any requirement of improving or removing the motion blur in the image, the embodiment of the application can be realized by calling the mode of the embodiment of the application. For example, when the electronic device takes a picture, a plurality of pictures (for example, pictures are continuously taken in a continuous shooting mode of a camera) can be continuously taken, and then the electronic device or a user selects a plurality of pictures as motion blur images and performs deblurring processing by using the embodiment of the application, so that a clear picture with the motion blur removed is obtained. For another example, the electronic device or the user may select a plurality of pictures from a local library of the electronic device as motion blur images and process the motion blur images by using the embodiment of the present application, so as to obtain a clear picture after removing the motion blur. For another example, the electronic device may also receive a plurality of pictures sent by other devices and use the pictures as motion blur images, and process the motion blur images by using the embodiment of the present application, so as to obtain a clear picture after motion blur removal.

The image processing method provided by the embodiment of the application can be applied to electronic equipment such as mobile phones, tablet computers, cameras, video recorders, wearable equipment, personal computers, servers and the like, and the electronic equipment is the execution main body of the image processing method provided by the embodiment of the application, and the embodiment of the application does not limit the specific type of the electronic equipment. The electronic device may have a photographing function or may not have a photographing function. For the electronic equipment with the photographing function, the embodiment of the application can be utilized to realize the processing of the motion blur image shot or stored or received by the electronic equipment, and the image definition is improved. And for electronic equipment without a photographing function, the processing of the self-stored or received motion blurred image can be realized, and the image definition is improved.

In order to illustrate the technical solutions described in the present application, the following description is made by specific examples.

In the embodiment of the application, the processing of the motion blur image can be divided into two stages of model training and image processing. The model training stage is mainly used for training an image processing model (also called as a multi-frame image deblurring model or an image deblurring model) capable of carrying out motion blur image processing and removing motion blur, and the image processing stage is mainly used for processing the motion blur image by utilizing the image processing model and removing the motion blur to obtain a clear image. The details are as follows:

1. Model training stage:

because a certain sample data, namely a certain number of motion blurred images and corresponding definition information are required in the training process of the image processing model. Therefore, the embodiment of the application can perform the acquisition operation of the sample data before the training of the image processing model. Based on this, in the embodiment of the present application, the model training phase can also be divided into three parts: a motion blurred image generating section, a sharpness calculating section and a model training section. The details are as follows:

A. a motion blur image generation section:

in order to obtain a motion blurred image required during model training (i.e. one of sample data required by model training), in practical application, the motion blurred image may be selected to be shot or generated in a certain way, or the two may be combined simultaneously to obtain the required motion blurred image. In view of the fact that the amount of work required to capture a motion blurred image is large, in the embodiment of the present application, it is possible to select a motion blurred image that is required to be generated by frame interpolation. Referring to fig. 1A, a flowchart of a motion blur image generating method according to an embodiment of the present application is described in detail below:

S101, selecting continuous m1 frame images from an image set, wherein the image set comprises a plurality of frames of images which are continuously shot.

In the embodiment of the present application, the image set in S101 includes a plurality of images that are continuous in time. For example, the image set may be a set of a plurality of continuous frames of images in a certain video or video stream, or may be a set of a plurality of continuous pictures, which may be specifically set by a technician. Based on the existing image set, the embodiment of the application can select continuous m1 frame images (the selected images can also be called original images hereinafter) from the existing image set for being used as images for reference in the subsequent motion blur image generation. In the embodiment of the application, the m1 frame image selection mode is not excessively limited. For example, in some alternative embodiments, it may be that consecutive m1 frame images are randomly selected, and S101 may be replaced by: successive m1 frame images are randomly selected from the image set. The image at the specific position may be selected as a start position (e.g., the first image, the middle image, or the m1 st image in the image set), and then m1 frame images may be sequentially selected from the start position.

Meanwhile, the size of the specific image number m1 selected in the embodiment of the present application corresponds to the complexity of the photographed motion scene and the corresponding calculation workload. The larger m1 is, the longer the time dimension spanned by the image set is, so that the more complex the corresponding covered motion scene is, the higher the complexity of the motion form of the shooting object is, and the larger the calculation workload is. Conversely, the smaller m1 is, the simpler the condition of shooting object movement is, the lower the complexity of the movement scene is, and the smaller the calculation workload is. Therefore, the skilled person can set the specific value of m1 according to the requirements at the time of actual application. For example, m1 may take any integer greater than 1 in some embodiments, such as any value from 5 to 10.

As an alternative embodiment of the present application, consider that when m1 is large, the motion scene covered by the m1 frame image is more complex. At this time, the shooting requirements of the user may be different from those of the user in practical application, and the calculation workload is large. For example, for a camera that can shoot 60 frames per minute, when m1=30, it means that the m1 frame image covers a motion scene of 30 seconds. In real life, the change of the object motion within 30 seconds can be very complex. Therefore, even if the photographs taken within 30 seconds are synthesized, the obtained new photographs are less reasonable with the actual life, and it is generally difficult to satisfy the actual demands of users. In the embodiment of the present application, the value of m1 is not excessively large, and may be any integer less than or equal to 10 and greater than 1, for example.

S102, respectively generating intermediate state images for m1-1 groups of adjacent images in m1 frame images, and generating a piece of motion blurred image corresponding to each group of adjacent images respectively based on the generated intermediate state images to obtain m1-1 pieces of motion blurred images.

And combining adjacent images in the m1 frame images two by two to obtain m1-1 group of adjacent images. In the embodiment of the present application, the processing logic of each set of adjacent images is the same, and thus only one set of adjacent images is exemplified. Referring to fig. 1B, a flowchart of generating a motion blurred image for a single group of adjacent images according to an embodiment of the present application is described in detail below:

s1021, for adjacent images in the m1 frame image: image i _j-1 And image i _j Generating intermediate state images to obtain a corresponding intermediate state image set { l } _j，1 ，l _j，2 ，...，l _j，m2-1 ，l _j，m2 }. Wherein j is [2, m1 ]]M2 is any positive integer.

Let the j-th original image in m 1-frame image be image i _j Parallel-arranged image i _j-1 And image i _j The composed adjacent image group is the j-th adjacent image group. j may be any value from 2 to m1, i.e. the j-th set of neighboring images may be any set of neighboring images. For example, assuming m1=3, the j-th group neighboring image may be (1 st frame image, 2 nd frame image) or (2 nd frame image ) 3 frame image). At this time, in the embodiment of the present application, the frame interpolation is performed between the j-th group of adjacent images, so as to obtain a plurality of motion states between the images i _j-1 And image i _j Intermediate state images in between. The embodiment of the application does not excessively limit the specific frame inserting method and the single-group adjacent image frame inserting number m2, and can be set by a technician. For example, in some alternative embodiments, some plug-in models of deep learning networks may be employed to enable generation of intermediate images, such as, for example, FLIM models, flow-based, kernel-based, or phase-based, etc., plug-in models may be employed to enable intermediate image generation. For the number of insertion frames m2, then any integer greater than 1 may be taken, for example m2 may be any value from 20 to 40, such as 33, etc. in some embodiments.

Reference may be made to fig. 1C, which is a schematic illustration of an insertion frame scene provided in an embodiment of the present application. In the embodiment of the application, the image i is subjected to a frame insertion model based on a deep learning network _j-1 And image i _j Processes and generates a plurality of intermediate images interposed therebetween. Reference may be made to fig. 1D, which is a schematic diagram of another frame-inserted scene provided in an embodiment of the present application, where p1 and p2 are two convolution kernels, and i1 and i2 are two frames of images to be processed. In the embodiment of the application, the interpolation method based on convolution is adopted by the interpolation model, a pair of convolution kernels p1 (x, y) and p2 (x, y) of 2D are set, and two convolution kernels are used to convolve, pool and the like two frames to be processed so as to calculate the color of an image pixel to be output. The convolution calculation formula (1) is as follows:

Wherein l (x, y) is the generated intermediate image, i1 (x, y) and i2 (x, y) are the images to be processed for two frames, i.e. in the embodiment of the present application, the image i _j-1 And image i _j . The "×" is the convolution operation symbol.

By processing the interpolation model, an image i can be obtained _j-1 And image i _j Corresponding intermediate state image set { l } _j，1 ，l _j，2 ，...，l _j，m2-1 ，l _j，m2 }。

S1022, selecting a plurality of frames of intermediate state images from the intermediate state image set, and synthesizing the selected intermediate state images to perform image synthesis to obtain adjacent images: image i _j-1 And image i _j Corresponding motion blurred images.

After obtaining the intermediate state images, the embodiment of the application selects a plurality of frames of intermediate state images from the intermediate state images, and synthesizes the selected intermediate state images into one frame of image. Since these images are intermediate-state images generated based on two consecutive frames of images, the motion state of the subject in the image content thereof is intermediate between the two consecutive frames of images. Therefore, when a moving object exists in the shooting object, the motion state of the moving object is in an overlapped state in the image synthesized based on the intermediate state images, so that a certain motion blur condition exists in the synthesized image at the moment, and a required motion blur image is obtained. The method for selecting the intermediate state images and the number g are not limited excessively, and can be set by a technician. For example, in some alternative embodiments, in order to simulate the randomness of the motion blur image that can be obtained in the real application as much as possible, a plurality of intermediate state images may be randomly selected and synthesized, so as to fit the actual application situation as much as possible, thereby improving the effectiveness of continuous model training. At this time S1022 may be replaced with: and randomly selecting a plurality of frames of intermediate state images from the intermediate state image set, and synthesizing the selected intermediate state images to perform image synthesis to obtain a motion blur image corresponding to the j-th group of adjacent images.

As an alternative embodiment of the present application, a fuzzy proportion k and a corresponding selected number formula (2) may be set:

and after m2 frames of intermediate state images are generated, assigning values to the k randomly, and calculating the number g of the intermediate state images to be selected based on the assigned k, wherein when the g calculated by using the formula is a non-integer, the integer g can be determined by selecting an upward or downward rounding mode.

As an alternative embodiment of the application, on the basis of determining the number of intermediate state images to be selected, the embodiment of the application can determine the image i first _j-1 And image i _j The image with the latest shooting time is selected from the intermediate state images adjacent to the latest image, and the required number of intermediate state images are sequentially selected. The motion state of the moving object in the image which is shot recently in the practical application is most fit with the practical motion state of the moving object. Therefore, when the intermediate state image is selected, the embodiment of the application preferentially selects the intermediate state image close to the latest shot image, so that the fitting degree of the selected intermediate state image and the motion state of the real motion object is improved, and the credibility of the generated motion blurred image is further improved. For illustration of an example, assume image i _j-1 And image i _j The latest shooting time in (a) is the image i _j The number of intermediate images to be selected is 2. At this time, the embodiment of the application can select the intermediate state image set { l } _j，1 ，l _j，2 ，...，l _j，m2-1 ，l _j，m2 Image l in } _j，m2-1 Sum image l _j，m2 To synthesize a corresponding motion blurred image.

As an alternative embodiment of the present application, the intermediate state image may be processed by averaging or the like, so as to synthesize a corresponding motion blurred image. For example, in some embodiments, the set of intermediate images selected is { l } _{j，m2-（g-1）} ，...，l _j，m2-1 ，l _j，m2 }. After the g-frame intermediate state images are selected, the intermediate state images can be synthesized by adopting the following formula (3) so as to obtain the corresponding motion blur image l _j ：

As one embodiment of the present application, by repeating the operations of S1021 to S1022 described above,corresponding motion blurred images can be generated for m1-1 groups of adjacent images respectively, so that the required m1-1 frame motion blurred images are obtained. Let the image set of m1 frame image be { i } ₁ ，i ₂ ，...，i _m1-1 ，i _m1 The image set of the motion blurred image generated at this time may be represented as { l } ₂ ，...，l _m1-1 ，l _m1 }. As another alternative embodiment of the present application. Alternatively, a number of motion blurred images less than m1-1 may be generated. For example, only a part of the adjacent images in the m1 frame image is processed, thereby obtaining less motion blur images. For illustration, it is assumed that m1=6, i.e. 6 frames of images can be processed. At this time, adjacent frames are combined two by two, and 5 groups of adjacent images can be obtained. In some embodiments, a corresponding motion blurred image may be generated for each set of neighboring images, at which time a 5 frame motion blurred image may be obtained. In other embodiments, only some of the neighboring images may be processed, for example, only 3 of the neighboring images may be processed, thereby obtaining a 3-frame motion blurred image.

As an alternative embodiment of the present application, when the m1 frame image is an RGB image, whether to convert the motion blur image into a RAW image may be selected according to actual requirements. For example, when a RAW image is supported by an electronic device in actual application, conversion of the generated motion blur image into the RAW image may be selected. The embodiment of the application does not limit the specific conversion method too much, and can be set by a technician. For example, each pixel of a conventional camera sensor contains only a single color filter (red, green, or blue), which are arranged in a Bayer pattern. For the synthesized motion blurred image in each frame RGB format, we ignore other two colors according to the Bayer filter mode, and finally can obtain the RAW domain image.

3. The intermediate state image range used when the motion blurred image is generated is determined by randomly assigning the blur proportion, so that the randomness of the generated motion blurred image can be effectively improved, the motion blurred image is more in line with the actual application situation, and the effectiveness of the motion blurred image is improved.

B. A definition calculating section:

after the multi-frame motion blurred image is generated, the embodiment of the application starts to calculate the definition information of each frame of motion blurred image, so that sample data required by model training is perfected. Since the manner of processing each frame of motion blurred image is the same, a single frame of motion blurred image will be described as an example. Referring to fig. 2A, a flowchart of a sharpness calculation method according to an embodiment of the present application is shown. The details are as follows:

s201, determining a first frame image and a last frame image in images used for generating the motion blur image from m1 frame images, and calculating pixel displacement information of the motion blur image based on the first frame image and the last frame image.

In the definition calculating section, the motion blur image for which definition information calculation is performed is also referred to as a target blur image. Since the motion blur image is generated based on the intermediate state image of the two frames of original images, the displacement condition of each pixel point in the motion blur image can be determined based on the original image used in the generation. Based on this, the embodiment of the present application first determines the original image used when generating the motion blur image, and determines the first frame image (i.e., the first frame image) and the last frame image (i.e., the last frame image) from the original image, thereby determining the initial motion state and the final motion state of the moving object. And calculating displacement degree information (namely pixel displacement information) of each pixel point in the motion blur image in the x direction and the y direction based on the first frame image and the last frame image. The method for calculating the pixel displacement information is not limited excessively, and can be set by a technician according to actual requirements. For example, in some embodiments, optical flow (optical flow) algorithms or the like may be employed to calculate pixel displacement information.

For illustration by way of example, assume that based on neighboring images: image i _j-1 And image i _j Generates a motion blurred image l _j At this time, the embodiment of the application may use an optical flow algorithm to image i _j-1 And image i _j Processing is performed to obtain a motion blurred image l _j Is provided.

And S202, calculating definition information of the motion blur image based on the pixel displacement information.

In the embodiments of the present application, the sharpness information may also be referred to as sharpness priori information or blur priori information. The definition information contains definition data of each pixel point of the motion blurred image. The sharpness data may be quantized in the form of a score or a level, for example, the sharpness score of each pixel.

After the pixel displacement information is obtained, the motion amplitude of each pixel point can be calculated according to the pixel displacement information. On the basis, in order to facilitate subsequent quantization calculation, the motion amplitude can be normalized. The greater the motion amplitude, the greater the blurring degree of the corresponding pixel point. Therefore, the motion amplitude after normalization processing can be subjected to inverse operation, so that definition data of each pixel point are obtained, and definition information consisting of the definition data of each pixel point is obtained. The definition information may be stored in a data matrix or the like, or a definition map having the same size as the motion blur image may be selectively generated, and corresponding definition data may be recorded in each pixel point in the definition map.

As an alternative embodiment of the present application, the following formula (4) may be used to calculate the motion amplitude of the pixel point ab in the motion blurred image, and the formula (5) may be used to normalize the motion amplitude. And finally, performing inverse operation on the motion amplitude after normalization processing by using a formula (6), so as to obtain definition data of the pixel point ab. The pixel point ab may be any pixel point in the motion blurred image.

By the method of the embodiment shown in fig. 2A, the embodiment of the application can implement calculation of definition information of a motion blurred image, so as to obtain sample data required by training an image processing model. In practical application, when the electronic device in the image processing stage improves or removes motion blur in an image, the sharpness information of the real motion blur image can be calculated by using front and rear frame images of the real motion blur image as the first frame image and the last frame image in the embodiment of the present application. However, in practical application, the electronic device may not necessarily acquire all the front and rear frame images of the real motion blurred image. Therefore, in order to better adapt to the requirements of practical applications, the embodiment of the present application may further train a model (hereinafter referred to as a sharpness estimation model) that can calculate the sharpness information of the image based on the motion blurred image generated by the motion blurred image generating section and the sharpness information of the motion blurred images calculated by the embodiment shown in fig. 2A.

Referring to fig. 2B, a schematic view of a scene of training a sharpness estimation model according to an embodiment of the present application is shown. The details are as follows:

in the embodiment of the present application, an initial neural network model (hereinafter referred to as an initial estimation model) is first constructed, and a loss function during training is set. The specific model parameters and specific loss functions of the initial estimation model are not limited herein and may be set by the skilled person. For example, in some alternative embodiments, L1, L2, smooth L1, etc. paradigms of loss functions may be employed. Based on this, the embodiment of the application may use the generated motion blur image as sample data, and use the sharpness information of the motion blur image calculated by the embodiment shown in fig. 2A as a corresponding label (label) value, to perform network iterative optimization training on the initial estimation model. And (3) until the loss function converges, training is completed, and a definition estimation model capable of being used for calculating definition information of the image is obtained. In the embodiment of the present application, the sharpness information used for training the sharpness estimation model may also be referred to as second sharpness information.

In the embodiment of the application, on one hand, the definition information of the motion blur image can be accurately calculated, so that the definition information of the generated motion blur image or the real motion blur image can be accurately calculated. On the other hand, a definition estimation model aiming at the definition information of the image can be trained, so that the requirement that the definition information of a single frame image can be calculated without referring to other images in practical application is met.

C. Model training part:

based on the generated multi-frame motion blurred images and the definition information of each frame of motion blurred images, the embodiment of the application can start training an image processing model. Accordingly, in the embodiment of the present application, an initial processing model is preset. Wherein the initial processing model is a deep learning network model.

Referring to fig. 3A, a schematic diagram of a model architecture of an initial processing model according to an embodiment of the present application is provided. In the present example, the initial process model comprises: a plurality of image processing network branches, each branch having a plurality of downsampling layers and a plurality of upsampling layers included in the image processing network. The system also comprises a fusion network, a splicing layer and an output layer. Wherein the downsampling layers in a single image processing network are connected in sequence, and the upsampling layers in the single image processing network are connected in sequence. The downsampling layer is used for downsampling the image to extract clear features of the image, and the upsampling layer is used for upsampling the clear features to restore image resolution. In one aspect, the fusion network is configured to fuse the clear features extracted by the sampling layers under the different image processing networks, so as to generate a fused image that includes the clear features of the multiple frames of different images. And the method is also used for inputting the generated fusion image containing the clear features of the multiple frames of different images to the next sampling layer of each image processing network to be used as a reference when the next sampling layer extracts the clear features so as to improve the accuracy of the clear feature extraction. On the other hand, the fusion network can be used for fusing the images generated by the sampling layers on different image processing networks, so that corresponding high-resolution and clear fusion images are obtained. And the method can also be used for inputting the generated high-resolution and clear fusion image into the next up-sampling layer of each image processing network to be used as a reference in the processing of the next up-sampling layer so as to improve the definition and the signal-to-noise ratio of the restored image. The splicing layer is used for splicing and fusing the multi-frame images to obtain clear images. The output layer is used for outputting the clear image (also called as a reconstructed image) after the fusion of the splicing layer. The clear features refer to some features with higher definition in the image, such as a region without motion blur in a motion blurred image. The definition characteristic dividing standard is determined according to the model network parameters after the actual training is completed, and the definition characteristic dividing standard is not limited herein.

The number of fusion networks included in the initial processing model, and the number of specific downsampling layers and upsampling layers included in each image processing network are not limited herein too much, and can be set by a technician according to actual requirements. For example, the number of the converged networks may be one or more, and when the number of the converged networks is 1, all the converged networks in fig. 3A are the same converged network. Let the number of image processing networks be m3, then m3 be any positive integer greater than 1, for example, m3 may take any value from 2 to 10. Meanwhile, the embodiment of the application does not excessively limit the network type of the image processing network and can be set by a technician. For example, in some alternative embodiments, the image processing network may be a network of Encoder-Decoder (Encoder) type structure, e.g., deconvolution, segNet, U-Net, UNet++, V-Net, etc., may be employed as the image processing network. Wherein the encoder is responsible for feature extraction (i.e., for sharp feature extraction by the downsampling layer) and the decoder is responsible for projecting the learned low-resolution features into pixel space to enhance the resolution restored image (i.e., restore image resolution by the upsampling layer). As an alternative embodiment of the present application, the residual U-Net network may be employed as an image processing network, considering that the residual U-Net network may extract features of different scales of images and reconstruct different information features.

As another alternative embodiment of the present application, the number of downsampling layers included in each image processing network is the same, and the number of upsampling layers included is the same, e.g., in some alternative embodiments, the structure of the respective image processing networks is the same. At this time, the downsampling layer and the upsampling layer in each image processing network are in one-to-one correspondence. On this basis, as a further alternative embodiment of the present application, the number of fusion networks is the same as the total number of downsampling and upsampling layers contained in a single image processing network. Or as yet another alternative embodiment of the present application, the number of fusion networks is the same as the number of downsampling layers contained in a single image processing network.

As an alternative embodiment of the present application, each downsampling layer includes at least one convolution layer, and may further include any number of pooling layers. Wherein the convolution layer is used to extract the sharp features and the pooling layer is used to increase the receptive field to the features. For example, in some alternative embodiments, 2 convolutional layers and one max-pooling layer may be included in each downsampling layer.

As an alternative embodiment of the present application, reference may be made to fig. 3B based on the embodiment shown in fig. 3A, which is a schematic model architecture diagram of another initial processing model provided in an embodiment of the present application. The embodiment of the present application is substantially the same as the embodiment shown in fig. 3A, and thus reference may be made to the description related to fig. 3A, and only differences will be described herein. In the embodiment of the application, the fusion network is used for fusing the clear features extracted by the sampling layers under different image processing networks, so as to generate a fused image containing the clear features of multiple frames of different images. And the method is also used for inputting the generated fusion image containing the clear features of the multiple frames of different images to the next sampling layer of each image processing network to be used as a reference when the next sampling layer extracts the clear features so as to improve the accuracy of the clear feature extraction. On this basis, the fusion network in the embodiment of the application may not fuse the images generated by the sampling layers on different image processing networks.

In practical applications, a technician may select the model architecture shown in fig. 3A or fig. 3B to set the initial processing model according to the requirements, which is not limited herein.

On the basis of the embodiments shown in fig. 3A and fig. 3B, referring to fig. 3C, as an embodiment of the present application, a schematic architecture diagram of a converged network is provided in an embodiment of the present application. In the embodiment of the application, each fusion network comprises a splicing layer and an output layer. The splicing layer is used for splicing and fusing the clear features to obtain a partially clear fused image or splicing and fusing the multi-frame images to obtain a clear fused image. The output layer is used for outputting the fused image fused by the splicing layer.

In the following, the training of the image processing model will be illustrated by taking the example that the initial processing model adopts the model structure shown in fig. 3A, and the network structure of each image processing network is the same, and the number of the fusion networks is the same as the total number of the downsampling layers and the upsampling layers included in the single image processing network. In the embodiment of the application, the sample data used for training includes: and one or more groups of samples, wherein each group of samples comprises a plurality of frames of motion blurred images and definition information corresponding to each frame of motion blurred images. And the number of multi-frame motion blurred images contained in each group of samples is the same as the number of image processing networks in the initial processing model. And meanwhile, corresponding clear reference images are arranged for each group of samples. In practical application, a technician can select a specific sample group according to practical requirements, and the specific sample group is not limited in the specification. The following is exemplified by a single set of samples as sample data:

The group of samples is provided to contain m3 frames of motion blurred images and definition information corresponding to each frame of motion blurred images. While the corresponding reference image is provided. In the embodiments of the present application, the source of the sample data is not limited too much, and may be set by a technician according to actual requirements. For example, in some alternative embodiments, a truly captured motion blurred image or a succession of multi-frame images in a captured video may be collected as motion blurred images, and corresponding sharpness information calculated as sample data. Alternatively, the method of "motion blur image generation part" described above may be used to generate the required motion blur image and calculate the corresponding sharpness information as the sample data. The definition information calculating method is not limited herein too, and can be set by a skilled person according to the requirements. For example, in some alternative embodiments, the embodiment shown in fig. 2A in the "sharpness calculation section" described above may be used alternatively, or sharpness information of a motion-blurred image may be calculated using a sharpness estimation model. Other ways of calculating the sharpness information may be used.

Continuing to refer to FIG. 3A, based on existing sample data, reference image, and initial process model. During training, the embodiment of the application can firstly input a frame of motion blurred image and definition information of the input motion blurred image into each image processing network respectively. The number of the motion blurred images input by each image processing network is different, and each image processing network can process one frame of motion blurred image respectively because the number of the motion blurred images is the same as that of the image processing networks. Meanwhile, each image processing network is relatively independent of processing operations such as downsampling and upsampling of the motion blurred image, and related operations are realized through a fusion network, a splicing layer and the like. The details are as follows:

since there may be differences in the blur areas and the blur degrees of different motion blurred images, after the motion blurred images are input, the 1 st downsampling layer of each image processing network starts downsampling the motion blurred images based on the sharpness information. The definition information is used as a guide, so that the resolution capability of the downsampling layer on the definition features can be improved, and the definition features of the motion blurred image can be extracted.

After the downsampling layer extracts clear features each time, in order to fully blend the clear features between the motion blurred images of each frame, each image processing network inputs the extracted clear features to a corresponding fusion network. And fusing the clear features by a fusion network, and recovering image details, thereby obtaining a fused image containing the clear features of the m3 frame motion blurred image.

On the basis of generating a fusion image containing clear features, the fusion network inputs the fusion image to the next downsampling layer of each image processing network.

For the (N1) th downsampling layer (N1 is a positive integer greater than 1 and less than or equal to the total number of downsampling layers of a single image processing network) in each image processing network, the data output by the previous downsampling layer and the fusion image input by the fusion network can be obtained. Accordingly, the nth 1 downsampling layer starts downsampling the data output by the previous downsampling layer (i.e., the nth 1-1 downsampling layer) based on the fused image and the sharpness information, thereby continuing to extract the sharpness feature. Because the fusion image contains the clear characteristic information of all the motion blurred images, the identification capability of the clear characteristics can be effectively improved when the fusion image is used for guiding the N1 downsampling layer to downsample, and the accuracy of the clear characteristic extraction is improved. For example, assume that each downsampling layer includes one convolutional layer. At this time, for the 1 st downsampling layer, the motion blur image is convolved with the convolution layer to extract features, and the clear features therein are screened out based on the sharpness information. For the N1 th downsampling layer, the data output by the N1 st downsampling layer and the data output by the N1 st downsampling layer are convolved to extract the features, and meanwhile clear features in the fused images and the definition information are screened out.

Similarly, after the 1 st downsampling layer of each image processing network extracts the clear features, the extracted clear features are input to the corresponding fusion network. The sharp features are fused by a fusion network and the fused image is input to the next downsampling layer of each image processing network.

After all downsampling layers finish downsampling, the image processing network begins upsampling the sharp features with the upsampling layer to restore image resolution. Specifically, the N2 up-sampling layer starts up-sampling when receiving a sharp feature input from the down-sampling layer, thereby generating a higher resolution image. If the fusion image input by the fusion image is received when the clear feature input by the downsampling layer is received, the fusion image is used as a reference to upsample the clear feature. Wherein N2 is a positive integer greater than or equal to 1, less than the total number of sampling layers on a single image processing network. After the N2 up-sampling layer of each image processing network generates a high resolution image, the generated image is input to a corresponding fusion network. The images are fused by a fusion network and the fused image is input to the next upsampling layer of the respective image processing network. For the last upsampling layer of the respective image processing network, the generated image may not be input to the fusion network after the image is generated, but to the stitching layer. The remaining operations are identical to the other upsampling layers and are not described here in detail.

As an alternative embodiment of the present application, for the image processing model architecture shown in fig. 3B, each upsampling layer may not input the generated image to the fusion network when upsampling the sharp features. Accordingly, when the image generated by the last upsampling layer is processed, the upsampling process is not needed by referring to the fused image. At this time, each up-sampling layer up-samples the received data in turn, and inputs the generated image to the next up-sampling layer (the last up-sampling layer inputs the generated image to the stitching layer).

Since each image processing network generates a frame of high resolution image, the stitching layer can receive m3 frames of images finally. On the basis, the splicing layer splices and fuses the images to obtain clear images, and inputs the clear images to the output layer. And outputting the reconstructed image fused by the splicing layer by the output layer.

After the reconstructed image is generated, a loss value between the reconstructed image and the reference image, namely, a difference degree or a similarity degree between the reconstructed image and the reference image is calculated. And iteratively training and updating model parameters of the initial processing model through the loss value until a preset convergence condition is met, thereby obtaining the image processing model with the final training completed. The embodiment of the application does not limit the type of the loss function and the convergence condition adopted for calculating the loss value too much, and can be set by a technician. For example, a loss function of a paradigm such as L1, L2, or smooth L1 may be used.

1. the sample data used in training contains multi-frame motion blurred images and corresponding definition information, and the effective reference and extraction of the clear features of each frame of motion blurred images can be realized through the participation of the definition information in the downsampling process.

2. Fusing the clear features extracted by each image processing network after each feature extraction, and taking the obtained fused image as a reference in the next feature extraction. Because the fusion image is fused with the clear features of the motion blurred images of each frame, the extraction of the clear features can be effectively guided in the next feature extraction, and the mutual reference of the clear features among different motion blurred images is realized. The accuracy of extracting the clear features each time can be effectively improved, so that the definition and the signal-to-noise ratio of the finally generated image are improved.

2. An image processing stage:

on the basis of completing training of an image processing model, the embodiment of the application can enter an image processing stage to realize deblurring processing of a motion blurred image. Referring to fig. 4A, a flowchart of an implementation of an image processing method provided in an embodiment of the present application is shown, and details are as follows:

S401, acquiring a plurality of images to be processed and definition information of each image to be processed.

According to different actual application scenes, a certain difference can exist in the actual sources of the images to be processed. For example, the image to be processed may be a photograph continuously taken by the electronic device in real time, may be a plurality of pictures selected locally, or may be a multi-frame image selected or cut from a video, etc. Therefore, the embodiment of the application does not excessively limit the source of the image to be processed, and can be determined according to the actual application condition. Meanwhile, the number of images to be processed in each processing is larger than 1, and the number is the same as the image processing network mathematical contained in the image processing model. On the basis, the embodiment of the application does not excessively limit the number of the images to be processed in each processing, and can be specifically determined according to the number of branches of the image processing network in the image processing model. For example, assume that the number of image processing networks included in the image processing model is m3, m3 is a positive integer greater than 1, and at this time, the image to be processed acquired in S401 is also m3.

Because the motion states of the moving objects at different moments are different, there may be a certain difference in the blur conditions (such as a blur area, a blur degree, etc.) in different images to be processed. For example, in the continuous shooting process, if the motion amplitude of the moving object is large in a certain exposure period, the motion blur of the shot image to be processed is serious. On the contrary, if the motion amplitude of the moving object is smaller in a certain exposure period, even if the moving object does not move, the shot image to be processed has lighter motion blur, and even can be a clear image without motion blur. For another example, during continuous shooting, the moving object may deviate from the shooting area, and the shot image to be processed may have no motion blur. Therefore, in practical application, the plurality of images to be processed include at least one motion blurred image. I.e. the plurality of images to be processed may all be motion blurred images or may be partially motion blurred images.

The embodiment of the application does not limit the acquisition mode of the definition information of the image to be processed too much, and can be selected by technicians according to actual requirements. The sharpness information of the image to be processed may also be referred to as first sharpness information.

As an alternative embodiment of the application, the embodiment shown in FIG. 2A can be selected according to requirements, or the trained sharpness estimation model in the embodiment shown in FIG. 2B can be used for calculating the sharpness information of the real motion blur image. The definition information of the image to be processed can also be obtained by a technician or a user in other modes and then input into the electronic equipment. When the trained definition estimation model is used for calculating definition information, more reference images of the image to be processed do not need to be acquired, so that the acquisition operation of the definition information is more convenient and compatible with more practical application scenes. For example, in the photographing process, the mobile phone only photographs 5 photos and selects the photos for the image to be processed, and at this time, more real photos cannot be obtained to be used as references for calculating the definition information of the image to be processed. When the definition information is calculated through the definition estimation model, other reference images do not need to be acquired, so that the actual scene requirement can be well solved. Therefore, the embodiment of the application can realize the rapid, accurate and convenient acquisition of the definition information. In the embodiment of the present application, the image to be processed, which is input to the sharpness estimation model each time, is also referred to as a target processing image.

And S402, respectively carrying out a plurality of clear feature extraction operations on each image to be processed based on the definition information, wherein after the clear features of each image to be processed are extracted, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is used as input data when the clear feature extraction operations are respectively carried out on each image to be processed next time until the last clear feature extraction operation is completed.

After obtaining a plurality of images to be processed and respective corresponding sharpness information, the embodiment of the application starts to downsample the images to be processed to extract the sharpness characteristics of the images to be processed. The clear characteristic extraction operation of each image to be processed is relatively independent. After the clear features of all the images to be processed are extracted once each time, the next clear feature extraction is not directly started based on the obtained clear features, but the clear features extracted at present are fused first to obtain a corresponding fused image (also called a first fused image). And taking the first fusion image as one of input data for next clear feature extraction of each image to be processed. When the clear features of each image to be processed are extracted next time, new clear features can be extracted according to the clear feature information extracted from each image to be processed at the time and the first fusion image obtained at the time, and the new first fusion image is obtained through fusion. I.e., for the ith sharp feature extraction operation of the single Zhang Dai processed image, a sharp feature extraction will be performed based on the resulting input data. And meanwhile, fusing the clear features extracted by the ith clear feature extraction operation of all the images to be processed into a first fused image, and inputting the first fused image into the (i+1) th clear feature extraction operation of each image to be processed. For the (i+1) th clear feature extraction operation of the single Zhang Dai processed image, the clear feature and the definition information output by the (i) th clear feature extraction operation and the first fused image generated by the (i) th time are processed as input data, so as to obtain a new clear feature. Wherein i is a positive integer, and i is greater than 1 and less than the total number of clear feature extraction operations performed on Shan Zhangdai processed images H1. For the 1 st clear feature extraction operation, the input data are the image to be processed and the definition data, and clear feature extraction is performed based on the data. For the ith clear feature extraction operation, the input data is the data output by the ith-1 clear feature extraction operation and the first fusion image generated in the ith-1. In the embodiment of the present application, the sharpness obtained by last downsampling each image to be processed is also referred to as a target sharpness.

As an alternative embodiment of the present application, the image processing model trained by the embodiment shown in fig. 3A or fig. 3B may be used to implement the clear feature extraction operation for each image to be processed. The specific model architecture of the image processing model may refer to the descriptions of the embodiments shown in fig. 3A and fig. 3B, and will not be described herein. At this time, the different images to be processed and the corresponding sharpness information can be input into different image processing networks for processing, and the sharp feature fusion is realized by using a fusion network in S402. Referring to fig. 4B, a flowchart of a clear feature extraction method according to an embodiment of the present application is shown, where S402 may be replaced by: s4021 to S4026.

S4021, inputting a piece of image to be processed and definition information corresponding to the input image to be processed into each image processing network in the image processing model. Wherein, the images to be processed input by each image processing network are different.

S4022, the 1 st downsampling layer in each image processing network starts to downsample the image to be processed for the 1 st time based on the input definition information, so as to obtain corresponding definition characteristics, and the obtained definition characteristics are input into the fusion network.

S4023, the fusion network fuses all the received clear features obtained by the 1 st downsampling to obtain a corresponding 1 st first fusion image, and inputs the 1 st first fusion image to the 2 nd downsampling layer of each image processing network.

S4024, the ith downsampling layer in each image processing network performs ith downsampling based on the data output by the ith-1 downsampling layer and the ith-1 first fusion image input by the fusion network to obtain corresponding clear features, and the obtained clear features are input to the fusion network. Wherein i is a positive integer, and i is greater than 1 and less than H1. H1 is the total number of downsampling layers in a single image processing network, and H1 is a positive integer greater than 2.

The i-1 first fusion image is obtained by fusing the fusion network based on the clear features obtained by all i-1 downsampling.

S4025, the fusion network fuses all the received clear features obtained by the ith downsampling to obtain a corresponding ith first fusion image, and the ith first fusion image is input to the (i+1) th downsampling layer of each image processing network.

Since the ith downsampling may be any one of downsampling other than the 1 st downsampling and the last downsampling. The operation of each downsampling between the 1 st downsampling and the last downsampling, and the corresponding sharp feature fusion, is therefore not described in detail herein.

S4026, the H1 down-sampling layer in each image processing network performs H1 down-sampling based on the data output by the H1-1 down-sampling layer and the H1-1 first fusion image input by the fusion network to obtain corresponding clear features.

For details of S4021 to S4026 in the embodiment of the present application, reference may be made to the related descriptions of the embodiment shown in fig. 3A and 3B, which are not repeated here.

S403, generating an output image based on the clear features of all the images to be processed extracted last time.

After the clear feature extraction operation of each image to be processed is completed, the embodiment of the application performs splicing and fusion based on the obtained clear features of different images to be processed, so that an output image with higher definition and higher signal-to-noise ratio is obtained. The embodiment of the application does not limit the splicing fusion method of the clear features too much, and can be set by technicians according to actual requirements. As an optional embodiment of the present application, after the last clear feature extraction operation is completed, the up-sampling operation may be performed on the clear features of each image to be processed, so as to restore the resolution of the image. And finally, fusing the images obtained based on the up-sampling operation, thereby obtaining an output image.

As an optional embodiment of the application, in the process of upsampling the clear features, the upsampled images can be fused each time, and the next upsampling operation is guided by using the fused images, so that the effect of the upsampling operation is improved, and the definition and the signal-to-noise ratio of the final output image are improved. Referring to fig. 4C, a flowchart of a method for generating an output image according to an embodiment of the present application may be provided, where S403 may be replaced with: s4031 to S4032.

S4031, respectively performing multiple upsampling operations on the clear features extracted from each image to be processed for the last time, wherein after the upsampling operation on each image to be processed is completed and a corresponding intermediate image is generated each time, synthesizing the intermediate images of all the images to be processed obtained at this time into a second fused image, and taking the second fused image as input data of the next upsampling operation on each image to be processed respectively until the last upsampling operation is completed.

S4032, fusing the intermediate image generated by the last upsampling operation to obtain an output image.

In the embodiment of the application, when the resolution of the image is restored through upsampling, the image with high resolution (i.e. the intermediate image) generated by upsampling is fused after upsampling operation of all the images to be processed each time, as in downsampling, so that a corresponding second fused image is obtained. The up-sampling is continued with the second fused image as one of the input data for the next up-sampling. Since the operation logic for generating and outputting the fused image is substantially the same as the operation logic for generating and outputting the fused image during downsampling, reference may be made to the description of S402 (such as the flowchart of the embodiment shown in fig. 4B) and the description of the embodiment shown in fig. 3A, which are not repeated herein. The second fusion image obtained by fusing the last upsampling result is used as a guide when upsampling is performed each time, so that the embodiment of the application effectively improves the upsampling effect to output the intermediate image with higher definition and higher signal-to-noise ratio, and accordingly improves the definition and the signal-to-noise ratio of the finally generated output image (also called as a reconstructed image).

After the last upsampling is completed and the corresponding intermediate images are generated, the embodiment of the application performs stitching and fusion on the intermediate images. Because the intermediate images contain the clear characteristics of different images to be processed, the motion blur parts in the images to be processed are basically or completely abandoned, and therefore, compared with the original images to be processed, the output image generated based on the motion blur parts is higher in definition and signal to noise ratio. Therefore, the embodiment of the application can realize effective improvement and removal of motion blur.

As an alternative embodiment of the present application, the image processing model trained by the embodiment shown in FIG. 3A may be used to implement deblurring of the image to be processed in the embodiment of the present application. At this time, the upsampling operation in S4031 may be implemented by using the respective upsampling layers in the image processing model, and the image fusion operation in S4031 may be implemented by using the fusion network. Details of the operation are substantially the same as those of the training in the embodiment shown in fig. 3A, so reference may be made to the description of the embodiment shown in fig. 3A, and details thereof are omitted here.

As an alternative embodiment of the present application, no reference to sharpness information may be selected when training the initial processing model shown in FIG. 3A or FIG. 3B. Accordingly, in the embodiments shown in fig. 4A to 4C, the sharpness information may not be acquired and used to perform the operations such as the sharpness feature extraction on the image to be processed.

1. according to the embodiment of the application, definition information of the image to be processed is introduced into the image to be processed. By using the image to be processed and the definition information thereof as input data, and using the definition information to guide the process of extracting the definition features, the accuracy of the branches (such as a single image processing network in some embodiments) for extracting the definition features of each image to be processed can be improved, thereby improving the definition and the signal-to-noise ratio of the finally generated output image.

2. Fusing the clear features extracted by each image processing network after each clear feature extraction, and taking the obtained fused image as a reference in the next feature extraction. Because the clear features of the images to be processed are fused in the fused image, the extraction of the clear features can be effectively guided in the next feature extraction, and the mutual reference of the clear features among different images to be processed is realized. The accuracy of extracting the clear features each time can be effectively improved, so that the definition and the signal-to-noise ratio of the final output image are improved.

3. The embodiment of the application also provides an image processing model (refer to a model architecture in the embodiment shown in fig. 3A or fig. 3B) capable of supporting the implementation of the image processing method in the embodiment of the application. On one hand, the image processing model is provided with a plurality of image processing networks which are respectively used for processing different images, so that the relative independent processing of different images to be processed is realized. On the other hand, a fusion network for connecting different downsampling layers is also arranged, fusion of output of each downsampling layer is realized through the fusion network, and fused images are output to the next downsampling layer, so that mutual reference of clear characteristics of different images to be processed is realized when different images to be processed are processed. Therefore, the accuracy of extracting the clear features each time can be effectively improved through the processing of the image processing model, so that the definition and the signal-to-noise ratio of the final output image are improved.

As an alternative embodiment of the present application, the execution subjects of the motion blur image generation section, the sharpness calculation section, the model training section, and the image processing stage described above may be the same or different. For example, in some alternative embodiments, the electronic device implementing the motion blurred image generating portion, the electronic device implementing the sharpness calculating portion, the electronic device implementing the model training portion, and the electronic device implementing the image processing stage may all be different, e.g., may be electronic device a, electronic device B, electronic device C, and electronic device D, respectively. In other alternative embodiments, the motion blur image generating part, the sharpness calculating part and the model training part may be implemented by the same electronic device (e.g. a server, etc.), while the image processing stage is implemented by another electronic device (e.g. a terminal device such as a mobile phone, a tablet computer, a camera, a computer, etc.). Therefore, the embodiment of the application does not limit the execution main body of each part and stage too much, and can be determined according to the actual application condition.

Fig. 5 shows a schematic structural diagram of an image processing apparatus provided in an embodiment of the present application, corresponding to the image processing method described in the above embodiment, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

Referring to fig. 5, the image processing apparatus includes:

the image acquisition module 51 is configured to acquire a plurality of images to be processed. Wherein the plurality of images to be processed comprise at least one motion blurred image.

The feature extraction module 52 is configured to perform a plurality of continuous clear feature extraction operations on each image to be processed, where after each clear feature of each image to be processed is extracted in the clear feature extraction operations, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is input into the next clear feature extraction operation performed on each image to be processed, respectively, until the last clear feature extraction operation is completed on each image to be processed.

The image generating module 53 is configured to generate an output image based on the target sharpness characteristics of each image to be processed, where the target sharpness characteristics are sharpness characteristics extracted from the image to be processed for the last time.

The process of implementing respective functions by each module in the image processing apparatus provided in this embodiment of the present application may refer to the foregoing description of the embodiments shown in fig. 1A to fig. 4C and other related method embodiments, which are not repeated herein. As an embodiment of the present application, the image processing apparatus may implement the foregoing embodiments shown in fig. 1A to 4C and other related method embodiments as an execution subject.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance. It will also be understood that, although the terms "first," "second," etc. may be used in this document to describe various elements in some embodiments of the present application, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first table may be named a second table, and similarly, a second table may be named a first table without departing from the scope of the various described embodiments. The first table and the second table are both tables, but they are not the same table.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The image processing method provided by the embodiment of the application can be applied to electronic devices such as mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA) and the like, and the specific types of the electronic devices are not limited.

For example, the electronic device may be a cellular telephone, a cordless telephone, a Session initiation protocol (Session InitiationProtocol, SIP) telephone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital assistant (Personal Digital Assistant, PDA) device, a handheld device with wireless communication capabilities, a computing device or other processing device connected to a wireless modem, an in-vehicle device, a car networking terminal, a computer, a laptop computer, a handheld communication device, a handheld computing device, a satellite radio, a wireless modem card, a television Set Top Box (STB), a customer premise equipment (customer premise equipment, CPE) and/or other devices for communicating over a wireless system, as well as next generation communication systems, e.g., electronic devices in a 5G network or electronic devices in a future evolved public land mobile network (Public Land Mobile Network, PLMN) network, etc.

By way of example, but not limitation, when the electronic device is a wearable device, the wearable device may also be a generic name for applying wearable technology to intelligently design daily wear, developing wearable devices, such as glasses, gloves, watches, apparel, shoes, and the like. The wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also can realize a powerful function through software support, data interaction and cloud interaction. The generalized wearable intelligent device comprises full functions, large size, and complete or partial functions which can be realized independent of a smart phone, such as a smart watch or a smart glasses, and is only focused on certain application functions, and needs to be matched with other devices such as the smart phone for use, such as various smart bracelets, smart jewelry and the like for physical sign monitoring.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 6 of this embodiment includes: at least one processor 60 (only one is shown in fig. 6), a memory 61, said memory 61 having stored therein a computer program 62 executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps in the respective image processing method embodiments described above, such as steps S401 to S403 shown in fig. 4A. Alternatively, the processor 60, when executing the computer program 62, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 51 to 53 shown in fig. 5.

The electronic device 6 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The electronic device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the electronic device 6 and is not meant to be limiting as the electronic device 6 may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input transmitting device, a network access device, a bus, etc.

The processor 60 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may in some embodiments be an internal storage unit of the electronic device 6, such as a hard disk or a memory of the electronic device 6. The memory 61 may be an external storage device of the electronic device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the electronic device 6. The memory 61 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 61 may also be used for temporarily storing data that has been transmitted or is to be transmitted.

In addition, it will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Taking the example that the electronic device is a mobile phone, fig. 7A shows a schematic structural diagram of the mobile phone 100.

The handset 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a SIM card interface 195, etc. The sensor module 180 may include a gyroscope sensor 180A, an acceleration sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an ambient light sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, and a touch sensor 180K (of course, the mobile phone 100 may also include other sensors such as a temperature sensor, a pressure sensor, an air pressure sensor, a bone conduction sensor, etc., which are not shown).

It should be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components may be provided. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a Neural network processor (Neural-network Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. The controller may be a neural center or a command center of the mobile phone 100. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

The processor 110 may operate the image processing method provided in the embodiments of the present application to improve the sharpness and the signal-to-noise ratio of the motion blurred image, and improve the user experience. The processor 110 may include different devices, for example, when the CPU and the GPU are integrated, the CPU and the GPU may cooperate to execute the image processing method provided in the embodiments of the present application, for example, a part of algorithms in the image processing method are executed by the CPU, and another part of algorithms are executed by the GPU, so as to obtain a faster processing efficiency.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the cell phone 100 may include 1 or N display screens 194, N being a positive integer greater than 1. The display 194 may be used to display information entered by or provided to a user as well as various graphical user interfaces (graphical user interface, GUI). For example, the display 194 may display photographs, videos, web pages, or files, etc. For another example, the display 194 may display a graphical user interface. Including status bars, hidden navigation bars, time and weather gadgets (widgets), and icons for applications, such as browser icons, etc. The status bar includes the name of the operator (e.g., chinese mobile), the mobile network (e.g., 4G), time, and the remaining power. The navigation bar includes a back (back) key icon, a home screen (home) key icon, and a forward key icon. Further, it is to be appreciated that in some embodiments, bluetooth icons, wi-Fi icons, external device icons, etc. may also be included in the status bar. It will also be appreciated that in other embodiments, a Dock may be included in the graphical user interface, a commonly used application icon may be included in the Dock, and the like. When the processor detects a touch event of a finger (or a stylus or the like) of a user with respect to a certain application icon, a user interface of the application corresponding to the application icon is opened in response to the touch event, and the user interface of the application is displayed on the display screen 194.

In the embodiment of the present application, the display 194 may be an integral flexible display, or a tiled display formed of two rigid screens and a flexible screen located between the two rigid screens may be used. After the processor 110 runs the image processing method provided in the embodiment of the present application, the processor 110 may control the external audio output device to switch the output audio signal.

The camera 193 (front camera or rear camera, or one camera may be used as both front camera and rear camera) is used to capture still images or video. In general, the camera 193 may include a photosensitive element such as a lens group including a plurality of lenses (convex lenses or concave lenses) for collecting optical signals reflected by an object to be photographed and transmitting the collected optical signals to an image sensor. The image sensor generates an original image of the object to be photographed according to the optical signal.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the cellular phone 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store, among other things, code for an operating system, an application program (e.g., a camera application, a WeChat application, etc.), and so on. The storage data area may store data created during use of the handset 100 (e.g., images, video, etc. acquired by the camera application), etc.

The internal memory 121 may also store one or more computer programs corresponding to the image processing methods provided in the embodiments of the present application. The one or more computer programs stored in the memory 121 and configured to be executed by the one or more processors 110 include instructions that may be used to perform the various steps as in the corresponding embodiments of fig. 1A-4C, which may include an account verification module, a priority comparison module. The account verification module is used for authenticating system authentication accounts of other electronic devices in the local area network; the priority comparison module can be used for comparing the priority of the audio output request service with the priority of the current output service of the audio output equipment. And the state synchronization module can be used for synchronizing the equipment state of the audio output equipment currently accessed by the electronic equipment to other electronic equipment or synchronizing the equipment state of the audio output equipment currently accessed by other equipment to the local. When the code of the image processing method stored in the internal memory 121 is executed by the processor 110, the processor 110 may control the electronic device to perform motion blur image processing.

In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

Of course, the code of the image processing method provided in the embodiment of the present application may also be stored in the external memory. In this case, the processor 110 may run codes of image processing methods stored in the external memory through the external memory interface 120, and the processor 110 may control the electronic device to perform motion blur image processing.

The function of the sensor module 180 is described below.

The fingerprint sensor 180H is used to collect a fingerprint. The mobile phone 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the mobile phone 100 at a different location than the display 194.

Illustratively, the display 194 of the handset 100 displays a main interface that includes icons of a plurality of applications (e.g., camera applications, weChat applications, etc.). The user clicks on an icon of the camera application in the main interface by touching the sensor 180K, triggering the processor 110 to launch the camera application, opening the camera 193. The display 194 displays an interface for the camera application, such as a viewfinder interface.

The wireless communication function of the mobile phone 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to the handset 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110. In the embodiment of the present application, the mobile communication module 150 may also be used for information interaction with other electronic devices.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc. applied to the handset 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2. In the embodiment of the present application, the wireless communication module 160 may be used for accessing an access point device, and sending and receiving messages to other electronic devices.

In addition, the mobile phone 100 may implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor, etc. Such as music playing, recording, etc. The handset 100 may receive key 190 inputs, generating key signal inputs related to user settings and function control of the handset 100. The cell phone 100 may generate a vibration alert (such as an incoming call vibration alert) using the motor 191. The indicator 192 in the mobile phone 100 may be an indicator light, which may be used to indicate a state of charge, a change in power, an indication message, a missed call, a notification, etc. The SIM card interface 195 in the handset 100 is used to connect to a SIM card. The SIM card may be inserted into the SIM card interface 195 or removed from the SIM card interface 195 to enable contact and separation with the handset 100.

It should be understood that in practical applications, the mobile phone 100 may include more or fewer components than shown in fig. 7A, and embodiments of the present application are not limited. The illustrated handset 100 is only one example, and the handset 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The software system of the electronic device may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of an electronic device is illustrated. Fig. 7B is a software architecture block diagram of an electronic device according to an embodiment of the present application.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages.

As shown in fig. 7B, the application package may include applications such as phone, camera, gallery, calendar, talk, map, navigation, WLAN, bluetooth, music, video, short message, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 7B, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is for providing communication functions of the electronic device. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio video encoding formats, such as: MPEG4, h.164, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The workflow of the mobile phone 100 software and hardware is illustrated below in connection with the scenario of the mobile phone 100 based on image processing.

When the camera 193 captures a plurality of images to be processed including a motion blur image, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the shot event into an original input event, which is stored in the kernel layer. The application program framework layer acquires an original input event from the kernel layer, and deblurs the image to be processed by calling the resource manager in the application program framework layer, so that an output image which is clearer and has higher signal-to-noise ratio is obtained.

The embodiment of the application also provides an electronic device, which comprises at least one memory, at least one processor and a computer program stored in the at least one memory and capable of running on the at least one processor, wherein the processor executes the computer program to enable the electronic device to realize the steps in any of the method embodiments.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform the steps of the method embodiments described above.

Embodiments of the present application also provide a chip system, where the chip system includes a processor, where the processor is coupled to a memory, and the processor executes a computer program stored in the memory to implement the steps in the foregoing method embodiments.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. An image processing method, applied to an electronic device, comprising:

acquiring a plurality of images to be processed; wherein the plurality of images to be processed comprise at least one motion blurred image;

respectively carrying out clear feature extraction operation on each image to be processed continuously for a plurality of times, wherein after clear features of each image to be processed are extracted in the clear feature extraction operation, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is respectively input into the clear feature extraction operation carried out on each image to be processed next time until the clear feature extraction operation of each image to be processed last time is completed;

and generating an output image based on the target clear features of each image to be processed, wherein the target clear features are the clear features extracted from the image to be processed for the last time.

2. The image processing method according to claim 1, further comprising, before said successively performing clear feature extraction operations on each of said images to be processed, respectively:

Acquiring first definition information of each image to be processed;

and respectively carrying out clear characteristic extraction operation on each image to be processed for a plurality of times, wherein the clear characteristic extraction operation comprises the following steps:

and based on the first definition information, respectively carrying out the definition characteristic extraction operation on each image to be processed for a plurality of times.

3. The image processing method according to claim 2, wherein the operation of acquiring the first sharpness information of the target process image, in which the target process image is any one of the images to be processed, includes:

inputting the target processing image into a pre-trained definition estimation model for processing to obtain corresponding first definition information; the definition estimation model is used for calculating definition information of the images, and is a neural network model obtained by training based on a plurality of motion blurred images and second definition information corresponding to each image in the plurality of motion blurred images.

4. The image processing method according to claim 2, wherein the electronic device stores a pre-trained image processing model, and the image processing model includes a fusion network and the same number of image processing networks as the image to be processed; each image processing network comprises a plurality of downsampling layers with the same quantity, and the downsampling layers in the single image processing network are sequentially connected; the downsampling layer is used for carrying out the clear feature extraction operation on the image so as to extract corresponding clear features;

Correspondingly, based on the first definition information, performing a plurality of continuous clear feature extraction operations on each image to be processed, and after each clear feature of each image to be processed is extracted in the clear feature extraction operations, fusing the clear features extracted from all the images to be processed into a first fused image, and inputting the first fused image into the clear feature extraction operations performed on each image to be processed next time until the final clear feature extraction operation on each image to be processed is completed, including:

each image processing network uses a plurality of downsampling layers of the image processing network to perform continuous and repeated clear feature extraction operation on a single image to be processed based on the first definition information; after the clear features are extracted by the downsampling layer each time, the obtained clear features are input into the fusion network; wherein the images to be processed by each image processing network are different;

and after receiving the clear features input by the downsampling layers of the image processing networks, the fusion network fuses the received clear features into the first fusion image each time, and inputs the first fusion image to the next downsampling layer of the image processing networks until all downsampling layers finish the clear feature extraction operation.

5. The image processing method according to claim 1, wherein the generating an output image based on the target sharpness characteristics of each of the images to be processed includes:

respectively carrying out continuous and repeated up-sampling operation on each target clear feature, wherein after the up-sampling operation on each target clear feature is finished and a corresponding intermediate image is generated, synthesizing all the intermediate images of the target clear features obtained at the time into a second fusion image, and respectively inputting the second fusion image into the up-sampling operation on each target clear feature at the next time until the up-sampling operation on each target clear feature at the last time is finished;

and fusing the intermediate images generated by the last upsampling operation of the target clear features to obtain the output image.

6. The image processing method according to claim 3, wherein the operation of acquiring the plurality of motion blur images used in training the sharpness estimation model includes:

selecting a plurality of continuous images from an image set, wherein the image set comprises a plurality of images which are continuously shot;

And respectively generating intermediate state images for each group of adjacent images in the selected multiple images, and generating motion blur images corresponding to each group of adjacent images one by one based on the generated intermediate state images to obtain the multiple motion blur images.

7. The image processing method according to claim 6, wherein the operation of generating the intermediate state image for a target image group, which is any one of the adjacent images, and obtaining a motion blur image corresponding to the target image group based on the generated intermediate state image, includes:

processing the target image group by using a frame inserting method to obtain a plurality of corresponding intermediate state images;

and randomly selecting a plurality of intermediate state images from the generated intermediate state images, and synthesizing the selected intermediate state images to obtain a motion blurred image corresponding to the target image group.

8. The image processing method according to claim 6 or 7, wherein the target blurred image is any one of the plurality of motion blurred images, and the operation of acquiring the second sharpness information of the target blurred image includes:

Determining an image used for generating the target blurred image from the plurality of images, and calculating pixel displacement information of the target blurred image based on a first frame image and a last frame image in the determined images;

the second sharpness information of the target blurred image is determined based on the pixel displacement information.

9. The image processing method according to claim 8, wherein the determining the second sharpness information of the target blurred image based on the pixel displacement information includes:

calculating the motion amplitude of each pixel point in the target blurred image based on the pixel displacement information;

and carrying out normalization processing on the motion amplitude, and carrying out inverse operation on the motion amplitude after the normalization processing to obtain the second definition information of the target blurred image.

10. An image processing apparatus, comprising:

the image acquisition module is used for acquiring a plurality of images to be processed; wherein the plurality of images to be processed comprise at least one motion blurred image;

the feature extraction module is used for respectively carrying out clear feature extraction operation on each image to be processed for a plurality of times, wherein after clear features of each image to be processed are extracted in the clear feature extraction operation each time, the clear features extracted from all the images to be processed at this time are fused into a first fused image, and the first fused image is respectively input into the clear feature extraction operation carried out on each image to be processed next time until the clear feature extraction operation of each image to be processed is finished last time;

11. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, the processor implementing the image processing method according to any of claims 1 to 9 when the computer program is executed.

12. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the image processing method according to any one of claims 1 to 9.

13. A chip system comprising a processor coupled to a memory, the processor executing a computer program stored in the memory to implement the image processing method of any of claims 1 to 9.