WO2021018163A1 - Neural network search method and apparatus - Google Patents

Neural network search method and apparatus Download PDF

Info

Publication number
WO2021018163A1
WO2021018163A1 PCT/CN2020/105369 CN2020105369W WO2021018163A1 WO 2021018163 A1 WO2021018163 A1 WO 2021018163A1 CN 2020105369 W CN2020105369 W CN 2020105369W WO 2021018163 A1 WO2021018163 A1 WO 2021018163A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
feature map
image
super
network
Prior art date
Application number
PCT/CN2020/105369
Other languages
French (fr)
Chinese (zh)
Inventor
宋德华
贾旭
王云鹤
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021018163A1 publication Critical patent/WO2021018163A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution

Definitions

  • This application relates to the field of artificial intelligence, and more specifically, to a neural network search method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • image super-resolution reconstruction technology refers to the reconstruction of low-resolution images to obtain high-resolution images.
  • Image super-resolution reconstruction processing through deep neural networks has obvious advantages.
  • the enhancement of the deep neural network model is also getting bigger.
  • the computing performance and storage space of mobile devices are very limited, this greatly limits the application of super-division models on mobile devices. Therefore, people are committed to designing lightweight super-division network models to ensure certain recognition accuracy. In this case, reduce the network scale as much as possible.
  • the neural architecture search (NAS) method is applied to the image super-resolution reconstruction technology.
  • the search space in the NAS method is usually a search space constructed by basic convolutional units.
  • the search space can include candidate neural network models constructed by multiple basic units.
  • the multiple basic units are based on the size of the input feature map in the same feature.
  • the size of the input feature map is nonlinearly transformed, which results in the amount of parameters in the neural network model being proportional to the amount of calculation, that is, the larger the parameter amount, the greater the amount of calculation of the network model.
  • the present application provides a neural network search method and device, which can improve the accuracy of the super-resolution network in image super-resolution processing when the computing performance of the mobile device is limited.
  • a search method for a neural network structure including: constructing a basic unit, which is a network structure obtained by connecting basic modules through basic operations of the neural network, the basic module including the first module
  • the first module is used to perform a dimensionality reduction operation and a residual connection operation on the first input feature map, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale,
  • the second scale is smaller than the first scale
  • the residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the feature map processed by the first module
  • the scale of is the same as the scale of the first input feature map
  • a search space is constructed according to the basic unit and network structure parameters, where the network structure parameter includes the type of basic module used to construct the basic unit, and the search space is used to search for images Super-resolution network structure; search for the image super-resolution network structure in the search space to determine the target image super-resolution network, the target image super-resolution
  • the basic unit may be a network structure obtained by connecting basic modules through the basic operations of a neural network.
  • the above-mentioned network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network. , These basic operations or combinations of basic operations can be collectively referred to as basic operations.
  • basic operations can refer to convolution operations, pooling operations, residual connections, etc.
  • connections between basic modules can be made to obtain the network structure of the basic unit.
  • the above feature addition may refer to adding different channel features for feature maps of the same scale.
  • the first module can perform residual connection on the input feature map, that is, can perform feature addition processing on the first input feature map and the feature map processed by the first module, so as to realize that the first More local detailed information in an input feature map is passed to the subsequent convolutional layer.
  • the emcee module can be used to reduce the dimensionality of the first input feature map.
  • the scale of the input feature map can be reduced to reduce the amount of model calculation.
  • the residual connection operation can transfer the information of the previous layer to the subsequent layer, which makes up for the defect of the loss of information of the dimensionality reduction operation.
  • the dimensionality reduction operation can also quickly expand the receptive field of features, allowing the prediction of high-resolution pixels to better consider contextual information, thereby improving the accuracy of super-resolution.
  • the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the pooling operation may be an average pooling operation, or the pooling operation may also be a maximum pooling operation.
  • the scale of the first input feature map can be reduced through the dimensionality reduction operation, thereby reducing the calculation amount of the target image super-resolution network under the condition that the parameter amount is unchanged.
  • the feature map processed by the first module is the feature map after the dimension upgrade operation
  • the dimension upgrade operation refers to the feature map after the dimension reduction process
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
  • the above-mentioned dimension increase operation may refer to an up-sampling operation, or the dimension increase operation may also refer to a reverse convolution operation, where the up-sampling operation may refer to the use of an interpolation method, that is, based on the original image pixels
  • the deconvolution operation can refer to the inverse process of the convolution operation, also known as transposed convolution.
  • the scale of the first input feature map after the dimensionality reduction operation can be transformed from the second scale to the original first scale through the dimensionality reduction operation.
  • the scale increases to ensure that the residual connection operation is realized at the same scale.
  • the first module is further configured to perform a dense connection operation on the first input feature map, wherein the dense connection operation refers to the i-1
  • the output feature map of each convolutional layer in each convolutional layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the aforementioned feature splicing may refer to splicing M feature maps of the same scale into a feature map with K channels, where K is a positive integer greater than M.
  • the largest information circulation in the network can be achieved by adopting dense connection operations, and each layer is connected to all layers before that layer, that is, the input of each layer is the splicing of the outputs of all the previous layers.
  • dense connection operation the information in the input feature map is better maintained in the entire network, which can better compensate for the defect of information loss caused by the dimensionality reduction operation.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
  • the depth of the super-resolution network of the target image can be deepened by adopting the cyclic operation, that is, the cyclic dense connection operation.
  • the cyclic operation that is, the cyclic dense connection operation.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than that of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map,
  • the channel compression operation refers to a convolution operation with a convolution kernel of 1 ⁇ 1 on the second input feature map;
  • the third module is used to perform the third input feature map
  • the channel exchange operation, the residual connection operation, and the dense connection operation the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, and M is an integer greater than 1, the The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the image super-resolution network structure search in the search space to determine the target image super-resolution network includes:
  • the first image super-resolution network is determined by searching the image super-resolution network structure through an evolutionary algorithm
  • the training image may refer to the sample image, that is, the low-resolution image and the sample super-resolution corresponding to the low resolution. image.
  • the first image super-resolution network determined by the evolutionary algorithm can be trained twice through the multi-level weighted joint loss function, and finally the parameters of the target image super-resolution network can be determined to obtain the target image super-resolution network. , Thereby improving the accuracy of the target image super-resolution network processing image.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the k-th basic unit of the first image super-resolution network
  • the loss value refers to the output feature of the k-th basic unit
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first image super-resolution network included
  • N is an integer greater than or equal to 1.
  • the weight of each intermediate layer image loss in the multi-level weighted joint loss function may change with time (or the number of iterations).
  • the loss function can combine the predicted image loss of each intermediate layer, and reflect the importance of different layers by weighting. Among them, the weight value of each intermediate layer image loss can change over time, which is conducive to more fully training the bottom layer
  • the parameters of the basic unit to improve the performance of the super-resolution network may change with time (or the number of iterations).
  • the search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network includes:
  • the performance parameter includes the peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the predicted super-division image obtained by each candidate network structure Differences with sample super-divided images;
  • the first image super-resolution network is determined according to the performance parameters of the candidate network.
  • training images and multi-level weighted joint loss functions need to be used to train the candidate network structures, where the training images may refer to sample images, ie, low-resolution images and low-resolution images. The corresponding sample super-resolution image.
  • an image processing method including: acquiring an image to be processed; performing super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image of the image to be processed, wherein the target image Is the super-resolution image corresponding to the image to be processed, the target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, and the search space is constructed by basic units and network structure parameters, The search space is used to search the image super-resolution network structure.
  • the network structure parameters include the type of the basic module used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes a first module that is used to perform a residual connection operation and a dimensionality reduction operation on the first input feature map.
  • the residual connection operation refers to the first input feature map and the first input feature map.
  • the feature map processed by the module is subjected to feature addition processing.
  • the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale, and the
  • the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is the feature map after the dimension upgrade operation
  • the dimension upgrade operation refers to the feature map after the dimensionality reduction process
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
  • the first module is also used to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolving i-1
  • the output feature map of each convolutional layer in the layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map,
  • the channel compression operation refers to a convolution operation with a convolution kernel of 1 ⁇ 1 on the second input feature map;
  • the third module is used to perform the third input feature map
  • the channel exchange operation, the residual connection operation, and the dense connection operation the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, and M is an integer greater than 1, the The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the target image super-resolution network is a network determined by back-propagating iterative training of the first image super-resolution network through a multi-level weighted joint loss function, where The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network.
  • the first image super-resolution rate network refers to a network that is determined by searching the image super-resolution network structure through an evolutionary algorithm in the search space.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the k-th basic unit of the first image super-resolution network
  • the loss value refers to the output feature map of the k-th basic unit
  • the image loss between the corresponding predicted super-resolution image and the sample super-resolution image, ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first image super-resolution network included
  • the number of the basic unit, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in P candidate network structures, and the P candidate network structures are Randomly generated according to the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function.
  • the performance parameter includes the peak-to-noise ratio, and the peak-to-noise ratio. The ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure, and P is an integer greater than 1.
  • an image processing method is provided, which is applied to an electronic device with a display screen and a camera.
  • the method includes: detecting a user's first operation for turning on the camera; in response to the first operation, A photographing interface is displayed on the upper surface, the photographing interface includes a finder frame, the finder frame includes a first image; the second operation of the user instructing the camera is detected; in response to the second operation, the second image is displayed in the finder frame,
  • the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target image super-resolution network is applied to the super-resolution processing process, and the target image super-resolution network is
  • the search space is a network determined by searching for a network structure.
  • the search space is constructed by basic units and network structure parameters.
  • the search space is used to search for image super-resolution network structures.
  • the network structure parameters include the basic units used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes a first module that is used for residual connection of the first input feature map Operation and dimensionality reduction operation.
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimensionality reduction operation is used for the first input feature map
  • the scale of is transformed from the original first scale to a second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module The scale is the same as the first input feature map.
  • the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is the feature map after the dimension upgrade operation
  • the dimension upgrade operation refers to the feature map after the dimensionality reduction process
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
  • the first module is also used to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map,
  • the channel compression operation refers to a convolution operation with a convolution kernel of 1 ⁇ 1 on the second input feature map;
  • the third module is used to perform the third input feature map
  • the channel exchange operation, the residual connection operation, and the dense connection operation the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, where M is an integer greater than 1,
  • the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the target image super-resolution network is a network determined by back-propagating iterative training of the first image super-resolution network through a multi-level weighted joint loss function, where The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network.
  • the first image super-resolution rate network refers to a network that is determined by searching the image super-resolution network structure through an evolutionary algorithm in the search space.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the k-th basic unit of the first image super-resolution network
  • the loss value refers to the corresponding output feature map of the k-th basic unit
  • the image loss between the predicted super-resolution image and the sample super-resolution image, ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first image super-resolution network included in the The number of basic units, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in P candidate network structures, and the P candidate network structures are Randomly generated according to the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function.
  • the performance parameter includes the peak-to-noise ratio, and the peak-to-noise ratio. The ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure, and P is an integer greater than 1.
  • a neural network search device in a fourth aspect, includes: a memory for storing programs; a processor for executing programs stored in the memory.
  • the processor uses For execution: construct a basic unit, which is a network structure obtained by connecting basic modules through the basic operation of a neural network.
  • the basic module includes a first module, which is used to perform a first input feature map.
  • a dimensionality reduction operation and a residual connection operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale is smaller than the first scale, and the residual connection
  • the operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map;
  • the basic unit and network structure parameters construct a search space, where the network structure parameters include the type of basic modules used to construct the basic unit, and the search space is used to search for image super-resolution network structures; image super-resolution is performed in the search space.
  • the resolution network structure search determines the target image super-resolution network, the target image super-resolution network is used to perform super-resolution processing on the image to be processed, the target image super-resolution network includes at least the first module, and the target image super-resolution network
  • the resolution network is a network in which the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold.
  • the processor included in the aforementioned neural network search device is further configured to execute the search method in any one implementation manner in the first aspect.
  • an image processing device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute: Obtain the image to be processed; perform super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image of the image to be processed, where the target image is a super-resolution image corresponding to the image to be processed, and the target
  • the image super-resolution network is a network determined by searching the image super-resolution network structure in the search space.
  • the search space is constructed by basic units and network structure parameters. The search space is used to search for the image super-resolution network structure.
  • the network structure parameter includes the type of the basic module used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of a neural network.
  • the basic module includes a first module.
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the reduction
  • the dimensional operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module,
  • the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the processor included in the foregoing image processing apparatus is further configured to execute the method in any implementation manner in the second aspect.
  • an image processing device including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute: The first operation of the user to turn on the camera is detected; in response to the first operation, a photographing interface is displayed on the display screen, the photographing interface includes a finder frame, and the finder frame includes the first image; the user instruction is detected The second operation of the camera; in response to the second operation, a second image is displayed in the viewing frame, the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target The image super-resolution network is used in the super-resolution processing.
  • the target image super-resolution network is a network determined by the image super-resolution network structure search in the search space, and the search space is constructed by the basic unit and network structure parameters
  • the search space is used to search the image super-resolution network structure.
  • the network structure parameters include the type of basic module used to construct the basic unit.
  • the basic unit is a kind of basic module obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes a first module, the first module is used to perform residual connection operation and dimensionality reduction operation on the first input feature map, the residual connection operation refers to the first input feature map and the
  • the feature map processed by the first module is subjected to feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale
  • the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the processor included in the foregoing image processing apparatus is further configured to execute the method in any one implementation manner in the third aspect.
  • a computer-readable medium stores program code for device execution, and the program code includes the program code for executing the first aspect to the third aspect and the first aspect to the third aspect.
  • the method in any of the implementations.
  • a computer program product containing instructions is provided.
  • the computer program product runs on a computer, the computer executes any one of the first aspect to the third aspect and the first aspect to the third aspect. The method in the way.
  • a chip in a ninth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes the first to third aspects and the first to third aspects. The method in any one of the third aspects.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory.
  • the processor is configured to execute the method in any one of the foregoing first aspect to the third aspect and the first aspect to the third aspect.
  • FIG. 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of yet another application scenario provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a convolutional neural network structure provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another convolutional neural network structure provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a neural network search method provided by the implementation of this application.
  • FIG. 12 is a schematic diagram of a target image super-resolution network provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a first module provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another first module provided by an embodiment of the present application.
  • 15 is a schematic structural diagram of still another first module provided by an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a rearrangement operation provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a second module provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a third module provided by an embodiment of the present application.
  • FIG. 19 is a schematic diagram of channel exchange processing provided by an embodiment of the present application.
  • FIG. 20 is a schematic diagram of a search image super-resolution network provided by an embodiment of the present application.
  • FIG. 21 is a schematic diagram of network training through a multi-level weighted joint loss function provided by an embodiment of the present application.
  • FIG. 22 is a schematic diagram of a network structure search based on an evolutionary algorithm provided by an embodiment of the present application.
  • FIG. 23 is a schematic diagram of an effect after image processing is performed through the target super-resolution network of an embodiment of the present application.
  • FIG. 24 is a schematic diagram of an effect after image processing is performed through the target super-resolution network of an embodiment of the present application.
  • FIG. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 26 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 27 is a schematic diagram of a group of display interfaces provided by an embodiment of the present application.
  • FIG. 28 is a schematic diagram of another set of display interfaces provided by an embodiment of the present application.
  • Fig. 29 is a schematic block diagram of a neural network search device according to an embodiment of the present application.
  • FIG. 30 is a schematic block diagram of an image processing device according to an embodiment of the present application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of the artificial intelligence system and is suitable for general artificial intelligence field requirements.
  • Intelligent Information Chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • Infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and realizes support through the basic platform.
  • the infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
  • the smart chip here can be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processing unit (GPU), and an application specific integrated circuit (application specific).
  • Hardware acceleration chips such as integrated circuit (ASIC) and field programmable gate array (FPGA).
  • the basic platform of infrastructure can include distributed computing framework and network and other related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
  • data can be obtained through sensors and external communication, and then these data can be provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • This data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • the above-mentioned data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the decision-making process of intelligent information after reasoning, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, safe city, smart terminal, etc.
  • Application scenario 1 Smart terminal camera field
  • the method for searching the neural network structure of the embodiment of the present application can be applied to a smart terminal device (for example, a mobile phone) for real-time image super-resolution technology.
  • the method for searching the neural network structure of the embodiment of the present application can determine the target image super-resolution network applied to the field of smart terminal shooting.
  • the target image super-resolution network when a user uses a smart terminal to photograph long-distance objects or small objects, the resolution of the captured image is relatively low and the details are not clear.
  • the user can use the target image super-resolution network provided by the embodiments of the present application to implement image super-resolution processing on the smart terminal, so that low-resolution images can be converted into high-resolution images, so that the photographed objects are clearer.
  • this application proposes an image processing method applied to an electronic device with a display screen and a camera.
  • the method includes: detecting a user's first operation for turning on the camera; in response to the first operation, A photographing interface is displayed on the display screen, the photographing interface includes a finder frame, and the finder frame includes a first image; a second operation instructed by the user to the camera is detected; in response to the second operation, A second image is displayed in the viewing frame, and the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target super-resolution neural network is applied to the super-resolution Rate in the process.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation
  • the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing.
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  • the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
  • the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1 ⁇ 1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, the M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the second module is used to perform channel compression operations on the second input feature map
  • the channel compression operation refers to a convolution operation with
  • the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network,
  • the first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the kth basic unit of the first image super-resolution network
  • the loss value refers to the kth
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function
  • the performance parameter includes a peak-to-noise ratio
  • the peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
  • target image super-resolution network applied to the camera field of the smart terminal provided by the embodiments of the present application is also applicable to the expansion of the target image super-resolution network related content in the following related embodiments in FIGS. 10 to 22 , Definitions, explanations and descriptions, not repeated here.
  • FIG. 2 a schematic diagram of a target image super-resolution network applied to a smart terminal.
  • the smart terminal 210 for example, a mobile phone
  • the low-resolution image obtained is 220 or 220
  • the image 230, the super-resolution 240 (SR) shown in FIG. 2 may be the target image super-resolution network in the embodiment of the present application, and the target image can be obtained after the target image super-resolution network is processed, for example After the image 220 is subjected to super-resolution processing, the super-resolution image 221 can be obtained; after the image 230 is subjected to the super-resolution processing, the super-resolution image 231 can be obtained.
  • the smart terminal 210 may be an electronic device with a camera.
  • the smart terminal may be a mobile phone with image processing functions, a tablet personal computer (TPC), a media player, a smart TV, or a laptop.
  • TPC tablet personal computer
  • PDA personal digital assistant
  • PC personal computer
  • camera video camera
  • smart watch wearable device
  • WD wearable device
  • the neural network search method of the embodiment of the present application can be applied to the security field.
  • pictures (or videos) collected by monitoring equipment in public places are often affected by factors such as weather and distance, and have problems such as blurred images and low resolution.
  • the super-resolution network of the target image can perform super-resolution reconstruction of the collected pictures, which can restore important information such as license plate numbers and clear faces for public security personnel, and provide important clues for case detection.
  • this application provides an image processing method, the method includes: acquiring a street view image; performing super-resolution processing on the street view image according to the target image super-resolution network to obtain a super-resolution image of the street view image; The super-resolution image of the street view image, and the information in the super-resolution image is recognized.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • target image super-resolution network applied to the security field provided by the embodiments of the present application is also applicable to the expansion, limitation, and definition of the target image super-resolution network related content in the following related embodiments in FIGS. 10 to 22. Explanation and description are not repeated here.
  • the neural network search method of the embodiment of the present application can be applied to the field of medical imaging.
  • the target image super-resolution network can perform super-resolution reconstruction of medical images, which can reduce the requirements for the imaging environment without increasing the cost of high-resolution imaging technology, and realize the recovery of clear medical images through the restoration of clear medical images. Accurate detection of cells helps doctors make a better diagnosis of the patient's condition.
  • this application provides an image processing method, the method includes: acquiring a medical image frame; performing super-resolution processing on the medical image frame according to the target image super-resolution network to obtain the super-resolution of the medical image frame Image; according to the super-resolution image of the medical imaging screen to identify and analyze the information in the super-resolution image.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by the module is subjected to feature addition processing.
  • the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, and the second scale is smaller than the first scale.
  • a scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • target image super-resolution network applied to the medical imaging field provided by the embodiments of the present application is also applicable to the expansion and limitation of the related content of the target image super-resolution network in the following related embodiments in FIGS. 10 to 22. , Explanation and description, I won’t repeat them here.
  • the neural network search method of the embodiment of the present application can be applied to the field of image compression.
  • the picture can be compressed in advance before transmission, waiting for the transmission to be completed, and then decoded by the receiving end through the super-resolution reconstruction technology of the target image super-resolution network to restore the original
  • the image sequence greatly reduces the space required for storage and the bandwidth required for transmission.
  • the present application provides an image processing method, which includes: acquiring a compressed image; performing super-resolution processing on the compressed image according to the target image super-resolution network to obtain a super-resolution image of the compressed image;
  • the super-resolution image of the compressed image identifies the information in the super-resolution image.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • target image super-resolution network applied in the field of image compression provided by the embodiments of this application is also applicable to the expansion and limitation of the related content of the target image super-resolution network in the following related embodiments in FIG. 10 to FIG. 22 , Explanation and description, I won’t repeat them here.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression: among them, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • the definition of these parameters in the DNN is as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolution layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way to extract image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the neural network can use an error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal neural network model parameters, such as the weight matrix.
  • NAS Neural Architecture Search
  • search space, search strategy and performance evaluation strategy are the core elements of NAS algorithm.
  • the search space can refer to the set of searched neural network structures, that is, the solution space. In order to improve search efficiency, sometimes the search space is limited or simplified.
  • the network is divided into basic units (cells, or blocks), and a more complex network is formed by stacking these units.
  • the basic unit is composed of multiple nodes (layers of the neural network), which appear repeatedly in the entire network but have different weight parameters.
  • the search strategy can refer to the process of finding the optimal network structure in the search space.
  • the search strategy defines how to find the optimal network structure. It is usually an iterative optimization process, which is essentially a hyperparameter optimization problem.
  • the performance evaluation strategy may refer to evaluating the performance of the searched network structure.
  • the goal of the search strategy is to find a neural network structure, and the performance of the searched network structure can be evaluated through the performance evaluation strategy.
  • Fig. 6 shows a system architecture 300 provided by an embodiment of the present application.
  • the data collection device 360 is used to collect training data.
  • the target image super-resolution network is determined by the neural network search method of the embodiment of the application, the target super-resolution network can be further trained through the training image, that is, data collection
  • the training data collected by the device 360 may be training images.
  • the training images may include sample images and super-resolution images corresponding to the sample images.
  • the sample images may refer to low-resolution images, for example, low-resolution images may refer to image paintings. Image with unclear quality and blurry picture.
  • the data collection device 360 stores the training data in the database 330, and the training device 320 obtains the target model/rule 301 based on the training data maintained in the database 330.
  • the training device 320 processes the input original image and compares the output image with the original image until the output image of the training device 320 is different from the original image. The difference is less than a certain threshold, thereby completing the training of the target model/rule 301.
  • the target image super-resolution network used for image super-resolution processing in the image processing method provided in this application can be obtained by training the loss between the predicted super-resolution image of the sample image and the sample super-resolution image.
  • the trained network makes the difference between the predicted super-resolution image and the sample super-resolution image obtained by inputting the sample image into the target image super-resolution network to be less than a certain threshold, thereby completing the training of the target image super-resolution network.
  • the above-mentioned target model/rule 301 can be used to implement the image processing method of the embodiment of the present application.
  • the target model/rule 301 in the embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 330 may not all come from the collection of the data collection device 360, and may also be received from other devices.
  • the training device 320 does not necessarily perform the training of the target model/rule 301 completely based on the training data maintained by the database 330. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application. Limitations of Examples.
  • the target model/rule 301 trained according to the training device 320 can be applied to different systems or devices, such as the execution device 310 shown in FIG. 6, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR)/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers, or cloud, etc.
  • the execution device 310 is configured with an input/output (input/output, I/O) interface 312 for data interaction with external devices.
  • the user can input data to the I/O interface 312 through the client device 340.
  • the input data in this embodiment of the application may include: the image to be processed input by the client device.
  • the preprocessing module 313 and the preprocessing module 314 are used for preprocessing according to the input data (such as the image to be processed) received by the I/O interface 312.
  • the input data such as the image to be processed
  • the execution device 310 When the execution device 310 preprocesses the input data, or when the calculation module 311 of the execution device 310 performs calculations and other related processing, the execution device 310 can call data, codes, etc. in the data storage system 350 for corresponding processing , The data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 350.
  • the I/O interface 312 returns the processing result, such as the predicted depth image obtained as described above, to the client device 340 to provide it to the user.
  • the training device 320 can generate corresponding target models/rules 301 based on different training data for different goals or different tasks, and the corresponding target models/rules 301 can be used to achieve the above goals or complete The above tasks provide the user with the desired result.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 312.
  • the client device 340 can automatically send input data to the I/O interface 312. If the client device 340 is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the client device 340.
  • the user can view the result output by the execution device 310 on the client device 340, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 340 can also be used as a data collection terminal to collect the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as new sample data, and store it in the database 330 as shown.
  • the I/O interface 312 directly uses the input data input to the I/O interface 312 and the output result of the output I/O interface 312 as a new sample as shown in the figure.
  • the data is stored in the database 330.
  • FIG. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 350 is an external memory relative to the execution device 310. In other cases, the data storage system 350 may also be placed in the execution device 310.
  • the target model/rule 301 is trained according to the training device 320.
  • the target model/rule 301 may be the neural network in this application in the embodiment of this application.
  • the neural network provided in the embodiment of this application It can be CNN, deep convolutional neural networks (deep convolutional neural networks, DCNN), etc.
  • CNN is a very common neural network
  • the structure of CNN will be introduced in detail below in conjunction with Figure 7.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • a deep learning architecture refers to a machine learning algorithm. Multi-level learning is carried out on the abstract level of
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (wherein the pooling layer is optional), and a neural network 430.
  • the input layer 410 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 420 and the subsequent neural network layer 430 for processing, and the image processing result can be obtained.
  • the following describes the internal layer structure of CNN 400 in Fig. 7 in detail.
  • the convolutional layer/pooling layer 420 may include layers 421-426, for example: in one implementation, layer 421 is a convolutional layer, layer 422 is a pooling layer, and layer 423 is a convolutional layer. Build layers, 424 layers are pooling layers, 425 are convolutional layers, and 426 are pooling layers; in another implementation, 421 and 422 are convolutional layers, 423 are pooling layers, and 424 and 425 are convolutional layers. Layer, 426 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 421 can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. Etc., it depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices with the same size (row ⁇ column) are used. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract the edge information of the image, another weight matrix is used to extract the specific color of the image, and another weight matrix is used to correct the unwanted images in the image. The noise is blurred and so on.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are combined to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 400 can make correct predictions. .
  • the initial convolutional layer (such as 421) often extracts more general features, which can also be called low-level features;
  • the features extracted by the subsequent convolutional layers (such as 426) become more complex, for example, features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the only purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • Neural network layer 430
  • the convolutional neural network 400 After processing by the convolutional layer/pooling layer 420, the convolutional neural network 400 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 420 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 400 needs to use the neural network layer 430 to generate one or a group of required classes of output. Therefore, the neural network layer 430 can include multiple hidden layers (431, 432 to 43n as shown in FIG. 7) and an output layer 440. The parameters contained in the hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image detection, and image super-resolution reconstruction.
  • the output layer 440 After the multiple hidden layers in the neural network layer 430, that is, the final layer of the entire convolutional neural network 400 is the output layer 440.
  • the output layer 440 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • a convolutional neural network (CNN) 500 may include an input layer 510, a convolutional layer/pooling layer 520 (wherein the pooling layer is optional), and a neural network 530.
  • CNN convolutional neural network
  • FIG. 7 multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 520 in FIG. 8 are parallel, and the respectively extracted features are input to the full neural network layer 530 for processing.
  • the convolutional neural network shown in FIG. 7 and FIG. 8 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of this application. In specific applications, this application implements The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
  • FIG. 9 is a hardware structure of a chip provided by an embodiment of the application.
  • the chip includes a neural network processor 600.
  • the chip can be set in the execution device 310 as shown in FIG. 6 to complete the calculation work of the calculation module 311.
  • the chip can also be set in the training device 320 as shown in FIG. 6 to complete the training work of the training device 320 and output the target model/rule 301.
  • the algorithms of each layer in the convolutional neural network as shown in FIG. 7 or FIG. 8 can all be implemented in the chip as shown in FIG. 9.
  • the neural network processor NPU 600 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks.
  • the core part of the NPU 600 is the arithmetic circuit 603.
  • the controller 604 controls the arithmetic circuit 603 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 603 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 603 is a two-dimensional systolic array. The arithmetic circuit 603 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 603 is a general-purpose matrix processor.
  • the arithmetic circuit 603 fetches the data corresponding to matrix B from the weight memory 602 and caches it on each PE in the arithmetic circuit 603.
  • the arithmetic circuit 603 fetches the matrix A data and matrix B from the input memory 601 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 608 (accumulator).
  • the vector calculation unit 607 can perform further processing on the output of the arithmetic circuit 603, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 607 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 607 can store the processed output vector to the unified memory 606.
  • the vector calculation unit 607 may apply a nonlinear function to the output of the arithmetic circuit 603, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 607 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 603, for example for use in subsequent layers in a neural network.
  • the unified memory 606 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 601 and/or the unified memory 606 through the storage unit access controller 605 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 602, And the data in the unified memory 606 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 610 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 609 through the bus.
  • An instruction fetch buffer 609 connected to the controller 604 is used to store instructions used by the controller 604.
  • the controller 604 is used to call the instructions cached in the instruction fetch memory 609 to control the working process of the computing accelerator.
  • the unified memory 606, the input memory 601, the weight memory 602, and the instruction fetch memory 609 are all on-chip memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate, synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • each layer in the convolutional neural network shown in FIG. 7 or FIG. 8 can be executed by the arithmetic circuit 603 or the vector calculation unit 607.
  • the execution device 310 in FIG. 6 introduced above can execute the neural network search method or image processing method of the embodiment of the present application.
  • the CNN model shown in FIGS. 7 and 8 and the chip shown in FIG. 9 can also be It is used to execute each step of the neural network search method or the image processing method of the embodiment of the present application.
  • an embodiment of the present application provides a system architecture 700.
  • the system architecture includes a local device 720, a local device 730, an execution device 710, and a data storage system 750.
  • the local device 720 and the local device 730 are connected to the execution device 710 through a communication network.
  • the execution device 710 may be implemented by one or more servers.
  • the execution device 710 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 710 may be arranged on one physical site or distributed on multiple physical sites.
  • the execution device 710 may use the data in the data storage system 750 or call the program code in the data storage system 750 to implement the method for searching the neural network structure of the embodiment of the present application.
  • execution device 710 may also be referred to as a cloud device, and in this case, the execution device 710 may be deployed in the cloud.
  • the execution device 710 may perform the following process: construct a basic unit, the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network, the basic module includes a first module, and the first module A module is used to perform a dimensionality reduction operation and a residual connection operation on the first input feature map.
  • the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, so The second scale is smaller than the first scale, and the residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the first module
  • the scale of the processed feature map is the same as the scale of the first input feature map;
  • a search space is constructed according to the basic unit and network structure parameters, wherein the network structure parameter includes the basic module used to construct the basic unit Type, the search space is used to search the image super-resolution network structure, the basic unit is a basic module used to construct the image super-resolution network;
  • the image super-resolution network structure search is performed in the search space to determine the target image A super-resolution network, the target image super-resolution network is used to perform super-resolution processing on an image to be processed, the target image super-resolution network includes at least the first module, and the target image super-resolution network is A network where the amount of calculation is less than the first
  • a target neural network can be obtained through a network structure search (neural architecture search, NAS), and the target neural network can be used for image super-resolution processing.
  • NAS network structure search
  • the foregoing method for the execution device 710 to search the network structure may be an offline search method executed in the cloud.
  • the user can operate respective user devices (for example, the local device 720 and the local device 730) to interact with the execution device 710.
  • Each local device can represent any computing device, for example, a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other types of cellular phones, a media consumption device, a wearable device, a set-top box, a game console, etc.
  • the local device of each user can interact with the execution device 710 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.
  • the local device 720 and the local device 730 may obtain the relevant parameters of the target neural network from the execution device 710, deploy the target neural network on the local device 720 and the local device 730, and use the target neural network to perform image processing. Super resolution processing and so on.
  • the target neural network can be directly deployed on the execution device 710.
  • the execution device 710 obtains the image to be processed from the local device 720 and the local device 730, and performs image super-resolution processing on the image to be processed according to the target neural network.
  • the aforementioned target neural network may be the target image super-resolution network in the embodiment of the present application.
  • the neural network search method of the embodiment of the present application will be described in detail below in conjunction with FIG. 11.
  • the method shown in FIG. 11 can be executed by a neural network search device.
  • the neural network search device can be a computer, a server, and other devices with sufficient computing power for neural network search.
  • the method 800 shown in FIG. 11 includes steps 810 to 830, which will be described in detail below.
  • Step 810 Construct a basic unit.
  • the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network.
  • the basic module includes a first module, which is used to perform a dimensionality reduction operation on the first input feature map And the residual connection operation, the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, the second scale is smaller than the first scale, and the residual connection operation is used to convert the first
  • the input feature map is subjected to feature addition processing with the feature map processed by the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the basic unit may be a network structure obtained by connecting basic modules through the basic operations of a neural network.
  • the above-mentioned network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network. , These basic operations or combinations of basic operations can be collectively referred to as basic operations.
  • basic operations can refer to convolution operations, pooling operations, residual connections, etc.
  • connections between basic modules can be made to obtain the network structure of the basic unit.
  • the above-mentioned basic unit may be a basic module used to construct an image super-resolution network.
  • the target image super-resolution network may include three major parts: The change part and the reconstruction part.
  • the feature extraction module is used to obtain the image features of the image to be processed.
  • the image to be processed may be a low resolution image (LR); the nonlinear transformation part is used to input
  • the image features of the image are transformed, and the image features are mapped from the first feature space to the second feature space.
  • the first feature space refers to the feature space where the image to be processed is extracted.
  • the second high-dimensional space is easier to reconstruct super Sub-image;
  • the reconstruction part is used to perform up-sampling and convolution processing on the image features output by the non-linear change part to obtain a super-resolution image corresponding to the image to be input.
  • the non-linear transformation part of the network structure can be searched in the search space by means of NAS.
  • the first input feature map input to the first module is at the first scale, and is transformed to the second scale after the dimensionality reduction operation, and the first input feature map of the second scale is subjected to the dimension upgrade operation Transform to the third scale, the third scale is located between the first scale and the second scale.
  • the dimension reduction operation of the feature map is performed to reduce the dimension to the same scale as the first input feature map of the third scale, that is, the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the above-mentioned basic unit cell may be a network obtained by connecting basic modules according to the basic operation of a neural network.
  • the basic module may include a first module, the first module may be a scale module (contextual residual dense block, CRDB), and the scale module may be used to perform dimensionality reduction and residual connection operations on the first input feature map, that is, scale
  • the module may include a pooling sub-module and residual connection for processing the first input feature map.
  • the dimensionality reduction operation can reduce the scale of the first input feature map, where the dimensionality reduction operation can refer to a pooling operation on the first input feature map, or the dimensionality reduction operation can also refer to the first input feature map.
  • the input feature map is subjected to a convolution operation with a step length of Q, where Q is a positive integer greater than 1.
  • the above-mentioned residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, where the feature map processed by the first module may refer to the dimension upgrade operation After the feature map, the dimension increase operation refers to restoring the scale of the feature map after dimensionality reduction processing to the original first scale, and the residual connection operation can refer to the first input feature map and the dimension enhancement operation processed
  • the feature map performs feature addition processing.
  • the above-mentioned dimension increase operation may refer to an up-sampling operation, or the dimension increase operation may also refer to a backwards convolution operation (backwards strided convolution), where the up-sampling operation may refer to the use of an interpolation method, that is, in the original image
  • a suitable interpolation algorithm is used to insert new elements between pixels
  • the deconvolution operation can refer to the inverse process of the convolution operation, also known as transposed convolution.
  • feature addition may refer to adding information of different channel features for feature maps of the same scale.
  • the scale module can perform residual connection on the input feature map, that is, can perform feature addition processing on the first input feature map and the feature map processed by the first module, so as to realize that the first More local details in the input feature map are passed to the subsequent convolutional layer.
  • the scale module can be used to perform a dimensionality reduction operation on the first input feature map.
  • the dimensionality reduction operation can reduce the scale of the input feature map to reduce the amount of model calculation.
  • the residual connection operation can well transfer the information of the previous layer to the subsequent layer, which makes up for the dimensionality reduction operation information Missing defects.
  • the dimensionality reduction operation can also quickly expand the receptive field of features, allowing the prediction of high-resolution pixels to better consider contextual information, thereby improving the super-resolution accuracy.
  • the image super-resolution reconstruction technology refers to obtaining a high-resolution image by reconstructing a low-resolution image. Therefore, more local information of image features is needed in image super-resolution processing.
  • the commonly used image super-resolution network models do not use dimensionality reduction operations, mainly because dimensionality reduction operations will lose part of the local information of low-resolution input images. .
  • the information in the input feature map is better maintained in the entire network through residual connection operations and/or dense connection operations, that is, the information of the front layer can be well transmitted to the back This layer can compensate for the defect of information loss caused by dimensionality reduction operations.
  • the use of dimensionality reduction operations can not only reduce the amount of model calculations, but also expand the receptive field of features and improve the accuracy of the image super-resolution recognition network.
  • the specific form of the network structure of the scale module may be as shown in FIG. 13.
  • Three scale modules are shown in FIG. 13, which are the d-1th scale module, the dth scale module, and the d+1th scale module.
  • the d-th scale module may include a pooling sub-module, and the dimensionality reduction operation may be used to downsample the input feature map, thereby reducing the feature size.
  • the aforementioned dimensionality reduction operation may refer to a pooling operation, such as average pooling, or maximum pooling.
  • the residual connection can refer to the feature addition of the output feature map of the CRDB d-1 module and the processed feature map.
  • the processed feature map refers to the input feature
  • the graphs are feature maps obtained after pooling operation, 3 ⁇ 3 convolution operation, rectified linear unit (ReLU), dimension-up operation processing, and 1 ⁇ 1 convolution operation.
  • the scale module can also be used to perform dense connection operations on the first input feature map.
  • the dense connection operation can refer to the feature stitching of the output feature maps of each convolutional layer in the i-1 convolutional layers and the input feature maps. As the input feature map of the i-th convolutional layer.
  • the specific form of the network structure of the scale module may be as shown in FIG. 14.
  • the dense connection operation can achieve the largest information flow in the network, through each layer being connected to all layers before that layer, that is, the input of each layer is the splicing of the outputs of all the previous layers.
  • the dense connection operation the information in the input feature map (for forward calculation) or gradient (for backward calculation) is better maintained in the entire network, which can better compensate for the loss of information in the dimensionality reduction operation. That is, when performing image super-resolution processing, the residual connection operation and dense connection operation can ensure that the information in the feature map can be better transmitted to the later layers in the network structure.
  • the input feature map is processed by the dimensionality reduction operation. Downsampling is performed to reduce the feature size, so that the amount of model calculation can be reduced while ensuring the accuracy of image super-resolution processing.
  • feature splicing may refer to splicing M feature maps of the same scale into a feature map with K channels, where K is a positive integer greater than M.
  • the dense connection operation refers to transferring the output feature map of each layer to the subsequent layers, and the input of the latter layer is obtained by splicing the feature maps of the output of the previous layers.
  • the specific form of the network structure of the scale module may be as shown in FIG. 15.
  • the scale module can be used to perform residual connection operations, dimensionality reduction operations, convolution operations, and loop dense connection operations on the input feature map. That is, the scale module can include residual connections, pooling submodules, convolution submodules, and loops. Dense connection.
  • the cyclic dense connection operation can increase the depth of the scale module network structure, thereby improving the accuracy of super-resolution processing.
  • the recursive operation on the feature map of normal scale will quickly increase the amount of calculation, but the recursive operation on the feature map after the dimensionality reduction operation process increases the amount of calculation less.
  • the combination of a certain number of operations and dimensionality reduction operations can improve the super-resolution accuracy without increasing the amount of calculations and parameters.
  • the first module proposed in the embodiment of the application namely the scale module, can reduce the amount of calculation, reduce the parameter, expand the receptive field, and decouple the parameter amount and calculation amount.
  • the dimensionality reduction operation in the scale module can reduce the calculation amount of the network structure by reducing the scale of the feature map.
  • the calculation amount of the network model can be expressed by the number of floating point operations (FLOPs) per second.
  • FLOPs ori represents the calculation amount of the network model through normal convolution.
  • the pooling operation can reduce the calculation amount by 75%, and even if the three loop operations are added, the calculation amount is only restored to the original calculation amount.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to combining multiple first channel features of the first input feature map according to preset rules to generate a first Two-channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the rearrangement operation as shown in FIG. 16 may refer to merging 4 different first feature channels according to the rules from left to right and top to bottom, and merge the 4 positive channel feature maps into one first feature map.
  • Two-channel feature map the resolution of the second channel feature map is higher than that of the first channel feature map.
  • the rearrangement operation can be seen as converting multiple low-scale feature channels into one high-scale feature channel, thereby reducing the number of channels.
  • the parameter quantity of normal convolution param ori is:
  • Param ori N conv ⁇ G ⁇ C out ;
  • the parameter quantity of the standard module is param up :
  • the basic modules that construct the basic unit include a scale module.
  • the scale module can expand the receptive field through dimensionality reduction operations, so that the prediction of high-resolution pixels can better consider contextual information; another
  • the common hyperdivision methods do not use dimensionality reduction operations, the scale of the input feature map in the entire non-linear change part will not change, resulting in a linear relationship between the parameter amount and the calculation amount.
  • the scale module proposed in the embodiment of the present application uses a dimensionality reduction operation to make the parameter amount and the calculation amount relatively independent, giving more possibilities for the search algorithm in the NAS.
  • the basic module that constructs the basic unit includes a second module and/or a third module in addition to the aforementioned first module, that is, the standard module.
  • the second module and the third module further included in the basic module will be described in detail below in conjunction with FIGS. 17 to 19.
  • the basic module may further include a second module, and the second module may be a compact module (shrink residual dense block, SRDB).
  • the compact module may refer to channel compression processing on the basis of the residual dense block (RDB), so as to achieve the retention of dense connections and effectively reduce the amount of model parameters.
  • the compaction module is used to perform channel compression operations, residual connection operations, and dense connection operations on the second input feature map.
  • the channel compression operation may refer to the second input feature map with a 1 ⁇ 1 convolution kernel.
  • Product operation may refer to the second input feature map with a 1 ⁇ 1 convolution kernel.
  • the second input characteristic map may refer to the characteristic map output by the previous basic unit of the basic unit; when the second module is not the first module in the basic unit In the case of a module, the second input feature map may refer to a feature map output after processing by the previous module of the module.
  • the first input feature map, the second input feature map, and the third input feature map all correspond to the same image to be processed.
  • the network structure of the compaction module can be as shown in Figure 17.
  • Figure 17 shows three compaction modules, namely the d-1th compaction module, the dth compaction module and the d+1th compaction module.
  • a compact module the 1x1 convolution kernel can be used to compress the number of channels of the feature map, and then the feature transformation of 3x3 convolution can be used to form a compact residual dense module can be referred to as a compact module, which can realize the condition of retaining dense links Significantly reduce the number of parameters.
  • the basic module may further include a third module, and the third module may refer to a group residual dense block (GRDB).
  • the grouping module may refer to dividing the convolution operation into multiple groups on the basis of the residual intensive module to calculate separately, thereby helping to reduce the model parameters.
  • the grouping module may be a module for performing channel switching operations, residual connection operations, and dense connection operations on the third input feature map.
  • the third input feature map includes M sub-feature maps, and each sub-feature map is
  • the feature map includes at least two adjacent channel features, and the channel exchange processing may refer to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the M sub feature maps correspond to different sub feature maps.
  • the channel characteristics of are adjacent, and M is an integer greater than 1.
  • the network structure of the grouping module may be as shown in FIG. 18, which shows three grouping modules, namely, the d-1th grouping module, the dth grouping module, and the d+1th grouping module.
  • the group convolution is directly used, the single channel feature of the output layer can only accept the features of the previous convolutional layer, which is not conducive to the channel feature Collaborate with each other. Therefore, in the embodiment of the present application, a channel shuffle operation is added to the residual intensive module, so that the residual intensive module constituting the packet may be referred to as the packet module, thereby effectively reducing the amount of network parameters.
  • the third input feature map includes three sub-feature maps 1, 2, and 3, and each sub-feature map includes 3 adjacent channel features.
  • the channel exchange can be to make the same sub-feature
  • the originally adjacent channel features in the figure are reordered, so that the channel features corresponding to different sub-feature maps are adjacent.
  • the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network.
  • a cell as shown in FIG. 12 can be a basic unit, and the basic unit is used for The basic module for constructing image super-resolution network.
  • the basic module is used to construct the basic unit.
  • each basic unit cell can be obtained by connecting different basic modules through the basic operation of the neural network.
  • the basic module can include the above-mentioned first module, second module, and third module. One or more of the modules.
  • Step 820 Construct a search space according to the basic unit and network structure parameters, where the network structure parameter includes the type of the basic module used to construct the basic unit, and the search space is a search space for searching the image super-resolution network structure.
  • the network parameters may include:
  • the types of basic modules can include three different types: the first module, the second module, and the third module.
  • C represents the first module or standard module
  • S represents the second module or compact module
  • G represents the third module.
  • the module is the grouping module.
  • the number of convolutional layers may be ⁇ 4, 6, 8 ⁇ .
  • the number of channels can be ⁇ 16,24,32,48 ⁇ .
  • the number of output channels of a basic unit can be ⁇ 16,24,32,48 ⁇ .
  • the status of the basic unit 1 means that the current node is connected to the network, and 0 means that the current node is not connected to the network.
  • the search space obtained by constructing the basic unit of the basic module can select the candidate network structure in the type of the given basic module, which is equivalent to discretizing the continuous search space, which can effectively reduce the search space. the size of.
  • Step 830 Perform an image super-resolution network structure search in the search space to determine the target image super-resolution network.
  • the target image super-resolution network is used to perform super-resolution processing on the image to be processed, and the target image super-resolution network includes at least the first A module, the target image super-resolution network is a network whose calculation amount is less than a first preset threshold and the image super-resolution accuracy is greater than a second preset threshold.
  • image super-resolution network structure search in the search space to determine the target image super-resolution network may refer to the search in the search space through algorithmic search to determine the network structure that meets the constraints, or it may also refer to Manual search selects the network structure that meets the constraints in the search space.
  • the constraint condition may mean that the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold, so that when the computing performance of the mobile device is limited, the image The accuracy of the super-resolution network of the target image processed by the super-resolution is higher.
  • the constraint condition may mean that the amount of calculation is less than the first preset threshold, the image super-resolution accuracy is greater than the second preset threshold, and the parameter amount is less than the third preset threshold.
  • common search algorithms may include but are not limited to the following algorithms: random search, Bayesian optimization, evolutionary algorithm, reinforcement learning, gradient-based algorithm, and so on.
  • random search for the specific process of the method for searching the image super-resolution network structure in the search space, reference may be made to the prior art. For brevity, detailed descriptions of all search methods are omitted in this application.
  • an evolutionary algorithm can be used to search for a lightweight, fast, and high-precision super-divided network structure by targeting the parameter amount, calculation amount, and model effect (PSNR) of the network model.
  • PSNR model effect
  • the process of performing a network search in the search space to determine the target image super-resolution network includes the following steps: performing a network search in the search space through an evolutionary algorithm to determine the first image super-resolution network; using a multi-level weighted joint loss function Perform back-propagation iterative training on the first image super-resolution network to determine the target image super-resolution network, wherein the multi-level weighted joint loss function is based on the output of each basic unit in the first image super-resolution network The loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map is determined.
  • the first image super-resolution network determined by the evolutionary algorithm can be subjected to secondary training through the multi-level weighted joint loss function, and finally the parameters of the target image super-resolution network can be determined to obtain the target Image super-resolution network.
  • searching for the target image super-resolution network in the search space through the evolutionary algorithm includes the following steps: randomly generating P candidate network structures according to the basic unit; training the P candidate network structures using a multi-level weighted joint loss function; evaluating training The performance parameters of each candidate network structure in the next P candidate network structures, the performance parameters include peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the predicted super-division image and sample super-division obtained by each candidate network structure The difference between images; the first image super-resolution network is determined according to the performance parameters of the candidate network.
  • the evolutionary algorithm execution process may include the following steps:
  • the first step randomly generate P individuals (ie candidate network structures), and the P candidate network structures are the initial population;
  • Step 2 Evaluate the fitness (ie performance parameters) of each network structure, including the amount of parameters, calculations, and accuracy.
  • the accuracy can be measured by the peak signal-to-noise ratio (PSNR);
  • Step 3 Select and update elite individuals, which can be regarded as network structures whose performance parameters meet preset conditions;
  • Step 4 Generate the next generation of individuals through crossover and mutation
  • Step 5 Repeat steps 2 to 4 until the evolutionary algorithm converges, and return to the last elite individual (that is, the first image super-resolution network).
  • the above-mentioned elite individuals may refer to a target network structure determined by an algorithm.
  • the multi-level weighted joint loss function proposed in this application can be used to train the network structure to be evaluated, and the peak value of the network structure is evaluated after the multi-level weighted joint loss function is trained. Signal-to-noise ratio.
  • the multi-level weighted joint loss function is obtained according to the following equation,
  • L can represent a multi-level weighted joint loss function
  • L k can represent the loss value of the kth layer of the first image super-resolution network
  • the loss value can refer to the predicted super-resolution image corresponding to the output feature map of the kth layer
  • the image loss between the super-resolution image and the sample, ⁇ k,t can represent the weight of the loss value of the k-th layer at time t.
  • the training level of the underlying basic unit may vary.
  • this embodiment of the application proposes a multi-level weighted joint loss function, that is, during training, the prediction super can be obtained according to the output feature map of each basic unit.
  • the resolution image is divided, and the loss value between the predicted super resolution image and the sample super resolution image is calculated, and the image loss value of each basic unit is weighted and the network is trained.
  • each intermediate layer image loss may change with time (or the number of iterations).
  • the loss function can combine the predicted image loss of each intermediate layer, and reflect the importance of different layers by weighting. Among them, the weight value of each intermediate layer image loss can change over time, which is conducive to more fully training the bottom layer
  • the parameters of the basic unit to improve the performance of the super-resolution network may change with time (or the number of iterations).
  • Table 1 shows the performance of the basic module of the basic unit proposed in this application by testing on the standard super-division data set.
  • Table 1 shows the experimental results of several image super-resolution network models constructed with the basic modules proposed in this application.
  • the number of floating point operations per second represents the amount of calculation of the network model, which is The number of floating-point operations per second can be used to describe the calculation amount of the neural network and evaluate the calculation efficiency of the model; parameters can be used to describe the parameters included in the neural network to evaluate the size of the model; SET5, SET14, B100 , Urban100 represents the name of different data sets.
  • the peak signal-to-noise ratio (PSNR) of the network model can be evaluated;
  • Baseline represents a small residual dense network.
  • the basic modules proposed in the embodiments of this application for example, include the standard module GRDN, compact module SRDN, and packet module CRDN) and the target image super-resolution network, that is, the efficient super-resolution network.
  • -resolution network, ESRN can effectively improve the accuracy of the model without changing the amount of parameters and calculations.
  • Table 2 is the test result of the multi-level weighted joint loss function proposed in the embodiment of the application.
  • Table 2 shows the experimental results of the deep convolutional network after applying the multi-level weighted joint loss function, where Joint loss represents the network model trained by the multi-level weighted loss function proposed in the embodiment of this application. It can be seen from Table 2 that training the image super-resolution network through the multi-level weighted joint loss function provided in the embodiment of the present application can effectively improve the accuracy of the image super-resolution network.
  • Table 3 is the result statistics of the image super-resolution network provided in the embodiment of the present application on the standard data set.
  • type 1 means that the running time of the image super-resolution model is Fast; the running time of the type 2 image super-resolution model is Very Fast; the model includes selecting deep networks with SUs, SelNet, and cascaded residual networks ( cascading residual network (CARN), mini cascading residual network (CARN-M), lightweight fast accurate and light super-resolution network (FALSR), FALSR-A and FALSR -B represents different network models;
  • ESRN represents the target image super-resolution network in the embodiment of this application, that is, an efficient super-resolution network, for example, it can be a fast efficient super-resolution network (ESRN-F) ), small efficient super-resolution network (ESRN-M). It can be seen from Table 3 that the target image super-resolution network provided by the embodiment of the present application and the calculation amount of the basic module and the image super-resolution accuracy are better than other network
  • Table 4 is the test results of the target image super-resolution network provided by the embodiments of the present application on different super-resolution scales.
  • a multiple of ⁇ 3 means that the output super-resolution image is 720p (1280 ⁇ 720) based on the super-resolution test of 3 times the scale
  • a multiple of ⁇ 4 means that the output super-resolution image is 720p (1280 ⁇ 720) 4 times the scale of the super-resolution test
  • models include super-resolution convolutional neural network (SRCNN), deep super-resolution network (very deep convolutional super-resolution network, VDSR), SelNet, CARN , CARN-M, ESRN, ESRN-F, ESRN-M.
  • the calculation of FLOPs in the above table 1 to table 3 is based on the output super-division image of 720p (1280x720) as an example, the x2 scale image super-resolution processing test results, from the data in Table 1 to Table 3 can be seen, the The neural network search method provided in the application embodiment can find a model with better super-resolution accuracy under different parameters.
  • the dimensionality reduction operation is introduced in the image super-resolution network provided by the embodiment of this application, it is also possible to search for a fast medium-parameter model by constraining the calculation amount FLOPs of the model to ensure the image super-resolution effect In the case of higher than the FALSR-A model, the calculation amount can be reduced by nearly half.
  • Table 5 is the test result of the running time of the target image super-resolution network provided by the embodiment of the present application. It can be seen from Table 5 that the super-resolution network obtained by the neural network search method in the embodiment of the present application not only has high accuracy, but also has high operating efficiency.
  • FIG. 23 and FIG. 24 are effect diagrams of image super-resolution processing performed by the target image super-resolution network determined by the neural network search method of the embodiment of the present application.
  • FIG. 23 and FIG. 24 show the image effect after the image super-resolution network constructed by the basic module proposed in this application performs image resolution processing.
  • FIG. 23 shows the visual effect diagram of the images in the Set14 data set after the super-resolution processing.
  • Figure 24 shows the visual effect of the images in the Urban100 dataset after super-resolution processing.
  • HR high resolution networks
  • LapSRN deep laplacian pyramid networks for super-resolution
  • LapSRN bicubic interpolation networks
  • CARN-M CARN
  • VDSR ESRN-M
  • ESRN ESRN
  • the image super-resolution network obtained by the neural network search method proposed in this application can not only reduce the amount of network parameters and the amount of calculation, but also can effectively improve the visual effect of the image super-division, making the edge of the super-division image clearer.
  • FIG. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the method 900 shown in FIG. 25 includes step 910 and step 920, and step 910 and step 920 will be described in detail below.
  • Step 910 Obtain an image to be processed.
  • the image to be processed may be an image captured by the electronic device through a camera, or the image to be processed may also be an image obtained from within the electronic device (for example, an image stored in an album of the electronic device, or the electronic device from the cloud Image obtained).
  • Step 920 Perform super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image, where the target image is a super-resolution image corresponding to the image to be processed.
  • the aforementioned target image super-resolution network may be obtained according to the method shown in FIG. 11.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space.
  • the search space is constructed by basic units and network structure parameters.
  • the search space is used to search for the image super-resolution network structure.
  • the structural parameters include the type of the basic module used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes the first module, which is used to A residual connection operation and a dimensionality reduction operation are performed on an input feature map.
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimensionality reduction operation is used To transform the scale of the first input feature map from the original first scale to the second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, and the first The scale of the feature map processed by the module is the same as the scale of the first input feature map.
  • the basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation
  • the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing.
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  • the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
  • the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1 ⁇ 1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the second module is used to perform channel compression operations on the second input feature map
  • the channel compression operation refers to a convolution operation with
  • the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network,
  • the first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the kth basic unit of the first image super-resolution network
  • the loss value refers to the kth
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function
  • the performance parameter includes a peak-to-noise ratio
  • the peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
  • FIG. 26 is a schematic flowchart of an image display method provided by an embodiment of the present application.
  • the method 1000 shown in FIG. 26 includes steps 1010 to 1040, and these steps will be described in detail below.
  • Step 1010 The first operation used by the user to turn on the camera is detected.
  • Step 1020 In response to the first operation, display a photographing interface on the display screen, and display a photographing interface on the display screen.
  • the photographing interface includes a viewfinder frame, and the viewfinder frame includes a first image.
  • the user's shooting behavior may include a first operation of the user to turn on the camera; in response to the first operation, displaying a shooting interface on the display screen.
  • FIG. 27 shows a graphical user interface (GUI) of the mobile phone, and the GUI is the desktop 1110 of the mobile phone.
  • GUI graphical user interface
  • the electronic device detects that the user has clicked the icon 1120 of the camera application (application, APP) on the desktop 1110, it can start the camera application and display another GUI as shown in (b) in Figure 27, which can be called It is the shooting interface 1130.
  • the shooting interface 1130 may include a viewing frame 1140. In the preview state, the preview image can be displayed in the viewfinder frame 1140 in real time.
  • a first image may be displayed in the view frame 1140, and the first image is a color image.
  • the shooting interface may also include a control 1150 for indicating the shooting mode, and other shooting controls.
  • the user's shooting behavior may include a first operation of the user to turn on the camera; in response to the first operation, displaying a shooting interface on the display screen.
  • the shooting interface may include a viewfinder frame. It is understandable that the size of the viewfinder frame may be different in the photo mode and the video mode.
  • the viewfinder frame may be the viewfinder frame in the photo mode. In video mode, the viewfinder frame can be the entire display screen.
  • the preview state that is, before the user turns on the camera and does not press the photo/video button, the preview image can be displayed in the viewfinder in real time.
  • the preview image may be a color image
  • the preview image may be an image displayed when the camera is set to automatic resolution
  • Step 1030 Detect the second operation of the camera instructed by the user.
  • the first processing mode may be a professional shooting mode (for example, a super-resolution shooting mode).
  • the shooting interface includes a shooting option 1160.
  • the electronic device displays a shooting mode interface.
  • the electronic device detects that the user clicks on the shooting mode interface to indicate the professional shooting mode 1161, the mobile phone enters the professional shooting mode.
  • the electronic device detects a second operation 1170 used by the user to instruct shooting in a low-light environment.
  • the second operation used by the user to instruct the shooting behavior may include pressing the shooting button in the camera of the electronic device, or may include the user equipment instructing the electronic device to perform the shooting behavior through voice, or may also include other instructions from the user.
  • the device performs shooting behavior.
  • Step 1040 In response to the second operation, display a second image in the viewing frame, where the second image is an image after super-resolution processing is performed on the first image collected by the camera, where , The target image is a super-resolution image corresponding to the image to be processed.
  • the aforementioned target image super-resolution network may be obtained according to the method shown in FIG. 11.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in a search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search for image super-resolution
  • the network structure, the network structure parameter includes the type of the basic module used to construct the basic unit
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network
  • the basic module includes the first A module
  • the first module is used to perform residual connection operations and dimensionality reduction operations on a first input feature map
  • the residual connection operation refers to combining the first input feature map with the first module processing
  • the latter feature map is subjected to feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale
  • the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale
  • the second image is displayed in the viewfinder frame in Figure 28(d), and the first image is displayed in the viewfinder frame in Figure 28(c).
  • the content of the second image and the first image are the same or substantially the same , but the quality of the second image is better than that of the first image. For example, the resolution of the second image is higher than that of the first image.
  • the basic unit is a basic module used to construct an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation
  • the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing.
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  • the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
  • the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1 ⁇ 1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, the M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the second module is used to perform channel compression operations on the second input feature map
  • the channel compression operation refers to a convolution operation with
  • the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network,
  • the first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the kth basic unit of the first image super-resolution network
  • the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the output feature map of the basic unit
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first The number of the basic units included in the image super-resolution network, where N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function
  • the performance parameter includes a peak-to-noise ratio
  • the peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
  • the neural network search device in the embodiment of this application can execute the various neural network search methods of the aforementioned embodiment of this application
  • the image processing device can execute the aforementioned various image processing methods of the embodiment of this application, namely the following For the specific working process of this product, refer to the corresponding process in the foregoing method embodiment.
  • FIG. 29 is a schematic diagram of the hardware structure of a neural network search device provided by an embodiment of the present application.
  • the neural network search device 1200 shown in FIG. 29 (the device 1200 may specifically be a computer device) includes a memory 1201, a processor 1202, a communication interface 1203, and a bus 1204. Among them, the memory 1201, the processor 1202, and the communication interface 1203 implement communication connections between each other through the bus 1204.
  • the memory 1201 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1201 may store a program.
  • the processor 1202 is configured to execute each step of the neural network search method of the embodiment of the present application, for example, execute each step shown in FIG. 11 .
  • the neural network search device shown in the embodiment of the present application may be a server, for example, it may be a cloud server, or may also be a chip configured in a cloud server.
  • the processor 1202 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the neural network search method in the method embodiment of the present application.
  • the processor 1202 may also be an integrated circuit chip with signal processing capability.
  • the various steps of the neural network search method of the present application can be completed by hardware integrated logic circuits in the processor 1202 or instructions in the form of software.
  • the above-mentioned processor 1202 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1201, and the processor 1202 reads the information in the memory 1201, and combines its hardware to complete the functions required by the units included in the search device of the neural network, or execute the method shown in FIG. 11 of the method embodiment of the application.
  • the neural network search method shown shown.
  • the communication interface 1203 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
  • the bus 1204 may include a path for transferring information between various components of the device 1200 (for example, the memory 1201, the processor 1202, and the communication interface 1203).
  • FIG. 30 is a schematic diagram of the hardware structure of an image processing apparatus according to an embodiment of the present application.
  • the image processing apparatus 1300 shown in FIG. 30 includes a memory 1301, a processor 1302, a communication interface 1303, and a bus 1304.
  • the memory 1301, the processor 1302, and the communication interface 1303 implement communication connections between each other through the bus 1304.
  • the memory 1301 may be ROM, static storage device and RAM.
  • the memory 1301 may store a program.
  • the processor 1302 and the communication interface 1303 are used to execute each step of the image processing method of the embodiment of the present application. For example, FIG. 25 and FIG. 26 shows the various steps of the image processing method.
  • the processor 1302 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits to execute related programs to realize the functions required by the units in the image processing apparatus of the embodiment of the present application. Or execute the image processing method in the method embodiment of this application.
  • the processor 1302 may also be an integrated circuit chip with signal processing capability.
  • each step of the image processing method in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 1302 or instructions in the form of software.
  • the aforementioned processor 1302 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1301, and the processor 1302 reads the information in the memory 1301, and combines its hardware to complete the functions required by the units included in the image processing apparatus of the embodiment of the present application, or perform the image processing of the method embodiment of the present application method.
  • the communication interface 1303 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or communication networks. For example, the image to be processed can be acquired through the communication interface 1303.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or communication networks.
  • the image to be processed can be acquired through the communication interface 1303.
  • the bus 1304 may include a path for transferring information between various components of the device 1300 (for example, the memory 1301, the processor 1302, and the communication interface 1303).
  • the above-mentioned apparatus 1200 and apparatus 1300 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that 1200 and apparatus 1300 may also include those necessary for normal operation. Other devices. At the same time, according to specific needs, those skilled in the art should understand that the above-mentioned apparatus 1200 and apparatus 1300 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the above-mentioned apparatus 1200 and apparatus 1300 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 29 or FIG. 30.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Disclosed are a neural network search method and apparatus in the field of computer vision in artificial intelligence. The search method comprises: building a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules by means of a basic operation of a neural network, the basic modules comprise a first module, the first module is used for performing a dimension reduction operation and a residual connection operation on a first input feature map, the dimension reduction operation is used for converting the dimension of the first input feature map from an original first dimension into a second dimension, the second dimension is less than the first dimension, and the residual connection operation is used for performing feature summation processing on the first input feature map and a feature map processed by the first module; building a search space according to the basic unit and network structure parameters; and searching a network structure in the search space to determine a target image super-resolution network. The present application can improve the accuracy of a super-resolution network in the case of a certain computing performance.

Description

神经网络的搜索方法及装置Neural network search method and device
本申请要求于2019年07月30日提交中国专利局、申请号为201910695706.7、申请名称为“神经网络的搜索方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910695706.7, and the application name is "Neural Network Search Method and Apparatus" on July 30, 2019, the entire content of which is incorporated into this application by reference .
技术领域Technical field
本申请涉及人工智能领域,并且更具体地,涉及一种神经网络的搜索方法及装置。This application relates to the field of artificial intelligence, and more specifically, to a neural network search method and device.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用***。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
随着人工智能技术的快速发展,神经网络(例如,深度神经网络)近年来在图像、视频以及语音等多种媒体信号的处理与分析中取得了很大的成就。图像超分辨率重构技术是指将低分辨率图像进行重构得到高分辨率图像,通过深度神经网络进行图像超分辨率重构处理具有明显的优势,随着图像超分辨率重构技术效果的提升深度神经网络模型也越来越大。由于移动设备的计算性能和存储空间都非常有限,这极大的限制了超分模型在移动设备上的应用,因此,人们致力于设计轻量级的超分网络模型,在保证一定的识别精度情况下,尽可能减少网络规模。With the rapid development of artificial intelligence technology, neural networks (for example, deep neural networks) have made great achievements in the processing and analysis of various media signals such as images, videos, and voices in recent years. Image super-resolution reconstruction technology refers to the reconstruction of low-resolution images to obtain high-resolution images. Image super-resolution reconstruction processing through deep neural networks has obvious advantages. With the effect of image super-resolution reconstruction technology The enhancement of the deep neural network model is also getting bigger. As the computing performance and storage space of mobile devices are very limited, this greatly limits the application of super-division models on mobile devices. Therefore, people are committed to designing lightweight super-division network models to ensure certain recognition accuracy. In this case, reduce the network scale as much as possible.
为了获取轻量级的超分网络模型,将神经网络结构搜索(neural architecture search,NAS)的方法应用于图像超分辨率重构技术中。目前,NAS方法中的搜索空间通常是由基本卷积单元构建的搜索空间,搜索空间中可以包括由多个基本单元构建的候选神经网络模型,多个基本单元基于输入特征图尺寸大小在相同特征尺寸上对输入特征图进行非线性变换,这导致神经网络模型中的参数量和计算量成正比,即参数量越大则网络模型的计算量越大。在移动设备的计算性能受限的情况下,只能通过减少参数量而降低计算量,从而限制了用于超分辨率重构的网络模型的性能。因此,在移动设备的计算性能受限制的情况下,如何提高超分辨率神经网络的性能成为一个亟需解决的问题。In order to obtain a lightweight super-division network model, the neural architecture search (NAS) method is applied to the image super-resolution reconstruction technology. At present, the search space in the NAS method is usually a search space constructed by basic convolutional units. The search space can include candidate neural network models constructed by multiple basic units. The multiple basic units are based on the size of the input feature map in the same feature. The size of the input feature map is nonlinearly transformed, which results in the amount of parameters in the neural network model being proportional to the amount of calculation, that is, the larger the parameter amount, the greater the amount of calculation of the network model. When the computing performance of mobile devices is limited, the amount of calculation can only be reduced by reducing the amount of parameters, thereby limiting the performance of the network model for super-resolution reconstruction. Therefore, when the computing performance of mobile devices is limited, how to improve the performance of super-resolution neural networks has become an urgent problem to be solved.
发明内容Summary of the invention
本申请提供一种神经网络的搜索方法及装置,能够在移动设备的计算性能受限制的情况下,提高超分辨率网络进行图像超分辨率处理时的精度。The present application provides a neural network search method and device, which can improve the accuracy of the super-resolution network in image super-resolution processing when the computing performance of the mobile device is limited.
第一方面,提供了一种神经网络结构的搜索方法,包括:构建基本单元,该基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,该基本模块包括第一模块,该第一模块用于对第一输入特征图进行降维操作和残差连接操作,该降维操作用于将该第一输入特征图的尺度从原始的第一尺度变换至第二尺度,该第二尺度小于该第一尺度,该残差连接操作用于将该第一输入特征图与经过该第一模块处理后的特征图进行特征相加处理,该第一模块处理后的特征图的尺度和该第一输入特征图的尺度相同;根据该基本单元和网络结构参数构建搜索空间,其中,该网络结构参数包括构建该基本单元使用的基本模块的类型,该搜索空间用于搜索图像超分辨率网络结构;在该搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,该目标图像超分辨率网络用于对待处理图像进行超分辨率处理,该目标图像超分辨率网络中至少包括该第一模块,该目标图像超分辨率网络为计算量小于第一预设阈值且图像超分辨率精度大于第二预设阈值的网络。In the first aspect, a search method for a neural network structure is provided, including: constructing a basic unit, which is a network structure obtained by connecting basic modules through basic operations of the neural network, the basic module including the first module The first module is used to perform a dimensionality reduction operation and a residual connection operation on the first input feature map, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, The second scale is smaller than the first scale, the residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the feature map processed by the first module The scale of is the same as the scale of the first input feature map; a search space is constructed according to the basic unit and network structure parameters, where the network structure parameter includes the type of basic module used to construct the basic unit, and the search space is used to search for images Super-resolution network structure; search for the image super-resolution network structure in the search space to determine the target image super-resolution network, the target image super-resolution network is used for super-resolution processing of the image to be processed, and the target image is super-resolution The rate network includes at least the first module, and the target image super-resolution network is a network with a calculation amount less than a first preset threshold and an image super-resolution accuracy greater than a second preset threshold.
需要说明的是,基本单元可以是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,上述网络结构可以包含预先设定好的卷积神经网络中的基础运算或者基础运算的组合,这些基础运算或者基础运算的组合可以统称为基本操作。It should be noted that the basic unit may be a network structure obtained by connecting basic modules through the basic operations of a neural network. The above-mentioned network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network. , These basic operations or combinations of basic operations can be collectively referred to as basic operations.
例如,基本操作可以是指卷积操作,池化操作,残差连接等,通过基本操作可以使得各个基本模块之间进行连接,从而得到基本单元的网络结构。For example, basic operations can refer to convolution operations, pooling operations, residual connections, etc. Through basic operations, connections between basic modules can be made to obtain the network structure of the basic unit.
上述特征相加可以是指对于同一尺度的特征图,将不同通道特征进行相加。The above feature addition may refer to adding different channel features for feature maps of the same scale.
在本申请的实施例中,第一模块可以对输入特征图进行残差连接,即可以将第一输入特征图与经过第一模块处理后的特征图进行特征相加处理,从而实现能够将第一输入特征图中更多的局部细节信息传递至后面的卷积层。在第一模块中确保有足够的第一输入特征图的局部细节信息传递至后面的卷积层时,司仪模块可以用于对第一输入特征图进行降维。通过降维操作可以降低输入特征图的尺度从而降低模型计算量,同时,残差连接操作可以将前层的信息可以很好地传递到后面的层,这弥补了降维操作信息丢失的缺陷。同时,降维操作还可以快速扩大特征的感受野,让高分辨率像素点的预测更好的考虑上下文的信息,从而提升超分精度。In the embodiment of the present application, the first module can perform residual connection on the input feature map, that is, can perform feature addition processing on the first input feature map and the feature map processed by the first module, so as to realize that the first More local detailed information in an input feature map is passed to the subsequent convolutional layer. When it is ensured in the first module that enough local detail information of the first input feature map is transferred to the subsequent convolutional layer, the emcee module can be used to reduce the dimensionality of the first input feature map. Through the dimensionality reduction operation, the scale of the input feature map can be reduced to reduce the amount of model calculation. At the same time, the residual connection operation can transfer the information of the previous layer to the subsequent layer, which makes up for the defect of the loss of information of the dimensionality reduction operation. At the same time, the dimensionality reduction operation can also quickly expand the receptive field of features, allowing the prediction of high-resolution pixels to better consider contextual information, thereby improving the accuracy of super-resolution.
在一种可能的实现方式中,上述基本单元是用于构建图像超分辨率网络的基础模块。In a possible implementation manner, the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
结合第一方面,在第一方面的某些实现方式中,该降维操作包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。With reference to the first aspect, in some implementations of the first aspect, the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
可选地,池化操作可以是是平均池化操作,或者,池化操作也可以是最大池化操作。Optionally, the pooling operation may be an average pooling operation, or the pooling operation may also be a maximum pooling operation.
在本申请的实施例中,通过降维操作可以使得第一输入特征图的尺度降低,从而在参数量不变的情况下减少目标图像超分辨率网络的计算量。In the embodiment of the present application, the scale of the first input feature map can be reduced through the dimensionality reduction operation, thereby reducing the calculation amount of the target image super-resolution network under the condition that the parameter amount is unchanged.
结合第一方面,在第一方面的某些实现方式中,该第一模块处理后的特征图为经过升维操作后的特征图,该升维操作是指将经过该降维处理后的特征图的尺度恢复至该第一尺度,该残差连接操作是指将该第一输入特征图与经过所述升维操作处理后的特征图进行特征相加处理。With reference to the first aspect, in some implementations of the first aspect, the feature map processed by the first module is the feature map after the dimension upgrade operation, and the dimension upgrade operation refers to the feature map after the dimension reduction process The scale of the map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
可选地,上述升维操作可以是指上采样操作,或者,升维操作也可以是指反向卷积操作,其中,上采样操作可以是指采用内插值方法,即在原有图像像素的基础上在像素点之间采用合适的插值算法***新的元素;反卷积操作可以是指卷积操作的逆过程,又称作转 置卷积。Optionally, the above-mentioned dimension increase operation may refer to an up-sampling operation, or the dimension increase operation may also refer to a reverse convolution operation, where the up-sampling operation may refer to the use of an interpolation method, that is, based on the original image pixels In the above, a suitable interpolation algorithm is used to insert new elements between pixels; the deconvolution operation can refer to the inverse process of the convolution operation, also known as transposed convolution.
在本申请的实施例中,通过升维操作可以使得经过降维操作后的第一输入特征图的尺度从第二尺度变换至原始的第一尺度,即实现经过降维操作后的特征图进行尺度的增加,从而确保在同一尺度下实现残差连接操作。In the embodiment of the present application, the scale of the first input feature map after the dimensionality reduction operation can be transformed from the second scale to the original first scale through the dimensionality reduction operation. The scale increases to ensure that the residual connection operation is realized at the same scale.
结合第一方面,在第一方面的某些实现方式中,所述第一模块还用于对所述第一输入特征图进行密集连接操作,其中,所述密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及该第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。With reference to the first aspect, in some implementations of the first aspect, the first module is further configured to perform a dense connection operation on the first input feature map, wherein the dense connection operation refers to the i-1 The output feature map of each convolutional layer in each convolutional layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
上述特征拼接可以是指将同一尺度的M个特征图拼接成一个具有K通道的特征图,其中,K为大于M的正整数。The aforementioned feature splicing may refer to splicing M feature maps of the same scale into a feature map with K channels, where K is a positive integer greater than M.
在本申请的实施例中,通过采用密集连接操作可以实现网络中最大的信息流通,通过每层都与该层之前的所有层都相连,即每层的输入是前面所有层的输出的拼接。通过密集连接操作使得输入特征图中的信息在整个网络中的保持地更好,从而能够更好弥补降维操作造成的信息丢失的缺陷。In the embodiments of the present application, the largest information circulation in the network can be achieved by adopting dense connection operations, and each layer is connected to all layers before that layer, that is, the input of each layer is the splicing of the outputs of all the previous layers. Through the dense connection operation, the information in the input feature map is better maintained in the entire network, which can better compensate for the defect of information loss caused by the dimensionality reduction operation.
结合第一方面,在第一方面的某些实现方式中,该密集连接操作为循环的密集连接操作,该循环的密集连接操作是指对经过通道压缩处理后的该第一输入特征图进行特征拼接处理。With reference to the first aspect, in some implementations of the first aspect, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
在本申请的实施例中,通过采用循环操作即循环的密集连接操作,可以加深目标图像超分辨率网络的深度,对于神经网络结构而言网络结构的深度越深,即网络结构中卷积层的数量越多,从而提高目标图像超分辨率网络处理图像的精度。In the embodiment of this application, the depth of the super-resolution network of the target image can be deepened by adopting the cyclic operation, that is, the cyclic dense connection operation. For the neural network structure, the deeper the network structure, that is, the convolutional layer in the network structure The larger the number, the higher the accuracy of the target image super-resolution network processing image.
结合第一方面,在第一方面的某些实现方式中,该第一模块还用于进行重排操作,该重排操作是指将该第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,该第二通道特征的分辨率高于该第一通道特征的分辨率。With reference to the first aspect, in some implementations of the first aspect, the first module is also used to perform a rearrangement operation. The rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than that of the first channel feature.
结合第一方面,在第一方面的某些实现方式中,该基本模块还包括第二模块和/或第三模块,其中,该第二模块用于对第二输入特征图进行通道压缩操作、该残差连接操作以及该密集连接操作,该通道压缩操作是指对该第二输入特征图进行卷积核为1×1的卷积操作;该第三模块用于对第三输入特征图进行通道交换操作、该残差连接操作以及该密集连接操作,该第三输入特征图中包括M个子特征图,该M子特征图中每个子特征图包括至少两个相邻的通道特征,该通道交换处理是指将该M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得该M个子特征图中不同子特征图对应的通道特征相邻,M为大于1的整数,该第一输入特征图、该第二输入特征图以及该第三输入特征图对应相同的图像。With reference to the first aspect, in some implementations of the first aspect, the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, For the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is used to perform the third input feature map The channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, and M is an integer greater than 1, the The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
结合第一方面,在第一方面的某些实现方式中,该在该搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,包括:With reference to the first aspect, in some implementations of the first aspect, the image super-resolution network structure search in the search space to determine the target image super-resolution network includes:
在该搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定第一图像超分辨率网络;In the search space, the first image super-resolution network is determined by searching the image super-resolution network structure through an evolutionary algorithm;
通过多级加权联合损失函数对该第一图像超分辨率网络进行反向传播迭代训练确定该目标图像超分辨率网络,其中,该多级加权联合损失函数是根据该第一图像超分辨率网络中的每个该基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间 的损失确定的。Perform back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function to determine the target image super-resolution network, where the multi-level weighted joint loss function is based on the first image super-resolution network The loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit is determined.
应理解,在通过多级加权联合损失函数对该第一图像超分辨率网络时还需要训练图像,其中,训练图像可以是指样本图像即低分辨率图像以及低分辨率对应的样本超分辨率图像。It should be understood that when the first image super-resolution network is super-resolution through the multi-level weighted joint loss function, a training image is also required, where the training image may refer to the sample image, that is, the low-resolution image and the sample super-resolution corresponding to the low resolution. image.
在本申请的实施例中可以通过对进化算法确定的第一图像超分辨率网络通过多级加权联合损失函数进行二次训练,最终确定目标图像超分辨率网络的参数得到目标图像超分辨率网络,从而提高目标图像超分辨率网络处理图像的精度。In the embodiments of the present application, the first image super-resolution network determined by the evolutionary algorithm can be trained twice through the multi-level weighted joint loss function, and finally the parameters of the target image super-resolution network can be determined to obtain the target image super-resolution network. , Thereby improving the accuracy of the target image super-resolution network processing image.
结合第一方面,在第一方面的某些实现方式中,该多级加权联合损失函数是根据以下等式得到的,With reference to the first aspect, in some implementations of the first aspect, the multi-level weighted joint loss function is obtained according to the following equation:
Figure PCTCN2020105369-appb-000001
Figure PCTCN2020105369-appb-000001
其中,L表示该多级加权联合损失函数,L k表示该第一图像超分辨率网络的第k个所述基本单元的损失值,该损失值是指该第k个该基本单元的输出特征图对应的预测超分辨率图像与样本超分辨率图像之间的图像损失,λ k,t表示在t时刻该第k层的损失值的权重,N表示该第一图像超分辨率网络包括的该基本单元的数量,N为大于或等于1的整数。 Where L represents the multi-level weighted joint loss function, L k represents the loss value of the k-th basic unit of the first image super-resolution network, and the loss value refers to the output feature of the k-th basic unit The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the figure, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the first image super-resolution network included The number of the basic unit, N is an integer greater than or equal to 1.
在本申请的实施例中,多级加权联合损失函数中各个中间层图像损失的权重可以随着时间(或者,迭代次数)的变化而变化。该损失函数可以联合各个中间层的预测图像损失,并通过加权方式体现不同层的重要程度,其中,各个中间层图像损失的权重值可以随着时间变化而变化,这样有利于更加充分地训练底层基本单元的参数,从而提升超分辨率网络的性能。In the embodiment of the present application, the weight of each intermediate layer image loss in the multi-level weighted joint loss function may change with time (or the number of iterations). The loss function can combine the predicted image loss of each intermediate layer, and reflect the importance of different layers by weighting. Among them, the weight value of each intermediate layer image loss can change over time, which is conducive to more fully training the bottom layer The parameters of the basic unit to improve the performance of the super-resolution network.
结合第一方面,在第一方面的某些实现方式中,该在该搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定第一图像超分辨率网络,包括:With reference to the first aspect, in some implementations of the first aspect, the search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network includes:
根据该基本单元随机生成P个候选网络结构,P为大于1的整数;Randomly generate P candidate network structures according to the basic unit, where P is an integer greater than 1;
采用该多级加权联合损失函数训练该P个候选网络结构;Training the P candidate network structures using the multi-level weighted joint loss function;
评估训练后的该P个候选网络结构中每个候选网络结构的性能参数,该性能参数包括峰值性噪比,该峰值信噪比用于指示通过该每个候选网络结构得到的预测超分图像与样本超分图像之间的差异;Evaluate the performance parameters of each candidate network structure in the P candidate network structures after training, the performance parameter includes the peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the predicted super-division image obtained by each candidate network structure Differences with sample super-divided images;
根据该候选网络的性能参数确定该第一图像超分辨率网络。The first image super-resolution network is determined according to the performance parameters of the candidate network.
应理解,在评估P个候选网络结构的性能参数时需要用训练图像和多级加权联合损失函数对候选网络结构进行训练,其中,训练图像可以是指样本图像即低分辨率图像以及低分辨率对应的样本超分辨率图像。It should be understood that when evaluating the performance parameters of the P candidate network structures, training images and multi-level weighted joint loss functions need to be used to train the candidate network structures, where the training images may refer to sample images, ie, low-resolution images and low-resolution images. The corresponding sample super-resolution image.
第二方面,提供了一种图像处理方法,包括:获取待处理图像;根据目标图像超分辨率网络对该待处理图像进行超分辨率处理得到该待处理图像的目标图像,其中,该目标图像为该待处理图像对应的超分辨率图像,该目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,该搜索空间是通过基本单元和网络结构参数构建的,该搜索空间用于搜索图像超分辨率网络结构,该网络结构参数包括构建该基本单元使用的基本模块的类型,该基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,该基本模块包括第一模块,该第一模块用于对第一输入特征图进行残差连接操作和降维操作,该残差连接操作是指将该第一输入特征图与经过该第一模块处理后的特征图进行特征相加处理,该降维操作用于将该第一输入特征图的尺度从原始的第一尺度 变换至第二尺度,该第二尺度小于该第一尺度,该目标图像超分辨率网络中至少包括该第一模块,该第一模块处理后的特征图的尺度和该第一输入特征图的尺度相同。In a second aspect, an image processing method is provided, including: acquiring an image to be processed; performing super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image of the image to be processed, wherein the target image Is the super-resolution image corresponding to the image to be processed, the target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, and the search space is constructed by basic units and network structure parameters, The search space is used to search the image super-resolution network structure. The network structure parameters include the type of the basic module used to construct the basic unit. The basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network. , The basic module includes a first module that is used to perform a residual connection operation and a dimensionality reduction operation on the first input feature map. The residual connection operation refers to the first input feature map and the first input feature map. The feature map processed by the module is subjected to feature addition processing. The dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale, and the The target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
在一种可能的实现方式中,上述基本单元是用于构建图像超分辨率网络的基础模块。In a possible implementation manner, the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
结合第二方面,在第二方面的某些实现方式中,该降维操作包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。With reference to the second aspect, in some implementations of the second aspect, the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
结合第二方面,在第二方面的某些实现方式中,该第一模块处理后的特征图为经过升维操作后的特征图,该升维操作是指将经过该降维处理后的特征图的尺度恢复至该第一尺度,该残差连接操作是指将该第一输入特征图与经过该升维操作处理后的特征图进行特征相加处理。With reference to the second aspect, in some implementations of the second aspect, the feature map processed by the first module is the feature map after the dimension upgrade operation, and the dimension upgrade operation refers to the feature map after the dimensionality reduction process The scale of the map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
结合第二方面,在第二方面的某些实现方式中,该第一模块还用于对该第一输入特征图进行密集连接操作,其中,该密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及该第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。With reference to the second aspect, in some implementations of the second aspect, the first module is also used to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolving i-1 The output feature map of each convolutional layer in the layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
结合第二方面,在第二方面的某些实现方式中,该密集连接操作为循环的密集连接操作,该循环的密集连接操作是指对经过通道压缩处理后的该第一输入特征图进行特征拼接处理。With reference to the second aspect, in some implementations of the second aspect, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
结合第二方面,在第二方面的某些实现方式中,该第一模块还用于进行重排操作,该重排操作是指将该第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,该第二通道特征的分辨率高于该第一通道特征的分辨率。With reference to the second aspect, in some implementations of the second aspect, the first module is also used to perform a rearrangement operation. The rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
结合第二方面,在第二方面的某些实现方式中,该基本模块还包括第二模块和/或第三模块,其中,该第二模块用于对第二输入特征图进行通道压缩操作、该残差连接操作以及该密集连接操作,该通道压缩操作是指对该第二输入特征图进行卷积核为1×1的卷积操作;该第三模块用于对第三输入特征图进行通道交换操作、该残差连接操作以及该密集连接操作,该第三输入特征图中包括M个子特征图,该M子特征图中每个子特征图包括至少两个相邻的通道特征,该通道交换处理是指将该M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得该M个子特征图中不同子特征图对应的通道特征相邻,M为大于1的整数,该第一输入特征图、该第二输入特征图以及该第三输入特征图对应相同的图像。With reference to the second aspect, in some implementations of the second aspect, the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, For the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is used to perform the third input feature map The channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, and M is an integer greater than 1, the The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
结合第二方面,在第二方面的某些实现方式中,该目标图像超分辨率网络是通过多级加权联合损失函数对第一图像超分辨率网络进行反向传播迭代训练确定的网络,其中,该多级加权联合损失函数是根据该第一图像超分辨率网络中的每个该基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的,该第一图像超分变率网络是指在该搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定的网络。With reference to the second aspect, in some implementations of the second aspect, the target image super-resolution network is a network determined by back-propagating iterative training of the first image super-resolution network through a multi-level weighted joint loss function, where The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network. The first image super-resolution rate network refers to a network that is determined by searching the image super-resolution network structure through an evolutionary algorithm in the search space.
结合第二方面,在第二方面的某些实现方式中,该多级加权联合损失函数是根据以下等式得到的,With reference to the second aspect, in some implementations of the second aspect, the multi-level weighted joint loss function is obtained according to the following equation:
Figure PCTCN2020105369-appb-000002
Figure PCTCN2020105369-appb-000002
其中,L表示该多级加权联合损失函数,L k表示该第一图像超分辨率网络的第k个该基本单元的损失值,该损失值是指该第k个该基本单元的输出特征图对应的预测超分辨率 图像与该样本超分辨率图像之间的图像损失,λ k,t表示在t时刻该第k层的损失值的权重,N表示该第一图像超分辨率网络包括的该基本单元的数量,N为大于或等于1的整数。 Where L represents the multi-level weighted joint loss function, L k represents the loss value of the k-th basic unit of the first image super-resolution network, and the loss value refers to the output feature map of the k-th basic unit The image loss between the corresponding predicted super-resolution image and the sample super-resolution image, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the first image super-resolution network included The number of the basic unit, N is an integer greater than or equal to 1.
结合第二方面,在第二方面的某些实现方式中,该第一图像超分辨率网络是在P个候选网络结构中每个候选网络结构的性能参数确定的,该P个候选网络结构是根据该基本单元随机生成的,该性能参数是指评估通过采用该多级加权联合损失函数训练后的该P个候选网络结构的性能的参数,该性能参数包括峰值性噪比,该峰值信噪比用于指示通过该每个候选网络结构得到的预测超分图像与样本超分图像之间的差异,P为大于1的整数。With reference to the second aspect, in some implementations of the second aspect, the first image super-resolution network is determined by the performance parameters of each candidate network structure in P candidate network structures, and the P candidate network structures are Randomly generated according to the basic unit, the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function. The performance parameter includes the peak-to-noise ratio, and the peak-to-noise ratio. The ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure, and P is an integer greater than 1.
第三方面,提供了一种图像处理方法,应用于具有显示屏和摄像头的电子设备,该方法包括:检测到用户用于打开相机的第一操作;响应于该第一操作,在该显示屏上显示拍摄界面,该拍摄界面上包括取景框,该取景框内包括第一图像;检测到该用户指示相机的第二操作;响应于该第二操作,在该取景框内显示第二图像,该第二图像为针对该摄像头采集到的该第一图像进行超分辨率处理后的图像,其中,目标图像超分辨率网络应用于该超分辨率处理过程中,目标图像超分辨率网络是在搜索空间中通过网络结构搜索确定的网络,该搜索空间是通过基本单元和网络结构参数构建的,该搜索空间用于搜索图像超分辨率网络结构,该网络结构参数包括构建该基本单元使用的基本模块的类型,该基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,该基本模块包括第一模块,该第一模块用于对第一输入特征图进行残差连接操作和降维操作,该残差连接操作是指将该第一输入特征图与经过该第一模块处理后的特征图进行特征相加处理,该降维操作用于将该第一输入特征图的尺度从原始的第一尺度变换至第二尺度,该第二尺度小于该第一尺度,该目标图像超分辨率网络中至少包括该第一模块,该第一模块处理后的特征图的尺度和该第一输入特征图的尺度相同。In a third aspect, an image processing method is provided, which is applied to an electronic device with a display screen and a camera. The method includes: detecting a user's first operation for turning on the camera; in response to the first operation, A photographing interface is displayed on the upper surface, the photographing interface includes a finder frame, the finder frame includes a first image; the second operation of the user instructing the camera is detected; in response to the second operation, the second image is displayed in the finder frame, The second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target image super-resolution network is applied to the super-resolution processing process, and the target image super-resolution network is The search space is a network determined by searching for a network structure. The search space is constructed by basic units and network structure parameters. The search space is used to search for image super-resolution network structures. The network structure parameters include the basic units used to construct the basic unit. The type of the module. The basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network. The basic module includes a first module that is used for residual connection of the first input feature map Operation and dimensionality reduction operation. The residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimensionality reduction operation is used for the first input feature map The scale of is transformed from the original first scale to a second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module The scale is the same as the first input feature map.
在一种可能的实现方式中,上述基本单元是用于构建图像超分辨率网络的基础模块。In a possible implementation manner, the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
结合第三方面,在第三方面的某些实现方式中,该降维操作可以包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。With reference to the third aspect, in some implementations of the third aspect, the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
结合第三方面,在第三方面的某些实现方式中,该第一模块处理后的特征图为经过升维操作后的特征图,该升维操作是指将经过该降维处理后的特征图的尺度恢复至该第一尺度,该残差连接操作是指将该第一输入特征图与经过该升维操作处理后的特征图进行特征相加处理。With reference to the third aspect, in some implementations of the third aspect, the feature map processed by the first module is the feature map after the dimension upgrade operation, and the dimension upgrade operation refers to the feature map after the dimensionality reduction process The scale of the map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
结合第三方面,在第三方面的某些实现方式中,该第一模块还用于对该第一输入特征图进行密集连接操作,其中,该密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及该第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。With reference to the third aspect, in some implementations of the third aspect, the first module is also used to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1 The output feature map of each convolutional layer in the layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
结合第三方面,在第三方面的某些实现方式中,该密集连接操作为循环的密集连接操作,该循环的密集连接操作是指对经过通道压缩处理后的该第一输入特征图进行特征拼接处理。With reference to the third aspect, in some implementations of the third aspect, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
结合第三方面,在第三方面的某些实现方式中,该第一模块还用于进行重排操作,该重排操作是指将该第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,该第二通道特征的分辨率高于该第一通道特征的分辨率。With reference to the third aspect, in some implementations of the third aspect, the first module is also used to perform a rearrangement operation. The rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
结合第三方面,在第三方面的某些实现方式中,该基本模块还包括第二模块和/或第 三模块,其中,该第二模块用于对第二输入特征图进行通道压缩操作、该残差连接操作以及该密集连接操作,该通道压缩操作是指对该第二输入特征图进行卷积核为1×1的卷积操作;该第三模块用于对第三输入特征图进行通道交换操作、该残差连接操作以及该密集连接操作,该第三输入特征图中包括M个子特征图,该M子特征图中每个子特征图包括至少两个相邻的通道特征,该通道交换处理是指将该M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得该M个子特征图中不同子特征图对应的通道特征相邻,该M为大于1的整数,该第一输入特征图、该第二输入特征图以及该第三输入特征图对应相同的图像。With reference to the third aspect, in some implementations of the third aspect, the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, For the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is used to perform the third input feature map The channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, where M is an integer greater than 1, The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
结合第三方面,在第三方面的某些实现方式中,该目标图像超分辨率网络是通过多级加权联合损失函数对第一图像超分辨率网络进行反向传播迭代训练确定的网络,其中,该多级加权联合损失函数是根据该第一图像超分辨率网络中的每个该基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的,该第一图像超分变率网络是指在该搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定的网络。With reference to the third aspect, in some implementations of the third aspect, the target image super-resolution network is a network determined by back-propagating iterative training of the first image super-resolution network through a multi-level weighted joint loss function, where The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network. The first image super-resolution rate network refers to a network that is determined by searching the image super-resolution network structure through an evolutionary algorithm in the search space.
结合第三方面,在第三方面的某些实现方式中,该多级加权联合损失函数是根据以下等式得到的,In combination with the third aspect, in some implementations of the third aspect, the multi-level weighted joint loss function is obtained according to the following equation:
Figure PCTCN2020105369-appb-000003
Figure PCTCN2020105369-appb-000003
其中,L表示该多级加权联合损失函数,L k表示该第一图像超分辨率网络的第k个基本单元的损失值,该损失值是指该第k个该基本单元的输出特征图对应的预测超分辨率图像与该样本超分辨率图像之间的图像损失,λ k,t表示在t时刻该第k层的损失值的权重,N表示该第一图像超分辨率网络包括的该基本单元的数量,N为大于或等于1的整数。 Among them, L represents the multi-level weighted joint loss function, L k represents the loss value of the k-th basic unit of the first image super-resolution network, and the loss value refers to the corresponding output feature map of the k-th basic unit The image loss between the predicted super-resolution image and the sample super-resolution image, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the first image super-resolution network included in the The number of basic units, N is an integer greater than or equal to 1.
结合第三方面,在第三方面的某些实现方式中,该第一图像超分辨率网络是在P个候选网络结构中每个候选网络结构的性能参数确定的,该P个候选网络结构是根据该基本单元随机生成的,该性能参数是指评估通过采用该多级加权联合损失函数训练后的该P个候选网络结构的性能的参数,该性能参数包括峰值性噪比,该峰值信噪比用于指示通过该每个候选网络结构得到的预测超分图像与样本超分图像之间的差异,P为大于1的整数。With reference to the third aspect, in some implementations of the third aspect, the first image super-resolution network is determined by the performance parameters of each candidate network structure in P candidate network structures, and the P candidate network structures are Randomly generated according to the basic unit, the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function. The performance parameter includes the peak-to-noise ratio, and the peak-to-noise ratio. The ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure, and P is an integer greater than 1.
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面和第三方面中相同的内容。It should be understood that the expansion, limitation, explanation and description of the related content in the above first aspect are also applicable to the same content in the second and third aspects.
第四方面,提供了一种神经网络的搜索装置,该装置包括:存储器,用于存储程序;处理器,用于执行该存储器存储的程序,当该存储器存储的程序被执行时,该处理器用于执行:构建基本单元,该基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,该基本模块包括第一模块,该第一模块用于对第一输入特征图进行降维操作和残差连接操作,该降维操作用于将该第一输入特征图的尺度从原始的第一尺度变换至第二尺度,该第二尺度小于该第一尺度,该残差连接操作用于将该第一输入特征图与经过该第一模块处理后的特征图进行特征相加处理,该第一模块处理后的特征图的尺度和该第一输入特征图的尺度相同;根据该基本单元和网络结构参数构建搜索空间,其中,该网络结构参数包括构建该基本单元使用的基本模块的类型,该搜索空间用于搜索图像超分辨率网络结构;在该搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,该目标图像超分辨率网络用于对待处理图像进行超分辨率处理,该目标图像超分辨率网络中至少包括该第一模块,该目标图像超分辨率网络为计算量小于第一预设阈值且图像超分 辨率精度大于第二预设阈值的网络。In a fourth aspect, a neural network search device is provided. The device includes: a memory for storing programs; a processor for executing programs stored in the memory. When the programs stored in the memory are executed, the processor uses For execution: construct a basic unit, which is a network structure obtained by connecting basic modules through the basic operation of a neural network. The basic module includes a first module, which is used to perform a first input feature map. A dimensionality reduction operation and a residual connection operation, the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale is smaller than the first scale, and the residual connection The operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map; The basic unit and network structure parameters construct a search space, where the network structure parameters include the type of basic modules used to construct the basic unit, and the search space is used to search for image super-resolution network structures; image super-resolution is performed in the search space. The resolution network structure search determines the target image super-resolution network, the target image super-resolution network is used to perform super-resolution processing on the image to be processed, the target image super-resolution network includes at least the first module, and the target image super-resolution network The resolution network is a network in which the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold.
在一种可能的实现方式中,上述神经网络的搜索装置中包括的处理器还用于执行第一方面中的任意一种实现方式中的搜索方法。In a possible implementation manner, the processor included in the aforementioned neural network search device is further configured to execute the search method in any one implementation manner in the first aspect.
第五方面,提供一种图像处理装置,该装置包括:存储器,用于存储程序;处理器,用于执行该存储器存储的程序,当该存储器存储的程序被执行时,该处理器用于执行:获取待处理图像;根据目标图像超分辨率网络对该待处理图像进行超分辨率处理得到该待处理图像的目标图像,其中,该目标图像为该待处理图像对应的超分辨率图像,该目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,该搜索空间是通过基本单元和网络结构参数构建的,该搜索空间用于搜索图像超分辨率网络结构,该网络结构参数包括构建该基本单元使用的基本模块的类型,该基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,该基本模块包括第一模块,该第一模块用于对第一输入特征图进行残差连接操作和降维操作,该残差连接操作是指将该第一输入特征图与经过该第一模块处理后的特征图进行特征相加处理,该降维操作用于将该第一输入特征图的尺度从原始的第一尺度变换至第二尺度,该第二尺度小于该第一尺度,该目标图像超分辨率网络中至少包括该第一模块,该第一模块处理后的特征图的尺度和该第一输入特征图的尺度相同。In a fifth aspect, an image processing device is provided, the device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute: Obtain the image to be processed; perform super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image of the image to be processed, where the target image is a super-resolution image corresponding to the image to be processed, and the target The image super-resolution network is a network determined by searching the image super-resolution network structure in the search space. The search space is constructed by basic units and network structure parameters. The search space is used to search for the image super-resolution network structure. The network structure parameter includes the type of the basic module used to construct the basic unit. The basic unit is a network structure obtained by connecting the basic modules through the basic operation of a neural network. The basic module includes a first module. To perform a residual connection operation and a dimensionality reduction operation on the first input feature map, the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the reduction The dimensional operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, The scale of the feature map processed by the first module is the same as the scale of the first input feature map.
在一种可能的实现方式中,上述图像处理装置中包括的处理器还用于执行第二方面中的任意一种实现方式中方法。In a possible implementation manner, the processor included in the foregoing image processing apparatus is further configured to execute the method in any implementation manner in the second aspect.
第六方面,提供一种图像处理装置,该装置包括:存储器,用于存储程序;处理器,用于执行该存储器存储的程序,当该存储器存储的程序被执行时,该处理器用于执行:检测到用户用于打开相机的第一操作;响应于该第一操作,在该显示屏上显示拍摄界面,该拍摄界面上包括取景框,该取景框内包括第一图像;检测到该用户指示相机的第二操作;响应于该第二操作,在该取景框内显示第二图像,该第二图像为针对该摄像头采集到的该第一图像进行超分辨率处理后的图像,其中,目标图像超分辨率网络应用于该超分辨率处理过程中,目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,该搜索空间是通过基本单元和网络结构参数构建的,该搜索空间用于搜索图像超分辨率网络结构,该网络结构参数包括构建该基本单元使用的基本模块的类型,该基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,该基本模块包括第一模块,该第一模块用于对第一输入特征图进行残差连接操作和降维操作,该残差连接操作是指将该第一输入特征图与经过该第一模块处理后的特征图进行特征相加处理,该降维操作用于将该第一输入特征图的尺度从原始的第一尺度变换至第二尺度,该第二尺度小于该第一尺度,该目标图像超分辨率网络中至少包括该第一模块,该第一模块处理后的特征图的尺度和该第一输入特征图的尺度相同。In a sixth aspect, an image processing device is provided, the device including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute: The first operation of the user to turn on the camera is detected; in response to the first operation, a photographing interface is displayed on the display screen, the photographing interface includes a finder frame, and the finder frame includes the first image; the user instruction is detected The second operation of the camera; in response to the second operation, a second image is displayed in the viewing frame, the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target The image super-resolution network is used in the super-resolution processing. The target image super-resolution network is a network determined by the image super-resolution network structure search in the search space, and the search space is constructed by the basic unit and network structure parameters The search space is used to search the image super-resolution network structure. The network structure parameters include the type of basic module used to construct the basic unit. The basic unit is a kind of basic module obtained by connecting the basic modules through the basic operation of the neural network. Network structure, the basic module includes a first module, the first module is used to perform residual connection operation and dimensionality reduction operation on the first input feature map, the residual connection operation refers to the first input feature map and the The feature map processed by the first module is subjected to feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale The target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
在一种可能的实现方式中,上述图像处理装置中包括的处理器还用于执行第三方面中的任意一种实现方式中方法。In a possible implementation manner, the processor included in the foregoing image processing apparatus is further configured to execute the method in any one implementation manner in the third aspect.
第七方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行上述第一方面至第三方面以及第一方面至第三方面中的任意一种实现方式中的方法。In a seventh aspect, a computer-readable medium is provided, and the computer-readable medium stores program code for device execution, and the program code includes the program code for executing the first aspect to the third aspect and the first aspect to the third aspect. The method in any of the implementations.
第八方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运 行时,使得计算机执行上述第一方面至第三方面以及第一方面至第三方面中的任意一种实现方式中的方法。In an eighth aspect, a computer program product containing instructions is provided. When the computer program product runs on a computer, the computer executes any one of the first aspect to the third aspect and the first aspect to the third aspect. The method in the way.
第九方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面至第三方面以及第一方面至第三方面中的任意一种实现方式中的方法。In a ninth aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface, and executes the first to third aspects and the first to third aspects. The method in any one of the third aspects.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行上述第一方面至第三方面以及第一方面至第三方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any one of the foregoing first aspect to the third aspect and the first aspect to the third aspect.
附图说明Description of the drawings
图1是本申请实施例提供的一种人工智能主体框架示意图;FIG. 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application;
图2是本申请实施例提供的一种应用场景的示意图;Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图3是本申请实施例提供的另一种应用场景的示意图;Figure 3 is a schematic diagram of another application scenario provided by an embodiment of the present application;
图4是本申请实施例提供的再一种应用场景的示意图;FIG. 4 is a schematic diagram of another application scenario provided by an embodiment of the present application;
图5是本申请实施例提供的再一种应用场景的示意图;Figure 5 is a schematic diagram of yet another application scenario provided by an embodiment of the present application;
图6是本申请实施例提供的***架构的结构示意图;FIG. 6 is a schematic structural diagram of a system architecture provided by an embodiment of the present application;
图7是本申请实施例提供的一种卷积神经网络结构示意图;FIG. 7 is a schematic diagram of a convolutional neural network structure provided by an embodiment of the present application;
图8是本申请实施例提供的另一种卷积神经网络结构示意图;FIG. 8 is a schematic diagram of another convolutional neural network structure provided by an embodiment of the present application;
图9是本申请实施例提供的一种芯片硬件结构示意图;FIG. 9 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application;
图10是本申请实施例提供了一种***架构的示意图;FIG. 10 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图11是本申请实施提供的神经网络的搜索方法的示意性流程图;FIG. 11 is a schematic flowchart of a neural network search method provided by the implementation of this application;
图12是本申请实施例提供的一种目标图像超分辨率网络的示意图;FIG. 12 is a schematic diagram of a target image super-resolution network provided by an embodiment of the present application;
图13是本申请实施例提供的一种第一模块的结构示意图;FIG. 13 is a schematic structural diagram of a first module provided by an embodiment of the present application;
图14是本申请实施例提供的另一种第一模块的结构示意图;FIG. 14 is a schematic structural diagram of another first module provided by an embodiment of the present application;
图15是本申请实施例提供的再一种第一模块的结构示意图;15 is a schematic structural diagram of still another first module provided by an embodiment of the present application;
图16是本申请实施例提供重排操作的示意图;FIG. 16 is a schematic diagram of a rearrangement operation provided by an embodiment of the present application;
图17是本申请实施例提供的一种第二模块的结构示意图;FIG. 17 is a schematic structural diagram of a second module provided by an embodiment of the present application;
图18是本申请实施例提供的一种第三模块的结构示意图;FIG. 18 is a schematic structural diagram of a third module provided by an embodiment of the present application;
图19是本申请实施例提供的通道交换处理的示意图;FIG. 19 is a schematic diagram of channel exchange processing provided by an embodiment of the present application;
图20是本申请实施例提供的一种搜索图像超分辨率网络的示意图;20 is a schematic diagram of a search image super-resolution network provided by an embodiment of the present application;
图21是本申请实施例提供的通过多级加权联合损失函数进行网络训练的示意图;FIG. 21 is a schematic diagram of network training through a multi-level weighted joint loss function provided by an embodiment of the present application;
图22是本申请实施例提供的基于进化算法进行网络结构搜索的示意图;FIG. 22 is a schematic diagram of a network structure search based on an evolutionary algorithm provided by an embodiment of the present application;
图23是通过本申请实施例的目标超分辨率网络进行图像处理后的效果示意图;FIG. 23 is a schematic diagram of an effect after image processing is performed through the target super-resolution network of an embodiment of the present application;
图24是通过本申请实施例的目标超分辨率网络进行图像处理后的效果示意图;FIG. 24 is a schematic diagram of an effect after image processing is performed through the target super-resolution network of an embodiment of the present application;
图25是本申请实施例提供的图像处理方法的示意性流程图;FIG. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图26是本申请实施例提供的图像处理方法的示意性流程图;FIG. 26 is a schematic flowchart of an image processing method provided by an embodiment of the present application;
图27是本申请实施例提供的一组显示界面示意图;FIG. 27 is a schematic diagram of a group of display interfaces provided by an embodiment of the present application;
图28是本申请实施例提供的另一组显示界面示意图;FIG. 28 is a schematic diagram of another set of display interfaces provided by an embodiment of the present application;
图29是本申请实施例的神经网络的搜索装置的示意性框图;Fig. 29 is a schematic block diagram of a neural network search device according to an embodiment of the present application;
图30是本申请实施例的图像处理装置的示意性框图。FIG. 30 is a schematic block diagram of an image processing device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
图1示出一种人工智能主体框架示意图,该主体框架描述了人工智能***总体工作流程,适用于通用的人工智能领域需求。Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of the artificial intelligence system and is suitable for general artificial intelligence field requirements.
下面从“智能信息链”(水平轴)和“信息技术(information technology,IT)价值链”(垂直轴)两个维度对上述人工智能主题框架100进行详细的阐述。The following is a detailed description of the above-mentioned artificial intelligence theme framework 100 from two dimensions of "intelligent information chain" (horizontal axis) and "information technology (IT) value chain" (vertical axis).
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。"Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到***的产业生态过程,反映人工智能为信息技术产业带来的价值。The "IT value chain" from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the system's industrial ecological process, reflects the value that artificial intelligence brings to the information technology industry.
(1)基础设施110(1) Infrastructure 110
基础设施为人工智能***提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。Infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and realizes support through the basic platform.
基础设施可以通过传感器与外部沟通,基础设施的计算能力可以由智能芯片提供。The infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
这里的智能芯片可以是中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专门应用的集成电路(application specific integrated circuit,ASIC)以及现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片。The smart chip here can be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processing unit (GPU), and an application specific integrated circuit (application specific). Hardware acceleration chips such as integrated circuit (ASIC) and field programmable gate array (FPGA).
基础设施的基础平台可以包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。The basic platform of infrastructure can include distributed computing framework and network and other related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
例如,对于基础设施来说,可以通过传感器和外部沟通获取数据,然后将这些数据提供给基础平台提供的分布式计算***中的智能芯片进行计算。For example, for infrastructure facilities, data can be obtained through sensors and external communication, and then these data can be provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
(2)数据120(2) Data 120
基础设施的上一层的数据用于表示人工智能领域的数据来源。该数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有***的业务数据以及力、位移、液位、温度、湿度等感知数据。The data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. This data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理130(3) Data processing 130
上述数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等处理方式。The above-mentioned data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
推理是指在计算机或智能***中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies. The typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the decision-making process of intelligent information after reasoning, and usually provides functions such as classification, ranking, and prediction.
(4)通用能力140(4) General ability 140
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用***,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing is performed on the data, some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
(5)智能产品及行业应用150(5) Smart products and industry applications 150
智能产品及行业应用指人工智能***在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, safe city, smart terminal, etc.
应用场景一:智能终端拍照领域Application scenario 1: Smart terminal camera field
在一个实施例中,如图2所示,本申请实施例的搜索神经网络结构的方法可以应用于智能终端设备(例如,手机)进行实时图像超分辨率技术。通过本申请实施例的搜索神经网络结构的方法可以确定应用于智能终端拍摄领域的目标图像超分辨率网络。通过目标图像超分辨率网络当用户使用智能终端拍摄远距离物体或者细小物体时,拍摄的图像分辨率比较低,细节不清晰。用户可以使用本申请实施例提供的目标图像超分辨率网络在智能终端上实现图像超分辨率处理,从而能够将低分辨率的图像转换成高分辨率图像,使得拍摄物体更加清晰。In one embodiment, as shown in FIG. 2, the method for searching the neural network structure of the embodiment of the present application can be applied to a smart terminal device (for example, a mobile phone) for real-time image super-resolution technology. The method for searching the neural network structure of the embodiment of the present application can determine the target image super-resolution network applied to the field of smart terminal shooting. Through the target image super-resolution network, when a user uses a smart terminal to photograph long-distance objects or small objects, the resolution of the captured image is relatively low and the details are not clear. The user can use the target image super-resolution network provided by the embodiments of the present application to implement image super-resolution processing on the smart terminal, so that low-resolution images can be converted into high-resolution images, so that the photographed objects are clearer.
示例性地,本申请提出了一种图像处理方法,应用于具有显示屏和摄像头的电子设备,该方法包括:检测到用户用于打开相机的第一操作;响应于所述第一操作,在所述显示屏上显示拍摄界面,所述拍摄界面上包括取景框,所述取景框内包括第一图像;检测到所述用户指示相机的第二操作;响应于所述第二操作,在所述取景框内显示第二图像,所述第二图像为针对所述摄像头采集到的所述第一图像进行超分辨率处理后的图像,其中,目标超分辨率神经网络应用于所述超分辨率处理过程中。Exemplarily, this application proposes an image processing method applied to an electronic device with a display screen and a camera. The method includes: detecting a user's first operation for turning on the camera; in response to the first operation, A photographing interface is displayed on the display screen, the photographing interface includes a finder frame, and the finder frame includes a first image; a second operation instructed by the user to the camera is detected; in response to the second operation, A second image is displayed in the viewing frame, and the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target super-resolution neural network is applied to the super-resolution Rate in the process.
其中,上述目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,所述搜索空间是通过基本单元和网络结构参数构建的,所述搜索空间用于搜索图像超分辨率网络结构,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块至少包括第一模块,所述第一模块用于对第一输入特征图进行残差连接操作和降维操作,所述残差连接操作是指将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述目标图像超分辨率网络中至少包括所述第一模块,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。Wherein, the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network. Resolution network structure, the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map. The residual connection operation refers to combining the first input feature map with the first input feature map. The feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
可选地,在一种可能的实现方式中,所述基本单元是用于构建图像超分辨率网络的基础模块。Optionally, in a possible implementation manner, the basic unit is a basic module for constructing an image super-resolution network.
可选地,在一种可能的实现方式中,所述降维操作可以包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。Optionally, in a possible implementation manner, the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
可选地,在一种可能的实现方式中,所述第一模块处理后的特征图为经过升维操作后的特征图,所述升维操作是指将经过所述降维处理后的特征图的尺度恢复至所述第一尺 度,所述残差连接操作是指将所述第一输入特征图与经过所述升维操作处理后的特征图进行特征相加处理。Optionally, in a possible implementation manner, the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation, and the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing. The scale of the map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
可选地,在一种可能的实现方式中,所述第一模块还用于对所述第一输入特征图进行密集连接操作,其中,所述密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及所述第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。Optionally, in a possible implementation manner, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1 The output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
可选地,在一种可能的实现方式中,所述密集连接操作为循环的密集连接操作,所述循环的密集连接操作是指对经过通道压缩处理后的所述第一输入特征图进行特征拼接处理。Optionally, in a possible implementation manner, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
可选地,在一种可能的实现方式中,所述第一模块还用于进行重排操作,所述重排操作是指将所述第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,所述第二通道特征的分辨率高于所述第一通道特征的分辨率。Optionally, in a possible implementation manner, the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
可选地,在一种可能的实现方式中,所述基本模块还包括第二模块和/或第三模块,其中,所述第二模块用于对第二输入特征图进行通道压缩操作、所述残差连接操作以及所述密集连接操作,所述通道压缩操作是指对所述第二输入特征图进行卷积核为1×1的卷积操作;所述第三模块用于对第三输入特征图进行通道交换操作、所述残差连接操作以及所述密集连接操作,所述第三输入特征图中包括M个子特征图,所述M子特征图中每个子特征图包括至少两个相邻的通道特征,所述通道交换处理是指将所述M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得所述M个子特征图中不同子特征图对应的通道特征相邻,所述M为大于1的整数,所述第一输入特征图、所述第二输入特征图以及所述第三输入特征图对应相同的图像。Optionally, in a possible implementation manner, the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1×1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, the M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
可选地,在一种可能的实现方式中,所述目标图像超分辨率网络是通过多级加权联合损失函数对第一图像超分辨率网络进行反向传播迭代训练确定的网络,其中,所述多级加权联合损失函数是根据所述第一图像超分辨率网络中的每个所述基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的,所述第一图像超分变率网络是指在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定的网络。Optionally, in a possible implementation manner, the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network, The first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
可选地,在一种可能的实现方式中,所述多级加权联合损失函数是根据以下等式得到的,Optionally, in a possible implementation manner, the multi-level weighted joint loss function is obtained according to the following equation:
Figure PCTCN2020105369-appb-000004
Figure PCTCN2020105369-appb-000004
其中,L表示所述多级加权联合损失函数,L k表示所述第一图像超分辨率网络的第k个所述基本单元的损失值,所述损失值是指所述第k个所述基本单元的输出特征图得到的预测超分辨率图像与所述样本超分辨率图像之间的图像损失,λ k,t表示在t时刻所述第k层的损失值的权重,N表示所述第一图像超分辨率网络包括的所述基本单元的数量,N为大于或等于1的整数。 Wherein, L represents the multi-level weighted joint loss function, L k represents the loss value of the kth basic unit of the first image super-resolution network, and the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image obtained from the output feature map of the basic unit, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
可选地,在一种可能的实现方式中,所述第一图像超分辨率网络是在P个候选网络结构中每个候选网络结构的性能参数确定的,所述P个候选网络结构是根据所述基本单元随机生成的,所述性能参数是指评估通过采用所述多级加权联合损失函数训练后的所述P个候选网络结构的性能的参数,所述性能参数包括峰值性噪比,所述峰值信噪比用于指示通过所述每个候选网络结构得到的预测超分图像与样本超分图像之间的差异。Optionally, in a possible implementation manner, the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit, the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function, and the performance parameter includes a peak-to-noise ratio, The peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
需要说明的是,本申请实施例提供的应用于智能终端的拍照领域的目标图像超分辨率网络同样适用于后面图10至图22中相关实施例中对目标图像超分辨率网络相关内容的扩展、限定、解释和说明,此处不再赘述。It should be noted that the target image super-resolution network applied to the camera field of the smart terminal provided by the embodiments of the present application is also applicable to the expansion of the target image super-resolution network related content in the following related embodiments in FIGS. 10 to 22 , Definitions, explanations and descriptions, not repeated here.
示例性地,如图2所示目标图像超分辨率网络应用于智能终端中的示意图,当用户使用智能终端210(例如,手机)拍摄距离较远的物体时获取的低分辨率图像为220或者图像230,图2所示的超分辨率240(super-resolution,SR)可以是本申请实施例中的目标图像超分辨率网络,通过目标图像超分辨率网络的处理后可以得到目标图像,例如,将图像220经过超分辨率处理后可以得到超分辨率图像221;将图像230经过超分辨率处理后可以得到超分辨率图像231。Exemplarily, as shown in FIG. 2, a schematic diagram of a target image super-resolution network applied to a smart terminal. When the user uses the smart terminal 210 (for example, a mobile phone) to photograph a distant object, the low-resolution image obtained is 220 or 220 The image 230, the super-resolution 240 (SR) shown in FIG. 2 may be the target image super-resolution network in the embodiment of the present application, and the target image can be obtained after the target image super-resolution network is processed, for example After the image 220 is subjected to super-resolution processing, the super-resolution image 221 can be obtained; after the image 230 is subjected to the super-resolution processing, the super-resolution image 231 can be obtained.
需要说明的是,智能终端210可以是具有摄像头的电子设备,例如,智能终端可以是有图像处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者,自动驾驶车辆中的车载终端等,本申请实施例对此不作限定。It should be noted that the smart terminal 210 may be an electronic device with a camera. For example, the smart terminal may be a mobile phone with image processing functions, a tablet personal computer (TPC), a media player, a smart TV, or a laptop. (laptop computer, LC), personal digital assistant (personal digital assistant, PDA), personal computer (PC), camera, video camera, smart watch, wearable device (WD) or, in autonomous vehicles The vehicle-mounted terminal, etc., are not limited in this embodiment of the application.
应用场景二:安防领域Application Scenario 2: Security Field
在一个实施例中,如图3所示,本申请实施例的神经网络的搜索方法可以应用于安防领域。例如,公共场合的监控设备采集到的图片(或者,视频)往往受到天气、距离等因素的影响,存在图像模糊、分辨率低等问题。通过目标图像超分辨率网络可以对采集到的图片进行超分辨率重建,可以为公安人员恢复出车牌号码、清晰人脸等重要信息,为案件侦破提供重要的线索信息。In an embodiment, as shown in FIG. 3, the neural network search method of the embodiment of the present application can be applied to the security field. For example, pictures (or videos) collected by monitoring equipment in public places are often affected by factors such as weather and distance, and have problems such as blurred images and low resolution. The super-resolution network of the target image can perform super-resolution reconstruction of the collected pictures, which can restore important information such as license plate numbers and clear faces for public security personnel, and provide important clues for case detection.
示例性地,本申请提供了一种图像处理方法,该方法包括:获取街景画面;根据目标图像超分辨率网络对该街景画面进行超分辨率处理,得到该街景画面的超分辨率图像;根据该街景画面的超分辨率图像,识别该超分辨率图像中的信息。Exemplarily, this application provides an image processing method, the method includes: acquiring a street view image; performing super-resolution processing on the street view image according to the target image super-resolution network to obtain a super-resolution image of the street view image; The super-resolution image of the street view image, and the information in the super-resolution image is recognized.
其中,上述目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,所述搜索空间是通过基本单元和网络结构参数构建的,所述搜索空间用于搜索图像超分辨率网络结构,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块至少包括第一模块,所述第一模块用于对第一输入特征图进行残差连接操作和降维操作,所述残差连接操作是指将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述目标图像超分辨率网络中至少包括所述第一模块,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。Wherein, the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network. Resolution network structure, the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map. The residual connection operation refers to combining the first input feature map with the first input feature map. The feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
需要说明的是,本申请实施例提供的应用于安防领域的目标图像超分辨率网络同样适用于后面图10至图22中相关实施例中对目标图像超分辨率网络相关内容的扩展、限定、解释和说明,此处不再赘述。It should be noted that the target image super-resolution network applied to the security field provided by the embodiments of the present application is also applicable to the expansion, limitation, and definition of the target image super-resolution network related content in the following related embodiments in FIGS. 10 to 22. Explanation and description are not repeated here.
应用场景三:医学成像领域Application scenario 3: Medical imaging field
在一个实施例中,如图4所示,本申请实施例的神经网络的搜索方法可以应用于医学 成像领域。例如,目标图像超分辨率网络可以对医学图像进行超分辨率重建,可以在不增加高分辨率成像技术成本的基础上,降低对成像环境的要求,通过复原出的清晰医学影像,实现对病变细胞的精准探测,有助于医生对患者病情做出更好的诊断。In an embodiment, as shown in FIG. 4, the neural network search method of the embodiment of the present application can be applied to the field of medical imaging. For example, the target image super-resolution network can perform super-resolution reconstruction of medical images, which can reduce the requirements for the imaging environment without increasing the cost of high-resolution imaging technology, and realize the recovery of clear medical images through the restoration of clear medical images. Accurate detection of cells helps doctors make a better diagnosis of the patient's condition.
示例性地,本申请提供了一种图像处理方法,该方法包括:获取医学影像画面;根据目标图像超分辨率网络对该医学影像画面进行超分辨率处理,得到该医学影像画面的超分辨率图像;根据该医学影像画面的超分辨率图像识别和分析该超分辨率图像中的信息。Exemplarily, this application provides an image processing method, the method includes: acquiring a medical image frame; performing super-resolution processing on the medical image frame according to the target image super-resolution network to obtain the super-resolution of the medical image frame Image; according to the super-resolution image of the medical imaging screen to identify and analyze the information in the super-resolution image.
其中,上述目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,所述搜索空间是通过基本单元和网络结构参数构建的,所述搜索空间用于搜索图像超分辨率网络结构,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块包括第一模块,所述第一模块用于对第一输入特征图进行残差连接操作和降维操作,所述残差连接操作是指将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述目标图像超分辨率网络中至少包括所述第一模块,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。Wherein, the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network. Resolution network structure, the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map. The residual connection operation refers to combining the first input feature map with the first input feature map. The feature map processed by the module is subjected to feature addition processing. The dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, and the second scale is smaller than the first scale. A scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
需要说明的是,本申请实施例提供的应用于医学成像领域的目标图像超分辨率网络同样适用于后面图10至图22中相关实施例中对目标图像超分辨率网络相关内容的扩展、限定、解释和说明,此处不再赘述。It should be noted that the target image super-resolution network applied to the medical imaging field provided by the embodiments of the present application is also applicable to the expansion and limitation of the related content of the target image super-resolution network in the following related embodiments in FIGS. 10 to 22. , Explanation and description, I won’t repeat them here.
应用场景四:图像压缩领域Application scenario 4: image compression field
在一个实施例中,如图5所示,本申请实施例的神经网络的搜索方法可以应用于图像压缩领域。例如,在视频会议等实时性要求较高的场合,可以在传输前预先对图片进行压缩,等待传输完毕,再由接收端解码后通过目标图像超分辨率网络进行超分辨率重建技术复原出原始图像序列,极大减少存储所需的空间及传输所需的带宽。In an embodiment, as shown in FIG. 5, the neural network search method of the embodiment of the present application can be applied to the field of image compression. For example, in occasions with high real-time requirements such as video conferencing, the picture can be compressed in advance before transmission, waiting for the transmission to be completed, and then decoded by the receiving end through the super-resolution reconstruction technology of the target image super-resolution network to restore the original The image sequence greatly reduces the space required for storage and the bandwidth required for transmission.
示例性地,本申请提供了一种图像处理方法,该方法包括:获取压缩图像;根据目标图像超分辨率网络对该压缩图像进行超分辨率处理,得到该压缩图像的超分辨率图像;根据该压缩图像的超分辨率图像识别该超分辨率图像中的信息。Exemplarily, the present application provides an image processing method, which includes: acquiring a compressed image; performing super-resolution processing on the compressed image according to the target image super-resolution network to obtain a super-resolution image of the compressed image; The super-resolution image of the compressed image identifies the information in the super-resolution image.
其中,上述目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,所述搜索空间是通过基本单元和网络结构参数构建的,所述搜索空间用于搜索图像超分辨率网络结构,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块至少包括第一模块,所述第一模块用于对第一输入特征图进行残差连接操作和降维操作,所述残差连接操作是指将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述目标图像超分辨率网络中至少包括所述第一模块,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。Wherein, the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network. Resolution network structure, the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map. The residual connection operation refers to combining the first input feature map with the first input feature map. The feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
需要说明的是,本申请实施例提供的应用于图像压缩领域的目标图像超分辨率网络同样适用于后面图10至图22中相关实施例中对目标图像超分辨率网络相关内容的扩展、限 定、解释和说明,此处不再赘述。It should be noted that the target image super-resolution network applied in the field of image compression provided by the embodiments of this application is also applicable to the expansion and limitation of the related content of the target image super-resolution network in the following related embodiments in FIG. 10 to FIG. 22 , Explanation and description, I won’t repeat them here.
应理解,上述为对应用场景的举例说明,并不对本申请的应用场景作任何限定。It should be understood that the foregoing is an example of an application scenario, and does not limit the application scenario of this application in any way.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。Since the embodiments of the present application involve a large number of applications of neural networks, in order to facilitate understanding, the following first introduces related terms and concepts of neural networks that may be involved in the embodiments of the present application.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为: A neural network can be composed of neural units. A neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs. The output of the arithmetic unit can be:
Figure PCTCN2020105369-appb-000005
Figure PCTCN2020105369-appb-000005
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2020105369-appb-000006
其中,
Figure PCTCN2020105369-appb-000007
是输入向量,
Figure PCTCN2020105369-appb-000008
是输出向量,
Figure PCTCN2020105369-appb-000009
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2020105369-appb-000010
经过如此简单的操作得到输出向量
Figure PCTCN2020105369-appb-000011
由于DNN层数多,系数W和偏移向量
Figure PCTCN2020105369-appb-000012
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020105369-appb-000013
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
Although DNN looks complicated, it is not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression:
Figure PCTCN2020105369-appb-000006
among them,
Figure PCTCN2020105369-appb-000007
Is the input vector,
Figure PCTCN2020105369-appb-000008
Is the output vector,
Figure PCTCN2020105369-appb-000009
Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector
Figure PCTCN2020105369-appb-000010
After such a simple operation, the output vector is obtained
Figure PCTCN2020105369-appb-000011
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure PCTCN2020105369-appb-000012
The number is also relatively large. The definition of these parameters in the DNN is as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
Figure PCTCN2020105369-appb-000013
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020105369-appb-000014
In summary, the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
Figure PCTCN2020105369-appb-000014
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).
(3)卷积神经网络(3) Convolutional neural network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常 包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with convolutional structure. The convolutional neural network contains a feature extractor composed of a convolution layer and a sub-sampling layer. The feature extractor can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way to extract image information has nothing to do with location. The convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(4)损失函数(4) Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible.
(5)反向传播算法(5) Back propagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use an error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal neural network model parameters, such as the weight matrix.
(6)神经网络结构搜索(6) Neural network structure search
神经网络结构搜索(Neural Architecture Search,NAS)是一种自动设计神经网络的技术,可以通过算法根据样本集自动设计出高性能的网络结构。Neural Architecture Search (NAS) is a technology for automatically designing neural networks, which can automatically design high-performance network structures based on sample sets through algorithms.
其中,搜索空间,搜索策略以及性能评估策略是NAS算法的核心要素。Among them, search space, search strategy and performance evaluation strategy are the core elements of NAS algorithm.
搜索空间可以是指搜索的神经网络结构的集合,即解的空间。为了提高搜索效率,有时候会对搜索空间进行限定或简化。在某些NAS实现中会把网络切分成基本单元(cell,或block),通过这些单元的堆叠形成更复杂的网络。基本单元由多个节点(神经网络的层)组成,它们在整个网络中重复出现多次,但具有不同的权重参数。The search space can refer to the set of searched neural network structures, that is, the solution space. In order to improve search efficiency, sometimes the search space is limited or simplified. In some NAS implementations, the network is divided into basic units (cells, or blocks), and a more complex network is formed by stacking these units. The basic unit is composed of multiple nodes (layers of the neural network), which appear repeatedly in the entire network but have different weight parameters.
搜索策略可以是指在搜索空间中寻找最优网络结构的过程。搜索策略定义了如何找到最优的网络结构,通常是一个迭代优化过程,本质上是超参数优化问题。The search strategy can refer to the process of finding the optimal network structure in the search space. The search strategy defines how to find the optimal network structure. It is usually an iterative optimization process, which is essentially a hyperparameter optimization problem.
性能评估策略可以是指评估搜索出的网络结构的性能。搜索策略的目标是找到一个神经网络结构,通过性能评估策略可以评估搜索到的网络结构的性能。The performance evaluation strategy may refer to evaluating the performance of the searched network structure. The goal of the search strategy is to find a neural network structure, and the performance of the searched network structure can be evaluated through the performance evaluation strategy.
图6示出了本申请实施例提供的一种***架构300。在图6中,数据采集设备360用于采集训练数据。针对本申请实施例的图像处理方法来说,当通过本申请实施例的神经网络的搜索方法确定目标图像超分辨率网络后,可以通过训练图像对目标超分辨率网络进行进一步训练,即数据采集设备360采集的训练数据可以是训练图像,训练图像可以包括样本图像以及样本图像对应的超分辨率图像,其中,样本图像可以是指低分辨率图像,例如, 低分辨率图像可以是指图像画质不清晰、画面模糊的图像。Fig. 6 shows a system architecture 300 provided by an embodiment of the present application. In FIG. 6, the data collection device 360 is used to collect training data. Regarding the image processing method of the embodiment of the application, after the target image super-resolution network is determined by the neural network search method of the embodiment of the application, the target super-resolution network can be further trained through the training image, that is, data collection The training data collected by the device 360 may be training images. The training images may include sample images and super-resolution images corresponding to the sample images. The sample images may refer to low-resolution images, for example, low-resolution images may refer to image paintings. Image with unclear quality and blurry picture.
在采集到训练数据之后,数据采集设备360将这些训练数据存入数据库330,训练设备320基于数据库330中维护的训练数据训练得到目标模型/规则301。After the training data is collected, the data collection device 360 stores the training data in the database 330, and the training device 320 obtains the target model/rule 301 based on the training data maintained in the database 330.
下面对训练设备320基于训练数据得到目标模型/规则301进行描述,训练设备320对输入的原始图像进行处理,将输出的图像与原始图像进行对比,直到训练设备320输出的图像与原始图像的差值小于一定的阈值,从而完成目标模型/规则301的训练。The following describes the target model/rule 301 obtained by the training device 320 based on the training data. The training device 320 processes the input original image and compares the output image with the original image until the output image of the training device 320 is different from the original image. The difference is less than a certain threshold, thereby completing the training of the target model/rule 301.
例如,在本申请提供的图像处理方法中用于进行图像超分辨率处理的目标图像超分辨率网络可以是通过样本图像的预测超分辨率图像与样本超分辨率图像之间的损失进行训练得到的,训练后的网络使得将样本图像输入至目标图像超分辨率网络得到的预测超分辨率图像与样本超分辨率图像的差值小于一定的阈值,从而完成目标图像超分辨率网络的训练。For example, the target image super-resolution network used for image super-resolution processing in the image processing method provided in this application can be obtained by training the loss between the predicted super-resolution image of the sample image and the sample super-resolution image. Yes, the trained network makes the difference between the predicted super-resolution image and the sample super-resolution image obtained by inputting the sample image into the target image super-resolution network to be less than a certain threshold, thereby completing the training of the target image super-resolution network.
上述目标模型/规则301能够用于实现本申请实施例的图像处理方法。本申请实施例中的目标模型/规则301具体可以为神经网络。The above-mentioned target model/rule 301 can be used to implement the image processing method of the embodiment of the present application. The target model/rule 301 in the embodiment of the present application may specifically be a neural network.
需要说明的是,在实际的应用中,所述数据库330中维护的训练数据不一定都来自于数据采集设备360的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备320也不一定完全基于数据库330维护的训练数据进行目标模型/规则301的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。It should be noted that, in practical applications, the training data maintained in the database 330 may not all come from the collection of the data collection device 360, and may also be received from other devices. In addition, it should be noted that the training device 320 does not necessarily perform the training of the target model/rule 301 completely based on the training data maintained by the database 330. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application. Limitations of Examples.
根据训练设备320训练得到的目标模型/规则301可以应用于不同的***或设备中,如应用于图6所示的执行设备310,所述执行设备310可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器,或者,云端等。在图6中,执行设备310配置输入/输出(input/output,I/O)接口312,用于与外部设备进行数据交互,用户可以通过客户设备340向I/O接口312输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待处理图像。The target model/rule 301 trained according to the training device 320 can be applied to different systems or devices, such as the execution device 310 shown in FIG. 6, which can be a terminal, such as a mobile phone terminal, a tablet computer, Notebook computers, augmented reality (AR)/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers, or cloud, etc. In FIG. 6, the execution device 310 is configured with an input/output (input/output, I/O) interface 312 for data interaction with external devices. The user can input data to the I/O interface 312 through the client device 340. The input data in this embodiment of the application may include: the image to be processed input by the client device.
预处理模块313和预处理模块314用于根据I/O接口312接收到的输入数据(如待处理图像)进行预处理,在本申请实施例中,也可以没有预处理模块313和预处理模块314(也可以只有其中的一个预处理模块),而直接采用计算模块311对输入数据进行处理。The preprocessing module 313 and the preprocessing module 314 are used for preprocessing according to the input data (such as the image to be processed) received by the I/O interface 312. In the embodiment of the present application, there may be no preprocessing module 313 and preprocessing module 314 (there may only be one preprocessing module), and the calculation module 311 is directly used to process the input data.
在执行设备310对输入数据进行预处理,或者在执行设备310的计算模块311执行计算等相关的处理过程中,执行设备310可以调用数据存储***350中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储***350中。When the execution device 310 preprocesses the input data, or when the calculation module 311 of the execution device 310 performs calculations and other related processing, the execution device 310 can call data, codes, etc. in the data storage system 350 for corresponding processing , The data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 350.
最后,I/O接口312将处理结果,如上述得到的预测深度处理后的深度图像返回给客户设备340,从而提供给用户。Finally, the I/O interface 312 returns the processing result, such as the predicted depth image obtained as described above, to the client device 340 to provide it to the user.
值得说明的是,训练设备320可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则301,该相应的目标模型/规则301即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth noting that the training device 320 can generate corresponding target models/rules 301 based on different training data for different goals or different tasks, and the corresponding target models/rules 301 can be used to achieve the above goals or complete The above tasks provide the user with the desired result.
在图6中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口312提供的界面进行操作。另一种情况下,客户设备340可以自动地向I/O接口312发送输入数据,如果要求客户设备340自动发送输入数据需要获得用户的授权,则用户可以在客户 设备340中设置相应权限。用户可以在客户设备340查看执行设备310输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备340也可以作为数据采集端,采集如图所示输入I/O接口312的输入数据及输出I/O接口312的输出结果作为新的样本数据,并存入数据库330。当然,也可以不经过客户设备340进行采集,而是由I/O接口312直接将如图所示输入I/O接口312的输入数据及输出I/O接口312的输出结果,作为新的样本数据存入数据库330。In the case shown in FIG. 6, the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 312. In another case, the client device 340 can automatically send input data to the I/O interface 312. If the client device 340 is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the client device 340. The user can view the result output by the execution device 310 on the client device 340, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 340 can also be used as a data collection terminal to collect the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as new sample data, and store it in the database 330 as shown. Of course, it is also possible not to collect through the client device 340, but the I/O interface 312 directly uses the input data input to the I/O interface 312 and the output result of the output I/O interface 312 as a new sample as shown in the figure. The data is stored in the database 330.
值得注意的是,图6仅是本申请实施例提供的一种***架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图6中,数据存储***350相对执行设备310是外部存储器,在其它情况下,也可以将数据存储***350置于执行设备310中。It is worth noting that FIG. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 6, the data The storage system 350 is an external memory relative to the execution device 310. In other cases, the data storage system 350 may also be placed in the execution device 310.
如图6所示,根据训练设备320训练得到目标模型/规则301,该目标模型/规则301在本申请实施例中可以是本申请中的神经网络,具体的,本申请实施例提供的神经网络可以是CNN,深度卷积神经网络(deep convolutional neural networks,DCNN)等。As shown in FIG. 6, the target model/rule 301 is trained according to the training device 320. The target model/rule 301 may be the neural network in this application in the embodiment of this application. Specifically, the neural network provided in the embodiment of this application It can be CNN, deep convolutional neural networks (deep convolutional neural networks, DCNN), etc.
由于CNN是一种非常常见的神经网络,下面结合图7重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。Since CNN is a very common neural network, the structure of CNN will be introduced in detail below in conjunction with Figure 7. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture. A deep learning architecture refers to a machine learning algorithm. Multi-level learning is carried out on the abstract level of As a deep learning architecture, CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图7所示。在图7中,卷积神经网络(CNN)400可以包括输入层410,卷积层/池化层420(其中,池化层为可选的),以及神经网络430。其中,输入层410可以获取待处理图像,并将获取到的待处理图像交由卷积层/池化层420以及后面的神经网络层430进行处理,可以得到图像的处理结果。下面对图7中的CNN 400中内部的层结构进行详细的介绍。The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 7. In FIG. 7, a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (wherein the pooling layer is optional), and a neural network 430. Wherein, the input layer 410 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 420 and the subsequent neural network layer 430 for processing, and the image processing result can be obtained. The following describes the internal layer structure of CNN 400 in Fig. 7 in detail.
卷积层/池化层420:Convolutional layer/pooling layer 420:
如图7所示卷积层/池化层420可以包括如示例421-426层,举例来说:在一种实现中,421层为卷积层,422层为池化层,423层为卷积层,424层为池化层,425为卷积层,426为池化层;在另一种实现方式中,421、422为卷积层,423为池化层,424、425为卷积层,426为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in Figure 7, the convolutional layer/pooling layer 420 may include layers 421-426, for example: in one implementation, layer 421 is a convolutional layer, layer 422 is a pooling layer, and layer 423 is a convolutional layer. Build layers, 424 layers are pooling layers, 425 are convolutional layers, and 426 are pooling layers; in another implementation, 421 and 422 are convolutional layers, 423 are pooling layers, and 424 and 425 are convolutional layers. Layer, 426 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
下面将以卷积层421为例,介绍一层卷积层的内部工作原理。The following will take the convolutional layer 421 as an example to introduce the internal working principle of a convolutional layer.
卷积层421可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素等,这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出, 但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。The convolution layer 421 can include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. Etc., it depends on the value of stride) to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices with the same size (row × column) are used. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
不同的权重矩阵可以用来提取图像中不同的特征,例如,一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract the edge information of the image, another weight matrix is used to extract the specific color of the image, and another weight matrix is used to correct the unwanted images in the image. The noise is blurred and so on. The multiple weight matrices have the same size (row×column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are combined to form The output of the convolution operation.
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络400进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 400 can make correct predictions. .
当卷积神经网络400有多个卷积层的时候,初始的卷积层(例如421)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络400深度的加深,越往后的卷积层(例如426)提取到的特征越来越复杂,比如,高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。When the convolutional neural network 400 has multiple convolutional layers, the initial convolutional layer (such as 421) often extracts more general features, which can also be called low-level features; With the deepening of the network 400, the features extracted by the subsequent convolutional layers (such as 426) become more complex, for example, features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图7中420所示例的421-426各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 421-426 as illustrated by 420 in Figure 7, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. In the image processing process, the only purpose of the pooling layer is to reduce the size of the image space. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
神经网络层430:Neural network layer 430:
在经过卷积层/池化层420的处理后,卷积神经网络400还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层420只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络400需要利用神经网络层430来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层430中可以包括多层隐含层(如图7所示的431、432至43n)以及输出层440,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像检测以及图像超分辨率重建等等。After processing by the convolutional layer/pooling layer 420, the convolutional neural network 400 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 420 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 400 needs to use the neural network layer 430 to generate one or a group of required classes of output. Therefore, the neural network layer 430 can include multiple hidden layers (431, 432 to 43n as shown in FIG. 7) and an output layer 440. The parameters contained in the hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image detection, and image super-resolution reconstruction.
在神经网络层430中的多层隐含层之后,也就是整个卷积神经网络400的最后层为输出层440,该输出层440具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络400的前向传播(如图7由410至440方向的传播为前向传播)完成,反向传播(如图7由440至410方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络400的损失,及卷积神经网络400通过输出层输出 的结果和理想结果之间的误差。After the multiple hidden layers in the neural network layer 430, that is, the final layer of the entire convolutional neural network 400 is the output layer 440. The output layer 440 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 400 (as shown in Figure 7, the propagation from the 410 to 440 direction is forward propagation) is completed, the back propagation (as shown in Figure 7 is the propagation from the 440 to 410 direction to the back propagation) Start to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 400 and the error between the output result of the convolutional neural network 400 through the output layer and the ideal result.
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图8所示。在图8中,卷积神经网络(CNN)500可以包括输入层510,卷积层/池化层520(其中,池化层为可选的),以及神经网络530。与图7相比,图8中的卷积层/池化层520中的多个卷积层/池化层并行,将分别提取的特征均输入给全神经网络层530进行处理。The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 8. In FIG. 8, a convolutional neural network (CNN) 500 may include an input layer 510, a convolutional layer/pooling layer 520 (wherein the pooling layer is optional), and a neural network 530. Compared with FIG. 7, multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 520 in FIG. 8 are parallel, and the respectively extracted features are input to the full neural network layer 530 for processing.
需要说明的是,图7和图8所示的卷积神经网络仅作为一种本申请实施例的图像处理方法的两种可能的卷积神经网络的示例,在具体的应用中,本申请实施例的图像处理方法所采用的卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network shown in FIG. 7 and FIG. 8 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of this application. In specific applications, this application implements The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
图9为本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器600。该芯片可以被设置在如图6所示的执行设备310中,用以完成计算模块311的计算工作。该芯片也可以被设置在如图6所示的训练设备320中,用以完成训练设备320的训练工作并输出目标模型/规则301。如图7或图8所示的卷积神经网络中各层的算法均可在如图9所示的芯片中得以实现。FIG. 9 is a hardware structure of a chip provided by an embodiment of the application. The chip includes a neural network processor 600. The chip can be set in the execution device 310 as shown in FIG. 6 to complete the calculation work of the calculation module 311. The chip can also be set in the training device 320 as shown in FIG. 6 to complete the training work of the training device 320 and output the target model/rule 301. The algorithms of each layer in the convolutional neural network as shown in FIG. 7 or FIG. 8 can all be implemented in the chip as shown in FIG. 9.
神经网络处理器NPU 600作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU 600的核心部分为运算电路603,控制器604控制运算电路603提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU 600 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks. The core part of the NPU 600 is the arithmetic circuit 603. The controller 604 controls the arithmetic circuit 603 to extract data from the memory (weight memory or input memory) and perform calculations.
在一些实现中,运算电路603内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路603是二维脉动阵列。运算电路603还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路603是通用的矩阵处理器。In some implementations, the arithmetic circuit 603 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 603 is a two-dimensional systolic array. The arithmetic circuit 603 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 603 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路603从权重存储器602中取矩阵B相应的数据,并缓存在运算电路603中每一个PE上。运算电路603从输入存储器601中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器608(accumulator)中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 603 fetches the data corresponding to matrix B from the weight memory 602 and caches it on each PE in the arithmetic circuit 603. The arithmetic circuit 603 fetches the matrix A data and matrix B from the input memory 601 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 608 (accumulator).
向量计算单元607可以对运算电路603的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元607可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。The vector calculation unit 607 can perform further processing on the output of the arithmetic circuit 603, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 607 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
在一些实现种,向量计算单元能607将经处理的输出的向量存储到统一存储器606。例如,向量计算单元607可以将非线性函数应用到运算电路603的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元607生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路603的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 607 can store the processed output vector to the unified memory 606. For example, the vector calculation unit 607 may apply a nonlinear function to the output of the arithmetic circuit 603, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 607 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 603, for example for use in subsequent layers in a neural network.
统一存储器606用于存放输入数据以及输出数据。The unified memory 606 is used to store input data and output data.
权重数据直接通过存储单元访问控制器605(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器601和/或统一存储器606、将外部存储器中的权重数据存入权重存储器602,以及将统一存储器606中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 601 and/or the unified memory 606 through the storage unit access controller 605 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 602, And the data in the unified memory 606 is stored in the external memory.
总线接口单元(bus interface unit,BIU)610,用于通过总线实现主CPU、DMAC和取指存储器609之间进行交互。The bus interface unit (BIU) 610 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 609 through the bus.
与控制器604连接的取指存储器(instruction fetch buffer)609,用于存储控制器604使用的指令。An instruction fetch buffer 609 connected to the controller 604 is used to store instructions used by the controller 604.
控制器604,用于调用取指存储器609中缓存的指令,实现控制该运算加速器的工作过程。The controller 604 is used to call the instructions cached in the instruction fetch memory 609 to control the working process of the computing accelerator.
一般地,统一存储器606,输入存储器601,权重存储器602以及取指存储器609均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。Generally, the unified memory 606, the input memory 601, the weight memory 602, and the instruction fetch memory 609 are all on-chip memories. The external memory is a memory external to the NPU. The external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate, synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
其中,图7或图8所示的卷积神经网络中各层的运算可以由运算电路603或向量计算单元607执行。Among them, the operations of each layer in the convolutional neural network shown in FIG. 7 or FIG. 8 can be executed by the arithmetic circuit 603 or the vector calculation unit 607.
上文中介绍的图6中的执行设备310能够执行本申请实施例的神经网络的搜索方法或者图像处理方法的各个步骤,图7和图8所示的CNN模型和图9所示的芯片也可以用于执行本申请实施例的神经网络的搜索方法或者图像处理方法的各个步骤。The execution device 310 in FIG. 6 introduced above can execute the neural network search method or image processing method of the embodiment of the present application. The CNN model shown in FIGS. 7 and 8 and the chip shown in FIG. 9 can also be It is used to execute each step of the neural network search method or the image processing method of the embodiment of the present application.
下面先结合附图10至图24对本申请实施例的神经网络的搜索方法进行详细的介绍,需要说明的是,通过本申请实施例的神经网络的搜索方法确定的目标超分辨率网络可以用于执行本申请实施例的图像处理方法。The following first describes the neural network search method of the embodiment of the application in detail with reference to FIGS. 10 to 24. It should be noted that the target super-resolution network determined by the neural network search method of the embodiment of the application can be used for Perform the image processing method of the embodiment of the present application.
如图10所示,本申请实施例提供了一种***架构700。该***架构包括本地设备720、本地设备730以及执行设备710和数据存储***750,其中,本地设备720和本地设备730通过通信网络与执行设备710连接。As shown in FIG. 10, an embodiment of the present application provides a system architecture 700. The system architecture includes a local device 720, a local device 730, an execution device 710, and a data storage system 750. The local device 720 and the local device 730 are connected to the execution device 710 through a communication network.
执行设备710可以由一个或多个服务器实现。可选的,执行设备710可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备710可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备710可以使用数据存储***750中的数据,或者调用数据存储***750中的程序代码来实现本申请实施例的搜索神经网络结构的方法。The execution device 710 may be implemented by one or more servers. Optionally, the execution device 710 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. The execution device 710 may be arranged on one physical site or distributed on multiple physical sites. The execution device 710 may use the data in the data storage system 750 or call the program code in the data storage system 750 to implement the method for searching the neural network structure of the embodiment of the present application.
需要说明的是,上述执行设备710也可以称为云端设备,此时执行设备710可以部署在云端。It should be noted that the above-mentioned execution device 710 may also be referred to as a cloud device, and in this case, the execution device 710 may be deployed in the cloud.
具体地,执行设备710可以执行以下过程:构建基本单元,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块包括第一模块,所述第一模块用于对第一输入特征图进行降维操作和残差连接操作,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述残差连接操作用于将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同;根据所述基本单元和网络结构参数构建搜索空间,其中,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述搜索空间用于搜索图像超分辨率网络结构,所述基本单元是用于构建图像超分辨率网络的基础模块;在所述搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,所述目标图像超分辨率网络用于对待处理图像进行超分辨率处理,所述目标图像超分辨率网络中至少包括所述第一模块,所述目标图像超分辨率网络为计算量小于第一预设阈值且图像超分辨率精度大于第二预设阈值的 网络。Specifically, the execution device 710 may perform the following process: construct a basic unit, the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network, the basic module includes a first module, and the first module A module is used to perform a dimensionality reduction operation and a residual connection operation on the first input feature map. The dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, so The second scale is smaller than the first scale, and the residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the first module The scale of the processed feature map is the same as the scale of the first input feature map; a search space is constructed according to the basic unit and network structure parameters, wherein the network structure parameter includes the basic module used to construct the basic unit Type, the search space is used to search the image super-resolution network structure, the basic unit is a basic module used to construct the image super-resolution network; the image super-resolution network structure search is performed in the search space to determine the target image A super-resolution network, the target image super-resolution network is used to perform super-resolution processing on an image to be processed, the target image super-resolution network includes at least the first module, and the target image super-resolution network is A network where the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold.
通过上述过程执行设备710能够通过网络结构搜索(neural architecture search,NAS)的方式获取一个目标神经网络,该目标神经网络可以用于图像超辨率处理等。Through the foregoing process execution device 710, a target neural network can be obtained through a network structure search (neural architecture search, NAS), and the target neural network can be used for image super-resolution processing.
在一种可能的实现方式中,上述执行设备710搜索网络结构的方法可以是在云端执行的离线搜索方法。In a possible implementation manner, the foregoing method for the execution device 710 to search the network structure may be an offline search method executed in the cloud.
用户可以操作各自的用户设备(例如,本地设备720和本地设备730)与执行设备710进行交互。每个本地设备可以表示任何计算设备,例如,个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。The user can operate respective user devices (for example, the local device 720 and the local device 730) to interact with the execution device 710. Each local device can represent any computing device, for example, a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other types of cellular phones, a media consumption device, a wearable device, a set-top box, a game console, etc.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备710进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。The local device of each user can interact with the execution device 710 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.
在一种实现方式中,本地设备720、本地设备730可以从执行设备710获取到目标神经网络的相关参数,将目标神经网络部署在本地设备720、本地设备730上,利用该目标神经网络进行图像超辨率处理处理等等。In an implementation manner, the local device 720 and the local device 730 may obtain the relevant parameters of the target neural network from the execution device 710, deploy the target neural network on the local device 720 and the local device 730, and use the target neural network to perform image processing. Super resolution processing and so on.
在另一种实现中,执行设备710上可以直接部署目标神经网络,执行设备710通过从本地设备720和本地设备730获取待处理图像,并根据目标神经网络对待处理图像进行图像超辨率处理。In another implementation, the target neural network can be directly deployed on the execution device 710. The execution device 710 obtains the image to be processed from the local device 720 and the local device 730, and performs image super-resolution processing on the image to be processed according to the target neural network.
例如,上述目标神经网络可以是本申请实施例中的目标图像超分辨率网络。For example, the aforementioned target neural network may be the target image super-resolution network in the embodiment of the present application.
下面先结合图11对本申请实施例的神经网络的搜索方法进行详细的介绍。图11所示的方法可以由神经网络搜索装置来执行,该神经网络搜索装置可以是电脑、服务器等运算能力足以用来神经网络搜索的装置。The neural network search method of the embodiment of the present application will be described in detail below in conjunction with FIG. 11. The method shown in FIG. 11 can be executed by a neural network search device. The neural network search device can be a computer, a server, and other devices with sufficient computing power for neural network search.
图11所示的方法800包括步骤810至830,下面分别对这些步骤进行详细的描述。The method 800 shown in FIG. 11 includes steps 810 to 830, which will be described in detail below.
步骤810:构建基本单元,基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,基本模块包括第一模块,第一模块用于对第一输入特征图进行降维操作和残差连接操作,降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,第二尺度小于第一尺度,残差连接操作用于将第一输入特征图与经过第一模块处理后的特征图进行特征相加处理,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。Step 810: Construct a basic unit. The basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network. The basic module includes a first module, which is used to perform a dimensionality reduction operation on the first input feature map And the residual connection operation, the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, the second scale is smaller than the first scale, and the residual connection operation is used to convert the first The input feature map is subjected to feature addition processing with the feature map processed by the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
需要说明的是,基本单元可以是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,上述网络结构可以包含预先设定好的卷积神经网络中的基础运算或者基础运算的组合,这些基础运算或者基础运算的组合可以统称为基本操作。It should be noted that the basic unit may be a network structure obtained by connecting basic modules through the basic operations of a neural network. The above-mentioned network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network. , These basic operations or combinations of basic operations can be collectively referred to as basic operations.
例如,基本操作可以是指卷积操作,池化操作,残差连接等,通过基本操作可以使得各个基本模块之间进行连接,从而得到基本单元的网络结构。For example, basic operations can refer to convolution operations, pooling operations, residual connections, etc. Through basic operations, connections between basic modules can be made to obtain the network structure of the basic unit.
在一种可能的实现方式中,上述基本单元可以是用于构建图像超分辨率网络的基础模块,如图12所示目标图像超分辨率网络可以包括三大部分分别为特征提取部分、非线性变化部分以及重建部分,其中,特征提取模块用于获取待处理图像的图像特征,在图12中待处理图像可以是低分辨率图像(low resolution,LR);非线性变换部分用于将待输入图像的图像特征进行变换,将图像特征从第一特征空间映射至第二特征空间,第一特征空间是指提取待处理图像所在的特征空间,通常情况下第二高维空间更易于重构超分图像; 重构部分用于将非线性变化部分输出的图像特征进行上采样和卷积处理,得到待输入图像对应的超分辨率图像。在本申请的实施例中,可以通过NAS的方式在搜索空间中搜索非线性变换部分网络结构。In a possible implementation manner, the above-mentioned basic unit may be a basic module used to construct an image super-resolution network. As shown in FIG. 12, the target image super-resolution network may include three major parts: The change part and the reconstruction part. Among them, the feature extraction module is used to obtain the image features of the image to be processed. In FIG. 12, the image to be processed may be a low resolution image (LR); the nonlinear transformation part is used to input The image features of the image are transformed, and the image features are mapped from the first feature space to the second feature space. The first feature space refers to the feature space where the image to be processed is extracted. Generally, the second high-dimensional space is easier to reconstruct super Sub-image; The reconstruction part is used to perform up-sampling and convolution processing on the image features output by the non-linear change part to obtain a super-resolution image corresponding to the image to be input. In the embodiment of the present application, the non-linear transformation part of the network structure can be searched in the search space by means of NAS.
在一种可能的实现方式中,输入第一模块的第一输入特征图为第一尺度,经过降维操作后变换至第二尺度,将第二尺度的第一输入特征图在进行升维操作变换至第三尺度,第三尺度位于第一尺度和第二尺度之间,此时为了实现残差连接操作即在相同尺度上进行特征图的连接,则可以再次对第一尺度的第一输入特征图进行降维操作降维至与第三尺度的第一输入特征图的尺度相同,即经过第一模块处理后的特征图的尺度和第一输入特征图的尺度相同。In a possible implementation, the first input feature map input to the first module is at the first scale, and is transformed to the second scale after the dimensionality reduction operation, and the first input feature map of the second scale is subjected to the dimension upgrade operation Transform to the third scale, the third scale is located between the first scale and the second scale. At this time, in order to realize the residual connection operation, that is, to connect the feature maps on the same scale, you can again input the first scale The dimension reduction operation of the feature map is performed to reduce the dimension to the same scale as the first input feature map of the third scale, that is, the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
上述基本单元cell可以是通过将基本模块根据神经网络的基本操作进行连接得的网络。其中,基本模块中可以包括第一模块,第一模块可以是尺度模块(contextual residual dense block,CRDB),尺度模块可以用于对第一输入特征图进行降维操作和残差连接操作,即尺度模块中可以包括用于对第一输入特征图进行处理的池化子模块和残差连接。The above-mentioned basic unit cell may be a network obtained by connecting basic modules according to the basic operation of a neural network. Among them, the basic module may include a first module, the first module may be a scale module (contextual residual dense block, CRDB), and the scale module may be used to perform dimensionality reduction and residual connection operations on the first input feature map, that is, scale The module may include a pooling sub-module and residual connection for processing the first input feature map.
示例性地,通过降维操作可以使得第一输入特征图的尺度减少,其中,降维操作可以是指对第一输入特征图进行池化操作,或者,降维操作也可以是指对第一输入特征图进行步长为Q的卷积操作,Q为大于1的正整数。Exemplarily, the dimensionality reduction operation can reduce the scale of the first input feature map, where the dimensionality reduction operation can refer to a pooling operation on the first input feature map, or the dimensionality reduction operation can also refer to the first input feature map. The input feature map is subjected to a convolution operation with a step length of Q, where Q is a positive integer greater than 1.
示例性地,上述残差连接操作用于将第一输入特征图与经过第一模块处理后的特征图进行特征相加处理,其中,第一模块处理后的特征图可以是指经过升维操作后的特征图,升维操作是指将经过降维处理后的特征图的尺度恢复至原始的第一尺度,残差连接操作可以是指将第一输入特征图与经过升维操作处理后的特征图进行特征相加处理。Exemplarily, the above-mentioned residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, where the feature map processed by the first module may refer to the dimension upgrade operation After the feature map, the dimension increase operation refers to restoring the scale of the feature map after dimensionality reduction processing to the original first scale, and the residual connection operation can refer to the first input feature map and the dimension enhancement operation processed The feature map performs feature addition processing.
例如,上述升维操作可以是指上采样操作,或者,升维操作也可以是指反向卷积操作(backwards strided convolution),其中,上采样操作可以是指采用内插值方法,即在原有图像像素的基础上在像素点之间采用合适的插值算法***新的元素;反卷积操作可以是指卷积操作的逆过程,又称作转置卷积。For example, the above-mentioned dimension increase operation may refer to an up-sampling operation, or the dimension increase operation may also refer to a backwards convolution operation (backwards strided convolution), where the up-sampling operation may refer to the use of an interpolation method, that is, in the original image On the basis of pixels, a suitable interpolation algorithm is used to insert new elements between pixels; the deconvolution operation can refer to the inverse process of the convolution operation, also known as transposed convolution.
应理解,特征相加可以是指对于同一尺度的特征图,将不同通道特征的信息进行相加。It should be understood that feature addition may refer to adding information of different channel features for feature maps of the same scale.
在本申请的实施例中,尺度模块可以对输入特征图进行残差连接,即可以将第一输入特征图与经过第一模块处理后的特征图进行特征相加处理,从而实现能够将第一输入特征图中更多的局部细节信息传递至后面的卷积层。在尺度模块中确保有足够的第一输入特征图的局部细节信息传递至后面的卷积层时,尺度模块可以用于对第一输入特征图进行降维操作。在一方面,通过降维操作可以降低输入特征图的尺度从而降低模型计算量,同时,残差连接操作可以将前层的信息可以很好地传递到后面的层,这弥补了降维操作信息丢失的缺陷。在另一方面,降维操作还可以快速扩大特征的感受野,让高分辨率像素点的预测更好的考虑上下文的信息,从而提升超分精度。In the embodiment of the present application, the scale module can perform residual connection on the input feature map, that is, can perform feature addition processing on the first input feature map and the feature map processed by the first module, so as to realize that the first More local details in the input feature map are passed to the subsequent convolutional layer. When it is ensured that sufficient local detail information of the first input feature map is transmitted to the subsequent convolutional layer in the scale module, the scale module can be used to perform a dimensionality reduction operation on the first input feature map. On the one hand, the dimensionality reduction operation can reduce the scale of the input feature map to reduce the amount of model calculation. At the same time, the residual connection operation can well transfer the information of the previous layer to the subsequent layer, which makes up for the dimensionality reduction operation information Missing defects. On the other hand, the dimensionality reduction operation can also quickly expand the receptive field of features, allowing the prediction of high-resolution pixels to better consider contextual information, thereby improving the super-resolution accuracy.
应理解,图像超分辨率重构技术是指通过将低分辨率图像进行重构得到高分辨率图像。因此,在图像超分辨处理中需要更多图像特征的局部信息,目前常用的图像超分辨率网络模型均未使用降维操作,主要是由于降维操作会损失部分低分辨率输入图像的局部信息。在本申请的实施例中,通过残差连接操作和/或密集连接操作使得输入特征图中的信息在整个网络中的保持地更好,即可以将前层的信息可以很好地传递到后面的层,从而能够弥补了降维操作造成的信息丢失的缺陷。通过使用降维操作不仅可以降低模型计算量, 而且还能够扩大特征的感受野,提升图像超分率辨网络的精度。It should be understood that the image super-resolution reconstruction technology refers to obtaining a high-resolution image by reconstructing a low-resolution image. Therefore, more local information of image features is needed in image super-resolution processing. Currently, the commonly used image super-resolution network models do not use dimensionality reduction operations, mainly because dimensionality reduction operations will lose part of the local information of low-resolution input images. . In the embodiment of the present application, the information in the input feature map is better maintained in the entire network through residual connection operations and/or dense connection operations, that is, the information of the front layer can be well transmitted to the back This layer can compensate for the defect of information loss caused by dimensionality reduction operations. The use of dimensionality reduction operations can not only reduce the amount of model calculations, but also expand the receptive field of features and improve the accuracy of the image super-resolution recognition network.
在一种可能的实现方式中,尺度模块的网络结构的具体形式可以如图13所示。在图13中示出了三个尺度模块,分别是第d-1个尺度模块、第d个尺度模块以及第d+1个尺度模块。其中,第d个尺度模块中可以包括池化子模块,降维操作可以用于对输入的特征图进行降采样,从而降低特征尺寸大小。In a possible implementation manner, the specific form of the network structure of the scale module may be as shown in FIG. 13. Three scale modules are shown in FIG. 13, which are the d-1th scale module, the dth scale module, and the d+1th scale module. Among them, the d-th scale module may include a pooling sub-module, and the dimensionality reduction operation may be used to downsample the input feature map, thereby reducing the feature size.
例如,上述降维操作可以是指池化操作,比如可以是平均池化,或者,也可以是最大池化。For example, the aforementioned dimensionality reduction operation may refer to a pooling operation, such as average pooling, or maximum pooling.
在图13所示的尺度模块网络结构的示意图中,残差连接可以是指CRDB d-1模块的输出特征图与处理后的特征图进行特征相加,处理后的特征图是指对输入特征图分别进行池化操作、3×3卷积操作、线性整流函数(rectified linear unit,ReLU)、升维操作处理以及1×1卷积操作后得到的特征图。In the schematic diagram of the scale module network structure shown in Figure 13, the residual connection can refer to the feature addition of the output feature map of the CRDB d-1 module and the processed feature map. The processed feature map refers to the input feature The graphs are feature maps obtained after pooling operation, 3×3 convolution operation, rectified linear unit (ReLU), dimension-up operation processing, and 1×1 convolution operation.
进一步地,为了提升目标图像超分辨率网络的性能。尺度模块中还可以用于对第一输入特征图进行密集连接操作,密集连接操作可以是指将i-1个卷积层中各个卷积层的输出特征图以及所述输入特征图进行特征拼接作为第i个卷积层的输入特征图。Further, in order to improve the performance of the target image super-resolution network. The scale module can also be used to perform dense connection operations on the first input feature map. The dense connection operation can refer to the feature stitching of the output feature maps of each convolutional layer in the i-1 convolutional layers and the input feature maps. As the input feature map of the i-th convolutional layer.
示例性地,尺度模块的网络结构的具体形式可以如图14所示。密集连接操作可以实现网络中最大的信息流通,通过每层都与该层之前的所有层都相连,即每层的输入是前面所有层的输出的拼接。通过密集连接操作使得输入特征图中的信息(前向计算时)或梯度(后向计算时)在整个网络中的保持地更好,从而能够更好弥补了降维操作信息丢失的缺陷。即在进行图像超分辨率处理时,通过残差连接操作和密集连接操作可以确保特征图中的信息可以更好的传递至网络结构中的后层,此时通过采用降维操作对输入特征图进行降采样降低特征尺寸大小,从而能够在确保图像超分辨率处理的精度的情况下降低模型计算量。Exemplarily, the specific form of the network structure of the scale module may be as shown in FIG. 14. The dense connection operation can achieve the largest information flow in the network, through each layer being connected to all layers before that layer, that is, the input of each layer is the splicing of the outputs of all the previous layers. Through the dense connection operation, the information in the input feature map (for forward calculation) or gradient (for backward calculation) is better maintained in the entire network, which can better compensate for the loss of information in the dimensionality reduction operation. That is, when performing image super-resolution processing, the residual connection operation and dense connection operation can ensure that the information in the feature map can be better transmitted to the later layers in the network structure. At this time, the input feature map is processed by the dimensionality reduction operation. Downsampling is performed to reduce the feature size, so that the amount of model calculation can be reduced while ensuring the accuracy of image super-resolution processing.
应理解,特征拼接可以是指将同一尺度的M个特征图拼接成一个具有K通道的特征图,其中,K为大于M的正整数。It should be understood that feature splicing may refer to splicing M feature maps of the same scale into a feature map with K channels, where K is a positive integer greater than M.
例如,如图14所示,密集连接操作是指将每一层的输出特征图传递至后面的各个层,后面层的输入是通过前面各个层的输出进行特征图拼接得到的。For example, as shown in Figure 14, the dense connection operation refers to transferring the output feature map of each layer to the subsequent layers, and the input of the latter layer is obtained by splicing the feature maps of the output of the previous layers.
对于神经网络结构而言网络结构的深度越深,即网络结构中卷积层的数量越多,得到的处理图像的精度越高。For the neural network structure, the deeper the network structure, that is, the more convolutional layers in the network structure, the higher the accuracy of the processed image.
在一种可能的实现方式中,尺度模块的网络结构的具体形式可以如图15所示。尺度模块可以用于对输入特征图进行残差连接操作、降维操作、卷积操作以及循环的密集连接操作,即尺度模块中可以包括残差连接、池化子模块、卷积子模块以及循环的密集连接。通过循环的密集连接操作可以增加尺度模块网络结构的深度,从而提升超分辨率处理的精度。In a possible implementation manner, the specific form of the network structure of the scale module may be as shown in FIG. 15. The scale module can be used to perform residual connection operations, dimensionality reduction operations, convolution operations, and loop dense connection operations on the input feature map. That is, the scale module can include residual connections, pooling submodules, convolution submodules, and loops. Dense connection. The cyclic dense connection operation can increase the depth of the scale module network structure, thereby improving the accuracy of super-resolution processing.
需要说明的是,循环(Recursive)操作在正常尺度的特征图上运算会快速增加计算量,但是在降维操作处理后的特征图进行循环操作增加计算量较小。一定数目的操作和降维操作的结合可以在不增加计算量和参数量的情况下提升超分精度。It should be noted that the recursive operation on the feature map of normal scale will quickly increase the amount of calculation, but the recursive operation on the feature map after the dimensionality reduction operation process increases the amount of calculation less. The combination of a certain number of operations and dimensionality reduction operations can improve the super-resolution accuracy without increasing the amount of calculations and parameters.
本申请实施例提出的第一模块即尺度模块,可以实现降低计算量、减少参数、扩大感受野以及解耦参数量和计算量。首先,尺度模块中的降维操作可以通过降低特征图的尺度从而降低网络结构的计算量。The first module proposed in the embodiment of the application, namely the scale module, can reduce the amount of calculation, reduce the parameter, expand the receptive field, and decouple the parameter amount and calculation amount. First of all, the dimensionality reduction operation in the scale module can reduce the calculation amount of the network structure by reducing the scale of the feature map.
例如,以降维操作为2x2池化为例进行说明,假设输入特征图的大小为C in×W×H,卷积核大小为K×K,输出特征的通道数目为C out,则正常卷积的计算量为: For example, taking the dimensionality reduction operation as 2x2 pooling as an example, assuming that the size of the input feature map is C in ×W×H, the size of the convolution kernel is K×K, and the number of channels of the output feature is C out , then normal convolution The amount of calculation is:
FLOPs ori=2(C inK 2+1)C out·HW; FLOPs ori = 2(C in K 2 +1) C out · HW;
其中,可以通过每秒浮点运算次数(floating point operations,FLOPs)表示网络模型的计算量。FLOPs ori表示网络模型通过正常卷积的计算量。 Among them, the calculation amount of the network model can be expressed by the number of floating point operations (FLOPs) per second. FLOPs ori represents the calculation amount of the network model through normal convolution.
加入池化操作后卷积的计算量(FLOPs pool)为: The calculation amount of convolution (FLOPs pool ) after adding the pooling operation is:
Figure PCTCN2020105369-appb-000015
Figure PCTCN2020105369-appb-000015
通过上面正常卷积的计算量和池化操作后卷积的计算量比较可以看作,池化操作可以减少75%的计算量,即使增加三次循环操作也仅仅恢复到原先的计算量。By comparing the calculation amount of the above normal convolution and the calculation amount of the convolution after the pooling operation, it can be seen that the pooling operation can reduce the calculation amount by 75%, and even if the three loop operations are added, the calculation amount is only restored to the original calculation amount.
可选地,在本申请的实施例中第一模块还用于进行重排操作,重排操作是指将第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,第二通道特征的分辨率高于第一通道特征的分辨率。Optionally, in the embodiment of the present application, the first module is also used to perform a rearrangement operation. The rearrangement operation refers to combining multiple first channel features of the first input feature map according to preset rules to generate a first Two-channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
例如,可以是如图16所示重排操作可以是指将4张不同的第一特征通道按照从左至右以及从上到下的规则进行合并处理,将4正通道特征图合并成一张第二通道特征图,第二通道特征图的分辨率高于第一通道特征图。For example, the rearrangement operation as shown in FIG. 16 may refer to merging 4 different first feature channels according to the rules from left to right and top to bottom, and merge the 4 positive channel feature maps into one first feature map. Two-channel feature map, the resolution of the second channel feature map is higher than that of the first channel feature map.
需要说明的是,上述是以多张特征图为4张特征图,预设规则为从左到右以及从上到下为例进行举例说明,并不对本申请作出任何限定。图13至图15中所示的升维子模块均可以用于进行重排操作。It should be noted that the above is based on multiple feature maps as four feature maps, and the preset rule is from left to right and top to bottom as examples for illustration, and does not limit the application in any way. The dimension upgrading sub-modules shown in Fig. 13 to Fig. 15 can all be used for rearranging operations.
通过图13至图15中所示的升维子模块,即通过在1x1卷积之前进行重排操作,可以减少一定量的参数。重排操作可以看作是将多个低尺度特征通道转化为一个高尺度特征通道,从而减少了通道数目。Through the ascending sub-module shown in Figs. 13-15, that is, by performing the rearrangement operation before the 1×1 convolution, a certain amount of parameters can be reduced. The rearrangement operation can be seen as converting multiple low-scale feature channels into one high-scale feature channel, thereby reducing the number of channels.
例如,以降维操作2x2池化为例,假设卷积层数目为N conv,每个卷积层输出通道数目为G,1x1卷积层的输出通道数目为C out,则正常卷积的参数量param ori为: For example, taking the dimensionality reduction operation 2x2 pooling as an example, assuming that the number of convolutional layers is N conv , the number of output channels of each convolutional layer is G, and the number of output channels of 1x1 convolutional layer is C out , the parameter quantity of normal convolution param ori is:
Param ori=N conv·G·C outParam ori =N conv ·G·C out ;
尺度模块的参数量为param upThe parameter quantity of the standard module is param up :
Figure PCTCN2020105369-appb-000016
Figure PCTCN2020105369-appb-000016
通过上述正常卷积的参数量与尺度模块的参数比较可以看出,通过采用重排操作的上采样可以使得尺度模块在1x1卷积层后减少75%的参数量。It can be seen from the comparison between the parameter amount of the normal convolution and the parameter of the scale module that the up-sampling using the rearrangement operation can reduce the parameter amount of the scale module by 75% after the 1x1 convolution layer.
在本申请的实施中,构建基本单元的基本模块中包括尺度模块,一方面尺度模块可以通过降维操作可以扩大感受野,促使高分辨率像素点的预测更好的考虑上下文的信息;另一方面,由于常见超分方法都不使用降维操作,整个非线性变化部分输入特征图的尺度都不会发生变化,从而导致参数量和计算量使线性关系。本申请实施例提出的尺度模块使用降维操作让参数量和计算量相对独立,给予NAS中搜索算法更多的可能性。In the implementation of this application, the basic modules that construct the basic unit include a scale module. On the one hand, the scale module can expand the receptive field through dimensionality reduction operations, so that the prediction of high-resolution pixels can better consider contextual information; another On the one hand, since the common hyperdivision methods do not use dimensionality reduction operations, the scale of the input feature map in the entire non-linear change part will not change, resulting in a linear relationship between the parameter amount and the calculation amount. The scale module proposed in the embodiment of the present application uses a dimensionality reduction operation to make the parameter amount and the calculation amount relatively independent, giving more possibilities for the search algorithm in the NAS.
在本申请的实施例中,构建基本单元的基本模块中除了上述第一模块即尺度模块外还包括第二模块和/或第三模块。下面结合图17至图19详细说明基本模块中还包括的第二模块和第三模块。In the embodiment of the present application, the basic module that constructs the basic unit includes a second module and/or a third module in addition to the aforementioned first module, that is, the standard module. The second module and the third module further included in the basic module will be described in detail below in conjunction with FIGS. 17 to 19.
在一种可能的实现方式中,基本模块中还可以包括第二模块,第二模块可以是紧致模块(shrink residual dense block,SRDB)。紧致模块可以是指在残差密集模块(residual dense  block,RDB)的基础上进行通道压缩处理,从而实现的保留密集连接并有效减少模型参数量。In a possible implementation manner, the basic module may further include a second module, and the second module may be a compact module (shrink residual dense block, SRDB). The compact module may refer to channel compression processing on the basis of the residual dense block (RDB), so as to achieve the retention of dense connections and effectively reduce the amount of model parameters.
具体地,紧致模块用于对第二输入特征图进行通道压缩操作、残差连接操作以及密集连接操作,通道压缩操作可以是指对第二输入特征图进行卷积核为1×1的卷积操作。Specifically, the compaction module is used to perform channel compression operations, residual connection operations, and dense connection operations on the second input feature map. The channel compression operation may refer to the second input feature map with a 1×1 convolution kernel. Product operation.
应理解,当第二模块是基本单元中的第一个模块时,第二输入特征图可以是指该基本单元的上一个基本单元输出的特征图;当第二模块不是基本单元中的第一个模块时,第二输入特征图可以是指经过该模块的前一个模块处理后输出的特征图。在本申请的实施例中第一输入特征图、第二输入特征图以及第三输入特征图均对应于同一个待处理图像。It should be understood that when the second module is the first module in the basic unit, the second input characteristic map may refer to the characteristic map output by the previous basic unit of the basic unit; when the second module is not the first module in the basic unit In the case of a module, the second input feature map may refer to a feature map output after processing by the previous module of the module. In the embodiment of the present application, the first input feature map, the second input feature map, and the third input feature map all correspond to the same image to be processed.
例如,紧致模块的网络结构可以如图17所示,在图17中示出了三个紧致模块,分别是第d-1个紧致模块、第d个紧致模块以及第d+1个紧致模块。其中,可以先采用采用1x1卷积核对特征图进行通道数目压缩,然后进行3x3卷积的特征变换,从而构成紧致的残差密集模块可以简称为紧致模块,能够实现在保留密集链接的条件下大幅减少参数数量。For example, the network structure of the compaction module can be as shown in Figure 17. Figure 17 shows three compaction modules, namely the d-1th compaction module, the dth compaction module and the d+1th compaction module. A compact module. Among them, the 1x1 convolution kernel can be used to compress the number of channels of the feature map, and then the feature transformation of 3x3 convolution can be used to form a compact residual dense module can be referred to as a compact module, which can realize the condition of retaining dense links Significantly reduce the number of parameters.
在一种可能的实现方式中,基本模块中还可以包括第三模块,第三模块可以是指分组模块(group residual dense block,GRDB)。分组模块可以是指在残差密集模块的基础上将卷积操作划分成多个组分别计算,从而有利于减少模型参数。In a possible implementation manner, the basic module may further include a third module, and the third module may refer to a group residual dense block (GRDB). The grouping module may refer to dividing the convolution operation into multiple groups on the basis of the residual intensive module to calculate separately, thereby helping to reduce the model parameters.
具体地,分组模块可以是用于对第三输入特征图进行通道交换操作、残差连接操作以及密集连接操作的模块,第三输入特征图中包括M个子特征图,M个子特征图中每个子特征图包括至少两个相邻的通道特征,通道交换处理可以是指将M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得所述M个子特征图中不同子特征图对应的通道特征相邻,M为大于1的整数。Specifically, the grouping module may be a module for performing channel switching operations, residual connection operations, and dense connection operations on the third input feature map. The third input feature map includes M sub-feature maps, and each sub-feature map is The feature map includes at least two adjacent channel features, and the channel exchange processing may refer to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the M sub feature maps correspond to different sub feature maps. The channel characteristics of are adjacent, and M is an integer greater than 1.
例如,分组模块的网络结构可以如图18所示,在图18中示出了三个分组模块,分别是第d-1个分组模块、第d个分组模块以及第d+1个分组模块。其中,由于卷积层的输入是由前面几层特征图直接进行拼接而成,若直接采用组卷积会导致输出层单个通道特征只能接受前面一个卷积层的特征,从而不利于通道特征间相互协作。因此,在本申请实施例中在残差密集模块的基础上加入了通道交换(Channel Shuffle)操作,从而构成分组的残差密集模块可以简称分组模块,从而有效减少网络的参数量。For example, the network structure of the grouping module may be as shown in FIG. 18, which shows three grouping modules, namely, the d-1th grouping module, the dth grouping module, and the d+1th grouping module. Among them, because the input of the convolutional layer is directly spliced from the feature maps of the previous layers, if the group convolution is directly used, the single channel feature of the output layer can only accept the features of the previous convolutional layer, which is not conducive to the channel feature Collaborate with each other. Therefore, in the embodiment of the present application, a channel shuffle operation is added to the residual intensive module, so that the residual intensive module constituting the packet may be referred to as the packet module, thereby effectively reducing the amount of network parameters.
示例性地,如图19所示,假设第三输入特征图包括三个子特征图1、2、3,每个子特征图中包括3个相邻的通道特征,通道交换即可以是使得同一子特征图中原本相邻的通道特征进行重新排序,从而得到不同子特征图对应的通道特征相邻。Exemplarily, as shown in Fig. 19, suppose that the third input feature map includes three sub-feature maps 1, 2, and 3, and each sub-feature map includes 3 adjacent channel features. The channel exchange can be to make the same sub-feature The originally adjacent channel features in the figure are reordered, so that the channel features corresponding to different sub-feature maps are adjacent.
在本申请的实施例中,基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,如图12中所示的一个cell即可以是一个基本单元,基本单元是用于构建图像超分辨率网络的基础模块。基本模块用于构建基本单元,如图20所示,每个基本单元cell可以由不同的基本模块通过神经网络的基本操作连接得到,基本模块可以包括上述的第一模块、第二模块以及第三模块中的一个或者多个。In the embodiment of this application, the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network. A cell as shown in FIG. 12 can be a basic unit, and the basic unit is used for The basic module for constructing image super-resolution network. The basic module is used to construct the basic unit. As shown in Figure 20, each basic unit cell can be obtained by connecting different basic modules through the basic operation of the neural network. The basic module can include the above-mentioned first module, second module, and third module. One or more of the modules.
步骤820、根据基本单元和网络结构参数构建搜索空间,其中,网络结构参数包括构建基本单元使用的基本模块的类型,搜索空间是用于搜索图像超分辨率网络结构的搜索空间。Step 820: Construct a search space according to the basic unit and network structure parameters, where the network structure parameter includes the type of the basic module used to construct the basic unit, and the search space is a search space for searching the image super-resolution network structure.
其中,基本单元的具体形式可以是上述步骤810中的任一种可能的实现方式。Wherein, the specific form of the basic unit may be any possible implementation manner in the foregoing step 810.
示例性地,在本申请的实施例中,网络参数可以包括:Exemplarily, in the embodiment of the present application, the network parameters may include:
(1)基本单元的数目;(1) The number of basic units;
(2)每个基本单元(cell)选择的模块类型;(2) The module type selected by each basic unit (cell);
例如,基本模块的类型可以包括第一模块、第二模块以及第三模块三种不同的类型,比如,C表示第一模块即尺度模块,S表示第二模块即紧致模块,G表示第三模块即分组模块。For example, the types of basic modules can include three different types: the first module, the second module, and the third module. For example, C represents the first module or standard module, S represents the second module or compact module, and G represents the third module. The module is the grouping module.
(3)基本单元中一个模块内卷积层的数目;(3) The number of convolutional layers in a module in the basic unit;
例如,卷积层的数目可以是{4,6,8}。For example, the number of convolutional layers may be {4, 6, 8}.
(4)基本单元中一个模块内每个卷积层输出的通道数目;(4) The number of channels output by each convolutional layer in a module in the basic unit;
例如,通道数目可以是{16,24,32,48}。For example, the number of channels can be {16,24,32,48}.
(5)整个基本单元输出的通道数目;(5) The number of channels output by the entire basic unit;
例如,一个基本单元的输出通道数目可以是{16,24,32,48}。For example, the number of output channels of a basic unit can be {16,24,32,48}.
(6)基本单元的状态:1表示当下节点接入网络,0表示当下节点不接入网络。(6) The status of the basic unit: 1 means that the current node is connected to the network, and 0 means that the current node is not connected to the network.
在本申请的实施例中,通过基本模块构建基本单元得到的搜索空间可以在给定的基本模块的类型中选择候选网络结构,相当于将连续的搜索空间离散化了,可以有效的减少搜索空间的大小。In the embodiment of the present application, the search space obtained by constructing the basic unit of the basic module can select the candidate network structure in the type of the given basic module, which is equivalent to discretizing the continuous search space, which can effectively reduce the search space. the size of.
步骤830:在搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,目标图像超分辨率网络用于对待处理图像进行超分辨率处理,目标图像超分辨率网络中至少包括第一模块,目标图像超分辨率网络为计算量小于第一预设阈值且图像超分辨率精度大于第二预设阈值的网络。Step 830: Perform an image super-resolution network structure search in the search space to determine the target image super-resolution network. The target image super-resolution network is used to perform super-resolution processing on the image to be processed, and the target image super-resolution network includes at least the first A module, the target image super-resolution network is a network whose calculation amount is less than a first preset threshold and the image super-resolution accuracy is greater than a second preset threshold.
应理解,上述在搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络可以是指通过算法搜索在搜索空间中进行搜索确定满足约束条件的网络结构,或者,也可以是指通过人工搜索在搜索空间中选取满足约束条件的网络结构。It should be understood that the above-mentioned image super-resolution network structure search in the search space to determine the target image super-resolution network may refer to the search in the search space through algorithmic search to determine the network structure that meets the constraints, or it may also refer to Manual search selects the network structure that meets the constraints in the search space.
在一种可能的实现方式中,约束条件可以是指计算量小于第一预设阈值且图像超分辨率精度大于第二预设阈值,使得在移动设备的计算性能受限制的情况下,进行图像超分辨率处理的目标图像超分辨率网络的精度较高。In a possible implementation, the constraint condition may mean that the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold, so that when the computing performance of the mobile device is limited, the image The accuracy of the super-resolution network of the target image processed by the super-resolution is higher.
在一种可能的实现方式中,约束条件可以是指计算量小于第一预设阈值、图像超分辨率精度大于第二预设阈值以及参数量小于第三预设阈值。In a possible implementation manner, the constraint condition may mean that the amount of calculation is less than the first preset threshold, the image super-resolution accuracy is greater than the second preset threshold, and the parameter amount is less than the third preset threshold.
示例性地,常见的搜索算法可以包括但不限于以下算法:随机搜索、贝叶斯优化、进化算法、强化学习、基于梯度的算法等。在搜索空间中进行图像超分辨率网络结构搜索的方法的具体流程可以参考现有技术,为了简洁,本申请中省略对所有搜索方法的详细说明。Exemplarily, common search algorithms may include but are not limited to the following algorithms: random search, Bayesian optimization, evolutionary algorithm, reinforcement learning, gradient-based algorithm, and so on. For the specific process of the method for searching the image super-resolution network structure in the search space, reference may be made to the prior art. For brevity, detailed descriptions of all search methods are omitted in this application.
在一个实施例中,在本申请中可以通过进化算法通过以网络模型的参数量、计算量和模型效果(PSNR)为目标搜索轻量化、快速且精度高的超分网络结构。In one embodiment, in this application, an evolutionary algorithm can be used to search for a lightweight, fast, and high-precision super-divided network structure by targeting the parameter amount, calculation amount, and model effect (PSNR) of the network model.
例如,在所述搜索空间中进行网络搜索确定目标图像超分辨率网络的过程包括以下步骤:在搜索空间中通过进化算法进行网络搜索确定第一图像超分辨率网络;通过多级加权联合损失函数对第一图像超分辨率网络进行反向传播迭代训练确定目标图像超分辨率网络,其中,所述多级加权联合损失函数是根据第一图像超分辨率网络中的每个所述基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的。For example, the process of performing a network search in the search space to determine the target image super-resolution network includes the following steps: performing a network search in the search space through an evolutionary algorithm to determine the first image super-resolution network; using a multi-level weighted joint loss function Perform back-propagation iterative training on the first image super-resolution network to determine the target image super-resolution network, wherein the multi-level weighted joint loss function is based on the output of each basic unit in the first image super-resolution network The loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map is determined.
换而言之,在本申请的实施例中可以通过对进化算法确定的第一图像超分辨率网络通过多级加权联合损失函数进行二次训练,最终确定目标图像超分辨率网络的参数得到目标 图像超分辨率网络。In other words, in the embodiments of the present application, the first image super-resolution network determined by the evolutionary algorithm can be subjected to secondary training through the multi-level weighted joint loss function, and finally the parameters of the target image super-resolution network can be determined to obtain the target Image super-resolution network.
具体地,通过进化算法在搜索空间中搜索目标图像超分辨率网络包括以下步骤:根据基本单元随机生成P个候选网络结构;采用多级加权联合损失函数训练所述P个候选网络结构;评估训练后的P个候选网络结构中每个候选网络结构的性能参数,性能参数包括峰值性噪比,峰值信噪比用于指示通过所述每个候选网络结构得到的预测超分图像与样本超分图像之间的差异;根据候选网络的性能参数确定第一图像超分辨率网络。Specifically, searching for the target image super-resolution network in the search space through the evolutionary algorithm includes the following steps: randomly generating P candidate network structures according to the basic unit; training the P candidate network structures using a multi-level weighted joint loss function; evaluating training The performance parameters of each candidate network structure in the next P candidate network structures, the performance parameters include peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the predicted super-division image and sample super-division obtained by each candidate network structure The difference between images; the first image super-resolution network is determined according to the performance parameters of the candidate network.
例如,图22所示,进化算法执行流程可以包括以下步骤:For example, as shown in Figure 22, the evolutionary algorithm execution process may include the following steps:
第一步:随机生成P个个体(即候选网络结构),该P个候选网络结构为初始化种群;The first step: randomly generate P individuals (ie candidate network structures), and the P candidate network structures are the initial population;
第二步:评价每一个网络结构的适应度(即性能参数),包括参数量、计算量和精度,精度可以通过峰值信噪比(peak signal-to-noise ratio,PSNR)进行度量;Step 2: Evaluate the fitness (ie performance parameters) of each network structure, including the amount of parameters, calculations, and accuracy. The accuracy can be measured by the peak signal-to-noise ratio (PSNR);
第三步:选择并更新精英个体,精英个体可以是看作是性能参数满足预设条件的网络结构;Step 3: Select and update elite individuals, which can be regarded as network structures whose performance parameters meet preset conditions;
第四步:通过交叉和变异产生下一代个体;Step 4: Generate the next generation of individuals through crossover and mutation;
第五步:重复步骤二至步骤四直到进化算法收敛,返回最后精英个体(即第一图像超分辨率网络)。其中,上述精英个体可以是指通过进行算法确定的目标网络结构。Step 5: Repeat steps 2 to 4 until the evolutionary algorithm converges, and return to the last elite individual (that is, the first image super-resolution network). Among them, the above-mentioned elite individuals may refer to a target network structure determined by an algorithm.
在上述第二步中评估网络结构的性能参数时,可以使用本申请提出的多级加权联合损失函数训练待评估的网络结构,在通过多级加权联合损失函数训练后再评估还网络结构的峰值信噪比。When evaluating the performance parameters of the network structure in the above second step, the multi-level weighted joint loss function proposed in this application can be used to train the network structure to be evaluated, and the peak value of the network structure is evaluated after the multi-level weighted joint loss function is trained. Signal-to-noise ratio.
例如,多级加权联合损失函数是根据以下等式得到的,For example, the multi-level weighted joint loss function is obtained according to the following equation,
Figure PCTCN2020105369-appb-000017
Figure PCTCN2020105369-appb-000017
其中,L可以表示多级加权联合损失函数,L k可以表示第一图像超分辨率网络的第k层的损失值,损失值可以是指第k层的输出特征图对应的预测超分辨率图像与样本超分辨率图像之间的图像损失,λ k,t可以表示在t时刻第k层的损失值的权重。 Among them, L can represent a multi-level weighted joint loss function, L k can represent the loss value of the kth layer of the first image super-resolution network, and the loss value can refer to the predicted super-resolution image corresponding to the output feature map of the kth layer The image loss between the super-resolution image and the sample, λ k,t can represent the weight of the loss value of the k-th layer at time t.
例如,如图21所示,由于基本单元中基本模块的数目不同,底层基本单元训练的程度可能有所差异。为了更加充分的学习底层基本单元的参数,提升搜索稳定性和模型的性能,本申请实施例提出了多级加权联合损失函数,即在训练时可以根据每一个基本单元的输出特征图得到预测超分分辨率图像,并计算预测超分分辨率图像与样本超分辨率图像之间的损失值,将各个基本单元的图像损失值进行加权处理后对网络进行训练。For example, as shown in Figure 21, due to the different number of basic modules in the basic unit, the training level of the underlying basic unit may vary. In order to learn the parameters of the underlying basic unit more fully and improve the search stability and the performance of the model, this embodiment of the application proposes a multi-level weighted joint loss function, that is, during training, the prediction super can be obtained according to the output feature map of each basic unit. The resolution image is divided, and the loss value between the predicted super resolution image and the sample super resolution image is calculated, and the image loss value of each basic unit is weighted and the network is trained.
需要说明的是,各个中间层图像损失的权重可以随着时间(或者,迭代次数)的变化而变化。该损失函数可以联合各个中间层的预测图像损失,并通过加权方式体现不同层的重要程度,其中,各个中间层图像损失的权重值可以随着时间变化而变化,这样有利于更加充分地训练底层基本单元的参数,从而提升超分辨率网络的性能。It should be noted that the weight of each intermediate layer image loss may change with time (or the number of iterations). The loss function can combine the predicted image loss of each intermediate layer, and reflect the importance of different layers by weighting. Among them, the weight value of each intermediate layer image loss can change over time, which is conducive to more fully training the bottom layer The parameters of the basic unit to improve the performance of the super-resolution network.
表1Table 1
Figure PCTCN2020105369-appb-000018
Figure PCTCN2020105369-appb-000018
Figure PCTCN2020105369-appb-000019
Figure PCTCN2020105369-appb-000019
表1是通过在标准超分数据集上测试本申请提出的构建基本单元的基本模块的性能。表1所示为几种用本申请所提出的基本模块构造的图像超分辨率网络模型的实验结果,其中,每秒浮点运算次数(floating point operations,FLOPs)表示网络模型的计算量即每秒浮点运算次数,可以用于描述神经网络的计算量,评价模型的计算效率;参数量(parameters)可以用于描述神经网络包含的参数量,用于评价模型的大小;SET5,SET14,B100,Urban100表示不同的数据集的名称,通过数据集的训练可以评估网络模型的图像超分辨率精度,例如可以评估网络模型的峰值信噪比PSNR;Baseline表示小型的残差密集网络。从表1所示可以看出,本申请实施例提出的基本模块(例如,包括尺度模块GRDN、紧致模块SRDN、分组模块CRDN)以及目标图像超分辨率网络即高效的超分辨网络(efficient super-resolution network,ESRN)可以在参数量和计算量不变的情况下有效提升模型精度。Table 1 shows the performance of the basic module of the basic unit proposed in this application by testing on the standard super-division data set. Table 1 shows the experimental results of several image super-resolution network models constructed with the basic modules proposed in this application. Among them, the number of floating point operations per second (FLOPs) represents the amount of calculation of the network model, which is The number of floating-point operations per second can be used to describe the calculation amount of the neural network and evaluate the calculation efficiency of the model; parameters can be used to describe the parameters included in the neural network to evaluate the size of the model; SET5, SET14, B100 , Urban100 represents the name of different data sets. Through the training of the data set, the image super-resolution accuracy of the network model can be evaluated. For example, the peak signal-to-noise ratio (PSNR) of the network model can be evaluated; Baseline represents a small residual dense network. As shown in Table 1, it can be seen that the basic modules proposed in the embodiments of this application (for example, include the standard module GRDN, compact module SRDN, and packet module CRDN) and the target image super-resolution network, that is, the efficient super-resolution network. -resolution network, ESRN) can effectively improve the accuracy of the model without changing the amount of parameters and calculations.
表2Table 2
Figure PCTCN2020105369-appb-000020
Figure PCTCN2020105369-appb-000020
表2是本申请实施例所提出的多级加权联合损失函数的测试结果。表2所示为深度卷积网络在应用多级加权联合损失函数后的实验结果,其中,Joint loss表示通过本申请实施例提出的多级加权损失函数训练的网络模型。从表2所示可以看出,通过本申请实施例中提供的多级加权联合损失函数训练图像超分辨率网络可以有效提升了图像超分辨率网络的精度。Table 2 is the test result of the multi-level weighted joint loss function proposed in the embodiment of the application. Table 2 shows the experimental results of the deep convolutional network after applying the multi-level weighted joint loss function, where Joint loss represents the network model trained by the multi-level weighted loss function proposed in the embodiment of this application. It can be seen from Table 2 that training the image super-resolution network through the multi-level weighted joint loss function provided in the embodiment of the present application can effectively improve the accuracy of the image super-resolution network.
表3table 3
Figure PCTCN2020105369-appb-000021
Figure PCTCN2020105369-appb-000021
Figure PCTCN2020105369-appb-000022
Figure PCTCN2020105369-appb-000022
表3是本申请实施例中提供的图像超分辨率网络在标准数据集上的结果统计。其中,类型1表示图像超分辨率模型运行时间为Fast;类型2图像超分辨率模型运行时间为Very Fast;模型中包括选择超分网络(deep networks with SUs,SelNet)、级联残差网络(cascading residual network,CARN),小型级联残差网络(mini cascading residual network,CARN-M),轻量级快速超分辨率网络(fast accurate and light super-resolution network,FALSR),FALSR-A与FALSR-B表示不同的网络模型;ESRN表示本申请实施例中的目标图像超分辨率网络即高效的超分辨网络,例如,可以是快速高效超分辨率网络(fast efficient super-resolution network,ESRN-F),小型高效超分辨率网络(mini efficient super-resolution network,ESRN-M)。从表3可以看出,本申请实施例提供的目标图像超分辨率网络以及基本模块的计算量以及图像超分辨率精度优于其他网络模型。Table 3 is the result statistics of the image super-resolution network provided in the embodiment of the present application on the standard data set. Among them, type 1 means that the running time of the image super-resolution model is Fast; the running time of the type 2 image super-resolution model is Very Fast; the model includes selecting deep networks with SUs, SelNet, and cascaded residual networks ( cascading residual network (CARN), mini cascading residual network (CARN-M), lightweight fast accurate and light super-resolution network (FALSR), FALSR-A and FALSR -B represents different network models; ESRN represents the target image super-resolution network in the embodiment of this application, that is, an efficient super-resolution network, for example, it can be a fast efficient super-resolution network (ESRN-F) ), small efficient super-resolution network (ESRN-M). It can be seen from Table 3 that the target image super-resolution network provided by the embodiment of the present application and the calculation amount of the basic module and the image super-resolution accuracy are better than other network models.
表4Table 4
Figure PCTCN2020105369-appb-000023
Figure PCTCN2020105369-appb-000023
Figure PCTCN2020105369-appb-000024
Figure PCTCN2020105369-appb-000024
表4是本申请实施例提供的目标图像超分辨率网络在不同超分辨率尺度上的测试结果。其中,倍数为×3表示在输出超分辨率图像为720p(1280×720)的基础进行3倍尺度的超分辨率测试;倍数为×4表示在输出超分辨率图像为720p(1280×720)的基础进行4倍尺度的超分辨率测试;模型中包括超分辨率卷积神经网络(super resolution convolutional neural network,SRCNN),深度超分网络(very deep convolutional super resolution network,VDSR),SelNet,CARN,CARN-M,ESRN,ESRN-F,ESRN-M。Table 4 is the test results of the target image super-resolution network provided by the embodiments of the present application on different super-resolution scales. Among them, a multiple of ×3 means that the output super-resolution image is 720p (1280×720) based on the super-resolution test of 3 times the scale; a multiple of ×4 means that the output super-resolution image is 720p (1280×720) 4 times the scale of the super-resolution test; models include super-resolution convolutional neural network (SRCNN), deep super-resolution network (very deep convolutional super-resolution network, VDSR), SelNet, CARN , CARN-M, ESRN, ESRN-F, ESRN-M.
上述表1至表3中FLOPs的计算是以输出超分图像为720p(1280x720)为例进行x2尺度的图像超分辨率处理的测试结果,从表1至表3中的数据可以看出,本申请实施例提供的神经网络的搜索方法在不同参数量下都可以找到超分精度更好的模型。除此之外,由于本申请实施例提供的图像超分辨率网络中引入了降维操作,还可以通过约束模型的计算量FLOPs来搜索一个快速的中等参数量模型,在保证图像超分辨率效果高于FALSR-A模型的情况下,可以降低接近一半的计算量。同时,还分别在x3和x4尺度上进行了图像超分辨率测试,测试结果如表5所示,通过本申请实施的神经网络的搜索方法搜索到的轻量级网络(ESRN,ESRN-F,ESRN-M)在不同超分辨率尺度上的实验结果均超过了其他网络模型。The calculation of FLOPs in the above table 1 to table 3 is based on the output super-division image of 720p (1280x720) as an example, the x2 scale image super-resolution processing test results, from the data in Table 1 to Table 3 can be seen, the The neural network search method provided in the application embodiment can find a model with better super-resolution accuracy under different parameters. In addition, since the dimensionality reduction operation is introduced in the image super-resolution network provided by the embodiment of this application, it is also possible to search for a fast medium-parameter model by constraining the calculation amount FLOPs of the model to ensure the image super-resolution effect In the case of higher than the FALSR-A model, the calculation amount can be reduced by nearly half. At the same time, the image super-resolution test was carried out on the x3 and x4 scales. The test results are shown in Table 5. The lightweight network (ESRN, ESRN-F, ESRN-M) experimental results on different super-resolution scales surpass other network models.
表5table 5
模型model RDNRDN CARNCARN ESRNESRN ESRN-FESRN-F ESRN-MESRN-M
GPU运行时间(ms)GPU running time (ms) 181.5181.5 45.645.6 52.552.5 36.236.2 30.930.9
表5是本申请实施例提供的目标图像超分辨率网络的运行时间的测试结果。从表5中可以看出,通过本申请实施例中的神经网络的搜索方法得到到的超分辨率网络不仅精度高,同时运行效率也较高。Table 5 is the test result of the running time of the target image super-resolution network provided by the embodiment of the present application. It can be seen from Table 5 that the super-resolution network obtained by the neural network search method in the embodiment of the present application not only has high accuracy, but also has high operating efficiency.
图23和图24是通过本申请实施例的神经网络的搜索方法确定的目标图像超分辨率网络进行图像超分辨率处理的效果图。FIG. 23 and FIG. 24 are effect diagrams of image super-resolution processing performed by the target image super-resolution network determined by the neural network search method of the embodiment of the present application.
其中,图23和图24展示了利用本申请所提出的基础模块构建的图像超分辨率网络进行图像充分辨率处理后的图像效果。以尺度x3进行超分辨率处理为例说明,图23示出了在Set14数据集中的图像进行超分辨率处理后的视觉效果图。图24示出了在Urban100数据集中的图像进行超分辨率处理后的视觉效果图。其中,包括高分辨率网络(high resolution,HR),金字塔超分网络(deep laplacian pyramid networks for super-resolution,LapSRN),双三次插值网络,CARN-M,CARN,VDSR,ESRN-M,ESRN。通过在Urban100和Set14数据集中通过使用本申请实施例提供的轻型高效的深度卷积网络(例如,ESRN、ESRN-M)可以得到了具有更高清晰度的图像。因此,本申请提出的神经网络的搜索方法得到的图像超分辨率网络不仅可以降低网络参数量和计算量,还可以有效提升图像超分的 视觉效果,使得超分图像的边缘更加清晰。Among them, FIG. 23 and FIG. 24 show the image effect after the image super-resolution network constructed by the basic module proposed in this application performs image resolution processing. Taking the super-resolution processing on the scale x3 as an example, FIG. 23 shows the visual effect diagram of the images in the Set14 data set after the super-resolution processing. Figure 24 shows the visual effect of the images in the Urban100 dataset after super-resolution processing. These include high resolution networks (HR), deep laplacian pyramid networks for super-resolution, LapSRN, bicubic interpolation networks, CARN-M, CARN, VDSR, ESRN-M, ESRN. By using the lightweight and efficient deep convolutional network (for example, ESRN, ESRN-M) provided by the embodiments of the present application in the Urban100 and Set14 datasets, images with higher definition can be obtained. Therefore, the image super-resolution network obtained by the neural network search method proposed in this application can not only reduce the amount of network parameters and the amount of calculation, but also can effectively improve the visual effect of the image super-division, making the edge of the super-division image clearer.
图25是本申请实施例提供的图像处理方法的示意性流程图。图25所示的方法900包括步骤910和步骤920,下面对步骤910和步骤920进行详细的说明。FIG. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The method 900 shown in FIG. 25 includes step 910 and step 920, and step 910 and step 920 will be described in detail below.
步骤910:获取待处理图像。Step 910: Obtain an image to be processed.
其中,待处理图像可以是电子设备通过摄像头拍摄到的图像,或者,该待处理图像还可以是从电子设备内部获得的图像(例如,电子设备的相册中存储的图像,或者,电子设备从云端获取的图片)。Among them, the image to be processed may be an image captured by the electronic device through a camera, or the image to be processed may also be an image obtained from within the electronic device (for example, an image stored in an album of the electronic device, or the electronic device from the cloud Image obtained).
步骤920:根据目标图像超分辨率网络对所述待处理图像进行超分辨率处理得到目标图像,目标图像为待处理图像对应的超分辨率图像。Step 920: Perform super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image, where the target image is a super-resolution image corresponding to the image to be processed.
其中,上述目标图像超分辨率网络可以是根据图11所示的方法得到的。Wherein, the aforementioned target image super-resolution network may be obtained according to the method shown in FIG. 11.
上述目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,搜索空间是通过基本单元和网络结构参数构建的,搜索空间用于搜索图像超分辨率网络结构,网络结构参数包括构建所述基本单元使用的基本模块的类型,基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,基本模块包括第一模块,第一模块用于对第一输入特征图进行残差连接操作和降维操作,残差连接操作是指将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,第二尺度小于第一尺度,目标图像超分辨率网络中至少包括所述第一模块,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。The above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space. The search space is constructed by basic units and network structure parameters. The search space is used to search for the image super-resolution network structure. The structural parameters include the type of the basic module used to construct the basic unit. The basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network. The basic module includes the first module, which is used to A residual connection operation and a dimensionality reduction operation are performed on an input feature map. The residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimensionality reduction operation is used To transform the scale of the first input feature map from the original first scale to the second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, and the first The scale of the feature map processed by the module is the same as the scale of the first input feature map.
可选地,在一种可能的实现方式中,所述基本单元是用于构建图像超分辨率网络的基础模块。Optionally, in a possible implementation manner, the basic unit is a basic module for constructing an image super-resolution network.
可选地,在一种可能的实现方式中,所述降维操作可以包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。Optionally, in a possible implementation manner, the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
可选地,在一种可能的实现方式中,所述第一模块处理后的特征图为经过升维操作后的特征图,所述升维操作是指将经过所述降维处理后的特征图的尺度恢复至所述第一尺度,所述残差连接操作是指将所述第一输入特征图与经过所述升维操作处理后的特征图进行特征相加处理。Optionally, in a possible implementation manner, the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation, and the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing. The scale of the map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
可选地,在一种可能的实现方式中,所述第一模块还用于对所述第一输入特征图进行密集连接操作,其中,所述密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及所述第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。Optionally, in a possible implementation manner, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1 The output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
可选地,在一种可能的实现方式中,所述密集连接操作为循环的密集连接操作,所述循环的密集连接操作是指对经过通道压缩处理后的所述第一输入特征图进行特征拼接处理。Optionally, in a possible implementation manner, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
可选地,在一种可能的实现方式中,所述第一模块还用于进行重排操作,所述重排操作是指将所述第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,所述第二通道特征的分辨率高于所述第一通道特征的分辨率。Optionally, in a possible implementation manner, the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
可选地,在一种可能的实现方式中,所述基本模块还包括第二模块和/或第三模块,其中,所述第二模块用于对第二输入特征图进行通道压缩操作、所述残差连接操作以及所 述密集连接操作,所述通道压缩操作是指对所述第二输入特征图进行卷积核为1×1的卷积操作;所述第三模块用于对第三输入特征图进行通道交换操作、所述残差连接操作以及所述密集连接操作,所述第三输入特征图中包括M个子特征图,所述M子特征图中每个子特征图包括至少两个相邻的通道特征,所述通道交换处理是指将所述M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得所述M个子特征图中不同子特征图对应的通道特征相邻,M为大于1的整数,所述第一输入特征图、所述第二输入特征图以及所述第三输入特征图对应相同的图像。Optionally, in a possible implementation manner, the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1×1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
可选地,在一种可能的实现方式中,所述目标图像超分辨率网络是通过多级加权联合损失函数对第一图像超分辨率网络进行反向传播迭代训练确定的网络,其中,所述多级加权联合损失函数是根据所述第一图像超分辨率网络中的每个所述基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的,所述第一图像超分变率网络是指在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定的网络。Optionally, in a possible implementation manner, the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network, The first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
可选地,在一种可能的实现方式中,所述多级加权联合损失函数是根据以下等式得到的,Optionally, in a possible implementation manner, the multi-level weighted joint loss function is obtained according to the following equation:
Figure PCTCN2020105369-appb-000025
Figure PCTCN2020105369-appb-000025
其中,L表示所述多级加权联合损失函数,L k表示所述第一图像超分辨率网络的第k个所述基本单元的损失值,所述损失值是指所述第k个所述基本单元的输出特征图对应的预测超分辨率图像与所述样本超分辨率图像之间的图像损失,λ k,t表示在t时刻所述第k层的损失值的权重,N表示所述第一图像超分辨率网络包括的所述基本单元的数量,N为大于或等于1的整数。 Wherein, L represents the multi-level weighted joint loss function, L k represents the loss value of the kth basic unit of the first image super-resolution network, and the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the output feature map of the basic unit, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
可选地,在一种可能的实现方式中,所述第一图像超分辨率网络是在P个候选网络结构中每个候选网络结构的性能参数确定的,所述P个候选网络结构是根据所述基本单元随机生成的,所述性能参数是指评估通过采用所述多级加权联合损失函数训练后的所述P个候选网络结构的性能的参数,所述性能参数包括峰值性噪比,所述峰值信噪比用于指示通过所述每个候选网络结构得到的预测超分图像与样本超分图像之间的差异。Optionally, in a possible implementation manner, the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit, the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function, and the performance parameter includes a peak-to-noise ratio, The peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
图26是本申请实施例的提供的图像显示方法的示意性流程图。图26所示的方法1000包括步骤1010至1040,下面分别对这些步骤进行详细的描述。FIG. 26 is a schematic flowchart of an image display method provided by an embodiment of the present application. The method 1000 shown in FIG. 26 includes steps 1010 to 1040, and these steps will be described in detail below.
步骤1010、检测到用户用于打开相机的第一操作。Step 1010: The first operation used by the user to turn on the camera is detected.
步骤1020、响应于所述第一操作,在所述显示屏上显示拍摄界面,在所述显示屏上显示拍摄界面,所述拍摄界面上包括取景框,所述取景框内包括第一图像。Step 1020: In response to the first operation, display a photographing interface on the display screen, and display a photographing interface on the display screen. The photographing interface includes a viewfinder frame, and the viewfinder frame includes a first image.
在一个示例中,用户的拍摄行为可以包括用户打开相机的第一操作;响应于所述第一操作,在显示屏上显示拍摄界面。In an example, the user's shooting behavior may include a first operation of the user to turn on the camera; in response to the first operation, displaying a shooting interface on the display screen.
图27中的(a)示出了手机的一种图形用户界面(graphical user interface,GUI),该GUI为手机的桌面1110。当电子设备检测到用户点击桌面1110上的相机应用(application,APP)的图标1120的操作后,可以启动相机应用,显示如图27中的(b)所示的另一GUI,该GUI可以称为拍摄界面1130。该拍摄界面1130上可以包括取景框1140。在预览状态下,该取景框1140内可以实时显示预览图像。(A) in FIG. 27 shows a graphical user interface (GUI) of the mobile phone, and the GUI is the desktop 1110 of the mobile phone. When the electronic device detects that the user has clicked the icon 1120 of the camera application (application, APP) on the desktop 1110, it can start the camera application and display another GUI as shown in (b) in Figure 27, which can be called It is the shooting interface 1130. The shooting interface 1130 may include a viewing frame 1140. In the preview state, the preview image can be displayed in the viewfinder frame 1140 in real time.
示例性的,参见图27中的(b),电子设备在启动相机后,取景框1140内可以显示有第一图像,该第一图像为彩色图像。拍摄界面上还可以包括用于指示拍照模式的控件 1150,以及其它拍摄控件。Exemplarily, referring to (b) in FIG. 27, after the electronic device starts the camera, a first image may be displayed in the view frame 1140, and the first image is a color image. The shooting interface may also include a control 1150 for indicating the shooting mode, and other shooting controls.
在一个示例中,用户的拍摄行为可以包括用户打开相机的第一操作;响应于所述第一操作,在显示屏上显示拍摄界面。例如,电子设备可以检测到用户点击桌面上的相机应用(application,APP)的图标的第一操作后,可以启动相机应用,显示拍摄界面。在拍摄界面上可以包括取景框,可以理解的是,在拍照模式和录像模式下,取景框的大小可以不同。例如,取景框可以为拍照模式下的取景框。在录像模式下,取景框可以为整个显示屏。在预览状态下即可以是用户打开相机且未按下拍照/录像按钮之前,该取景框内可以实时显示预览图像。In an example, the user's shooting behavior may include a first operation of the user to turn on the camera; in response to the first operation, displaying a shooting interface on the display screen. For example, after the electronic device detects the first operation of the user clicking on the icon of the camera application (application, APP) on the desktop, it can start the camera application and display the shooting interface. The shooting interface may include a viewfinder frame. It is understandable that the size of the viewfinder frame may be different in the photo mode and the video mode. For example, the viewfinder frame may be the viewfinder frame in the photo mode. In video mode, the viewfinder frame can be the entire display screen. In the preview state, that is, before the user turns on the camera and does not press the photo/video button, the preview image can be displayed in the viewfinder in real time.
在一个示例中,预览图像可以为彩色图像,预览图像可以是在相机设置为自动分辨率的情况下显示的图像。In one example, the preview image may be a color image, and the preview image may be an image displayed when the camera is set to automatic resolution.
步骤1030、检测到所述用户指示相机的第二操作。Step 1030: Detect the second operation of the camera instructed by the user.
例如,可以是检测到用户指示第一处理模式的第二操作。其中,第一处理模式可以是专业拍摄模(例如,超分辨率拍摄模式)。参见图28(a),拍摄界面上包括拍摄选项1160,在电子设备检测到用户点击拍摄选项1160后,参见图28(b),电子设备显示拍摄模式界面。在电子设备检测到用户点击拍摄模式界面上用于指示专业拍摄模式1161后,手机进入专业拍摄模式。For example, it may be the detection of the second operation indicating the first processing mode by the user. Among them, the first processing mode may be a professional shooting mode (for example, a super-resolution shooting mode). Referring to Fig. 28(a), the shooting interface includes a shooting option 1160. After the electronic device detects that the user clicks on the shooting option 1160, referring to Fig. 28(b), the electronic device displays a shooting mode interface. After the electronic device detects that the user clicks on the shooting mode interface to indicate the professional shooting mode 1161, the mobile phone enters the professional shooting mode.
例如,可以是检测到用户用于指示拍摄的第二操作,该第二操作为在拍摄远距离的物体或者拍摄微小的物体的情况下用于指示拍摄的操作。参见图28(c)中,电子设备在低照度环境下,检测到用户用于指示拍摄的第二操作1170。For example, it may be the detection of a second operation for instructing shooting by the user, and the second operation is an operation for instructing shooting in the case of shooting a distant object or shooting a tiny object. Referring to FIG. 28(c), the electronic device detects a second operation 1170 used by the user to instruct shooting in a low-light environment.
应理解,用户用于指示拍摄行为的第二操作可以包括按下电子设备的相机中的拍摄按钮,也可以包括用户设备通过语音指示电子设备进行拍摄行为,或者,还可以包括用户其它的指示电子设备进行拍摄行为。上述为举例说明,并不对本申请作任何限定。It should be understood that the second operation used by the user to instruct the shooting behavior may include pressing the shooting button in the camera of the electronic device, or may include the user equipment instructing the electronic device to perform the shooting behavior through voice, or may also include other instructions from the user. The device performs shooting behavior. The above are examples and do not limit the application in any way.
步骤1040、响应于所述第二操作,在所述取景框内显示第二图像,所述第二图像为针对所述摄像头采集到的所述第一图像进行超分辨率处理后的图像,其中,所述目标图像为所述待处理图像对应的超分辨率图像。Step 1040: In response to the second operation, display a second image in the viewing frame, where the second image is an image after super-resolution processing is performed on the first image collected by the camera, where , The target image is a super-resolution image corresponding to the image to be processed.
其中,上述目标图像超分辨率网络可以是根据图11所示的方法得到的。Wherein, the aforementioned target image super-resolution network may be obtained according to the method shown in FIG. 11.
上述目标图像超分辨率网络是在搜索空间中通过图像超分辨率网络结构搜索确定的网络,所述搜索空间是通过基本单元和网络结构参数构建的,所述搜索空间用于搜索图像超分辨率网络结构,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块包括第一模块,所述第一模块用于对第一输入特征图进行残差连接操作和降维操作,所述残差连接操作是指将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述目标图像超分辨率网络中至少包括所述第一模块,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同。The above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in a search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search for image super-resolution The network structure, the network structure parameter includes the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module includes the first A module, the first module is used to perform residual connection operations and dimensionality reduction operations on a first input feature map, and the residual connection operation refers to combining the first input feature map with the first module processing The latter feature map is subjected to feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale The target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
参见图28,图28(d)中取景框内显示的是第二图像,图28(c)中取景框内显示的是第一图像,第二图像和第一图像的内容相同或者实质上相同,但是第二图像的画质优于第一图像,例如,第二图像的分辨率高于第一图像的分辨率。Referring to Figure 28, the second image is displayed in the viewfinder frame in Figure 28(d), and the first image is displayed in the viewfinder frame in Figure 28(c). The content of the second image and the first image are the same or substantially the same , But the quality of the second image is better than that of the first image. For example, the resolution of the second image is higher than that of the first image.
可选地,在一种可能的实现方式中,所述基本单元是用于构建图像超分辨率网络的基 础模块。Optionally, in a possible implementation manner, the basic unit is a basic module used to construct an image super-resolution network.
可选地,在一种可能的实现方式中,所述降维操作可以包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。Optionally, in a possible implementation manner, the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
可选地,在一种可能的实现方式中,所述第一模块处理后的特征图为经过升维操作后的特征图,所述升维操作是指将经过所述降维处理后的特征图的尺度恢复至所述第一尺度,所述残差连接操作是指将所述第一输入特征图与经过所述升维操作处理后的特征图进行特征相加处理。Optionally, in a possible implementation manner, the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation, and the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing. The scale of the map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
可选地,在一种可能的实现方式中,所述第一模块还用于对所述第一输入特征图进行密集连接操作,其中,所述密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及所述第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。Optionally, in a possible implementation manner, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1 The output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
可选地,在一种可能的实现方式中,所述密集连接操作为循环的密集连接操作,所述循环的密集连接操作是指对经过通道压缩处理后的所述第一输入特征图进行特征拼接处理。Optionally, in a possible implementation manner, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
可选地,在一种可能的实现方式中,所述第一模块还用于进行重排操作,所述重排操作是指将所述第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,所述第二通道特征的分辨率高于所述第一通道特征的分辨率。Optionally, in a possible implementation manner, the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
可选地,在一种可能的实现方式中,所述基本模块还包括第二模块和/或第三模块,其中,所述第二模块用于对第二输入特征图进行通道压缩操作、所述残差连接操作以及所述密集连接操作,所述通道压缩操作是指对所述第二输入特征图进行卷积核为1×1的卷积操作;所述第三模块用于对第三输入特征图进行通道交换操作、所述残差连接操作以及所述密集连接操作,所述第三输入特征图中包括M个子特征图,所述M子特征图中每个子特征图包括至少两个相邻的通道特征,所述通道交换处理是指将所述M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得所述M个子特征图中不同子特征图对应的通道特征相邻,所述M为大于1的整数,所述第一输入特征图、所述第二输入特征图以及所述第三输入特征图对应相同的图像。Optionally, in a possible implementation manner, the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1×1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, the M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
可选地,在一种可能的实现方式中,所述目标图像超分辨率网络是通过多级加权联合损失函数对第一图像超分辨率网络进行反向传播迭代训练确定的网络,其中,所述多级加权联合损失函数是根据所述第一图像超分辨率网络中的每个所述基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的,所述第一图像超分变率网络是指在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定的网络。Optionally, in a possible implementation manner, the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network, The first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
可选地,在一种可能的实现方式中,所述多级加权联合损失函数是根据以下等式得到的,Optionally, in a possible implementation manner, the multi-level weighted joint loss function is obtained according to the following equation:
Figure PCTCN2020105369-appb-000026
Figure PCTCN2020105369-appb-000026
其中,L表示所述多级加权联合损失函数,L k表示所述第一图像超分辨率网络的第k个所述基本单元的损失值,所述损失值是指所述第k个所述基本单元的输出特征图对应的预测超分辨率图像与样本超分辨率图像之间的图像损失,λ k,t表示在t时刻所述第k层的损失值的权重,N表示所述第一图像超分辨率网络包括的所述基本单元的数量,N为大于或等于1的整数。 Wherein, L represents the multi-level weighted joint loss function, L k represents the loss value of the kth basic unit of the first image super-resolution network, and the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the output feature map of the basic unit, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the first The number of the basic units included in the image super-resolution network, where N is an integer greater than or equal to 1.
可选地,在一种可能的实现方式中,所述第一图像超分辨率网络是在P个候选网络结构中每个候选网络结构的性能参数确定的,所述P个候选网络结构是根据所述基本单元随机生成的,所述性能参数是指评估通过采用所述多级加权联合损失函数训练后的所述P个候选网络结构的性能的参数,所述性能参数包括峰值性噪比,所述峰值信噪比用于指示通过所述每个候选网络结构得到的预测超分图像与样本超分图像之间的差异。Optionally, in a possible implementation manner, the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit, the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function, and the performance parameter includes a peak-to-noise ratio, The peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
应理解,上述举例说明是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。It should be understood that the foregoing illustration is intended to help those skilled in the art understand the embodiments of the present application, and is not intended to limit the embodiments of the present application to the specific numerical values or specific scenarios illustrated. Those skilled in the art can obviously make various equivalent modifications or changes based on the above examples given, and such modifications or changes also fall within the scope of the embodiments of the present application.
上文结合图1至图28,详细描述了本申请实施例提供的神经网络的搜索方法以及图像处理方法,下面将结合图29和图30,详细描述本申请的装置实施例。应理解,本申请实施例中的神经网络的搜索装置可以执行前述本申请实施例的各种神经网络的搜索方法,图像处理装置可以执行前述本申请实施例的各种图像处理方法,即以下各种产品的具体工作过程,可以参考前述方法实施例中的对应过程。The above describes in detail the neural network search method and image processing method provided by the embodiments of the present application with reference to FIGS. 1 to 28. The following describes the device embodiments of the present application in detail with reference to FIGS. 29 and 30. It should be understood that the neural network search device in the embodiment of this application can execute the various neural network search methods of the aforementioned embodiment of this application, and the image processing device can execute the aforementioned various image processing methods of the embodiment of this application, namely the following For the specific working process of this product, refer to the corresponding process in the foregoing method embodiment.
图29是本申请实施例提供的神经网络的搜索装置的硬件结构示意图。图29所示的神经网络的搜索装置1200(该装置1200具体可以是一种计算机设备)包括存储器1201、处理器1202、通信接口1203以及总线1204。其中,存储器1201、处理器1202、通信接口1203通过总线1204实现彼此之间的通信连接。FIG. 29 is a schematic diagram of the hardware structure of a neural network search device provided by an embodiment of the present application. The neural network search device 1200 shown in FIG. 29 (the device 1200 may specifically be a computer device) includes a memory 1201, a processor 1202, a communication interface 1203, and a bus 1204. Among them, the memory 1201, the processor 1202, and the communication interface 1203 implement communication connections between each other through the bus 1204.
存储器1201可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1201可以存储程序,当存储器1201中存储的程序被处理器1202执行时,处理器1202用于执行本申请实施例的神经网络的搜索方法的各个步骤,例如,执行图11所示的各个步骤。The memory 1201 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 1201 may store a program. When the program stored in the memory 1201 is executed by the processor 1202, the processor 1202 is configured to execute each step of the neural network search method of the embodiment of the present application, for example, execute each step shown in FIG. 11 .
应理解,本申请实施例所示的神经网络的搜索装置可以是服务器,例如,可以是云端的服务器,或者,也可以是配置于云端的服务器中的芯片。It should be understood that the neural network search device shown in the embodiment of the present application may be a server, for example, it may be a cloud server, or may also be a chip configured in a cloud server.
处理器1202可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的神经网络的搜索方法。The processor 1202 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute related programs to implement the neural network search method in the method embodiment of the present application.
处理器1202还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的神经网络的搜索方法的各个步骤可以通过处理器1202中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 1202 may also be an integrated circuit chip with signal processing capability. In the implementation process, the various steps of the neural network search method of the present application can be completed by hardware integrated logic circuits in the processor 1202 or instructions in the form of software.
上述处理器1202还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1201,处理器1202 读取存储器1201中的信息,结合其硬件完成本神经网络的搜索装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的图11所示的神经网络的搜索方法。The above-mentioned processor 1202 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 1201, and the processor 1202 reads the information in the memory 1201, and combines its hardware to complete the functions required by the units included in the search device of the neural network, or execute the method shown in FIG. 11 of the method embodiment of the application. The neural network search method shown.
通信接口1203使用例如但不限于收发器一类的收发装置,来实现装置1200与其他设备或通信网络之间的通信。The communication interface 1203 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
总线1204可包括在装置1200各个部件(例如,存储器1201、处理器1202、通信接口1203)之间传送信息的通路。The bus 1204 may include a path for transferring information between various components of the device 1200 (for example, the memory 1201, the processor 1202, and the communication interface 1203).
图30是本申请实施例的图像处理装置的硬件结构示意图。图30所示的图像处理装置1300包括存储器1301、处理器1302、通信接口1303以及总线1304。其中,存储器1301、处理器1302、通信接口1303通过总线1304实现彼此之间的通信连接。FIG. 30 is a schematic diagram of the hardware structure of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 1300 shown in FIG. 30 includes a memory 1301, a processor 1302, a communication interface 1303, and a bus 1304. Among them, the memory 1301, the processor 1302, and the communication interface 1303 implement communication connections between each other through the bus 1304.
存储器1301可以是ROM,静态存储设备和RAM。存储器1301可以存储程序,当存储器1301中存储的程序被处理器1302执行时,处理器1302和通信接口1303用于执行本申请实施例的图像处理方法的各个步骤,例如,可以执行图25和图26所示的图像处理方法的各个步骤。The memory 1301 may be ROM, static storage device and RAM. The memory 1301 may store a program. When the program stored in the memory 1301 is executed by the processor 1302, the processor 1302 and the communication interface 1303 are used to execute each step of the image processing method of the embodiment of the present application. For example, FIG. 25 and FIG. 26 shows the various steps of the image processing method.
处理器1302可以采用通用的,CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像处理装置中的单元所需执行的功能,或者执行本申请方法实施例的图像处理方法。The processor 1302 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits to execute related programs to realize the functions required by the units in the image processing apparatus of the embodiment of the present application. Or execute the image processing method in the method embodiment of this application.
处理器1302还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的图像处理方法的各个步骤可以通过处理器1302中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 1302 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the image processing method in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 1302 or instructions in the form of software.
上述处理器1302还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1301,处理器1302读取存储器1301中的信息,结合其硬件完成本申请实施例的图像处理装置中包括的单元所需执行的功能,或者执行本申请方法实施例的图像处理方法。The aforementioned processor 1302 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 1301, and the processor 1302 reads the information in the memory 1301, and combines its hardware to complete the functions required by the units included in the image processing apparatus of the embodiment of the present application, or perform the image processing of the method embodiment of the present application method.
通信接口1303使用例如但不限于收发器一类的收发装置,来实现装置1300与其他设备或通信网络之间的通信。例如,可以通过通信接口1303获取待处理图像。The communication interface 1303 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or communication networks. For example, the image to be processed can be acquired through the communication interface 1303.
总线1304可包括在装置1300各个部件(例如,存储器1301、处理器1302、通信接口1303)之间传送信息的通路。The bus 1304 may include a path for transferring information between various components of the device 1300 (for example, the memory 1301, the processor 1302, and the communication interface 1303).
应注意,尽管上述装置1200和装置1300仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,1200和装置1300还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,上述装置1200和装置1300还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,上述装置1200和装置1300也可仅仅包括实现本申请实施例所必须的器件,而不必包括图29或图30中所示的全部器件。It should be noted that although the above-mentioned apparatus 1200 and apparatus 1300 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that 1200 and apparatus 1300 may also include those necessary for normal operation. Other devices. At the same time, according to specific needs, those skilled in the art should understand that the above-mentioned apparatus 1200 and apparatus 1300 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the above-mentioned apparatus 1200 and apparatus 1300 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 29 or FIG. 30.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及 算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (22)

  1. 一种神经网络的搜索方法,其特征在于,包括:A neural network search method, characterized in that it includes:
    构建基本单元,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块包括第一模块,所述第一模块用于对第一输入特征图进行降维操作和残差连接操作,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述残差连接操作用于将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同;Construct a basic unit, the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network, the basic module includes a first module, the first module is used to perform the first input feature map A dimensionality reduction operation and a residual connection operation, the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale, The residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the scale of the feature map processed by the first module and the first module The scale of the input feature map is the same;
    根据所述基本单元和网络结构参数构建搜索空间,其中,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述搜索空间用于搜索图像超分辨率网络结构;Constructing a search space according to the basic unit and network structure parameters, wherein the network structure parameter includes the type of the basic module used to construct the basic unit, and the search space is used for searching the image super-resolution network structure;
    在所述搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,所述目标图像超分辨率网络用于对待处理图像进行超分辨率处理,所述目标图像超分辨率网络中至少包括所述第一模块,所述目标图像超分辨率网络为计算量小于第一预设阈值且图像超分辨率精度大于第二预设阈值的网络。The image super-resolution network structure search is performed in the search space to determine the target image super-resolution network, the target image super-resolution network is used to perform super-resolution processing on the image to be processed, and the target image super-resolution network is At least the first module is included, and the target image super-resolution network is a network with a calculation amount less than a first preset threshold and an image super-resolution accuracy greater than a second preset threshold.
  2. 如权利要求1所述的搜索方法,其特征在于,所述降维操作包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。The search method according to claim 1, wherein the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  3. 如权利要求1或2所述的搜索方法,其特征在于,所述第一模块处理后的特征图为经过升维操作后的特征图,所述升维操作是指将经过所述降维处理后的特征图的尺度恢复至所述第一尺度,所述残差连接操作是指将所述第一输入特征图与经过所述升维操作处理后的特征图进行特征相加处理。The search method according to claim 1 or 2, wherein the feature map processed by the first module is a feature map after a dimensionality increase operation, and the dimensionality increase operation means that the dimensionality reduction processing The scale of the subsequent feature map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  4. 如权利要求1至3中任一项所述的搜索方法,其特征在于,所述第一模块还用于对所述第一输入特征图进行密集连接操作,其中,所述密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及所述第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。The search method according to any one of claims 1 to 3, wherein the first module is further configured to perform a dense connection operation on the first input feature map, wherein the dense connection operation refers to The output feature map of each convolutional layer in the i-1 convolutional layers and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  5. 如权利要求4所述的搜索方法,其特征在于,所述密集连接操作为循环的密集连接操作,所述循环的密集连接操作是指对经过通道压缩处理后的所述第一输入特征图进行特征拼接处理。The search method according to claim 4, wherein the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to performing a channel compression process on the first input feature map. Feature splicing processing.
  6. 如权利要求1至5中任一项所述的搜索方法,其特征在于,所述第一模块还用于重排操作,所述重排操作是指将所述第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,所述第二通道特征的分辨率高于所述第一通道特征的分辨率。The search method according to any one of claims 1 to 5, wherein the first module is also used for a rearrangement operation, and the rearrangement operation refers to combining a plurality of the first input feature maps The first channel feature is merged according to a preset rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  7. 根据权利要求4至6中任一项所述的搜索方法,其特征在于,所述基本模块还包括第二模块和/或第三模块,其中,所述第二模块用于对第二输入特征图进行通道压缩操作、所述残差连接操作以及所述密集连接操作,所述通道压缩操作是指对所述第二输入特征图进行卷积核为1×1的卷积操作;所述第三模块用于对第三输入特征图进行通道交换操作、所述残差连接操作以及所述密集连接操作,所述第三输入特征图中包括M个子特征图,所述M子特征图中每个子特征图包括至少两个相邻的通道特征,所述通道交换处理 是指将所述M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得所述M个子特征图中不同子特征图对应的通道特征相邻,M为大于1的整数,所述第一输入特征图、所述第二输入特征图以及所述第三输入特征图对应相同的图像。The search method according to any one of claims 4 to 6, wherein the basic module further comprises a second module and/or a third module, wherein the second module is used to compare the second input feature The image performs a channel compression operation, the residual connection operation, and the dense connection operation. The channel compression operation refers to performing a convolution operation with a 1×1 convolution kernel on the second input feature map; The three modules are used to perform channel exchange operations, the residual connection operation, and the dense connection operation on the third input feature map. The third input feature map includes M sub-feature maps, and each of the M sub-feature maps The sub-feature maps include at least two adjacent channel features, and the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps so that the M sub-feature maps are different The channel features corresponding to the sub feature maps are adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  8. 如权利要求1至7中任一项所述的搜索方法,其特征在于,所述在所述搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,包括:8. The search method according to any one of claims 1 to 7, wherein the searching of the image super-resolution network structure in the search space to determine the target image super-resolution network comprises:
    在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定第一图像超分辨率网络;Search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network;
    通过多级加权联合损失函数对所述第一图像超分辨率网络进行反向传播迭代训练确定所述目标图像超分辨率网络,其中,所述多级加权联合损失函数是根据所述第一图像超分辨率网络中的每个所述基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的。Performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function to determine the target image super-resolution network, wherein the multi-level weighted joint loss function is based on the first image The loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the super-resolution network is determined.
  9. 如权利要求8所述的搜索方法,其特征在于,所述多级加权联合损失函数是根据以下等式得到的,The search method of claim 8, wherein the multi-level weighted joint loss function is obtained according to the following equation:
    Figure PCTCN2020105369-appb-100001
    Figure PCTCN2020105369-appb-100001
    其中,L表示所述多级加权联合损失函数,L k表示所述第一图像超分辨率网络的第k个所述基本单元的损失值,所述损失值是指所述第k个所述基本单元的输出特征图对应的预测超分辨率图像与所述样本超分辨率图像之间的图像损失,λ k,t表示在t时刻所述第k层的损失值的权重,N表示所述第一图像超分辨率网络中包括的所述基本单元的数量,N为大于或等于1的整数。 Wherein, L represents the multi-level weighted joint loss function, L k represents the loss value of the kth basic unit of the first image super-resolution network, and the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the output feature map of the basic unit, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
  10. 如权利要求8或9所述的搜索方法,其特征在于,所述在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定第一图像超分辨率网络,包括:The search method according to claim 8 or 9, wherein the search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network comprises:
    根据所述基本单元随机生成P个候选网络结构,P为大于1的整数;Randomly generate P candidate network structures according to the basic unit, where P is an integer greater than 1;
    采用所述多级加权联合损失函数训练所述P个候选网络结构;Training the P candidate network structures by using the multi-level weighted joint loss function;
    评估训练后的所述P个候选网络结构中每个候选网络结构的性能参数,所述性能参数包括峰值性噪比,所述峰值信噪比用于指示通过所述每个候选网络结构得到的预测超分图像与所述样本超分图像之间的差异;Evaluate the performance parameters of each candidate network structure in the P candidate network structures after training, where the performance parameters include peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the value obtained through each candidate network structure Predict the difference between the super-division image and the sample super-division image;
    根据所述候选网络的性能参数确定所述第一图像超分辨率网络。The first image super-resolution network is determined according to the performance parameters of the candidate network.
  11. 一种神经网络的搜索装置,其特征在于,包括:A neural network search device, characterized in that it comprises:
    存储器,用于存储程序;Memory, used to store programs;
    处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行以下过程:The processor is configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute the following process:
    构建基本单元,所述基本单元是通过神经网络的基本操作将基本模块进行连接得到的一种网络结构,所述基本模块包括第一模块,所述第一模块用于对第一输入特征图进行降维操作和残差连接操作,所述降维操作用于将所述第一输入特征图的尺度从原始的第一尺度变换至第二尺度,所述第二尺度小于所述第一尺度,所述残差连接操作用于将所述第一输入特征图与经过所述第一模块处理后的特征图进行特征相加处理,所述第一模块处理后的特征图的尺度和所述第一输入特征图的尺度相同;Construct a basic unit, the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network, the basic module includes a first module, the first module is used to perform the first input feature map A dimensionality reduction operation and a residual connection operation, the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale, The residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the scale of the feature map processed by the first module and the first module The scale of the input feature map is the same;
    根据所述基本单元和网络结构参数构建搜索空间,其中,所述网络结构参数包括构建所述基本单元使用的基本模块的类型,所述搜索空间用于搜索图像超分辨率网络结构;Constructing a search space according to the basic unit and network structure parameters, wherein the network structure parameter includes the type of the basic module used to construct the basic unit, and the search space is used for searching the image super-resolution network structure;
    在所述搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,所述目标图像超分辨率网络用于对待处理图像进行超分辨率处理,所述目标图像超分辨率网络中至少包括所述第一模块,所述目标图像超分辨率网络为计算量小于第一预设阈值且图像超分辨率精度大于第二预设阈值的网络。The image super-resolution network structure search is performed in the search space to determine the target image super-resolution network, the target image super-resolution network is used to perform super-resolution processing on the image to be processed, and the target image super-resolution network is At least the first module is included, and the target image super-resolution network is a network with a calculation amount less than a first preset threshold and an image super-resolution accuracy greater than a second preset threshold.
  12. 如权利要求11所述的搜索装置,其特征在于,所述降维操作包括池化操作和步长为Q的卷积操作中的至少一项,Q为大于1的正整数。The search device according to claim 11, wherein the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  13. 如权利要求11或12所述的搜索装置,其特征在于,所述第一模块处理后的特征图为经过升维操作后的特征图,所述升维操作是指将经过所述降维处理后的特征图的尺度恢复至所述第一尺度,所述残差连接操作是指将所述第一输入特征图与经过所述升维操作处理后的特征图进行特征相加处理。The search device according to claim 11 or 12, wherein the feature map processed by the first module is a feature map after a dimensionality increase operation, and the dimensionality increase operation refers to the dimensionality reduction The scale of the subsequent feature map is restored to the first scale, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  14. 如权利要求11至13中任一项所述的搜索装置,其特征在于,所述第一模块还用于对所述第一输入特征图进行密集连接操作,其中,所述密集连接操作是指将i-1个卷积层中各个卷积层的输出特征图以及所述第一输入特征图进行特征拼接作为第i个卷积层的输入特征图,i为大于1的正整数。The search device according to any one of claims 11 to 13, wherein the first module is further configured to perform a dense connection operation on the first input feature map, wherein the dense connection operation refers to The output feature map of each convolutional layer in the i-1 convolutional layers and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  15. 如权利要求14所述的搜索装置,其特征在于,所述密集连接操作为循环的密集连接操作,所述循环的密集连接操作是指对经过通道压缩处理后的所述第一输入特征图进行特征拼接处理。The search device according to claim 14, wherein the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to performing channel compression processing on the first input feature map. Feature splicing processing.
  16. 如权利要求11至15中任一项所述的搜索装置,其特征在于,所述第一模块还用于重排操作,所述重排操作是指将所述第一输入特征图的多个第一通道特征按照预设规则进行合并处理生成一个第二通道特征,其中,所述第二通道特征的分辨率高于所述第一通道特征的分辨率。The search device according to any one of claims 11 to 15, wherein the first module is also used for rearranging operations, and the rearranging operations refer to combining multiple The first channel feature is merged according to a preset rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  17. 根据权利要求14至16中任一项所述的搜索装置,其特征在于,所述基本模块还包括第二模块和/或第三模块,其中,所述第二模块用于对第二输入特征图进行通道压缩操作、所述残差连接操作以及所述密集连接操作,所述通道压缩操作是指对所述第二输入特征图进行卷积核为1×1的卷积操作;所述第三模块用于对第三输入特征图进行通道交换操作、所述残差连接操作以及所述密集连接操作,所述第三输入特征图中包括M个子特征图,所述M子特征图中每个子特征图包括至少两个相邻的通道特征,所述通道交换处理是指将所述M个子特征图对应的至少两个相邻的通道特征进行重新排序,使得所述M个子特征图中不同子特征图对应的通道特征相邻,M为大于1的整数,所述第一输入特征图、所述第二输入特征图以及所述第三输入特征图对应相同的图像。The search device according to any one of claims 14 to 16, wherein the basic module further comprises a second module and/or a third module, wherein the second module is used to compare the second input feature The image performs a channel compression operation, the residual connection operation, and the dense connection operation. The channel compression operation refers to performing a convolution operation with a 1×1 convolution kernel on the second input feature map; The three modules are used to perform channel exchange operations, the residual connection operation, and the dense connection operation on the third input feature map. The third input feature map includes M sub-feature maps, and each of the M sub-feature maps The sub-feature maps include at least two adjacent channel features, and the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps so that the M sub-feature maps are different The channel features corresponding to the sub feature maps are adjacent, and M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  18. 如权利要求11至17中任一项所述的搜索装置,其特征在于,所述在所述搜索空间中进行图像超分辨率网络结构搜索确定目标图像超分辨率网络,包括:17. The search device according to any one of claims 11 to 17, wherein said performing an image super-resolution network structure search in the search space to determine a target image super-resolution network comprises:
    在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定第一图像超分辨率网络;Search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network;
    通过多级加权联合损失函数对所述第一图像超分辨率网络进行反向传播迭代训练确定所述目标图像超分辨率网络,其中,所述多级加权联合损失函数是根据所述第一图像超分辨率网络中的每个所述基本单元输出的特征图对应的预测超分辨率图像与样本超分辨率图像之间的损失确定的。Performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function to determine the target image super-resolution network, wherein the multi-level weighted joint loss function is based on the first image The loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the super-resolution network is determined.
  19. 如权利要求18所述的搜索装置,其特征在于,所述多级加权联合损失函数是根 据以下等式得到的,The search device of claim 18, wherein the multi-stage weighted joint loss function is obtained according to the following equation:
    Figure PCTCN2020105369-appb-100002
    Figure PCTCN2020105369-appb-100002
    其中,L表示所述多级加权联合损失函数,L k表示所述第一图像超分辨率网络的第k个所述基本单元的损失值,所述损失值是指所述第k个所述基本单元的输出特征图对应的预测超分辨率图像与所述样本超分辨率图像之间的图像损失,λ k,t表示在t时刻所述第k层的损失值的权重,N表示所述第一图像超分辨率网络包括的所述基本单元的数量,N为大于或等于1的整数。 Wherein, L represents the multi-level weighted joint loss function, L k represents the loss value of the kth basic unit of the first image super-resolution network, and the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the output feature map of the basic unit, λ k,t represents the weight of the loss value of the k-th layer at time t, and N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
  20. 如权利要求18或19所述的搜索装置,其特征在于,所述在所述搜索空间中通过进化算法进行图像超分辨率网络结构搜索确定第一图像超分辨率网络,包括:The search device according to claim 18 or 19, wherein the search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network comprises:
    根据所述基本单元随机生成P个候选网络结构,P为大于1的整数;Randomly generate P candidate network structures according to the basic unit, where P is an integer greater than 1;
    采用所述多级加权联合损失函数训练所述P个候选网络结构;Training the P candidate network structures by using the multi-level weighted joint loss function;
    评估训练后的所述P个候选网络结构中每个候选网络结构的性能参数,所述性能参数包括峰值性噪比,所述峰值信噪比用于指示通过所述每个候选网络结构得到的预测超分图像与所述样本超分图像之间的差异;Evaluate the performance parameters of each candidate network structure in the P candidate network structures after training, where the performance parameters include peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the value obtained through each candidate network structure Predict the difference between the super-division image and the sample super-division image;
    根据所述候选网络的性能参数确定所述第一图像超分辨率网络。The first image super-resolution network is determined according to the performance parameters of the candidate network.
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至10中任一项所述的搜索方法。A computer-readable storage medium, wherein the computer-readable medium stores a program code for device execution, and the program code includes a search method for executing the search method according to any one of claims 1 to 10.
  22. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1至10中任一项所述的搜索方法。A chip, characterized in that, the chip comprises a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method according to any one of claims 1 to 10 Search method.
PCT/CN2020/105369 2019-07-30 2020-07-29 Neural network search method and apparatus WO2021018163A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910695706.7 2019-07-30
CN201910695706.7A CN112308200B (en) 2019-07-30 2019-07-30 Searching method and device for neural network

Publications (1)

Publication Number Publication Date
WO2021018163A1 true WO2021018163A1 (en) 2021-02-04

Family

ID=74230275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105369 WO2021018163A1 (en) 2019-07-30 2020-07-29 Neural network search method and apparatus

Country Status (2)

Country Link
CN (1) CN112308200B (en)
WO (1) WO2021018163A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990053A (en) * 2021-03-29 2021-06-18 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN113033422A (en) * 2021-03-29 2021-06-25 中科万勋智能科技(苏州)有限公司 Face detection method, system, equipment and storage medium based on edge calculation
CN113065426A (en) * 2021-03-19 2021-07-02 浙江理工大学 Gesture image feature fusion method based on channel perception
CN113869395A (en) * 2021-09-26 2021-12-31 大连理工大学 Light-weight underwater target detection method based on feature fusion and neural network search
CN114998958A (en) * 2022-05-11 2022-09-02 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN115131727A (en) * 2022-06-12 2022-09-30 西北工业大学 Pedestrian re-identification method based on residual error unit structure search
CN115841587A (en) * 2022-10-24 2023-03-24 智慧眼科技股份有限公司 Feature extraction method, device and equipment for image classification task and storage medium
CN116416468A (en) * 2023-04-11 2023-07-11 安徽中科星联信息技术有限公司 SAR target detection method based on neural architecture search
CN116703729A (en) * 2023-08-09 2023-09-05 荣耀终端有限公司 Image processing method, terminal, storage medium and program product

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580639B (en) * 2021-03-01 2021-08-13 四川大学 Early gastric cancer image identification method based on evolutionary neural network model compression
CN113706530A (en) * 2021-10-28 2021-11-26 北京矩视智能科技有限公司 Surface defect region segmentation model generation method and device based on network structure
CN115273129B (en) * 2022-02-22 2023-05-05 珠海数字动力科技股份有限公司 Lightweight human body posture estimation method and device based on neural architecture search
CN115601792A (en) * 2022-12-14 2023-01-13 长春大学(Cn) Cow face image enhancement method
CN117058000B (en) * 2023-10-10 2024-02-02 苏州元脑智能科技有限公司 Neural network architecture searching method and device for image super-resolution

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132147A1 (en) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Enhancement of visual data
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803378B2 (en) * 2017-03-15 2020-10-13 Samsung Electronics Co., Ltd System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN108985457B (en) * 2018-08-22 2021-11-19 北京大学 Deep neural network structure design method inspired by optimization algorithm
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of search structure method and device of deep neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132147A1 (en) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Enhancement of visual data
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065426A (en) * 2021-03-19 2021-07-02 浙江理工大学 Gesture image feature fusion method based on channel perception
CN113065426B (en) * 2021-03-19 2023-10-17 浙江理工大学 Gesture image feature fusion method based on channel perception
CN112990053B (en) * 2021-03-29 2023-07-25 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN113033422A (en) * 2021-03-29 2021-06-25 中科万勋智能科技(苏州)有限公司 Face detection method, system, equipment and storage medium based on edge calculation
CN112990053A (en) * 2021-03-29 2021-06-18 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN113869395A (en) * 2021-09-26 2021-12-31 大连理工大学 Light-weight underwater target detection method based on feature fusion and neural network search
CN113869395B (en) * 2021-09-26 2024-05-24 大连理工大学 Lightweight underwater target detection method based on feature fusion and neural network search
CN114998958A (en) * 2022-05-11 2022-09-02 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN114998958B (en) * 2022-05-11 2024-04-16 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN115131727B (en) * 2022-06-12 2024-03-15 西北工业大学 Pedestrian re-identification method based on residual unit structure search
CN115131727A (en) * 2022-06-12 2022-09-30 西北工业大学 Pedestrian re-identification method based on residual error unit structure search
CN115841587A (en) * 2022-10-24 2023-03-24 智慧眼科技股份有限公司 Feature extraction method, device and equipment for image classification task and storage medium
CN115841587B (en) * 2022-10-24 2023-11-24 智慧眼科技股份有限公司 Feature extraction method, device, equipment and storage medium for image classification task
CN116416468A (en) * 2023-04-11 2023-07-11 安徽中科星联信息技术有限公司 SAR target detection method based on neural architecture search
CN116416468B (en) * 2023-04-11 2023-10-03 安徽中科星联信息技术有限公司 SAR target detection method based on neural architecture search
CN116703729A (en) * 2023-08-09 2023-09-05 荣耀终端有限公司 Image processing method, terminal, storage medium and program product
CN116703729B (en) * 2023-08-09 2023-12-19 荣耀终端有限公司 Image processing method, terminal, storage medium and program product

Also Published As

Publication number Publication date
CN112308200B (en) 2024-04-26
CN112308200A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
WO2021018163A1 (en) Neural network search method and apparatus
CN110188795B (en) Image classification method, data processing method and device
WO2020177651A1 (en) Image segmentation method and image processing device
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
WO2021164731A1 (en) Image enhancement method and image enhancement apparatus
WO2021043273A1 (en) Image enhancement method and apparatus
WO2022042713A1 (en) Deep learning training method and apparatus for use in computing device
WO2022116856A1 (en) Model structure, model training method, and image enhancement method and device
CN111402130B (en) Data processing method and data processing device
EP3923233A1 (en) Image denoising method and apparatus
CN112236779A (en) Image processing method and image processing device based on convolutional neural network
WO2022001805A1 (en) Neural network distillation method and device
WO2021008206A1 (en) Neural architecture search method, and image processing method and device
US20220157041A1 (en) Image classification method and apparatus
CN111914997B (en) Method for training neural network, image processing method and device
CN112070664B (en) Image processing method and device
CN110222718B (en) Image processing method and device
CN113011562A (en) Model training method and device
WO2022021938A1 (en) Image processing method and device, and neutral network training method and device
WO2021018251A1 (en) Image classification method and device
WO2021103731A1 (en) Semantic segmentation method, and model training method and apparatus
CN113191489B (en) Training method of binary neural network model, image processing method and device
CN112561028A (en) Method for training neural network model, and method and device for data processing
WO2024002211A1 (en) Image processing method and related apparatus
CN113066018A (en) Image enhancement method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20848622

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20848622

Country of ref document: EP

Kind code of ref document: A1