WO2023272431A1 - 图像处理方法及装置 - Google Patents

图像处理方法及装置 Download PDF

Info

Publication number
WO2023272431A1
WO2023272431A1 PCT/CN2021/102739 CN2021102739W WO2023272431A1 WO 2023272431 A1 WO2023272431 A1 WO 2023272431A1 CN 2021102739 W CN2021102739 W CN 2021102739W WO 2023272431 A1 WO2023272431 A1 WO 2023272431A1
Authority
WO
WIPO (PCT)
Prior art keywords
image processing
module
task model
image
visual task
Prior art date
Application number
PCT/CN2021/102739
Other languages
English (en)
French (fr)
Inventor
伍玮翔
伍文龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180099442.4A priority Critical patent/CN117529725A/zh
Priority to PCT/CN2021/102739 priority patent/WO2023272431A1/zh
Publication of WO2023272431A1 publication Critical patent/WO2023272431A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application relates to the field of computer vision, and more specifically, to an image processing method and device.
  • Computer vision is an integral part of various intelligent/autonomous systems in various application fields such as manufacturing, inspection, document analysis, medical diagnosis, and military. What we need is the knowledge of the data and information of the subject being photographed. To put it figuratively, it is to install eyes (cameras/video cameras) and brains (algorithms) on computers to replace human eyes to identify, track and measure targets, so that computers can perceive the environment.
  • Computer vision can be seen as the science of how to make artificial systems "perceive" from images or multidimensional data. In general, computer vision is to use various imaging systems to replace the visual organs to obtain input information, and then use the computer to replace the brain to complete the processing and interpretation of these input information.
  • Computer vision tasks include tasks such as image classification, object detection, object tracking, and object segmentation.
  • a series of image signal processing image signal processing, ISP
  • ISP image signal processing
  • This visualized image can be used as an input image for computer vision tasks.
  • the purpose of ISP is usually to meet human visual needs.
  • the image obtained after a series of image signal processing can meet human visual needs, but performing visual tasks based on the image may not necessarily obtain ideal processing results.
  • the present application provides an image processing method and device, which can obtain an image processing flow suitable for a visual task and improve the performance of a visual task model.
  • an image processing method comprising: acquiring a first image; processing the first image through at least one image processing module to obtain a second image; inputting the second image into a visual task model for processing Processing; adjusting at least one image processing module according to the processing results of the visual task model.
  • the image processing flow is adjusted according to the processing result of the visual task model, which is beneficial to obtain an image suitable for the visual task, so as to ensure the performance of the visual task model.
  • the solutions of the embodiments of the present application can adjust the image processing flow according to the requirements of different application scenarios, so as to adapt to different application scenarios.
  • the first image may be a raw image acquired by a sensor.
  • the image processing module is used for image signal processing on the input image.
  • the second image may be an RGB image.
  • processing the first image through at least one image processing module to obtain the second image includes: processing the first image through at least one image processing module and the weight of the at least one image processing module to obtain the second image .
  • the processing result of the at least one image processing module is adjusted according to the weight of the at least one image processing module to obtain the second image.
  • the vision task includes: target detection, image classification, target segmentation, target tracking, or image recognition.
  • the visual task model is used to perform visual tasks. For example, if the vision task is target detection, then the vision task model is the target detection model. For another example, if the visual task is image recognition, then the visual task model is an image recognition model.
  • the vision task model can be a trained model.
  • the processing results of the vision task model may include performance indicators of the vision task model.
  • the performance index of the vision task model includes the accuracy of reasoning or the value of the loss function.
  • the loss function can be set as needed.
  • the loss function is used to indicate the difference between the inference result of the vision task model and the true value corresponding to the first image. It should be noted that the loss function here may be the loss function in the training process of the vision task model, or other forms of loss functions may also be used.
  • the processing result of the vision task model may include detection accuracy.
  • the processing result of the vision task model may include segmentation accuracy.
  • the visual task model may use a neural network model, or may also use a non-neural network model.
  • the at least one image processing module is adjusted according to the processing result of the visual task model, so that the processing result of the visual task model is as close to expectation as possible.
  • the at least one image adjustment module may be adjusted by means of a Bayesian optimization method, an RNN model, or a reinforcement learning algorithm.
  • adjusting at least one image processing module according to the processing result of the visual task model includes: adjusting the at least one image processing module according to the time of image processing and the processing result of the visual task model module.
  • the image processing time may be the processing time of the visual task model, or may also be the processing time of the at least one image processing module, or may also be the difference between the processing time of the visual task model and the processing time of the at least one image processing module. sum.
  • the processing speed can be improved and the time delay can be reduced under the premise of ensuring the performance of the visual task model.
  • At least one image processing module includes multiple image processing modules, and adjusting at least one image processing module according to the processing results of the visual task model includes: changing the at least one image processing module module.
  • Changing the at least one image processing module may include: deleting some image processing modules in the at least one image processing module or/and adding other image processing modules.
  • the combination of image processing modules is changed according to the processing results of the visual task model, so that a combination of image processing modules more suitable for the visual task model can be obtained, which is conducive to improving the performance of the visual task model.
  • At least one image processing module includes multiple image processing modules, and adjusting at least one image processing module according to the processing results of the visual task model includes: processing according to the visual task model As a result, some image processing modules among the plurality of image processing modules are deleted.
  • some image processing modules are deleted according to the processing results of the visual task model, which can reduce the time required for image processing, increase the processing speed, and reduce the requirement for computing power.
  • deleting some of the image processing modules in the multiple image processing modules according to the processing results of the visual task model includes: adjusting multiple image processing modules according to the processing results of the visual task model The weight of the module, the weights of the multiple image processing modules are used to process the processing results of the multiple image processing modules to obtain the second image; delete the parts of the multiple image processing modules according to the adjusted weights of the multiple image processing modules Image processing module.
  • the image processing module to be deleted is determined according to the weight of each image processing module, and the image processing module with a relatively small weight value is deleted, so that the impact on the processing result of the visual task model is small, and the visual impact after deletion is relatively small.
  • the performance of the task model is less affected. That is to say, the solutions of the embodiments of the present application can reduce unnecessary operations, reduce computing overhead, and improve processing speed on the premise of ensuring the performance of the visual task model.
  • the multiple image processing modules are m image processing modules.
  • m is an integer greater than 1.
  • the n image processing modules with the smallest adjusted weights are deleted from the m image processing modules.
  • n is an integer greater than 1 and less than m.
  • an image processing module whose adjusted weight is less than or equal to a weight threshold is deleted from the m image processing modules.
  • adjusting at least one image processing module according to a processing result of the visual task model includes: adjusting parameters in at least one image processing module according to a processing result of the visual task model.
  • adjusting at least one image processing module according to the processing result of the visual task model includes: deleting part of the image processing module from multiple image processing modules according to the processing result of the visual task model module; the fifth image is processed by the undeleted image processing module in the plurality of image processing modules to obtain the sixth image, and the sixth image is input into the visual task model for processing; according to the processing result of the visual task model, the undeleted The parameters of the image processing module that were removed.
  • the performance indicators obtained by the visual task model are used to adjust the weights of multiple image processing modules, so as to keep the performance indicators that have a relatively small impact on the visual task model.
  • a large image processing module or in other words, an image processing module that maintains or improves the performance indicators of the vision task model.
  • an image processing module suitable for the visual task model can be obtained, or in other words, the image processing module required by the visual task model can be obtained, which reduces the time required for the image processing process, saves computing overhead, reduces the demand for computing power, and requires more hardware. friendly.
  • using the performance index obtained from the visual task model to adjust the parameters in the reserved image processing module for example, using the performance index obtained from the visual task model to search the design space of the image processing module is conducive to obtaining the optimal value of each image processing module. parameter configuration to improve the performance of the vision task model.
  • adjusting the at least one image processing module according to the processing result of the visual task model includes: adjusting the processing sequence of the at least one image processing module according to the processing result of the visual task model.
  • the processing sequence of the image processing module is adjusted according to the processing result of the visual task model, so that an image processing flow more suitable for the visual task can be obtained, which is conducive to improving the accuracy of the visual task.
  • At least one image processing module includes: a black level compensation module, a green balance module, a bad pixel correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, color correction module, gamma correction module or noise reduction and sharpening module.
  • Any image processing module in the at least one image processing module may be implemented by a neural network algorithm, or may also be implemented by a non-neural network algorithm.
  • an image processing method comprising: acquiring a third image; determining at least one target image processing module according to a visual task model; processing the third image by at least one target image processing module to obtain a fourth An image; the fourth image is processed through the visual task model to obtain a processing result of the fourth image.
  • different visual task models correspond to different image processing module configurations.
  • the image processing module can adaptively match the visual task model, making the image processing flow more suitable for the visual task model. , which is beneficial to improve the performance of the vision task model.
  • the third image may be a raw image acquired by the sensor.
  • the processing result of the fourth image can also be understood as the processing result of the third image.
  • the processing result of the fourth image is the reasoning result of the visual task model.
  • the at least one target image processing module is one or more image processing modules corresponding to the visual task model.
  • the vision task includes: target detection, image classification, target segmentation, target tracking, or image recognition.
  • the visual task model is used to perform visual tasks. For example, if the vision task is target detection, then the vision task model is the target detection model. For another example, if the visual task is image recognition, then the visual task model is an image recognition model.
  • the vision task model can be a trained model.
  • different visual task models may be used, and accordingly, at least one target image processing module matching the visual task model may be determined according to different visual task models. In this way, different image processing modules can be selected according to different application scenarios.
  • the configuration of the image processing module matching the current visual task model can be determined.
  • the configuration of the image processing modules includes at least one of the following: a combination of image processing modules, a weight of the image processing modules, a processing order of the image processing modules, or parameters in the image processing modules.
  • determining at least one target image processing module according to the visual task model includes: determining at least one target image processing module from multiple candidate image processing modules according to the visual task model.
  • different visual task models correspond to different combinations of image processing modules.
  • the combination of image processing modules can adaptively match the visual task model, so that the current image processing module The combination is more suitable for the current visual task model and is beneficial to improve the performance of the visual task model.
  • the combination of image processing modules corresponding to the current visual task model can be determined, or in other words, the image processing module required for the visual task model can be determined according to the corresponding relationship, that is, the at least one target image processing module .
  • determining at least one target image processing module according to the visual task model includes: determining the weight of at least one target image processing module according to the visual task model, and at least one target image processing module The weights of are used to process the processing result of at least one target image processing module to obtain a fourth image.
  • different visual task models correspond to the weights of different image processing modules.
  • the weight of the image processing module can adaptively match the visual task model, so that the current image processing module The weights are more suitable for the current visual task model, which is beneficial to improve the performance of the visual task model.
  • determining at least one target image processing module according to the visual task model includes: determining parameters in the at least one target image processing module according to the visual task model.
  • different visual task models correspond to different parameters in the image processing module.
  • the parameters in the image processing module can adaptively match the visual task model, so that the current image processing
  • the parameters in the module are more suitable for the current vision task model, which is beneficial to improve the performance of the vision task model.
  • parameters in the image processing module corresponding to the visual task model can be determined, that is, parameters in the at least one target image processing module.
  • determining at least one target image processing module according to the visual task model includes: determining a processing order of the at least one target image processing module according to the visual task model.
  • different visual task models correspond to different processing sequences of image processing modules.
  • the processing sequence of the image processing modules can adaptively match the visual task model, so that the current image processing
  • the processing order of the modules is more suitable for the current vision task model, which is beneficial to improve the performance of the vision task model.
  • the processing order of the image processing modules corresponding to the current visual task model can be determined, that is, the processing order of the at least one target image processing module.
  • At least one target image processing module includes: a black level compensation module, a green balance module, a dead pixel correction module, a demosaic module, a Bayer noise reduction module, an automatic white Balance Module, Color Correction Module, Gamma Correction Module or Noise Reduction and Sharpening Module.
  • an image processing apparatus includes a module or unit for executing the method in any one of the above-mentioned first aspect and the first aspect.
  • an image processing device includes a module or unit for executing the method in any one of the above-mentioned second aspect and the second aspect.
  • an image processing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to execute the first aspect and the method in any one of the implementation manners of the first aspect.
  • the processor in the fifth aspect above can be either a central processing unit (central processing unit, CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor can include a graphics processing unit (graphics processing unit). unit, GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc.
  • TPU is an artificial intelligence accelerator ASIC fully customized by Google for machine learning.
  • an image processing device which includes: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processing The device is configured to execute the second aspect and the method in any one implementation manner of the second aspect.
  • the processor in the sixth aspect above can be a central processing unit, or a combination of a CPU and a neural network computing processor, where the neural network computing processor can include a graphics processor, a neural network processor and a tensor processor wait.
  • TPU is Google's fully customized artificial intelligence accelerator ASIC for machine learning.
  • a computer-readable storage medium where the computer-readable medium stores program code for execution by a device, and the program code includes a program code for executing any one of the implementation manners of the first aspect or the second aspect. method.
  • a computer program product containing instructions is provided, and when the computer program product is run on a computer, the computer is made to execute the method in any one of the above-mentioned first aspect or the second aspect.
  • the chip includes a processor and a data interface, and the processor reads the instructions stored in the memory through the data interface, and executes any one of the above-mentioned first aspect or the second aspect method in the implementation.
  • the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • the aforementioned chip may specifically be a field-programmable gate array (field-programmable gate array, FPGA) or an application-specific integrated circuit (application-specific integrated circuit, ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • FIG. 1 is a schematic structural diagram of a system architecture provided in an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an image processing flow provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image processing method provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another image processing flow provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of another image processing flow provided by the embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another image processing method provided by the embodiment of the present application.
  • Fig. 7 is a schematic block diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 8 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.
  • the embodiments of the present application can be applied in fields such as automatic driving, image classification, image retrieval, image semantic segmentation, image quality enhancement, image super-resolution, monitoring, object tracking, object detection, etc. that need to perform visual tasks.
  • the method in the embodiment of the present application can be applied in picture classification and monitoring scenarios, and the following two application scenarios are briefly introduced respectively.
  • a terminal device for example, a mobile phone
  • a cloud disk When a user stores a large number of pictures on a terminal device (for example, a mobile phone) or a cloud disk, it is convenient for the user or the system to classify and manage the album by identifying the images in the album, thereby improving user experience.
  • an image suitable for performing a classification task can be obtained, and the accuracy of classification can be improved.
  • it can reduce the image processing process, reduce hardware overhead, be more friendly to terminal equipment, increase the speed of classifying pictures, and help to label pictures of different categories in real time, which is convenient for users to view and find.
  • the classification tags of these pictures can also be provided to the album management system for classification management, which saves management time for users, improves the efficiency of album management, and improves user experience.
  • Surveillance scenarios include: smart city, field surveillance, indoor surveillance, outdoor surveillance, in-vehicle surveillance, etc.
  • multiple attribute recognition is required, such as pedestrian attribute recognition and riding attribute recognition.
  • Deep neural networks play an important role in multiple attribute recognition by virtue of their powerful capabilities.
  • an image suitable for performing an attribute recognition task can be obtained, and the accuracy of recognition can be improved.
  • the image processing flow can be reduced, the hardware overhead can be reduced, and the processing efficiency can be improved, which is conducive to real-time processing of the input road picture and faster recognition of different attribute information in the road picture.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be:
  • s 1, 2, ... n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to transform the input signal in the neural unit into an output signal.
  • the output signal of this activation function can be used as the input of the next layer.
  • the activation function can be a ReLU, tanh or sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, it is actually not complicated in terms of the work of each layer.
  • it is the following linear relationship expression: in, is the input vector, is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation, the output vector is obtained. Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location.
  • the convolution kernel can be formalized as a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Recurrent neural networks are used to process sequence data.
  • the layers are fully connected, and each node in each layer is unconnected.
  • RNN Recurrent neural networks
  • this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as that of traditional CNN or DNN.
  • the error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is expanded to the network, then the parameters, such as W, are shared; while the above-mentioned traditional neural network is not the case.
  • the output of each step depends not only on the network of the current step, but also depends on the state of the previous several steps of the network. This learning algorithm is called back propagation through time (BPTT) based on time.
  • BPTT back propagation through time
  • the loss function loss function
  • objective function objective function
  • the training of the deep neural network becomes a process of reducing the loss as much as possible.
  • the smaller the loss the higher the training quality of the deep neural network, and the larger the loss, the lower the training quality of the deep neural network.
  • the smaller the loss fluctuation the more stable the training; the larger the loss fluctuation, the more unstable the training.
  • the embodiment of the present application provides a system architecture 100 .
  • the data collection device 170 is used to collect training data.
  • the training data may include training images and ground truths corresponding to the training images.
  • the vision task is an image classification task
  • the ground truth value corresponding to the training image may be the classification result corresponding to the training image
  • the classification result of the training image may be the result of manual pre-labeling.
  • the data collection device 170 After collecting the training data, the data collection device 170 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 based on training data maintained in the database 130 .
  • the target model/rule 101 is the model used by the vision task. For example, if the vision task is an image classification task, the target model/rule 101 may be a network model for image classification.
  • the training device 120 obtains the target model/rule 101 based on the training data.
  • the training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value The value is less than a certain threshold, thus completing the training of the target model/rule 101.
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network model.
  • a neural network model For example, Convolutional Neural Networks or Residual Networks.
  • the training data maintained in the database 130 may not all be collected by the data collection device 170, but may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rules 101 based entirely on the training data maintained by the database 130, and it is also possible to obtain training data from the cloud or other places for model training. Limitations of the Examples.
  • the target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptop, augmented reality (augmented reality, AR) AR/virtual reality (virtual reality, VR), vehicle terminal, etc., can also be a server or cloud, etc.
  • the execution device 110 configures an input/output (input/output, I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, and the input data
  • I/O input/output
  • the input data may include: data to be processed input by the client device.
  • the input data may include a raw image in this embodiment of the application.
  • the preprocessing module 113 is used to perform preprocessing according to the input image received by the I/O interface 112. In the embodiment of the present application, the preprocessing module 113 may be used to perform a series of image signal processing on the input image.
  • the preprocessing module 113 may include one or more image processing modules.
  • the execution device 110 When the execution device 110 preprocesses the input data, or in the calculation module 111 of the execution device 110 performs calculation and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the correspondingly processed data and instructions may also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result, such as the processing result of the data obtained above, to the client device 140, thereby providing it to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above-mentioned goals or complete the above-mentioned task to provide the user with the desired result.
  • the user can manually specify the input data, and the manual specification can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the client device 140 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 140 .
  • the user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be specific ways such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal, collecting the input data input to the I/O interface 112 as shown in the figure and the output results of the output I/O interface 112 as new sample data, and storing them in the database 130 .
  • the client device 140 may not be used for collection, but the I/O interface 112 directly uses the input data input to the I/O interface 112 as shown in the figure and the output result of the output I/O interface 112 as a new sample.
  • the data is stored in database 130 .
  • FIG. 1 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • the target model/rule 101 is obtained according to the training device 120.
  • the target model/rule 101 in the embodiment of the present application may be the neural network model in the present application, specifically, the neural network in the embodiment of the present application
  • the model can be CNN or residual network, etc.
  • the image signal processor outputs a visualized image after a series of processing on the raw image acquired by the sensor. These images can be used as input images for vision tasks. Specifically, a neural network algorithm or a non-neural network algorithm may be used to process an input image in a visual task to obtain relevant results of the visual task.
  • FIG. 2 shows a schematic diagram of an overall processing flow of a vision task.
  • the raw image is used as an input image, and a series of image signal processing is performed on the input image, and an 8-bit (bit) visualized red green blue (RGB) image is output.
  • the RGB image is used as the input image of the vision task, and the processing result of the vision task is obtained.
  • the image signal processing module includes a black level compensation (black level compensation) module, a green balance (green balance) module, a bad pixel correction (bad pixel correction) module, a demosaic (demosaic) module, Bayer Bayer denoise module, auto white balance module, color correction module, gamma correction module, denoise sharpness module, etc.
  • the image signal processing module can adopt non-neural network algorithm or neural network algorithm.
  • the input images of vision tasks are usually RGB images after image signal processing.
  • the purpose of traditional image signal processing is usually to meet human visual needs, and the results obtained by performing visual tasks based on the images are not necessarily optimal.
  • the embodiment of the present application provides an image processing method, which adjusts the image processing flow before the vision task according to the processing result of the vision task, so as to obtain an image processing flow that meets requirements.
  • FIG. 3 shows an image processing method 300 provided by an embodiment of the present application.
  • the method shown in Figure 3 can be executed by a computing device, which can be a cloud service device or a terminal device, such as a computer, server, mobile phone, camera, vehicle, drone or robot, or a A system composed of cloud service equipment and terminal equipment.
  • a computing device which can be a cloud service device or a terminal device, such as a computer, server, mobile phone, camera, vehicle, drone or robot, or a A system composed of cloud service equipment and terminal equipment.
  • the method 300 may be executed by a training device or an inference device, for example, the method 300 may be executed by an accelerator such as a CPU, a GPU, or an NPU.
  • the accelerator chip may be located on an FPGA, a chip emulator (Emulator) or a development board (evaluation board, EVB).
  • the method 300 may be executed by a tuning tool or a calibration tool of an ISP pipeline (pipeline) of a hardware device (eg, a camera or a camera).
  • a tuning tool or a calibration tool of an ISP pipeline pipeline
  • a hardware device eg, a camera or a camera.
  • the method 300 includes step S301 to step S304. Step S301 to step S304 will be described in detail below.
  • the first image may be a raw image acquired by a sensor.
  • the training data set includes multiple images, and the first image is any image in the training data set.
  • the method 300 may be executed multiple times based on multiple images in the training data set until the required image processing modules are obtained.
  • the training data set may use an open source data set.
  • the training data set can also be a self-made data set.
  • the training data set may be pre-stored.
  • the training data set may be the training data maintained in the database 130 shown in FIG. 1 .
  • the training dataset can also be user-input data.
  • S302. Process the first image by at least one image processing module to obtain a second image.
  • the image processing module is used for image signal processing on the input image.
  • the at least one image processing module may be located on an image signal processor. That is to say, step S302 is executed by the image processing module in the image processor.
  • Any image processing module in the at least one image processing module may be implemented by a neural network algorithm, or may also be implemented by a non-neural network algorithm.
  • the embodiment of the present application does not limit the specific implementation manner of the image processing module.
  • the at least one image processing module may include: a black level compensation module, a green balance module, a dead point correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, a color correction module, and a gamma correction module Or noise reduction and sharpening modules.
  • the raw image is used as the first image
  • the at least one image processing module includes 9 image processing modules, which are respectively a black level compensation module, a green balance module, a bad pixel correction module, a demosaic module, Bayer noise reduction module, automatic white balance module, color correction module, gamma correction module, and noise reduction and sharpening module.
  • the nine image processing modules sequentially perform black level compensation, green balance processing, dead pixel correction, demosaicing, Bayer noise reduction, automatic white balance processing, color correction, gamma correction, and noise reduction and sharpening.
  • the black level module, the green balance module and the bad pixel correction module can be used to process the raw data.
  • a demosaic module and a Bayer denoising module may be used to perform demosaic processing.
  • An automatic white balance module, a color correction module, a gamma correction module, and a noise reduction and sharpening module can be used to perform image enhancement processing.
  • the second image may be an RGB image.
  • the second image may be an 8-bit RGB image. This is only an example, and the type of the second image may also be set according to the input requirements of the visual task model.
  • step S302 includes: processing the first image by using at least one image processing module and the weight of the at least one image processing module to obtain the second image.
  • the processing result of the at least one image processing module is adjusted according to the weight of the at least one image processing module to obtain the second image.
  • the image processing module processes the image input to the module, which may be to adjust the pixel values of all or part of the pixels of the image input to the module, that is, to change the pixel values of all or part of the pixels.
  • the variation of the pixel values of all or some pixels may be adjusted according to the weight of the image processing module.
  • the weight of the image processing module is multiplied by the variation of the pixel value to obtain the adjusted variation of the pixel, and then the output image of the module is obtained. If the weight of the image processing module is 0, it means that the image processing module does not participate in the image processing process.
  • the specific value of the weight can be set as required, for example, the weight can be a real number greater than or equal to 0 and less than or equal to 1.
  • the weight of the at least one image processing module can be normalized, that is, the sum of the weights of the at least one image processing module is 1, or the sum of the weights of the at least one image processing module close to 1.
  • the weights of the nine image processing modules are w1, w2, w3, w4, w5, w6, w7, w8 and w9, respectively.
  • the value range of the weight is a real number greater than or equal to 0 and less than or equal to 1.
  • the largest possible sum of w1, w2, w3, w4, w5, w6, w7, w8 and w9 is nine.
  • the nine weights can also be normalized so that the sum of the nine weights can be 1.
  • the vision task includes: target detection, image classification, target segmentation, target tracking, or image recognition.
  • the visual task model is used to perform visual tasks. For example, if the vision task is target detection, then the vision task model is the target detection model. For another example, if the visual task is image recognition, then the visual task model is an image recognition model.
  • the vision task model can be a trained model.
  • the type of output of the visual task model is related to the type of visual task.
  • the output of the visual task model is the inference result of the visual task model.
  • the output of the vision task model may be a target frame on the second image and the category of the object in the target frame.
  • the output of the visual task model may be the category of the second image.
  • the processing results of the vision task model may include performance indicators of the vision task model.
  • the performance index of the vision task model includes the accuracy of reasoning or the value of the loss function.
  • the loss function can be set as needed.
  • the loss function is used to indicate the difference between the inference result of the vision task model and the true value corresponding to the first image. It should be noted that the loss function here may be the loss function in the training process of the vision task model, or other forms of loss functions may also be used.
  • the processing result of the vision task model may include detection accuracy.
  • the processing result of the vision task model may include segmentation accuracy.
  • the visual task model may use a neural network model, or may also use a non-neural network model.
  • the neural network model may be an existing neural network model, for example, a residual network.
  • the neural network model may also be a neural network model of other structures constructed by itself. This embodiment of the present application does not limit it.
  • the visual task model employed may or may not be the same in overexposed and underexposed situations.
  • the first object detection model may be used, and if the current scene is recognized as underexposed, the second object detection model may be used.
  • the first target detection model and the second target detection model are different target detection models.
  • the processing of the visual task model can be executed by the calculation module 111 in FIG. 1 .
  • the vision task model can be deployed on the execution device of the method 300, or can be deployed on other devices. That is to say, the processing of the visual task model can be executed by the executing device of the method 300 or by other devices, and the processing result can be fed back to the executing device of the method 300 .
  • the at least one image processing module is adjusted according to the processing result of the visual task model, so that the processing result of the visual task model is as close to expectation as possible.
  • the at least one image processing module is adjusted according to the performance index of the visual task model, so as to improve the performance of the visual task model.
  • the at least one image processing module is adjusted to improve the accuracy of inference of the model.
  • the at least one image processing module is adjusted to reduce the value of the loss function of the visual task model.
  • the method 300 may be executed based on multiple images in the training data set until a preset condition is met. That is to say, in practical applications, the image processing module can be adjusted iteratively based on multiple images.
  • the image processing module used in each iteration is the image processing module obtained after the previous iteration.
  • the preset conditions can be set as required, and examples will be given in Mode 1, Mode 2, Mode 3, and Mode 4 below.
  • the at least one image processing module can also be adjusted according to the image processing time and the processing result of the visual task model.
  • the image processing time may be the processing time of the visual task model, or may also be the processing time of the at least one image processing module, or may also be the difference between the processing time of the visual task model and the processing time of the at least one image processing module. sum.
  • the processing speed can be improved and the time delay can be reduced under the premise of ensuring the performance of the visual task model.
  • the image processing flow is adjusted according to the processing result of the visual task model, which is beneficial to obtain an image suitable for the visual task, so as to ensure the performance of the visual task model.
  • the solutions of the embodiments of the present application can adjust the image processing flow according to the requirements of different application scenarios, so as to adapt to different application scenarios.
  • the visual task model employed may or may not be the same in overexposed and underexposed situations.
  • the first object detection model may be used as the visual task model.
  • the second object detection model may be used as the vision task model.
  • the solution of the embodiment of the present application can adjust the image processing flow according to the processing results of the first object detection model and the second object detection model respectively, so as to obtain an image processing flow suitable for the first object detection model and an image suitable for the second object detection model processing flow.
  • Step S304 can be implemented in various ways, and the following four ways (mode 1, mode 2, mode 3 and mode 4) are taken as examples for illustration.
  • the at least one image processing module includes a plurality of image processing modules
  • step S304 includes: adjusting weights of the plurality of image processing modules according to a processing result of the visual task model.
  • the weights of the plurality of image processing modules are adjusted according to the processing results of the visual task model, so as to improve the performance of the visual task model.
  • the method 300 can be executed based on multiple images in the training data set to implement iterative adjustment of the weights of the multiple image processing modules until the preset conditions are met. Stop adjusting the weights of the plurality of image processing modules after the preset condition is met, or stop refreshing the weights of the plurality of image processing modules.
  • the preset condition may be that the weights of the plurality of image processing modules converge.
  • the method 300 is not executed any more, that is, the adjustment of the weights of the plurality of image processing modules is stopped.
  • the weight convergence can also be understood as the weight gradient obtained after performing the method 300 multiple times continuously has a small change. For example, when the change amount of the weight gradient obtained after performing the method 300 for multiple times is less than or equal to the first threshold, stop adjusting the weights of the multiple image processing modules.
  • the preset condition may be that the accuracy of the visual task model is greater than or equal to the second threshold.
  • the method 300 is not executed, that is, the adjustment of the weights of the plurality of image processing modules is stopped.
  • the second threshold may be a preset value.
  • the second threshold may be the inference accuracy of the visual task model obtained without setting the weight of the image processing module.
  • the second threshold may be the inference accuracy of the visual task model when no weight is set for the nine image processing modules.
  • the second threshold may be the inference accuracy of the visual task model when the weight of the nine image processing modules is 1.
  • the image is input into the original image processing module for processing, and the processed image is input into the vision task model for processing, and the accuracy of reasoning is calculated, and the accuracy is used as the second threshold.
  • Executing method 300 that is, inputting the image into the currently adjusted image processing module for processing, and inputting the processed image into the visual task model for processing, and calculating the accuracy of inference, and comparing the currently obtained inference accuracy with The second threshold is compared, and in the case that the accuracy of the currently obtained reasoning is greater than or equal to the second threshold, the method 300 is not executed any more. In this way, using the adjusted image processing module to process the image can ensure the performance of the visual task model, or can improve the performance of the visual task model.
  • the preset condition may be that the change amount of the loss function value of the visual task model obtained after performing the method 300 for multiple times is less than or equal to the third threshold.
  • the method 300 is not executed any more.
  • the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.
  • the method 300 is not executed any more.
  • the preset condition may be that the accuracy of the visual task model is greater than or equal to the second threshold, and the number of iterations is greater than or equal to the fourth threshold.
  • the preset condition may be that the weights of the plurality of image processing modules converge, and the accuracy of the visual task model is greater than or equal to the second threshold.
  • the weights of the plurality of image processing modules may be adjusted by means of Bayesian optimization method, RNN model, or reinforcement learning algorithm.
  • the Bayesian optimization method is taken as an example to illustrate below.
  • the vision task model is a target detection model
  • the performance index of the vision task model may be mean average precision (mAP).
  • the weights of the multiple image processing modules are adjusted by a Bayesian optimization method to improve the mAP of the object detection model. In other words, the weights of the multiple image processing modules are adjusted with the goal of maximizing the mAP of the target detection model.
  • the average accuracy refers to the average of the detection accuracies for all target objects.
  • the detection accuracy of the image is input into the Bayesian optimization model, and the Bayesian optimization model adjusts the weight of each image processing module.
  • the detection accuracy of the image can be preserved in the Bayesian optimization model. That is to say, when other images in the training data set are input into the target detection model, the detection accuracy of other images is obtained.
  • the Bayesian optimization model can adjust the weight of each image processing module according to the detection accuracy of other images and the detection accuracy of previous images.
  • the training data set in the embodiment of the present application is used to train each image processing module, which may be the same as or different from the training data set of the vision task model.
  • the training data set in the embodiment of the present application may use a verification data set or a test data set of the vision task model.
  • the weight of the image processing module is evaluated according to the processing results of the visual task model, and then the weight of the image processing module is adjusted to increase the weight of the image processing module that has a strong correlation with the performance of the visual task model. Reducing the weight of image processing modules that are less correlated with the performance of the vision task model can obtain an image processing flow that is more suitable for the vision task, and is conducive to improving the performance of the vision task model.
  • step S304 includes: modifying the at least one image processing module according to a processing result of the visual task model.
  • Changing the at least one image processing module may include: deleting some image processing modules in the at least one image processing module or/and adding other image processing modules.
  • step S304 may be to select a combination of image processing modules from multiple candidate image processing modules according to the processing results of the visual task model, and use the combination of image processing modules to replace the at least one image processing module. module.
  • the at least one image processing module can be changed by means of Bayesian optimization method or reinforcement learning algorithm.
  • the method 300 can be executed based on multiple images in the training data set to realize iterative adjustment of the combination of the multiple image processing modules until the preset conditions are met. Stop adjusting the combination of the multiple image processing modules after the preset condition is met, or stop refreshing the combination of the multiple image processing modules.
  • the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.
  • the execution of the method 300 is stopped, that is, the adjustment of the combination of the image processing modules is stopped.
  • the combination of image processing modules is changed according to the processing results of the visual task model, so that a combination of image processing modules more suitable for the visual task model can be obtained, which is conducive to improving the performance of the visual task model.
  • the at least one image processing module includes a plurality of image processing modules, and step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing result of the visual task model.
  • the processing result of the manner 1 may be adopted in the manner 2.
  • step S304 includes: adjusting the weights of the multiple image processing modules according to the processing results of the visual task model; deleting part of the image processing modules from the multiple image processing modules according to the adjusted weights of the multiple image processing modules .
  • the multiple image processing modules are m image processing modules.
  • m is an integer greater than 1.
  • the n image processing modules with the smallest adjusted weights are deleted from the m image processing modules.
  • n is an integer greater than 1 and less than m.
  • an image processing module whose adjusted weight is less than or equal to a weight threshold is deleted from the m image processing modules.
  • some image processing modules are deleted according to the processing results of the visual task model, which can reduce the time required for image processing, increase the processing speed, and reduce the requirement for computing power.
  • image processing modules with higher weights have a stronger correlation with the vision task model, or in other words, image processing modules with higher weights have a greater impact on the performance of the vision task model.
  • the image processing module to be deleted is determined according to the weight of each image processing module, and the image processing module with a relatively small weight value is deleted, so that the impact on the processing result of the visual task model is small, and the visual impact after deletion is relatively small.
  • the performance of the task model is less affected. That is to say, the solutions of the embodiments of the present application can reduce unnecessary operations, reduce computing overhead, and improve processing speed on the premise of ensuring the performance of the visual task model.
  • step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing result of the visual task model and the processing speed of the plurality of image processing modules.
  • some image processing modules are deleted from the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules and the processing speeds of the plurality of image processing modules.
  • an image processing module whose adjusted weight is less than or equal to a weight threshold and whose processing speed is less than or equal to a speed threshold is deleted from the plurality of image processing modules. That is, image processing modules that have slower processing speed and have less impact on the vision task model are deleted. In this way, the speed of image processing can be further increased.
  • the at least one image processing module includes a plurality of image processing modules, and step S304 includes: adjusting the processing order of the plurality of image processing modules according to the processing results of the visual task model.
  • the processing order of the plurality of image processing modules is adjusted according to the processing results of the visual task model, so as to improve the performance of the visual task model.
  • the processing order of the plurality of image processing modules may be adjusted by means of Bayesian optimization method, RNN model, or reinforcement learning algorithm.
  • the method 300 can be executed based on multiple images in the training data set until the preset conditions are met. Stop adjusting the processing sequence of the multiple image processing modules after the preset condition is satisfied, or stop refreshing the processing sequence of the multiple image processing modules.
  • the preset condition may be that the variation of the processing sequence of the plurality of image processing modules is less than or equal to the fifth threshold.
  • the amount of change in the processing order of the plurality of image processing modules may be the number of image processing modules whose processing order changes after the method 300 is executed.
  • the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the sixth threshold.
  • the method 300 is not executed again, that is, the adjustment of the processing sequence of the plurality of image processing modules is stopped.
  • the sixth threshold may be a preset value.
  • the sixth threshold may be the inference accuracy of the visual task model without adjusting the processing sequence of the image processing module.
  • the sixth threshold may be the inference accuracy of the visual task model when images are processed according to the processing order of the image processing module shown in FIG. 4 .
  • the image is input into the original image processing module, processed in the order of the original image processing module, and the processed image is input into the visual task model for processing, and the accuracy of inference is calculated, and the accuracy as the sixth threshold.
  • the method 300 is not executed any more. In this way, the images are processed according to the adjusted processing sequence of the image processing module, so that the performance of the visual task model can be guaranteed, or the performance of the visual task model can be improved.
  • the preset condition may be that the change amount of the loss function value of the visual task model obtained after performing the method 300 for multiple times is less than or equal to the third threshold.
  • the method 300 is not executed any more.
  • the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.
  • the method 300 is not executed any more.
  • the above preset conditions may be used in combination.
  • the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the sixth threshold, and the number of iterations is greater than or equal to the fourth threshold.
  • the preset condition may be that the variation of the processing order of the plurality of image processing modules is less than or equal to the fifth threshold, and the accuracy of the visual task model is greater than or equal to the sixth threshold.
  • the processing sequence of the image processing module is adjusted according to the processing result of the visual task model, so that an image processing flow more suitable for the visual task can be obtained, which is conducive to improving the accuracy of the visual task.
  • step S304 includes: adjusting parameters in the at least one image processing module according to a processing result of the visual task model.
  • the parameters in the at least one image processing module are adjusted according to the processing results of the visual task model, so as to improve the performance of the visual task model.
  • the parameters in the image processing module are the parameters of the neural network model.
  • the parameters in the at least one image processing module may be adjusted by means of a Bayesian optimization method, an RNN model, a reinforcement learning algorithm, and the like.
  • the input image is processed based on the parameter combination in the current image processing module, and the processed result is input into the vision task model for processing, for example, the vision task is performed by CPU or GPU.
  • the parameter combination in the image processing module is optimized and updated according to the feedback of the performance of the visual task model, that is, the optimal parameter combination in the image processing module is found in the search space, so as to improve the performance of the visual task model.
  • the method 300 can be executed based on multiple images in the training data set until a preset condition is met. Stop adjusting the parameters in the at least one image processing module after the preset condition is met, or stop refreshing the parameters in the at least one image processing module.
  • the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the seventh threshold.
  • the method 300 is not executed again, that is, the adjustment of the parameters in the at least one image processing module is stopped.
  • the seventh threshold may be a preset value.
  • the seventh threshold may be the processing accuracy of the visual task model obtained without adjusting the parameters in the at least one image processing module.
  • the seventh threshold may be the inference accuracy of the vision task model when the nine image processing modules do not adjust parameters.
  • the image is input into the original image processing module, that is, the image processing module without adjustment parameters for processing, and the processed image is input into the vision task model for processing, and the accuracy of inference is calculated, and the accuracy degrees as the seventh threshold.
  • the method 300 is not executed any more. In this way, using the adjusted image processing module to process the image can ensure the performance of the visual task model, or can improve the performance of the visual task model.
  • the preset condition may be that the change amount of the loss function value of the visual task model obtained after performing the method 300 for multiple times is less than or equal to the third threshold.
  • the method 300 is not executed any more.
  • the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.
  • the method 300 is not executed any more.
  • the above preset conditions may be used in combination.
  • the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the seventh threshold, and the number of iterations is greater than or equal to the fourth threshold.
  • any two or more of the above modes 1, 2, 3 and 4 may be used in combination.
  • each method can be executed at the same time, or each method can also be executed separately.
  • step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing results of the visual task model; processing the fifth image through the image processing modules that have not been deleted in the plurality of image processing modules to obtain For the sixth image, input the sixth image into the visual task model for processing; adjust the parameters of the image processing module that have not been deleted according to the processing result of the visual task model.
  • the fifth image may be an image in the training data set.
  • the fifth image and the first image may be the same image or different images.
  • the sixth image may be an RGB image.
  • the description of the sixth image refer to the second image above.
  • the performance indicators obtained by the visual task model are used to adjust the weights of multiple image processing modules, so as to keep the performance indicators that have a relatively small impact on the visual task model.
  • a large image processing module or in other words, an image processing module that maintains or improves the performance indicators of the vision task model.
  • an image processing module suitable for the visual task model can be obtained, or in other words, the image processing module required by the visual task model can be obtained, which reduces the time required for the image processing process, saves computing overhead, reduces the demand for computing power, and requires more hardware. friendly.
  • using the performance index obtained from the visual task model to adjust the parameters in the reserved image processing module for example, using the performance index obtained from the visual task model to search the design space of the image processing module is conducive to obtaining the optimal value of each image processing module. parameter configuration to improve the performance of the vision task model.
  • step S304 includes: adjusting the parameters of multiple image processing modules and the weights of the multiple image processing modules according to the processing results of the visual task model, and processing the multiple image processing modules according to the adjusted weights of the multiple image processing modules. Some image processing modules are deleted from the module.
  • step S304 includes: adjusting the parameters of multiple image processing modules, the weights of the multiple image processing modules, and the processing order of the multiple image processing modules according to the processing results of the visual task model, and adjusting the multiple image processing modules according to the adjusted The weight of the processing module deletes some image processing modules from the plurality of image processing modules.
  • the embodiment of the present application provides an image processing method 400.
  • the method 400 can be regarded as a specific implementation of the method 300.
  • some descriptions are appropriately omitted when introducing the method 400 below.
  • the method 400 adopts a combination of mode 1, mode 2 and mode 4.
  • the method 400 includes step S401 to step S410. Steps S401 to S410 will be described below.
  • the method 400 can be regarded as two stages, the first stage includes steps S401 to S406, and the second stage includes steps S407 to S410.
  • the plurality of image processing modules may include nine image processing modules as shown in FIG. 5 .
  • the weights of the respective image processing modules are denoted as w1, w2, w3, w4, w5, w6, w7, w8 and w9.
  • the sum of the 9 weights is 1.
  • the input image is processed based on the weights of the plurality of image processing modules.
  • the processing results of the multiple image processing modules are adjusted based on the weights of the multiple image processing modules.
  • the input image is processed according to the image processing module and its corresponding weight shown in FIG. 5 .
  • the processing result may be an RGB image.
  • the processing result may be an 8-bit RGB image.
  • Step S402 corresponds to step S302, and for a specific description, refer to the description in step S302.
  • the vision task model can be a trained model.
  • the comparison result is fed back to the optimization algorithm, and the optimization algorithm is used to adjust the weights of the plurality of image processing modules.
  • the optimization algorithm includes a Bayesian optimization method, an RNN model, and a reinforcement learning algorithm.
  • step S405 using the adjusted weight of the image processing module as the weight of the image processing module in step S402, and repeating steps S402 to S404 until the first preset condition is met.
  • step S405 may also be to perform normalization processing on the adjusted weights of the image processing modules, and use the normalized weights as the weights of the image processing modules in step S402.
  • step S402 to step S404 are terminated.
  • the currently obtained weight of the image processing module may be regarded as the weight of the image processing module obtained after satisfying the first preset condition.
  • step S402 to step S404 are terminated.
  • Steps S403 to S405 can be regarded as a specific implementation of method 1.
  • Step S406 corresponds to step S304 in method 2.
  • Step S406 corresponds to step S304 in method 2.
  • the image in step S407 and the image in step S402 may be the same image or different images.
  • the parameters in the image processing module that have not been deleted are used as tuning objects.
  • the parameters in the reserved image processing module are used as tuning objects.
  • normalization processing may also be performed on the weights of the image processing modules that have not been deleted.
  • the images in the training data set are input to the black level compensation module, the demosaic module, the automatic white balance module and the gamma correction module for processing. Further, before performing step S407, the weights of the four image processing modules may be normalized.
  • the comparison result is fed back to the optimization algorithm, and the parameters in the image processing module are adjusted using the optimization algorithm.
  • the optimization algorithm includes Bayesian optimization method, RNN model or reinforcement learning algorithm.
  • step S409 may be the same as or different from the optimization algorithm used in step S440.
  • step S407 to step S410 are terminated.
  • the currently obtained parameters in the image processing module may be regarded as parameters in the image processing module obtained after satisfying the second preset condition.
  • step S407 to step S410 are terminated.
  • Step S407 to step S410 can be regarded as a specific implementation manner of mode 3, and for specific description, refer to the description in mode 3, which will not be repeated here.
  • the setting method of the second preset condition reference may be made to the preset condition in method 3.
  • the performance indicators obtained by the visual task model are used to adjust the weights of multiple image processing modules, so as to keep the performance indicators that have a relatively small impact on the visual task model.
  • a large image processing module or in other words, an image processing module that maintains or improves the performance indicators of the vision task model.
  • an image processing module suitable for the visual task model can be obtained, or in other words, the image processing module required by the visual task model can be obtained, which reduces the time required for the image processing process, saves computing overhead, reduces the demand for computing power, and requires more hardware. friendly.
  • the performance index obtained by the visual task model to adjust the parameters in the retained image processing module, for example, use the performance index obtained by the visual task model to search the design space of the image processing module, which is beneficial
  • the optimal parameter configuration of each image processing module is obtained to improve the performance of the vision task model.
  • the first stage and the second stage in the method 400 may be executed simultaneously. That is to say, the weight of the image processing module and the parameters in the image processing module are adjusted at the same time.
  • the manner in which the first phase and the second phase of the method 400 are executed simultaneously will be described below.
  • Method 400 may include the following steps. For the following steps, reference may be made to the description of the first stage and the second stage of the aforementioned method 400. For the sake of brevity, part of the description is appropriately omitted when describing the following steps.
  • the comparison result is fed back to the optimization algorithm, and the optimization algorithm is used to adjust the weights of the plurality of image processing modules.
  • An optimization algorithm is used to adjust parameters in the multiple image processing modules.
  • the optimization algorithm includes a Bayesian optimization method, an RNN model, and a reinforcement learning algorithm.
  • the optimization algorithm for adjusting the weights of the multiple image processing modules and the optimization algorithm for adjusting the parameters in the multiple image processing modules may be the same or different.
  • step 5 The weight of the image processing module after adjustment is used as the weight of the image processing module in step 2), and the parameter in the image processing module after adjustment is used as the parameter in the image processing module in step 2), repeating step 2) Go to step 4) until the training is completed.
  • the adjusted weights of the image processing modules are normalized, and the normalized weights are used as the weights of the image processing modules in step 5).
  • the training is completed.
  • the accuracy of the current visual task model is greater than or equal to the inference accuracy of the visual task model before the method 400 is executed, the training is complete.
  • Step 6) Delete part of the image processing modules according to the weights of the image processing modules after training. Step 6) corresponds to step S304 in method 2. For specific description, please refer to the description in method 2, which will not be repeated here.
  • the first stage and the second stage are executed at the same time, which can prevent the image processing module from being deleted due to unreasonable parameter configuration, so that the image processing module can process the image under a better parameter configuration, and then judge the better parameter configuration
  • the contribution degree of each image processing module under the vision task model to the performance index in order to retain the image processing module required by the vision task model, so that the performance index of the vision task model can be further improved.
  • Method 400 is only an example of combining mode 1, mode 2 and mode 4.
  • Way 1, way 2, way 3 and way 4 can also be combined in other implementation ways.
  • mode 1, mode 2 and mode 3 are combined.
  • step S304 may include: adjusting the weights of multiple image processing modules and the processing order of the multiple image processing modules according to the processing results of the visual task model, and selecting from the multiple image processing modules according to the adjusted weights of the image processing modules Delete some image processing modules.
  • step S304 may include: adjusting the weights of multiple image processing modules according to the processing results of the visual task model, and deleting part of the image processing modules from the multiple image processing modules according to the adjusted weights of the image processing modules;
  • the processing results of the model adjust the processing order of the image processing modules that have not been deleted. That is, step S304 is divided into two stages. In the first stage, some image processing modules are deleted, and in the second stage, the processing order of the image processing modules that have not been deleted is adjusted.
  • the adjusted image processing module is an image processing module required by the visual task model. There is a corresponding relationship between the adjusted image processing module and the vision task model. Different vision task models can correspond to different image processing modules. In this way, an appropriate image processing flow can be selected according to the application scenario.
  • Figure 6 shows an image processing method 700 provided by the embodiment of the present application.
  • the method shown in Figure 6 can be executed by an image processing device, which can be a cloud service device or a terminal device, such as a computer, server, etc.
  • a device capable enough to perform image processing may also be a system composed of cloud service equipment and terminal equipment.
  • the method 700 may be executed by the preprocessing module in FIG. 1 .
  • the target image processing module in method 700 is obtained by method 300 or method 400 .
  • repeated descriptions are appropriately omitted when introducing the method 700 below.
  • the method 700 includes steps S701 to S704. Steps S701 to S704 will be described in detail below.
  • the third image is an image to be processed.
  • the third image may be a raw image acquired by the sensor.
  • the third image may be an image captured by a terminal device (or other device or device such as a computer or server) through a camera, or the third image may also be an image captured by a terminal device (or other device or device such as a computer or server). ) internally obtained images (for example, images stored in the photo album of the terminal device, or images obtained by the terminal device from the cloud), which are not limited in this embodiment of the present application.
  • S702. Determine at least one target image processing module according to the visual task model.
  • the at least one target image processing module is one or more image processing modules corresponding to the visual task model.
  • the vision task includes: target detection, image classification, target segmentation, target tracking, or image recognition.
  • the visual task model is used to perform visual tasks. For example, if the vision task is target detection, then the vision task model is the target detection model. For another example, if the visual task is image recognition, then the visual task model is an image recognition model.
  • the vision task model can be a trained model.
  • different visual task models may be used, and accordingly, at least one target image processing module matching the visual task model may be determined according to different visual task models. In this way, different image processing modules can be selected according to different application scenarios.
  • the visual task model employed may or may not be the same in overexposed and underexposed situations.
  • the first target detection model may be used as the visual task model, and at least one target image processing module corresponding to the first target detection model may be determined according to the first target detection model.
  • the second target detection model may be used as the visual task model, and at least one target image processing module corresponding to the second target detection model is determined according to the second target detection model.
  • the first target detection model and the second target detection model are different target detection models. In this way, different image processing processes can be selected according to different application scenarios to improve the performance of the vision task model.
  • one or more image processing modules corresponding to the visual task model are used to process the input third image to obtain the fourth image.
  • the fourth image may be an RGB image.
  • the fourth image may be an 8-bit RGB image. This is only an example, and the type of the fourth image can be set according to the input requirements of the visual task model.
  • the processing result of the fourth image can also be understood as the processing result of the third image.
  • the processing result of the fourth image is the reasoning result of the visual task model.
  • the inference results of the visual task model are related to the type of visual task.
  • the inference result of the vision task model may be the target frame on the fourth image and the category of the object in the target frame.
  • the reasoning result of the vision task model may be the category of the fourth image.
  • the configuration of the image processing module matching the current visual task model can be determined.
  • the configuration of the image processing modules includes at least one of the following: a combination of image processing modules, a weight of the image processing modules, a processing order of the image processing modules, or parameters in the image processing modules.
  • step S702 includes: determining at least one target image processing module from multiple candidate image processing modules according to the visual task model.
  • a combination of image processing modules is determined from multiple candidate image processing modules according to the visual task model, and an image processing module in the combination of image processing modules is the at least one target image processing module.
  • the combination of image processing modules may also change accordingly.
  • the combination of image processing modules corresponding to the current visual task model can be determined, or in other words, the image processing module required for the visual task model can be determined according to the corresponding relationship, that is, the at least one target image processing module .
  • the at least one target image processing module may be obtained through the method 300 or the method 400 .
  • the correspondence between the combination of the visual task model and the image processing module is obtained through the method 300 or the method 400 .
  • the at least one target image processing module includes: a black level compensation module, a demosaic module, an automatic white balance module and a gamma correction module.
  • the combination of image processing modules can adaptively match the visual task model, making the current combination of image processing modules more suitable for the current visual Task model, which is beneficial to improve the performance of vision task models.
  • step S702 includes: determining the weight of at least one target image processing module according to the visual task model.
  • the weight of the at least one target image processing module is used to process the processing result of the at least one target image processing module to obtain a fourth image.
  • the combinations of image processing modules corresponding to different visual task models are the same.
  • the weight of the image processing module may change accordingly.
  • the combination of image processing modules corresponding to different visual task models is the same, which may be understood to mean that the functions implemented by the image processing modules adopted by different visual task models are the same.
  • the weight of the image processing module corresponding to the current visual task model that is, the weight of the at least one target image processing module, can be determined.
  • the at least one target image processing module may be the nine image processing modules in FIG. 4, and the weights of the image processing modules may be the weights obtained in step S405.
  • the weights of the image processing modules can adaptively match the visual task model, making the weights of the current image processing modules more suitable for the current visual Task model, which is beneficial to improve the performance of vision task models.
  • the weight of the image processing module may also change, and other configurations of the image processing module may also change.
  • the combination of image processing modules may change.
  • the visual task model has a corresponding relationship with the weight of the image processing module and other configuration conditions of the image processing module.
  • the weight of the image processing module corresponding to the visual task model and other configurations of the image processing module can be determined according to the visual task model.
  • step S702 a combination of image processing modules corresponding to the visual task model and weights of image processing modules in the combination of image processing modules may be determined.
  • the at least one target image processing module corresponding to the visual task model may be obtained in step S406.
  • the at least one target image processing module includes a black level compensation module, a demosaic module, an automatic white balance module and a gamma correction module.
  • the weight of the at least one target image processing module may be the weight obtained in step S405.
  • step S702 includes: determining a processing sequence of at least one target image processing module according to the visual task model.
  • the combinations of image processing modules corresponding to different visual task models are the same.
  • the processing sequence of the image processing module may also change accordingly.
  • the processing order of the image processing modules corresponding to the current visual task model can be determined, that is, the processing order of the at least one target image processing module.
  • different visual task models correspond to the processing order of different image processing modules.
  • the processing order of the image processing module can adaptively match the visual task model, making the processing order of the current image processing module more suitable.
  • the current vision task model is beneficial to improve the performance of the vision task model.
  • the processing order of the image processing module may change, and other configurations of the image processing module may also change.
  • the combination of image processing modules may change.
  • the visual task model has a corresponding relationship with the processing order of the image processing module and other configurations of the image processing module.
  • the processing sequence of the image processing module corresponding to the visual task model and other configurations of the image processing module can be determined according to the corresponding relationship.
  • the combination of the visual task model and the image processing module there is a corresponding relationship between the combination of the visual task model and the image processing module, and the processing sequence of the image processing module.
  • the combination of image processing modules corresponding to the vision task model and the processing order of the image processing modules in the combination of image processing modules can be determined.
  • the combinations of image processing modules corresponding to different visual task models may be the same or different.
  • the combinations of image processing modules corresponding to the two visual task models are the same, but the processing orders of the image processing modules in the combination of image processing modules are different.
  • step S702 it is possible to determine the combination of image processing modules corresponding to the visual task model, the weight of the image processing modules, and the processing order of the image processing modules, that is, determine the target image processing module and the target image processing module from multiple candidate image processing modules. weights and the processing order of the target image processing module.
  • the combinations of image processing modules corresponding to different visual task models may be the same or different.
  • the weights of the image processing modules in the combination of image processing modules may be the same or different.
  • the processing order of the image processing modules in the combination of image processing modules may be the same or different.
  • step S702 includes: determining parameters in the at least one target image processing module according to the visual task model.
  • the combinations of image processing modules corresponding to different visual task models are the same.
  • the parameters in the image processing module may change accordingly.
  • the image processing module corresponding to the first visual task model includes: a black level compensation module and a demosaic module.
  • the parameters of the black level compensation module include parameter A1
  • the parameters of the demosaic module include parameter B1.
  • the image processing module corresponding to the second visual task model includes: a black level compensation module and a demosaic module.
  • the parameters of the black level compensation module include parameter A2, and the parameters of the demosaic module include parameter B2.
  • the parameters used in the black level compensation processing and demosaic processing before the first visual task model are different from those used in the black level compensation processing and demosaic processing before the second visual model.
  • parameters in the image processing module corresponding to the visual task model can be determined, that is, parameters in the at least one target image processing module.
  • different visual task models correspond to different parameters in the image processing module.
  • the parameters in the image processing module can adaptively match the visual task model, making the parameters in the current image processing module more suitable.
  • the current vision task model is beneficial to improve the performance of the vision task model.
  • the visual task model has a corresponding relationship with parameters in the image processing module and other configurations of the image processing module. In this way, parameters in the image processing module corresponding to the current visual task model and other configurations of the image processing module can be determined according to the corresponding relationship.
  • the combination of the visual task model and the image processing module there is a corresponding relationship between the combination of the visual task model and the image processing module, and the parameters in the image processing module. According to the corresponding relationship, the combination of image processing modules corresponding to the current visual task model and the parameters of the image processing modules in the combination of image processing modules can be determined.
  • the combinations of image processing modules corresponding to different visual task models may be the same or different.
  • the combinations of image processing modules corresponding to the two vision task models are the same, but the parameters of the image processing modules in the combination of image processing modules are different.
  • the combination of the visual task model and the image processing module there is a corresponding relationship between the combination of the visual task model and the image processing module, the weight of the image processing module, and the parameters in the image processing module.
  • the combination of the image processing modules corresponding to the visual task model, the weight of the image processing modules, and the parameters in the image processing modules can be determined.
  • the combinations of image processing modules corresponding to different visual task models may be the same or different.
  • the weights of the image processing modules in the combination of image processing modules may be the same or different.
  • the parameters of the image processing modules in the combination of image processing modules may be the same or different.
  • different visual task models correspond to different image processing module configurations.
  • the image processing module can adaptively match the visual task model, making the image processing flow more suitable for the visual task model. , which is beneficial to improve the performance of the vision task model.
  • the device of the embodiment of the present application will be described below with reference to FIG. 7 to FIG. 8 . It should be understood that the device described below can execute the method of the aforementioned embodiment of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the device of the embodiment of the present application below.
  • FIG. 7 is a schematic block diagram of an image processing device according to an embodiment of the present application.
  • the image processing device 4000 shown in FIG. 7 includes an acquisition unit 4010 and a processing unit 4020 .
  • the acquisition unit 4010 and the processing unit 4020 may be used to execute the image processing method of the embodiment of the present application.
  • the apparatus 4000 may be used to execute the method 300 or the method 400 .
  • the acquiring unit 4010 is configured to acquire the first image.
  • the processing unit 4020 is used to: process the first image through at least one image processing module to obtain a second image; input the second image into the visual task model for processing; adjust at least one image processing module according to the processing result of the visual task model .
  • At least one image processing module includes multiple image processing modules, and the processing unit 4020 is specifically configured to:
  • Part of the image processing modules in the plurality of image processing modules are deleted according to the processing results of the visual task model.
  • the processing unit 4020 is specifically configured to: adjust the weights of multiple image processing modules according to the processing results of the visual task model, and the weights of the multiple image processing modules are used to process the processing results of the multiple image processing modules Perform processing to obtain a second image; delete part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.
  • the processing unit 4020 is specifically configured to: adjust parameters in at least one image processing module according to a processing result of the visual task model.
  • the processing unit 4020 is specifically configured to: adjust a processing sequence of at least one image processing module according to a processing result of the visual task model.
  • At least one image processing module includes: a black level compensation module, a green balance module, a dead point correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma Horse correction module or noise reduction and sharpening module.
  • the apparatus 4000 may be used to execute the method 700 .
  • the acquiring unit 4010 is configured to acquire a third image.
  • the processing unit 4020 is configured to: determine at least one target image processing module according to the visual task model; process the third image through at least one target image processing module to obtain the fourth image; process the fourth image through the visual task model to obtain the fourth image Four image processing results.
  • the processing unit 4020 is specifically configured to: determine at least one target image processing module from multiple candidate image processing modules according to the visual task model.
  • the processing unit 4020 is specifically configured to: determine parameters in at least one target image processing module according to the visual task model.
  • the processing unit 4020 is specifically configured to: determine a processing sequence of at least one target image processing module according to the visual task model.
  • At least one target image processing module includes: a black level compensation module, a green balance module, a dead point correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, a color correction module, Gamma Correction Module or Noise Reduction and Sharpening Module.
  • unit here may be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit or a combination of both to realize the above functions.
  • the hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
  • ASICs application specific integrated circuits
  • processors such as shared processors, dedicated processors, or group processors for executing one or more software or firmware programs. etc.
  • memory incorporating logic, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • FIG. 8 is a schematic diagram of a hardware structure of an image processing device provided by an embodiment of the present application.
  • the image processing apparatus 6000 shown in FIG. 8 (the apparatus 6000 may specifically be a computer device) includes a memory 6001 , a processor 6002 , a communication interface 6003 and a bus 6004 .
  • the memory 6001 , the processor 6002 , and the communication interface 6003 are connected to each other through a bus 6004 .
  • the memory 6001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 6001 may store programs, and when the programs stored in the memory 6001 are executed by the processor 6002, the processor 6002 is configured to execute various steps of the image processing method of the embodiment of the present application. Specifically, the processor 6002 may execute the method 300, the method 400 or the method 700 above.
  • the processor 6002 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the image processing method of the method embodiment of the present application.
  • the processor 6002 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the image processing method of the present application may be completed by an integrated logic circuit of hardware in the processor 6002 or instructions in the form of software.
  • the above-mentioned processor 6002 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 6001, and the processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required by the units included in the device shown in Figure 7, or execute the image processing method of the method embodiment of the present application .
  • the communication interface 6003 implements communication between the apparatus 6000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, training data can be obtained through the communication interface 6003 .
  • the bus 6004 may include pathways for transferring information between various components of the device 6000 (eg, memory 6001 , processor 6002 , communication interface 6003 ).
  • the above device 6000 only shows a memory, a processor, and a communication interface, those skilled in the art should understand that the device 6000 may also include other devices necessary for normal operation during specific implementation. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 6000 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the device 6000 may also only include the components necessary to realize the embodiment of the present application, and does not necessarily include all the components shown in FIG. 8 .
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable medium stores program code for device execution, and the program code includes the image processing method used in the embodiment of the present application.
  • the embodiment of the present application further provides a computer program product including instructions, and when the computer program product is run on a computer, the computer is made to execute the image processing method in the embodiment of the present application.
  • the embodiment of the present application also provides a chip, the chip includes a processor and a data interface, and the processor reads the instructions stored in the memory through the data interface, and executes the image processing method in the embodiment of the present application.
  • the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect or the second aspect.
  • the aforementioned chip may specifically be an FPGA or an ASIC.
  • the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory Access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product comprises one or more computer instructions or computer programs.
  • the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one means one or more, and “multiple” means two or more.
  • At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

本申请提供了一种图像处理方法及装置,涉及人工智能领域,具体涉及计算机视觉领域。该方法包括:通过至少一个图像处理模块对输入图像进行处理,并将处理结果作为视觉任务模型的输入,根据视觉任务模型的处理结果调整该至少一个图像处理模块。本申请的方案能够获得适合视觉任务模型的图像处理流程,有利于提高视觉任务模型的性能。

Description

图像处理方法及装置 技术领域
本申请涉及计算机视觉领域,并且更具体地,涉及一种图像处理方法及装置。
背景技术
计算机视觉是各个应用领域,如制造业、检验、文档分析、医疗诊断,和军事等领域中各种智能/自主***中不可分割的一部分,它是一门关于如何运用照相机/摄像机和计算机来获取我们所需的,被拍摄对象的数据与信息的学问。形象地说,就是给计算机安装上眼睛(照相机/摄像机)和大脑(算法)用来代替人眼对目标进行识别、跟踪和测量等,从而使计算机能够感知环境。计算机视觉可以看作是研究如何使人工***从图像或多维数据中“感知”的科学。总的来说,计算机视觉就是用各种成象***代替视觉器官获取输入信息,再由计算机来代替大脑对这些输入信息完成处理和解释。
计算机视觉任务包括图像分类、目标检测、目标跟踪以及目标分割等任务。在实际应用中,通常先对生(raw)图进行一系列的图像信号处理(image signal processing,ISP),输出可视化的图像。该可视化的图像可以作为计算机视觉任务的输入图像。然而,ISP的目的通常是为了满足人的视觉需求。实际上,经过一系列的图像信号处理后得到的图像能够满足人的视觉需求,但基于该图像执行视觉任务不一定能得到理想的处理结果。
发明内容
本申请提供一种图像处理方法及装置,能够获得适合视觉任务的图像处理流程,提高视觉任务模型的性能。
第一方面,提供了一种图像处理方法,该方法包括:获取第一图像;通过至少一个图像处理模块对第一图像进行处理,得到第二图像;将第二图像输入至视觉任务模型中进行处理;根据视觉任务模型的处理结果调整至少一个图像处理模块。
在本申请实施例的方案中,根据视觉任务模型的处理结果调整图像处理流程,有利于得到适合视觉任务的图像,以保证视觉任务模型的性能。本申请实施例的方案能够根据不同的应用场景的需求调整图像处理流程,以适应不同的应用场景。
示例性地,第一图像可以为传感器获取的raw图。
图像处理模块用于对输入图像进行图像信号处理。
示例性地,第二图像可以为RGB图像。
可选地,通过至少一个图像处理模块对第一图像进行处理,得到第二图像,包括:通过至少一个图像处理模块和该至少一个图像处理模块的权重对第一图像进行处理,得到第二图像。
具体地,根据该至少一个图像处理模块的权重对该至少一个图像处理模块的处理结果进行调整,得到第二图像。
示例性地,视觉任务包括:目标检测、图像分类、目标分割、目标跟踪或图像识别等。
视觉任务模型用于执行视觉任务。例如,视觉任务为目标检测,则视觉任务模型为目标检测模型。再如,视觉任务为图像识别,则视觉任务模型为图像识别模型。
视觉任务模型可以为训练好的模型。
视觉任务模型的处理结果可以包括视觉任务模型的性能指标。
示例性地,视觉任务模型的性能指标包括推理的准确度或损失函数的值等。损失函数可以根据需要设置。损失函数用于指示视觉任务模型的推理结果与第一图像对应的真值之间的差异。需要说明的是,此处的损失函数可以采用视觉任务模型训练过程中的损失函数,或者,也可以采用其他形式的损失函数。
例如,视觉任务为目标检测,则视觉任务模型的处理结果可以包括检测准确度。
再如,视觉任务为目标分割,则视觉任务模型的处理结果可以包括分割准确度。
视觉任务模型可以采用神经网络模型,或者,也可以采用非神经网络模型。
根据视觉任务模型的处理结果调整该至少一个图像处理模块,以使视觉任务模型的处理结果尽可能接近预期。
示例性地,可以采用贝叶斯优化方法、RNN模型或强化学习算法等方式调整至少一个图像调整模块。
结合第一方面,在第一方面的某些实现方式中,根据视觉任务模型的处理结果调整至少一个图像处理模块,包括:根据图像处理的时间和视觉任务模型的处理结果调整该至少一个图像处理模块。
图像处理的时间可以为视觉任务模型的处理时间,或者,也可以为该至少一个图像处理模块的处理时间,或者,也可以为视觉任务模型的处理时间和该至少一个图像处理模块的处理时间的总和。
这样,能够在保证视觉任务模型的性能的前提下,提高处理速度,降低时延。
结合第一方面,在第一方面的某些实现方式中,至少一个图像处理模块包括多个图像处理模块,根据视觉任务模型的处理结果调整至少一个图像处理模块,包括:更改该至少一个图像处理模块。
更改该至少一个图像处理模块,可以包括:删除该至少一个图像处理模块中的部分图像处理模块或/和增加其他图像处理模块。
在本申请实施例的方案中,根据视觉任务模型的处理结果更改图像处理模块的组合,能够获得更适合视觉任务模型的图像处理模块的组合,有利于提高视觉任务模型的性能。
结合第一方面,在第一方面的某些实现方式中,至少一个图像处理模块包括多个图像处理模块,根据视觉任务模型的处理结果调整至少一个图像处理模块,包括:根据视觉任务模型的处理结果删除多个图像处理模块中的部分图像处理模块。
在本申请实施例的方案中,根据视觉任务模型的处理结果删除部分图像处理模块,能够减少图像处理所需的时间,提高处理速度,减少对计算力的要求。
结合第一方面,在第一方面的某些实现方式中,根据视觉任务模型的处理结果删除多个图像处理模块中的部分图像处理模块,包括:根据视觉任务模型的处理结果调整多个图像处理模块的权重,多个图像处理模块的权重用于对多个图像处理模块的处理结果进行处理,得到第二图像;根据调整后的多个图像处理模块的权重删除多个图像处理模块中的部 分图像处理模块。
本申请实施例的方案中,根据各个图像处理模块的权重确定被删除的图像处理模块,删除权重值相对较小的图像处理模块,这样对视觉任务模型的处理结果影响较小,删除之后对视觉任务模型的性能的影响较小。也就是说,本申请实施例的方案能够在保证视觉任务模型的性能的前提下,减少不必要的运算,降低计算开销,提高处理速度。
示例性地,该多个图像处理模块为m个图像处理模块。m为大于1的整数。从该m个图像处理模块中删除调整后的权重最小的n个图像处理模块。n为大于1且小于m的整数。
可替换地,从该m个图像处理模块中删除调整后的权重小于或等于权重阈值的图像处理模块。
结合第一方面,在第一方面的某些实现方式中,根据视觉任务模型的处理结果调整至少一个图像处理模块,包括:根据视觉任务模型的处理结果调整至少一个图像处理模块中的参数。
在本申请实施例的方案中,根据视觉任务模型的处理结果调整图像处理模块中的参数,能够获得更适合视觉任务的图像处理模块,有利于提高视觉任务的准确度。
结合第一方面,在第一方面的某些实现方式中,根据视觉任务模型的处理结果调整至少一个图像处理模块,包括:根据视觉任务模型的处理结果从多个图像处理模块中删除部分图像处理模块;通过多个图像处理模块中未被删除的图像处理模块对第五图像进行处理,得到第六图像,将第六图像输入至视觉任务模型中进行处理;根据视觉任务模型的处理结果调整未被删除的图像处理模块的参数。
根据本申请实施例的方案,利用视觉任务模型得到的性能指标,例如,目标检测的准确度、目标分割准确率等,调整多个图像处理模块的权重,保留对视觉任务模型的性能指标影响较大的图像处理模块,或者说,保留能够维持或提升视觉任务模型的性能指标的图像处理模块。这样,能够得到适合视觉任务模型的图像处理模块,或者说,得到视觉任务模型所需的图像处理模块,减少了图像处理流程所需的时间,节省计算开销,减少计算力的需求,对硬件更加友好。
而且,利用视觉任务模型得到的性能指标调整被保留的图像处理模块中的参数,例如,利用视觉任务模型得到的性能指标对图像处理模块进行设计空间的搜索,有利于得到各个图像处理模块最优的参数配置,以提升视觉任务模型的性能。
结合第一方面,在第一方面的某些实现方式中,根据视觉任务模型的处理结果调整至少一个图像处理模块,包括:根据视觉任务模型的处理结果调整至少一个图像处理模块的处理顺序。
在本申请实施例的方案中,根据视觉任务模型的处理结果调整图像处理模块的处理顺序,能够获得更适合视觉任务的图像处理流程,有利于提高视觉任务的准确度。
结合第一方面,在第一方面的某些实现方式中,至少一个图像处理模块包括:黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
该至少一个图像处理模块中的任一图像处理模块可以采用神经网络算法实现,或者,也可以采用非神经网络算法实现。
第二方面,提供了一种图像处理方法,该方法包括:获取第三图像;根据视觉任务模型确定至少一个目标图像处理模块;通过至少一个目标图像处理模块对第三图像进行处理,得到第四图像;通过视觉任务模型对第四图像进行处理,得到第四图像的处理结果。
根据本申请实施例的方案,不同的视觉任务模型对应不同的图像处理模块的配置,当视觉任务模型发生变化时,图像处理模块能够自适应匹配视觉任务模型,使得图像处理流程更适合视觉任务模型,有利于提高视觉任务模型的性能。
示例性地,第三图像可以为传感器获取的raw图。
第四图像的处理结果也可以理解为第三图像的处理结果。
第四图像的处理结果即为视觉任务模型的推理结果。
该至少一个目标图像处理模块是与视觉任务模型对应的一个或多个图像处理模块。
示例性地,视觉任务包括:目标检测、图像分类、目标分割、目标跟踪或图像识别等。
视觉任务模型用于执行视觉任务。例如,视觉任务为目标检测,则视觉任务模型为目标检测模型。再如,视觉任务为图像识别,则视觉任务模型为图像识别模型。
视觉任务模型可以为训练好的模型。
在不同的应用场景中,可以采用不同的视觉任务模型,相应地,根据不同的视觉任务模型即可确定与该视觉任务模型匹配的至少一个目标图像处理模块。这样,可以根据不同的应用场景选用不同的图像处理模块。
视觉任务模型和图像处理模块的配置之间具有对应关系。根据视觉任务模型和图像处理模块的配置之间的对应关系可以确定与当前的视觉任务模型匹配的图像处理模块的配置。
示例性地,图像处理模块的配置包括以下至少一项:图像处理模块的组合、图像处理模块的权重、图像处理模块的处理顺序或者图像处理模块中的参数。
结合第二方面,在第二方面的某些实现方式中,根据视觉任务模型确定至少一个目标图像处理模块,包括:根据视觉任务模型从多个候选图像处理模块中确定至少一个目标图像处理模块。
根据本申请实施例的方案,不同的视觉任务模型对应不同的图像处理模块的组合,当视觉任务模型发生变化时,图像处理模块的组合能够自适应匹配视觉任务模型,使得当前的图像处理模块的组合更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
而且,根据视觉任务模型从多个候选图像处理模块中选择适合的图像处理模块,无需使用所有的候选图像处理模块对图像进行处理,减少了处理流程,降低了对计算力的要求。
视觉任务模型和图像处理模块的组合之间具有对应关系。根据该对应关系即可确定当前视觉任务模型对应的图像处理模块的组合,或者说,根据该对应关系即可确定用于该视觉任务模型所需的图像处理模块,即该至少一个目标图像处理模块。
结合第二方面,在第二方面的某些实现方式中,根据视觉任务模型确定至少一个目标图像处理模块,包括:根据视觉任务模型确定至少一个目标图像处理模块的权重,至少一个目标图像处理模块的权重用于对至少一个目标图像处理模块的处理结果进行处理,得到第四图像。
根据本申请实施例的方案,不同的视觉任务模型对应不同的图像处理模块的权重,当视觉任务模型发生变化时,图像处理模块的权重能够自适应匹配视觉任务模型,使得当前 的图像处理模块的权重更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
结合第二方面,在第二方面的某些实现方式中,根据视觉任务模型确定至少一个目标图像处理模块,包括:根据视觉任务模型确定至少一个目标图像处理模块中的参数。
根据本申请实施例的方案,不同的视觉任务模型对应不同的图像处理模块中的参数,当视觉任务模型发生变化时,图像处理模块中的参数能够自适应匹配视觉任务模型,使得当前的图像处理模块中的参数更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
视觉任务模型和图像处理模块中的参数之间具有对应关系。根据视觉任务模型可以确定视觉任务模型对应的图像处理模块中的参数,即该至少一个目标图像处理模块中的参数。
结合第二方面,在第二方面的某些实现方式中,其特征在于,根据视觉任务模型确定至少一个目标图像处理模块,包括:根据视觉任务模型确定至少一个目标图像处理模块的处理顺序。
根据本申请实施例的方案,不同的视觉任务模型对应不同的图像处理模块的处理顺序,当视觉任务模型发生变化时,图像处理模块的处理顺序能够自适应匹配视觉任务模型,使得当前的图像处理模块的处理顺序更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
视觉任务模型和图像处理模块的处理顺序之间具有对应关系。根据该对应关系可以确定当前视觉任务模型对应的图像处理模块的处理顺序,即该至少一个目标图像处理模块的处理顺序。
结合第二方面,在第二方面的某些实现方式中,至少一个目标图像处理模块包括:黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
第三方面,提供了一种图像处理装置,该装置包括用于执行上述第一方面以及第一方面中的任意一种实现方式中的方法的模块或单元。
第四方面,提供了一种图像处理装置,该装置包括用于执行上述第二方面以及第二方面中的任意一种实现方式中的方法的模块或单元。
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面、第三方面和第四方面中相同的内容。
第五方面,提供了一种图像处理装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面以及第一方面中的任意一种实现方式中的方法。
上述第五方面中的处理器既可以是中央处理器(central processing unit,CPU),也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing unit,NPU)和张量处理器(tensor processing unit,TPU)等等。其中,TPU是谷歌(***)为机器学习全定制的人工智能加速器专用集成电路。
第六方面,提供了一种图像处理装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执 行第二方面以及第二方面中的任意一种实现方式中的方法。
上述第六方面中的处理器既可以是中央处理器,也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器、神经网络处理器和张量处理器等等。其中,TPU是谷歌为机器学习全定制的人工智能加速器专用集成电路。
第七方面,提供一种计算机可读存储介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面或第二方面中的任意一种实现方式中的方法。
第八方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面中的任意一种实现方式中的方法。
第九方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面中的任意一种实现方式中的方法。
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或者专用集成电路(application-specific integrated circuit,ASIC)。
附图说明
图1为本申请实施例提供的一种***架构的结构示意图;
图2为本申请实施例提供的一种图像处理流程的示意图;
图3为本申请实施例提供的一种图像处理方法的示意性流程图;
图4为本申请实施例提供的另一种图像处理流程的示意图;
图5为本申请实施例提供的又一种图像处理流程的示意图;
图6为本申请实施例提供的另一种图像处理方法的示意性流程图;
图7是本申请实施例提供的一种图像处理装置的示意性框图;
图8是本申请实施例提供的另一种图像处理装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例可以应用在自动驾驶、图像分类、图像检索、图像语义分割、图像质量增强、图像超分辨率、监控、目标跟踪、目标检测等需要执行视觉任务的领域。
具体而言,本申请实施例的方法能够应用在图片分类和监控场景中,下面分别对这两种应用场景进行简单的介绍。
图片分类:
当用户在终端设备(例如,手机)或者云盘上存储了大量的图片时,通过对相册中图像进行识别可以方便用户或者***对相册进行分类管理,提升用户体验。
利用本申请实施例的图像处理方法,能够获得适合执行分类任务的图像,提高分类的准确率。此外,能够减少图像处理流程,降低硬件开销,对终端设备更友好,提高对图片进行分类的速度,有利于实时为不同的类别的图片打上标签,便于用户查看和查找。另外, 这些图片的分类标签也可以提供给相册管理***进行分类管理,节省用户的管理时间,提高相册管理的效率,提升用户体验。
监控:
监控场景包括:智慧城市、野外监控、室内监控、室外监控、车内监控等。其中,智慧城市场景下,需要进行多种属性识别,例如行人属性识别和骑行属性识别,深度神经网络凭借着其强大的能力在多种属性识别中发挥着重要的作用。
通过采用本申请实施例的图像处理方法,能够获得适合执行属性识别任务的图像,提高识别的准确率。此外,能够减少图像处理流程,降低硬件开销,提高处理效率,有利于对输入的道路画面进行实时处理,更快地识别出道路画面中的不同的属性信息。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021102739-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。
f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号变换为输出信号。该激活函数的输出信号可以作为下一层的输入。例如,激活函数可以是ReLU,tanh或sigmoid函数。
神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021102739-appb-000002
其中,
Figure PCTCN2021102739-appb-000003
是输入向量,
Figure PCTCN2021102739-appb-000004
是输出向量,
Figure PCTCN2021102739-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021102739-appb-000006
经过如此简单的操作得到输出向量。由于DNN层数多,系数W和偏移向量
Figure PCTCN2021102739-appb-000007
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021102739-appb-000008
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021102739-appb-000009
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩 阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)循环神经网络
循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法(back propagation through time,BPTT)。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(5)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与 真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。通常地,loss越小,该深度神经网络的训练质量越高,loss越大,深度神经网络的训练质量越低。类似的,loss波动越小,训练越稳定;loss波动越大,训练越不稳定。
如图1所示,本申请实施例提供了一种***架构100。在图1中,数据采集设备170用于采集训练数据。例如,针对本申请实施例的图像处理方法来说,训练数据可以包括训练图像以及训练图像对应的真值(ground truth)。例如,若视觉任务为图像分类任务,则训练图像对应的真值可以为训练图像对应的分类结果,训练图像的分类结果可以是人工预先标注的结果。
在采集到训练数据之后,数据采集设备170将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。该目标模型/规则101即为视觉任务所使用的模型。例如,视觉任务为图像分类任务,则该目标模型/规则101可以为用于图像分类的网络模型。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的原始数据进行处理,将输出值与目标值进行对比,直到训练设备120输出的值与目标值的差值小于一定的阈值,从而完成目标模型/规则101的训练。
本申请实施例中的目标模型/规则101具体可以为神经网络模型。例如,卷积神经网络或残差网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备170的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的***或设备中,如应用于图1所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图1中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,输入数据在本申请实施例中可以包括:客户设备输入的待处理的数据。示例性地,输入数据在本申请实施例中可以包括raw图。
预处理模块113用于根据I/O接口112接收到的输入图像进行预处理,在本申请实施例中,预处理模块113可以用于对输入图像进行一系列的图像信号处理。预处理模块113中可以包括一个或多个图像处理模块。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储***150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储***150中。
最后,I/O接口112将处理结果,如上述得到的数据的处理结果返回给客户设备140,从而提供给用户。
值得说明的是,训练设备120可以针对不同的目标或不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图1中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图1仅是本申请实施例提供的一种***架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图1中,数据存储***150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储***150置于执行设备110中。
如图1所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是本申请中的神经网络模型,具体的,本申请实施例的神经网络模型可以为CNN或残差网络等。
图像信号处理器对传感器获取的raw图像经过一系列的处理之后,输出可视化的图像。这些图像可以作为视觉任务的输入图像。具体地,在视觉任务中可以利用神经网络算法或者非神经网络算法对输入图像进行处理,得到视觉任务的相关结果。
图2示出了一种视觉任务的整体处理流程的示意图。将raw图作为输入图像,对该输入图像进行一系列的图像信号处理,输出8比特(bit)的可视化的红绿蓝(red green blue,RGB)图像。将该RGB图像作为视觉任务的输入图像,得到视觉任务的处理结果。例如,如图2所示,图像信号处理模块包括黑电平补偿(black level compensation)模块、绿平衡(green balance)模块、坏点修正(bad pixel correction)模块、去马赛克(demosaic)模块、拜耳降噪(bayer denoise)模块、自动白平衡(auto white balance)模块、色彩校正(color correction)模块、伽马校正(gamma correction)模块以及降噪和锐化(denoise sharpness)模块等。图像信号处理模块可以采用非神经网络算法,也可以采用神经网络算法。
视觉任务的输入图像通常为经过图像信号处理的RGB图像。传统的图像信号处理的目的通常是为了满足人的视觉需求,基于该图像执行视觉任务得到的结果不一定是最优的结果。
本申请实施例提供了一种图像处理方法,根据视觉任务的处理结果调整视觉任务之前的图像处理流程,以得到满足需要的图像处理流程。
下面结合图3至图6对本申请实施例中的图像处理方法进行详细的描述。
图3示出了本申请实施例提供的图像处理方法300。图3所示的方法可以由计算装置来执行,该装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器、手机、摄像头、车辆、无人机或机器人等装置,也可以是由云服务设备和终端设备构成的***。
示例性地,方法300可以由训练设备或推理设备执行,例如,方法300可以由CPU、GPU或NPU等加速器执行。进一步地,加速器芯片可以位于FPGA、芯片仿真器(Emulator)或开发板(evaluation board,EVB)上。
或者,方法300可以由硬件装置(例如,摄像头或相机)的ISP流水线(pipeline)的调优工具或校准工具执行。
方法300包括步骤S301至步骤S304。下面对步骤S301至步骤S304进行详细介绍。
S301,获取第一图像。
示例性地,第一图像可以为传感器获取的raw图。
训练数据集中包括多个图像,第一图像为训练数据集中的任一图像。在实际应用中,可以基于训练数据集中的多个图像多次执行方法300,直至得到需要的图像处理模块。
示例性地,训练数据集可以采用开源数据集。或者,训练数据集也可以是自行制作的数据集。
示例性地,训练数据集可以是预先存储的。例如,该训练数据集可以是图1所示的数据库130中维护的训练数据。或者,训练数据集也可以用户输入的数据。
S302,通过至少一个图像处理模块对第一图像进行处理,得到第二图像。
图像处理模块用于对输入图像进行图像信号处理。
示例性地,该至少一个图像处理模块可以位于图像信号处理器上。也就是说,由图像处理器中的图像处理模块执行步骤S302。
该至少一个图像处理模块中的任一图像处理模块可以采用神经网络算法实现,或者,也可以采用非神经网络算法实现。本申请实施例对图像处理模块的具体实现方式不做限定。
可选地,该至少一个图像处理模块可以包括:黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、Bayer降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪和锐化模块。
例如,如图4所示,将raw图作为第一图像,该至少一个图像处理模块包括9个图像处理模块,分别为黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、Bayer降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块以及降噪和锐化模块。该9个图像处理模块依次执行黑电平补偿、绿平衡处理、坏点修正、去马赛克、Bayer降噪、自动白平衡处理、色彩校正、伽马校正以及降噪和锐化。
示例性地,黑电平模块、绿平衡模块和坏点修正模块可以用于对raw数据进行处理。去马赛克模块和Bayer降噪模块可以用于执行去马赛克处理。自动白平衡模块、色彩校正模块、伽马校正模块以及降噪和锐化模块可以用于执行图像增强处理。
例如,如图4所示,第二图像可以为RGB图像。进一步地,第二图像可以为8bit的RGB图像。此处仅为示例,第二图像的类型也可以根据视觉任务模型的输入需要设置。
可选地,步骤S302包括:通过至少一个图像处理模块和该至少一个图像处理模块的权重对第一图像进行处理,得到第二图像。
具体地,根据该至少一个图像处理模块的权重对该至少一个图像处理模块的处理结果进行调整,得到第二图像。
示例性地,图像处理模块对输入该模块的图像进行处理,可以为调整输入该模块的图 像的全部或部分像素的像素值,也就是使全部或部分像素的像素值发生变化。在该情况下,可以根据该图像处理模块的权重对全部或部分像素的像素值的变化量进行调整。
例如,将该图像处理模块的权重与像素值的变化量相乘,得到调整后的像素的变化量,进而得到该模块的输出图像。若该图像处理模块的权重为0,则相当于该图像处理模块没有参与图像处理流程。
权重的具体取值可以根据需要设定,例如,权重可以为大于或等于0,且小于等于1的实数。
进一步地,在设置权重时可以对该至少一个图像处理模块的权重进行归一化处理,也就是使该至少一个图像处理模块的权重的总和为1,或使该至少一个图像处理模块的权重总和接近1。
如图4所示,9个图像处理模块的权重分别为w1、w2、w3、w4、w5、w6、w7、w8和w9。权重的取值范围为大于或等于0,且小于等于1的实数。这样,w1、w2、w3、w4、w5、w6、w7、w8和w9的最大的可能的总和为9。或者,也可以对该9个权重进行归一化处理,这样可以使该9个权重的总和为1。
S303,将第二图像输入至视觉任务模型中进行处理。
示例性地,视觉任务包括:目标检测、图像分类、目标分割、目标跟踪或图像识别等。
视觉任务模型用于执行视觉任务。例如,视觉任务为目标检测,则视觉任务模型为目标检测模型。再如,视觉任务为图像识别,则视觉任务模型为图像识别模型。
视觉任务模型可以为训练好的模型。
视觉任务模型的输出的类型与视觉任务的类型有关。视觉任务模型的输出即为视觉任务模型的推理结果。
例如,视觉任务为目标检测,则视觉任务模型的输出可以为第二图像上的目标框以及该目标框中的物体的类别。再如,视觉任务为图像分类,则视觉任务模型的输出可以为第二图像的类别。
视觉任务模型的处理结果可以包括视觉任务模型的性能指标。
示例性地,视觉任务模型的性能指标包括推理的准确度或损失函数的值等。损失函数可以根据需要设置。损失函数用于指示视觉任务模型的推理结果与第一图像对应的真值之间的差异。需要说明的是,此处的损失函数可以采用视觉任务模型训练过程中的损失函数,或者,也可以采用其他形式的损失函数。
例如,视觉任务为目标检测,则视觉任务模型的处理结果可以包括检测准确度。
将第二图像输入视觉任务模型中进行处理,将得到的检测结果与第一图像对应的真值比较,得到两者之间的误差,根据两者之间的误差确定检测准确度。
再如,视觉任务为目标分割,则视觉任务模型的处理结果可以包括分割准确度。
将第二图像输入视觉任务模型中进行处理,将得到的分割结果与第一图像对应的真值比较,得到两者之间的误差,根据两者之间的误差确定分割准确度。
视觉任务模型可以采用神经网络模型,或者,也可以采用非神经网络模型。神经网络模型可以是现有的神经网络模型,例如,残差网络。或者,该神经网络模型也可以是自行构建的其他结构的神经网络模型。本申请实施例对此不作限定。
需要说明的是,对于相同的视觉任务,在不同的应用场景下,可能采用不同的视觉任 务模型。例如,对于驾驶场景中的目标检测任务,在曝光过度和曝光不足的情况下采用的视觉任务模型可能是相同的,也可能是不同的。在驾驶的过程中,若当前场景被识别为曝光过度,可以采用第一目标检测模型,若当前场景被识别为曝光不足,可以采用第二目标检测模型。第一目标检测模型和第二目标检测模型为不同的目标检测模型。
示例性地,视觉任务模型的处理过程可以由图1中的计算模块111执行。
视觉任务模型可以部署于方法300的执行设备上,也可以部署于其他设备上。也就是说,视觉任务模型的处理过程可以由方法300的执行设备执行,也可以由其他设备执行,并将处理结果反馈至方法300的执行设备上。
S304,根据视觉任务模型的处理结果调整该至少一个图像处理模块。
根据视觉任务模型的处理结果调整该至少一个图像处理模块,以使视觉任务模型的处理结果尽可能接近预期。
或者说,根据视觉任务模型的性能指标调整该至少一个图像处理模块,以提高视觉任务模型的性能。
例如,视觉任务模型的性能指标为视觉任务模型的推理的准确度,则调整该至少一个图像处理模块,以提高模型的推理的准确度。
再如,视觉任务模型的性能指标为视觉任务模型的损失函数的值,则调整该至少一个图像处理模块,以减少视觉任务模型的损失函数的值。
在实际应用中可以基于训练数据集中多张图像执行方法300,直至满足预设条件。也就是说在实际应用中可以基于多张图像不断迭代调整图像处理模块。每一次迭代过程中采用的图像处理模块为上一次迭代后得到的图像处理模块。
预设条件可以根据需要设置,后文中会在方式1、方式2、方式3和方式4中举例说明。
进一步地,还可以根据图像处理的时间和视觉任务模型的处理结果调整该至少一个图像处理模块。
图像处理的时间可以为视觉任务模型的处理时间,或者,也可以为该至少一个图像处理模块的处理时间,或者,也可以为视觉任务模型的处理时间和该至少一个图像处理模块的处理时间的总和。
这样,能够在保证视觉任务模型的性能的前提下,提高处理速度,降低时延。
在本申请实施例的方案中,根据视觉任务模型的处理结果调整图像处理流程,有利于得到适合视觉任务的图像,以保证视觉任务模型的性能。
本申请实施例的方案能够根据不同的应用场景的需求调整图像处理流程,以适应不同的应用场景。
相同的视觉任务,在不同的应用场景下,可能采用不同的视觉任务模型。例如,对于驾驶场景中的目标检测任务,在曝光过度和曝光不足的情况下采用的视觉任务模型可能是相同的,也可能是不同的。在驾驶的过程中,若当前场景被识别为曝光过度,可以采用第一目标检测模型作为视觉任务模型。若当前场景被识别为曝光不足,可以采用第二目标检测模型作为视觉任务模型。本申请实施例的方案可以分别针对第一目标检测模型和第二目标检测模型的处理结果调整图像处理流程,以分别得到适合第一目标检测模型的图像处理流程和适合第二目标检测模型的图像处理流程。
步骤S304可以采用多种方式实现,下面以其中四种方式(方式1、方式2、方式3和方式4)为例进行说明。
方式1
可选地,该至少一个图像处理模块包括多个图像处理模块,步骤S304包括:根据视觉任务模型的处理结果调整该多个图像处理模块的权重。
根据视觉任务模型的处理结果调整该多个图像处理模块的权重,以提高视觉任务模型的性能。
如前所述,实际应用中可以基于训练数据集中多张图像执行方法300以实现对该多个图像处理模块的权重进行迭代调整,直至满足预设条件。满足预设条件后停止调整该多个图像处理模块的权重,或者说,停止刷新该多个图像处理模块的权重。
示例性地,预设条件可以为该多个图像处理模块的权重收敛。
在该多个图像处理模块的权重收敛的情况下,不再执行方法300,即停止调整该多个图像处理模块的权重。权重收敛也可以理解为在连续执行多次方法300后得到的权重梯度变化较小。例如,连续执行多次方法300后得到的权重梯度的变化量小于或等于第一阈值的情况下,停止调整该多个图像处理模块的权重。
可替换地,预设条件可以为视觉任务模型的准确度大于或等于第二阈值。
在视觉任务模型的准确度大于或等于第二阈值的情况下,不再执行方法300,即停止调整该多个图像处理模块的权重。
第二阈值可以为预设值。或者,第二阈值可以是在没有设置图像处理模块的权重的情况下得到的视觉任务模型的推理的准确度。例如,如图4所示,第二阈值可以在该9个图像处理模块没有设置权重的情况下的视觉任务模型的推理的准确度。或者可以理解为,第二阈值可以为在该9个图像处理模块的权重为1的情况下的视觉任务模型的推理的准确度。
也就是说,将图像输入原始的图像处理模块中进行处理,并将处理后的图像输入至视觉任务模型中进行处理,并计算推理的准确度,将该准确度作为第二阈值。执行方法300,即将图像输入当前调整过权重的图像处理模块中进行处理,并将处理后的图像输入至视觉任务模型中进行处理,并计算推理的准确度,将当前得到的推理的准确度与第二阈值进行比较,在当前得到的推理的准确度大于或等于第二阈值的情况下,不再执行方法300。这样,利用调整后的图像处理模块对图像进行处理,能够保证视觉任务模型的性能,或者能够提高视觉任务模型的性能。
可替换地,预设条件可以为连续执行多次方法300后得到的视觉任务模型的损失函数值的变化量小于或等于第三阈值。
也就是说,在视觉任务模型的损失函数值的变化趋于稳定的情况下,不再执行方法300。
可替换地,预设条件可以为迭代次数大于或等于第四阈值。
也就是说,在执行方法300的次数大于或等于第四阈值的情况下,不再执行方法300。
应理解,上述预设条件可以结合使用。例如,预设条件可以为视觉任务模型的准确度大于或等于第二阈值,且迭代次数大于或等于第四阈值。再如,预设条件可以为该多个图像处理模块的权重收敛,且视觉任务模型的准确度大于或等于第二阈值。
应理解,以上仅为示例,预设条件还可以为其他形式的条件,本申请对此不做限定。
示例性地,可以采用贝叶斯优化方法、RNN模型或强化学习算法等方式调整该多个图像处理模块的权重。
下面以贝叶斯优化方法为例进行说明。
例如,视觉任务模型为目标检测模型,视觉任务模型的性能指标可以为平均准确度(mean average precision,mAP)。通过贝叶斯优化方法调整该多个图像处理模块的权重,以提高目标检测模型的mAP。或者说,以目标检测模型的mAP最大化为目标调整该多个图像处理模块的权重。
平均准确度指的是对于所有目标物体的检测准确度的平均值。
将训练数据集中的图像输入目标检测模型中,得到该图像的检测准确度。将该图像的检测准确度输入贝叶斯优化模型中,贝叶斯优化模型调整各个图像处理模块的权重。
进一步地,该图像的检测准确度可以保留在贝叶斯优化模型中。也就是说,当训练数据集中的其他图像输入至目标检测模型中,得到其他图像的检测准确度。贝叶斯优化模型可以根据其他图像的检测准确度以及之前的图像的检测准确度调整各个图像处理模块的权重。
需要说明的是本申请实施例中的训练数据集用于训练各个图像处理模块,与视觉任务模型的训练数据集可以相同也可以不同。例如,本申请实施例中的训练数据集可以采用视觉任务模型的验证数据集或测试数据集等。
在本申请实施例的方案中,根据视觉任务模型的处理结果评估图像处理模块的权重,进而调整图像处理模块的权重,以增加与视觉任务模型的性能相关性较强的图像处理模块的权重,减少与视觉任务模型的性能相关性较弱的图像处理模块的权重,这样能够获得更适合视觉任务的图像处理流程,有利于提高视觉任务模型的性能。
方式2
可选地,步骤S304包括:根据视觉任务模型的处理结果更改该至少一个图像处理模块。
更改该至少一个图像处理模块,可以包括:删除该至少一个图像处理模块中的部分图像处理模块或/和增加其他图像处理模块。
在一种可能的实现方式中,步骤S304可以为,根据视觉任务模型的处理结果从多个候选图像处理模块中选择一个图像处理模块的组合,利用该图像处理模块的组合替换该至少一个图像处理模块。
示例性地,可以采用贝叶斯优化方法或强化学习算法等方式更改该至少一个图像处理模块。
如前所述,实际应用中可以基于训练数据集中多张图像执行方法300以实现对该多个图像处理模块的组合进行迭代调整,直至满足预设条件。满足预设条件后停止调整该多个图像处理模块的组合,或者说,停止刷新该多个图像处理模块的组合。
示例性地,预设条件可以为迭代次数大于或等于第四阈值。
在执行方法300的次数大于或等于第四阈值的情况下,不再执行方法300,即停止调整该图像处理模块的组合。
应理解,此处仅为示例,其他的预设条件的设置可以参考方式1,此处不再赘述。
在本申请实施例的方案中,根据视觉任务模型的处理结果更改图像处理模块的组合,能够获得更适合视觉任务模型的图像处理模块的组合,有利于提高视觉任务模型的性能。
该至少一个图像处理模块包括多个图像处理模块,步骤S304包括:根据视觉任务模型的处理结果从该多个图像处理模块中删除部分图像处理模块。
在一种可能实现方式中,方式2可以采用方式1的处理结果。
可选地,步骤S304包括:根据视觉任务模型的处理结果调整该多个图像处理模块的权重;根据调整后的该多个图像处理模块的权重从该多个图像处理模块中删除部分图像处理模块。
示例性地,该多个图像处理模块为m个图像处理模块。m为大于1的整数。从该m个图像处理模块中删除调整后的权重最小的n个图像处理模块。n为大于1且小于m的整数。
可替换地,从该m个图像处理模块中删除调整后的权重小于或等于权重阈值的图像处理模块。
例如,如图4所示的9个图像处理模块中,绿平衡模块、坏点修正模块、bayer降噪模块、色彩校正模块以及降噪和锐化模块对应的权重小于或等于权重阈值,则删除该五个模块。
在本申请实施例的方案中,根据视觉任务模型的处理结果删除部分图像处理模块,能够减少图像处理所需的时间,提高处理速度,减少对计算力的要求。
此外,权重较高的图像处理模块与视觉任务模型的相关性较强,或者说,权重较高的图像处理模块对视觉任务模型的性能的影响较大。本申请实施例的方案中,根据各个图像处理模块的权重确定被删除的图像处理模块,删除权重值相对较小的图像处理模块,这样对视觉任务模型的处理结果影响较小,删除之后对视觉任务模型的性能的影响较小。也就是说,本申请实施例的方案能够在保证视觉任务模型的性能的前提下,减少不必要的运算,降低计算开销,提高处理速度。
可选地,步骤S304包括:根据视觉任务模型的处理结果和该多个图像处理模块的处理速度从该多个图像处理模块中删除部分图像处理模块。
示例性地,根据调整后的该多个图像处理模块的权重和该多个图像处理模块的处理速度从该多个图像处理模块中删除部分图像处理模块。
例如,从该多个图像处理模块中删除调整后的权重小于或等于权重阈值、且处理速度小于或等于速度阈值的图像处理模块。也就是说,删除处理速度较慢,且对视觉任务模型的影响较小的图像处理模块。这样,能进一步提高图像处理的速度。
方式3
该至少一个图像处理模块包括多个图像处理模块,步骤S304包括:根据视觉任务模型的处理结果调整该多个图像处理模块的处理顺序。
根据视觉任务模型的处理结果调整该多个图像处理模块的处理顺序,以提高视觉任务模型的性能。
示例性地,可以采用贝叶斯优化方法、RNN模型或强化学习算法等方式调整该多个图像处理模块的处理顺序。
如前所述,实际应用中可以基于训练数据集中多张图像执行方法300,直至满足预设 条件。满足预设条件后停止调整该多个图像处理模块的处理顺序,或者说,停止刷新该多个图像处理模块的处理顺序。
示例性地,预设条件可以为该多个图像处理模块的处理顺序的变化量小于或等于第五阈值。
例如,该多个图像处理模块的处理顺序的变化量可以为,在执行方法300后处理顺序发生变化的图像处理模块的数量。
可替换地,预设条件可以为视觉任务模型的推理的准确度大于或等于第六阈值。
在视觉任务模型的推理的准确度大于或等于第六阈值的情况下,不再执行方法300,即停止调整该多个图像处理模块的处理顺序。
第六阈值可以为预设值。或者,第六阈值可以是在没有调整图像处理模块的处理顺序的情况下,视觉任务模型的推理的准确度。例如,如图4所示,第六阈值可以为按照如图4所示的图像处理模块的处理顺序对图像进行处理的情况下,视觉任务模型的推理的准确度。
也就是说,将图像输入原始的图像处理模块,按照原始的图像处理模块的顺序进行处理,并将处理后的图像输入至视觉任务模型中进行处理,并计算推理的准确度,将该准确度作为第六阈值。将图像输入当前调整过处理顺序的图像处理模块中进行处理,并将处理后的图像输入至视觉任务模型中进行处理,并计算推理的准确度,将当前得到的推理的准确度与第六阈值进行比较,在当前得到的推理的准确度大于或等于第六阈值的情况下,不再执行方法300。这样,按照调整后的图像处理模块的处理顺序对图像进行处理,能够保证视觉任务模型的性能,或者能够提高视觉任务模型的性能。
可替换地,预设条件可以为连续执行多次方法300后得到的视觉任务模型的损失函数值的变化量小于或等于第三阈值。
也就是说,在视觉任务模型的损失函数值的变化趋于稳定的情况下,不再执行方法300。
可替换地,预设条件可以为迭代次数大于或等于第四阈值。
也就是说,在执行方法300的次数大于或等于第四阈值的情况下,不再执行方法300。
可替换地,上述预设条件可以结合使用。例如,预设条件可以为视觉任务模型的推理的准确度大于或等于第六阈值,且迭代次数大于或等于第四阈值。再如,预设条件可以为该多个图像处理模块的处理顺序的变化量小于或等于第五阈值,且视觉任务模型的准确度大于或等于第六阈值。
应理解,以上仅为示例,预设条件还可以为其他形式的条件,本申请对此不做限定。
在本申请实施例的方案中,根据视觉任务模型的处理结果调整图像处理模块的处理顺序,能够获得更适合视觉任务的图像处理流程,有利于提高视觉任务的准确度。
方式4
可选地,步骤S304包括:根据视觉任务模型的处理结果调整该至少一个图像处理模块中的参数。
根据视觉任务模型的处理结果调整该至少一个图像处理模块中的参数,以提高视觉任务模型的性能。
示例性地,若图像处理模块采用神经网络模型,则该图像处理模块中的参数即为该神 经网络模型的参数。
示例性地,可以采用贝叶斯优化方法、RNN模型、强化学习算法等方式调整该至少一个图像处理模块中的参数。
基于当前的图像处理模块中的参数组合对输入图像进行处理,并将处理后的结果输入视觉任务模型中进行处理,例如由CPU或GPU执行视觉任务。根据视觉任务模型的性能的反馈优化更新该图像处理模块中的参数组合,即在搜索空间中寻找最优的图像处理模块中的参数组合,以提高视觉任务模型的性能。
如前所述,实际应用中可以基于训练数据集中多张图像执行方法300,直至满足预设条件。满足预设条件后停止调整该至少一个图像处理模块中的参数,或者说,停止刷新该至少一个图像处理模块中的参数。
示例性地,预设条件可以为视觉任务模型的推理的准确度大于或等于第七阈值。
在视觉任务模型的推理的准确度大于或等于第七阈值的情况下,不再执行方法300,即停止调整该至少一个图像处理模块中的参数。
第七阈值可以为预设值。或者,第七阈值可以是在没有调整该至少一个图像处理模块中的参数的情况下得到的视觉任务模型的处理的准确度。例如,如图4所示,第七阈值可以在该9个图像处理模块没有调整参数的情况下,视觉任务模型的推理的准确度。
也就是说,将图像输入原始的图像处理模块,即没有调整参数的图像处理模块中进行处理,并将处理后的图像输入至视觉任务模型中进行处理,并计算推理的准确度,将该准确度作为第七阈值。将图像输入当前调整过参数的图像处理模块中进行处理,并将处理后的图像输入至视觉任务模型中进行处理,并计算推理的准确度,将当前得到的推理的准确度与第七阈值进行比较,在当前得到的推理的准确度大于或等于第七阈值的情况下,不再执行方法300。这样,利用调整后的图像处理模块对图像进行处理,能够保证视觉任务模型的性能,或者能够提高视觉任务模型的性能。
可替换地,预设条件可以为连续执行多次方法300后得到的视觉任务模型的损失函数值的变化量小于或等于第三阈值。
也就是说,在视觉任务模型的损失函数值的变化趋于稳定的情况下,不再执行方法300。
可替换地,预设条件可以为迭代次数大于或等于第四阈值。
也就是说,在执行方法300的次数大于或等于第四阈值的情况下,不再执行方法300。
可替换地,上述预设条件可以结合使用。例如,预设条件可以为视觉任务模型的推理的准确度大于或等于第七阈值,且迭代次数大于或等于第四阈值。
应理解,以上仅为示例,预设条件还可以为其他形式的条件,本申请对此不做限定。
在本申请实施例的方案中,根据视觉任务模型的处理结果调整图像处理模块中的参数,能够获得更适合视觉任务的图像处理模块,有利于提高视觉任务的准确度。
需要说明的是,上述方式1、方式2、方式3和方式4中任两种或两种以上的方式可以结合使用。在结合使用时,各个方式可以同时执行,或者,各个方式也可以分别执行。
可选地,步骤S304包括:根据视觉任务模型的处理结果从多个图像处理模块中删除部分图像处理模块;通过多个图像处理模块中未被删除的图像处理模块对第五图像进行处理,得到第六图像,将第六图像输入至视觉任务模型中进行处理;根据视觉任务模型的处 理结果调整未被删除的图像处理模块的参数。
示例性地,第五图像可以为训练数据集中的图像。第五图像的其他描述可以参考前文中的第一图像。第五图像和第一图像可以为相同的图像,也可以为不同的图像。
示例性地,第六图像可以为RGB图像。第六图像的描述可以参考前文中的第二图像。
根据本申请实施例的方案,利用视觉任务模型得到的性能指标,例如,目标检测的准确度、目标分割准确率等,调整多个图像处理模块的权重,保留对视觉任务模型的性能指标影响较大的图像处理模块,或者说,保留能够维持或提升视觉任务模型的性能指标的图像处理模块。这样,能够得到适合视觉任务模型的图像处理模块,或者说,得到视觉任务模型所需的图像处理模块,减少了图像处理流程所需的时间,节省计算开销,减少计算力的需求,对硬件更加友好。
而且,利用视觉任务模型得到的性能指标调整被保留的图像处理模块中的参数,例如,利用视觉任务模型得到的性能指标对图像处理模块进行设计空间的搜索,有利于得到各个图像处理模块最优的参数配置,以提升视觉任务模型的性能。
可选地,步骤S304包括:根据视觉任务模型的处理结果调整多个图像处理模块的参数以及该多个图像处理模块的权重,根据调整后的多个图像处理模块的权重从该多个图像处理模块中删除部分图像处理模块。
可选地,步骤S304包括:根据视觉任务模型的处理结果调整多个图像处理模块的参数、该多个图像处理模块的权重以及该多个图像处理模块的处理顺序,根据调整后的多个图像处理模块的权重从该多个图像处理模块中删除部分图像处理模块。
本申请实施例提供了一种图像处理方法400,方法400可以视为方法300的一种具体实现方式,具体描述参考前述方法300,为了描述简洁,下面在介绍方法400时适当省略部分描述。具体地,方法400采用方式1、方式2和方式4的组合的方式。
方法400包括步骤S401至步骤S410。下面对步骤S401至步骤S410进行说明。方法400可以视为两个阶段,第一阶段包括步骤S401至步骤S406,第二阶段包括步骤S407至步骤S410。
S401,为多个图像处理模块设置初始的权重。
例如,该多个图像处理模块可以包括如图5所示的9个图像处理模块。将各个图像处理模块的权重表示为w1、w2、w3、w4、w5、w6、w7、w8和w9。该9个权重的总和为1。
以上仅为示例,其他权重设置方法可以参考步骤S302中的描述。
S402,将训练数据集中的图像输入至该多个图像处理模块中进行处理。
也就是基于该多个图像处理模块的权重对输入的图像进行处理。或者说,基于该多个图像处理模块的权重对该多个图像处理模块的处理结果进行调整。
例如,按照图5所示的图像处理模块及其对应的权重对输入图形进行处理。
示例性地,处理结果可以为RGB图像。
进一步地,处理结果可以为8bit的RGB图像。
步骤S402与步骤S302对应,具体描述可以参见步骤S302中的描述。
S403,该多个图像处理模块处理后的结果输入至视觉任务模型中进行推理,得到视觉任务模型的推理结果。
视觉任务模型可以为已经训练好的模型。
S404,将视觉任务模型的推理结果与训练数据集中的图像对应的真值进行对比,根据对比结果调整该多个图像处理模块的权重。
或者说,将对比结果反馈至优化算法中,利用优化算法调整该多个图像处理模块的权重。
示例性地,优化算法包括贝叶斯优化方法、RNN模型、强化学习算法。
S405,将调整后的图像处理模块的权重作为步骤S402中的图像处理模块的权重,重复步骤S402至步骤S404,直至满足第一预设条件。
或者,步骤S405也可以为,将调整后的图像处理模块的权重进行归一化处理,将归一化处理后的权重作为步骤S402中的图像处理模块的权重。
也就是说,在每次调整图像处理模块的权重之后,对调整后的权重进行归一化处理,使得归一化后的权重的总和为1或者,总和接近1。
满足第一预设条件后,终止步骤S402至步骤S404。示例性地,当前得到的图像处理模块的权重可以视为满足第一预设条件后得到的图像处理模块的权重。
例如,若当前视觉任务模型的准确度大于或等于没有设置图像处理模块的权重的情况下视觉任务模型的准确度,则终止步骤S402至步骤S404。
步骤S403至步骤S405可以视为方式1的具体实现方式,具体描述可以参考方式1中的描述,第一预设条件的设置方式可以参考方式1中的预设条件,此处不再赘述。
S406,根据满足第一预设条件后得到的图像处理模块的权重删除部分图像处理模块。
步骤S406与方法2中的步骤S304对应,具体描述可以参考方式2中的相爱难改观描述,此处不再赘述。
例如,如图5所示,删除调整后的权重值较小的绿平衡模块、坏点修复模块、bayer降噪模块、色彩校正模块、伽马校正模块以及降噪和锐化模块。
S407,将训练数据集中的图像输入至该未被删除的图像处理模块中进行处理。
步骤S407中的图像与步骤S402中的图像可以为相同的图像,也可以为不同的图像。
也就是说,将未被删除的图像处理模块中的参数作为调优对象。或者说,将被保留的图像处理模块中的参数作为调优对象。
进一步地,在步骤S407之前,还可以对未被删除的图像处理模块的权重进行归一化处理。
例如,如图5所示,将训练数据集中的图像输入至黑电平补偿模块、去马赛克模块、自动白平衡模块和伽马校正模块中进行处理。进一步地,在执行步骤S407之前,可以对该4个图像处理模块的权重进行归一化处理。
S408,将未被删除的图像处理模块处理后的结果输入至视觉任务模型中进行推理,得到视觉任务模型的推理结果。
S409,将视觉任务模型的推理结果与训练数据集中的图像对应的真值进行对比,根据对比结果调整未被删除的图像处理模块中的参数。
或者说,将对比结果反馈至优化算法中,利用优化算法调整该图像处理模块中的参数。
示例性地,优化算法包括贝叶斯优化方法、RNN模型或强化学习算法。
应理解,步骤S409采用的优化算法与步骤S440采用的优化算法可以相同,也可以不 同。
S410,将调整后的图像处理模块中的参数作为步骤S407中的图像处理模块中的参数,重复步骤S407至步骤S409,直至满足第二预设条件。
满足第二预设条件后,终止步骤S407至步骤S410。示例性地,当前得到的图像处理模块中的参数可以视为满足第二预设条件后得到的图像处理模块中的参数。
例如,若当前视觉任务模型的准确度大于或等于没有设置图像处理模块的权重的情况下视觉任务模型的准确度,则终止步骤S407至步骤S410。
步骤S407至步骤S410可以视为方式3的具体实现方式,具体描述可以参考方式3中的描述,此处不再赘述。第二预设条件的设置方式可以参考方式3中的预设条件。
根据本申请实施例的方案,利用视觉任务模型得到的性能指标,例如,目标检测的准确度、目标分割准确率等,调整多个图像处理模块的权重,保留对视觉任务模型的性能指标影响较大的图像处理模块,或者说,保留能够维持或提升视觉任务模型的性能指标的图像处理模块。这样,能够得到适合视觉任务模型的图像处理模块,或者说,得到视觉任务模型所需的图像处理模块,减少了图像处理流程所需的时间,节省计算开销,减少计算力的需求,对硬件更加友好。
而且,在第一阶段完成后,利用视觉任务模型得到的性能指标调整被保留的图像处理模块中的参数,例如,利用视觉任务模型得到的性能指标对图像处理模块进行设计空间的搜索,有利于得到各个图像处理模块最优的参数配置,以提升视觉任务模型的性能。
在另一种可能的实现方式中,方法400中的第一阶段和第二阶段可以同时执行。也就是说同时调整图像处理模块的权重以及图像处理模块中的参数。下面对方法400的第一阶段和第二阶段同时执行的方式进行说。方法400可以包括以下步骤。以下步骤可以参考前述方法400的第一阶段和第二阶段的描述,为了描述简洁,在描述以下步骤时适当省略部分描述。
1)为多个图像处理模块设置初始的权重。
2)将训练数据集中的图像输入至该多个图像处理模块中进行处理。
3)该多个图像处理模块处理后的结果输入至视觉任务模型中进行推理,得到视觉任务模型的推理结果。
4)将视觉任务模型的推理结果与训练数据集中的图像对应的真值进行对比,根据对比结果调整该多个图像处理模块的权重以及该多个图像处理模块中的参数。
或者说,将对比结果反馈至优化算法中,利用优化算法调整该多个图像处理模块的权重。利用优化算法调整该多个图像处理模块中的参数。
示例性地,优化算法包括贝叶斯优化方法、RNN模型、强化学习算法。
调整该多个图像处理模块的权重的优化算法和调整该多个图像处理模块中的参数的优化算法可以相同,也可以不同。
5)将调整后的图像处理模块的权重作为步骤2)中的图像处理模块的权重,将调整后的图像处理模块中的参数作为步骤2)中的图像处理模块中的参数,重复步骤2)至步骤4),直至训练完成。
或者,将调整后的图像处理模块的权重进行归一化处理,将归一化处理后的权重作为步骤5)中的图像处理模块的权重。
也就是说,在每次调整图像处理模块的权重之后,对调整后的权重进行归一化处理,使得归一化后的权重的总和为1或者,总和接近1。
例如,若当前视觉任务模型的准确度大于或等于没有设置图像处理模块的权重的情况下视觉任务模型的推理的准确度,则训练完成。或者说,若当前视觉任务模型的准确度大于或等于方法400执行之前的视觉任务模型的推理的准确度,则训练完成。
6)根据训练完成后的图像处理模块的权重删除部分图像处理模块。步骤6)与方法2中的步骤S304对应,具体描述可以参考方式2中的描述,此处不再赘述。
这样,第一阶段和第二阶段同时执行,能够避免图像处理模块由于参数配置不合理而被删除,使得图像处理模块能够在较优的参数配置下对图像进行处理,进而判断较优的参数配置下的各个图像处理模块对视觉任务模型的性能指标的贡献程度,以保留视觉任务模型所需的图像处理模块,这样能够进一步提高视觉任务模型的性能指标。
方法400仅为将方式1、方式2和方式4进行结合的一种示例。方式1、方式2、方式3和方式4还可以以其他实现方式进行结合。
示例性地,将方式1、方式2和方式3进行结合。
例如,步骤S304可以包括:根据视觉任务模型的处理结果调整多个图像处理模块的权重和该多个图像处理模块的处理顺序,根据调整后的图像处理模块的权重从该多个图像处理模块中删除部分图像处理模块。
再如,步骤S304可以包括:根据视觉任务模型的处理结果调整多个图像处理模块的权重,根据调整后的图像处理模块的权重从该多个图像处理模块中删除部分图像处理模块;根据视觉任务模型的处理结果调整未被删除的图像处理模块的处理顺序。也就是将步骤S304分为两个阶段,在第一阶段中删除部分图像处理模块,在第二阶段中调整未被删除的图像处理模块的处理顺序。
具体的结合方式可以参考方法400,此处不再赘述。
应理解,以上结合方式均为示例,还可以对上述四种方式中的任两种及两种以上的方式进行结合,本申请实施例对此不做限定。
本申请实施例中,调整后的图像处理模块为视觉任务模型所需的图像处理模块。调整后的图像处理模块与视觉任务模型之间具有对应关系。不同的视觉任务模型可以对应不同的图像处理模块。这样,能够根据应用场景选择合适的图像处理流程。
图6示出了本申请实施例提供的图像处理方法700,图6所示的方法可以由图像处理装置执行,该装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器等运算能力足以用来执行图像处理的装置,也可以是由云服务设备和终端设备构成的***。示例性地,方法700可以由图1中的预处理模块执行。
方法700中的目标图像处理模块是由方法300或方法400得到的。为了避免不必要的重复,下面在介绍方法700时适当省略重复的描述。
方法700包括步骤S701至步骤S704。下面对步骤S701至步骤S704进行详细介绍。
S701,获取第三图像。
第三图像为待处理的图像。
示例性地,第三图像可以为传感器获取的raw图。
示例性地,第三图像可以是终端设备(或者电脑、服务器等其他装置或设备)通过摄 像头拍摄到的图像,或者,第三图像还可以是从终端设备(或者电脑、服务器等其他装置或设备)内部获得的图像(例如,终端设备的相册中存储的图像,或者终端设备从云端获取的图像),本申请实施例对此并不限定。
S702,根据视觉任务模型确定至少一个目标图像处理模块。
该至少一个目标图像处理模块是与视觉任务模型对应的一个或多个图像处理模块。
示例性地,视觉任务包括:目标检测、图像分类、目标分割、目标跟踪或图像识别等。
视觉任务模型用于执行视觉任务。例如,视觉任务为目标检测,则视觉任务模型为目标检测模型。再如,视觉任务为图像识别,则视觉任务模型为图像识别模型。
视觉任务模型可以为训练好的模型。
在不同的应用场景中,可以采用不同的视觉任务模型,相应地,根据不同的视觉任务模型即可确定与该视觉任务模型匹配的至少一个目标图像处理模块。这样,可以根据不同的应用场景选用不同的图像处理模块。
对于相同的视觉任务,在不同的应用场景下,可能采用不同的视觉任务模型。例如,对于驾驶场景中的目标检测任务,在曝光过度和曝光不足的情况下采用的视觉任务模型可能是相同的,也可能是不同的。在驾驶的过程中,若当前场景被识别为曝光过度,可以采用第一目标检测模型作为视觉任务模型,根据第一目标检测模型确定第一目标检测模型对应的至少一个目标图像处理模块。若当前场景被识别为曝光不足,可以采用第二目标检测模型作为视觉任务模型,根据第二目标检测模型确定与第二目标检测模型对应的至少一个目标图像处理模块。第一目标检测模型和第二目标检测模型为不同的目标检测模型。这样,可以根据不同的应用场景选择不同的图像处理流程,提高视觉任务模型的性能。
S703,通过该至少一个目标图像处理模块对第三图像进行处理,得到第四图像。
也就是说,利用与视觉任务模型对应的一个或多个图像处理模块对输入的第三图像进行处理,得到第四图像。
示例性地,第四图像可以为RGB图像。或者,第四图像可以为8bit的RGB图像。此处仅为示例,第四图像的类型可以根据视觉任务模型的输入需要设置。
S704,通过视觉任务模型对第四图像进行处理,得到第四图像的处理结果。
第四图像的处理结果也可以理解为第三图像的处理结果。
第四图像的处理结果即为视觉任务模型的推理结果。视觉任务模型的推理结果与视觉任务的类型有关。
例如,视觉任务为目标检测,则视觉任务模型的推理结果可以为第四图像上的目标框以及该目标框中的物体的类别。再如,视觉任务为图像分类,则视觉任务模型的推理结果可以为第四图像的类别。
视觉任务模型和图像处理模块的配置之间具有对应关系。根据视觉任务模型和图像处理模块的配置之间的对应关系可以确定与当前的视觉任务模型匹配的图像处理模块的配置。
示例性地,图像处理模块的配置包括以下至少一项:图像处理模块的组合、图像处理模块的权重、图像处理模块的处理顺序或者图像处理模块中的参数。
可选地,步骤S702包括:根据视觉任务模型从多个候选图像处理模块中确定至少一个目标图像处理模块。
也就是说,根据视觉任务模型从多个候选图像处理模块中确定一个图像处理模块的组合,该图像处理模块的组合中的图像处理模块即为该至少一个目标图像处理模块。
在该情况下,当视觉任务模型发生变化时,相应地,图像处理模块的组合也可能发生变化。
视觉任务模型和图像处理模块的组合之间具有对应关系。根据该对应关系即可确定当前视觉任务模型对应的图像处理模块的组合,或者说,根据该对应关系即可确定用于该视觉任务模型所需的图像处理模块,即该至少一个目标图像处理模块。该至少一个目标图像处理模块可以是通过方法300或者方法400得到的。或者,可以理解为,视觉任务模型和图像处理模块的组合之间的对应关系是通过方法300或方法400得到的。
例如,视觉任务模型为图5所示的模型,则该至少一个目标图像处理模块包括:黑电平补偿模块、去马赛克模块、自动白平衡模块和伽马校正模块。
这样,不同的视觉任务模型对应不同的图像处理模块的组合,当视觉任务模型发生变化时,图像处理模块的组合能够自适应匹配视觉任务模型,使得当前的图像处理模块的组合更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
而且,根据视觉任务模型从多个候选图像处理模块中选择适合的图像处理模块,无需使用所有的候选图像处理模块对图像进行处理,减少了处理流程,降低了对计算力的要求。
可选地,步骤S702包括:根据视觉任务模型确定至少一个目标图像处理模块的权重。至少一个目标图像处理模块的权重用于对至少一个目标图像处理模块的处理结果进行处理,得到第四图像。
在一种实现方式中,不同的视觉任务模型对应的图像处理模块的组合是相同的。当视觉任务模型发生变化时,相应地,图像处理模块的权重可能发生变化。
本申请实施例中,不同的视觉任务模型对应的图像处理模块的组合相同可以理解为不同的视觉任务模型所采用的图像处理模块所实现的功能是相同的。
视觉任务模型和图像处理模块的权重之间具有对应关系。根据该对应关系可以确定当前的视觉任务模型对应的图像处理模块的权重,即该至少一个目标图像处理模块的权重。
例如,视觉任务模型为图4所示的模型,则该至少一个目标图像处理模块可以为图4中的9个图像处理模块,图像处理模块的权重可以为步骤S405得到的权重。
这样,不同的视觉任务模型对应不同的图像处理模块的权重,当视觉任务模型发生变化时,图像处理模块的权重能够自适应匹配视觉任务模型,使得当前的图像处理模块的权重更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
在另一种实现方式中,当视觉任务模型发生变化时,相应地,图像处理模块的权重也可能发生变化,图像处理模块的其他配置可能也发生变化。例如,图像处理模块的组合可能发生变化。
示例性地,视觉任务模型和图像处理模块的权重以及图像处理模块的其他配置情况具有对应关系。这样,根据视觉任务模型可以确定视觉任务模型对应的图像处理模块的权重,以及图像处理模块的其他配置情况。
例如,视觉任务模型和图像处理模块的组合以及图像处理模块的权重之间具有对应关系。步骤S702中可以确定视觉任务模型对应的图像处理模块的组合,以及该图像处理模块的组合中的图像处理模块的权重。
若视觉任务模型为图5所示的模型,与该视觉任务模型对应的该至少一个目标图像处理模块可以是步骤S406得到的。该至少一个目标图像处理模块包括黑电平补偿模块、去马赛克模块、自动白平衡模块和伽马校正模块。该至少一个目标图像处理模块的权重可以是步骤S405得到的权重。
可选地,步骤S702包括:根据视觉任务模型确定至少一个目标图像处理模块的处理顺序。
在一种实现方式中,不同的视觉任务模型对应的图像处理模块的组合是相同的。在该情况下,当视觉任务模型发生变化时,相应地,图像处理模块的处理顺序也可能发生变化。
视觉任务模型和图像处理模块的处理顺序之间具有对应关系。根据该对应关系可以确定当前视觉任务模型对应的图像处理模块的处理顺序,即该至少一个目标图像处理模块的处理顺序。
这样,不同的视觉任务模型对应不同的图像处理模块的处理顺序,当视觉任务模型发生变化时,图像处理模块的处理顺序能够自适应匹配视觉任务模型,使得当前的图像处理模块的处理顺序更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
在另一种实现方式中,当视觉任务模型发生变化时,相应地,图像处理模块的处理顺序可能发生变化,图像处理模块的其他配置也可能发生变化。例如,图像处理模块的组合可能发生变化。
示例性地,视觉任务模型和图像处理模块的处理顺序以及图像处理模块的其他配置情况具有对应关系。这样,根据该对应关系可以确定视觉任务模型对应的图像处理模块的处理顺序,以及图像处理模块的其他配置情况。
例如,视觉任务模型和图像处理模块的组合以及图像处理模块的处理顺序之间具有对应关系。根据视觉任务模型可以确定视觉任务模型对应的图像处理模块的组合,以及该图像处理模块的组合中的图像处理模块的处理顺序。
在该情况下,不同的视觉任务模型对应的图像处理模块的组合可能是相同的,可能是不同的。例如,两个视觉任务模型对应的图像处理模块的组合是相同的,而该图像处理模块的组合中的图像处理模块的处理顺序是不同的。
再如,视觉任务模型和图像处理模块的组合、图像处理模块的权重以及图像处理模块的处理顺序之间具有对应关系。步骤S702中可以确定视觉任务模型对应的图像处理模块的组合、该图像处理模块的权重以及图像处理模块的处理顺序,即从多个候选图像处理模块中确定目标图像处理模块、目标图像处理模块的权重以及目标图像处理模块的处理顺序。
在该情况下,不同的视觉任务模型对应的图像处理模块的组合可能是相同的,也可能是不同的。在图像处理模块的组合相同的情况下,图像处理模块的组合中的图像处理模块的权重可能是相同的,也可能是不同的。在图像处理模块的组合相同的情况下,图像处理模块的组合中的图像处理模块的处理顺序可能是相同的,也可能是不同的。
可选地,步骤S702包括:根据视觉任务模型确定该至少一个目标图像处理模块中的参数。
在一种实现方式中,不同的视觉任务模型对应的图像处理模块的组合是相同的。当视觉任务模型发生变化时,相应地,图像处理模块中的参数可能发生变化。
例如,第一视觉任务模型对应的图像处理模块包括:黑电平补偿模块和去马赛克模块。其中,黑电平补偿模块的参数包括参数A1,去马赛克模块的参数包括参数B1。第二视觉任务模型对应的图像处理模块包括:黑电平补偿模块和去马赛克模块。其中,黑电平补偿模块的参数包括参数A2,去马赛克模块的参数包括参数B2。图像在输入第一视觉任务模型和第二视觉任务模型之前,均需要经过黑电平补偿处理和去马赛克处理。但第一视觉任务模型之前的黑电平补偿处理和去马赛克处理与第二视觉模型之前的黑电平补偿处理和去马赛克处理所采用的参数是不同的。
视觉任务模型和图像处理模块中的参数之间具有对应关系。根据视觉任务模型可以确定视觉任务模型对应的图像处理模块中的参数,即该至少一个目标图像处理模块中的参数。
这样,不同的视觉任务模型对应不同的图像处理模块中的参数,当视觉任务模型发生变化时,图像处理模块中的参数能够自适应匹配视觉任务模型,使得当前的图像处理模块中的参数更适合当前的视觉任务模型,有利于提高视觉任务模型的性能。
在另一种实现方式中,视觉任务模型和图像处理模块中的参数以及图像处理模块的其他配置情况具有对应关系。这样,根据该对应关系可以确定当前视觉任务模型对应的图像处理模块中的参数,以及图像处理模块的其他配置情况。
例如,视觉任务模型和图像处理模块的组合以及图像处理模块中的参数之间具有对应关系。根据该对应关系可以确定当前视觉任务模型对应的图像处理模块的组合以及该图像处理模块的组合中的图像处理模块中的参数。
在该情况下,不同的视觉任务模型对应的图像处理模块的组合可能是相同的,可能是不同的。例如,两个视觉任务模型对应的图像处理模块的组合是相同的,而该图像处理模块的组合中的图像处理模块中的参数是不同的。
再如,视觉任务模型和图像处理模块的组合、图像处理模块的权重以及图像处理模块中的参数之间具有对应关系。根据该对应关系可以确定视觉任务模型对应的图像处理模块的组合、该图像处理模块的权重以及图像处理模块中的参数。
在该情况下,不同的视觉任务模型对应的图像处理模块的组合可能是相同的,也可能是不同的。在图像处理模块的组合相同的情况下,图像处理模块的组合中的图像处理模块的权重可能是相同的,也可能是不同的。在图像处理模块的组合相同的情况下,图像处理模块的组合中的图像处理模块中的参数可能是相同的,也可能是不同的。
根据本申请实施例的方案,不同的视觉任务模型对应不同的图像处理模块的配置,当视觉任务模型发生变化时,图像处理模块能够自适应匹配视觉任务模型,使得图像处理流程更适合视觉任务模型,有利于提高视觉任务模型的性能。
下面结合图7至图8对本申请实施例的装置进行说明。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。
图7是本申请实施例的图像处理装置的示意性框图。图7所示的图像处理装置4000包括获取单元4010和处理单元4020。
获取单元4010和处理单元4020可以用于执行本申请实施例的图像处理方法。
在一种可能的实现方式中,装置4000可以用于执行方法300或方法400。
具体地,获取单元4010用于获取第一图像。
处理单元4020用于:通过至少一个图像处理模块对第一图像进行处理,得到第二图像;将第二图像输入至视觉任务模型中进行处理;根据视觉任务模型的处理结果调整至少一个图像处理模块。
可选地,作为一个实施例,至少一个图像处理模块包括多个图像处理模块,处理单元4020具体用于:
根据视觉任务模型的处理结果删除多个图像处理模块中的部分图像处理模块。
可选地,作为一个实施例,处理单元4020具体用于:根据视觉任务模型的处理结果调整多个图像处理模块的权重,多个图像处理模块的权重用于对多个图像处理模块的处理结果进行处理,得到第二图像;根据调整后的多个图像处理模块的权重删除多个图像处理模块中的部分图像处理模块。
可选地,作为一个实施例,处理单元4020具体用于:根据视觉任务模型的处理结果调整至少一个图像处理模块中的参数。
可选地,作为一个实施例,处理单元4020具体用于:根据视觉任务模型的处理结果调整至少一个图像处理模块的处理顺序。
可选地,作为一个实施例,至少一个图像处理模块包括:黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
在另一种可能的实现方式中,装置4000可以用于执行方法700。
具体地,获取单元4010用于获取第三图像。
处理单元4020用于:根据视觉任务模型确定至少一个目标图像处理模块;通过至少一个目标图像处理模块对第三图像进行处理,得到第四图像;通过视觉任务模型对第四图像进行处理,得到第四图像的处理结果。
可选地,作为一个实施例,处理单元4020具体用于:根据视觉任务模型从多个候选图像处理模块中确定至少一个目标图像处理模块。
可选地,作为一个实施例,处理单元4020具体用于:根据视觉任务模型确定至少一个目标图像处理模块中的参数。
可选地,作为一个实施例,处理单元4020具体用于:根据视觉任务模型确定至少一个目标图像处理模块的处理顺序。
可选地,作为一个实施例,至少一个目标图像处理模块包括:黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
需要说明的是,上述装置4000以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件 和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图8是本申请实施例提供的图像处理装置的硬件结构示意图。图8所示的图像处理装置6000(该装置6000具体可以是一种计算机设备)包括存储器6001、处理器6002、通信接口6003以及总线6004。其中,存储器6001、处理器6002、通信接口6003通过总线6004实现彼此之间的通信连接。
存储器6001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器6001可以存储程序,当存储器6001中存储的程序被处理器6002执行时,处理器6002用于执行本申请实施例的图像处理方法的各个步骤。具体地,处理器6002可以执行上文中的方法300、方法400或方法700。
处理器6002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的图像处理方法。
处理器6002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的图像处理方法的各个步骤可以通过处理器6002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器6002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器6001,处理器6002读取存储器6001中的信息,结合其硬件完成图7所示的装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的图像处理方法。
通信接口6003使用例如但不限于收发器一类的收发装置,来实现装置6000与其他设备或通信网络之间的通信。例如,可以通过通信接口6003获取训练数据。
总线6004可包括在装置6000各个部件(例如,存储器6001、处理器6002、通信接口6003)之间传送信息的通路。
应注意,尽管上述装置6000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置6000还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置6000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置6000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图8中所示的全部器件。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读介质存储用于设备执 行的程序代码,该程序代码包括用于执行本申请实施例中的图像处理方法。
本申请实施例还提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行本申请实施例中的图像处理方法。
本申请实施例还提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行本申请实施例中的图像处理方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面中的任意一种实现方式中的方法。
上述芯片具体可以是FPGA或者ASIC。
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟 悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (26)

  1. 一种图像处理方法,其特征在于,包括:
    获取第一图像;
    通过至少一个图像处理模块对所述第一图像进行处理,得到第二图像;
    将所述第二图像输入至视觉任务模型中进行处理;
    根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块。
  2. 根据权利要求1所述的方法,其特征在于,其特征在于,所述至少一个图像处理模块包括多个图像处理模块,所述根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块,包括:
    根据所述视觉任务模型的处理结果删除所述多个图像处理模块中的部分图像处理模块。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述视觉任务模型的处理结果删除所述多个图像处理模块中的部分图像处理模块,包括:
    根据所述视觉任务模型的处理结果调整所述多个图像处理模块的权重,所述多个图像处理模块的权重用于对所述多个图像处理模块的处理结果进行处理,得到所述第二图像;
    根据调整后的所述多个图像处理模块的权重删除所述多个图像处理模块中的部分图像处理模块。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块,包括:
    根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块中的参数。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块,包括:
    根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块的处理顺序。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述至少一个图像处理模块包括:
    黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
  7. 一种图像处理方法,其特征在于,包括:
    获取第三图像;
    根据视觉任务模型确定至少一个目标图像处理模块;
    通过所述至少一个目标图像处理模块对所述第三图像进行处理,得到第四图像;
    通过所述视觉任务模型对所述第四图像进行处理,得到所述第四图像的处理结果。
  8. 根据权利要求7所述的方法,其特征在于,所述根据视觉任务模型确定至少一个目标图像处理模块,包括:
    根据所述视觉任务模型从多个候选图像处理模块中确定所述至少一个目标图像处理模块。
  9. 根据权利要求7或8所述的方法,其特征在于,所述根据视觉任务模型确定至少一 个目标图像处理模块,包括:
    根据所述视觉任务模型确定所述至少一个目标图像处理模块中的参数。
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述根据视觉任务模型确定至少一个目标图像处理模块,包括:
    根据所述视觉任务模型确定所述至少一个目标图像处理模块的处理顺序。
  11. 根据权利要求7至10中任一项所述的方法,其特征在于,所述至少一个目标图像处理模块包括:
    黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
  12. 一种图像处理装置,其特征在于,包括:
    获取单元,用于获取第一图像;
    处理单元,用于:
    通过至少一个图像处理模块对所述第一图像进行处理,得到第二图像;
    将所述第二图像输入至视觉任务模型中进行处理;
    根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块。
  13. 根据权利要求12所述的装置,其特征在于,所述至少一个图像处理模块包括多个图像处理模块,所述处理单元具体用于:
    根据所述视觉任务模型的处理结果删除所述多个图像处理模块中的部分图像处理模块。
  14. 根据权利要求13所述的装置,其特征在于,所述处理单元具体用于:
    根据所述视觉任务模型的处理结果调整所述多个图像处理模块的权重,所述多个图像处理模块的权重用于对所述多个图像处理模块的处理结果进行处理,得到所述第二图像;
    根据调整后的所述多个图像处理模块的权重删除所述多个图像处理模块中的部分图像处理模块。
  15. 根据权利要求12至14中任一项所述的装置,其特征在于,所述处理单元具体用于:
    根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块中的参数。
  16. 根据权利要求12至15中任一项所述的装置,其特征在于,所述处理单元具体用于:
    根据所述视觉任务模型的处理结果调整所述至少一个图像处理模块的处理顺序。
  17. 根据权利要求12至16中任一项所述的装置,其特征在于,所述至少一个图像处理模块包括:
    黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
  18. 一种图像处理装置,其特征在于,包括:
    获取单元,用于获取第三图像;
    处理单元,用于:
    根据视觉任务模型确定至少一个目标图像处理模块;
    通过所述至少一个目标图像处理模块对所述第三图像进行处理,得到第四图像;
    通过所述视觉任务模型对所述第四图像进行处理,得到所述第四图像的处理结果。
  19. 根据权利要求18所述的装置,其特征在于,所述处理单元具体用于:
    根据所述视觉任务模型从多个候选图像处理模块中确定所述至少一个目标图像处理模块。
  20. 根据权利要求18或19所述的装置,其特征在于,所述处理单元具体用于:
    根据所述视觉任务模型确定所述至少一个目标图像处理模块中的参数。
  21. 根据权利要求18至20中任一项所述的装置,其特征在于,所述处理单元具体用于:
    根据所述视觉任务模型确定所述至少一个目标图像处理模块的处理顺序。
  22. 根据权利要求18至21中任一项所述的装置,其特征在于,所述至少一个目标图像处理模块包括:
    黑电平补偿模块、绿平衡模块、坏点修正模块、去马赛克模块、拜耳降噪模块、自动白平衡模块、色彩校正模块、伽马校正模块或降噪及锐化模块。
  23. 一种图像处理装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令以执行如权利要求1至6或权利要求7至11中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储设备执行的程序代码,所述程序代码包括用于执行如权利要求1至6或权利要求7至11中任一项所述的方法。
  25. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至6或权利要求7至11中任一项所述的方法。
  26. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令以执行如权利要求1至6或权利要求7至11中任一项所述的方法。
PCT/CN2021/102739 2021-06-28 2021-06-28 图像处理方法及装置 WO2023272431A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180099442.4A CN117529725A (zh) 2021-06-28 2021-06-28 图像处理方法及装置
PCT/CN2021/102739 WO2023272431A1 (zh) 2021-06-28 2021-06-28 图像处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102739 WO2023272431A1 (zh) 2021-06-28 2021-06-28 图像处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023272431A1 true WO2023272431A1 (zh) 2023-01-05

Family

ID=84690936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102739 WO2023272431A1 (zh) 2021-06-28 2021-06-28 图像处理方法及装置

Country Status (2)

Country Link
CN (1) CN117529725A (zh)
WO (1) WO2023272431A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7235768B1 (en) * 2005-02-28 2007-06-26 United States Of America As Represented By The Secretary Of The Air Force Solid state vision enhancement device
CN102663745A (zh) * 2012-03-23 2012-09-12 北京理工大学 一种基于视觉任务的彩色融合图像质量评价方法
CN110348572A (zh) * 2019-07-09 2019-10-18 上海商汤智能科技有限公司 神经网络模型的处理方法及装置、电子设备、存储介质
CN111881785A (zh) * 2020-07-13 2020-11-03 北京市商汤科技开发有限公司 客流分析方法及装置、存储介质和***
CN111898638A (zh) * 2020-06-29 2020-11-06 北京大学 融合不同视觉任务的图像处理方法、电子设备及介质
CN111901594A (zh) * 2020-06-29 2020-11-06 北京大学 面向视觉分析任务的图像编码方法、电子设备及介质
CN112529150A (zh) * 2020-12-01 2021-03-19 华为技术有限公司 一种模型结构、模型训练方法、图像增强方法及设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7235768B1 (en) * 2005-02-28 2007-06-26 United States Of America As Represented By The Secretary Of The Air Force Solid state vision enhancement device
CN102663745A (zh) * 2012-03-23 2012-09-12 北京理工大学 一种基于视觉任务的彩色融合图像质量评价方法
CN110348572A (zh) * 2019-07-09 2019-10-18 上海商汤智能科技有限公司 神经网络模型的处理方法及装置、电子设备、存储介质
CN111898638A (zh) * 2020-06-29 2020-11-06 北京大学 融合不同视觉任务的图像处理方法、电子设备及介质
CN111901594A (zh) * 2020-06-29 2020-11-06 北京大学 面向视觉分析任务的图像编码方法、电子设备及介质
CN111881785A (zh) * 2020-07-13 2020-11-03 北京市商汤科技开发有限公司 客流分析方法及装置、存储介质和***
CN112529150A (zh) * 2020-12-01 2021-03-19 华为技术有限公司 一种模型结构、模型训练方法、图像增强方法及设备

Also Published As

Publication number Publication date
CN117529725A (zh) 2024-02-06

Similar Documents

Publication Publication Date Title
WO2020253416A1 (zh) 物体检测方法、装置和计算机存储介质
WO2021043273A1 (zh) 图像增强方法和装置
WO2021043168A1 (zh) 行人再识别网络的训练方法、行人再识别方法和装置
WO2021120719A1 (zh) 神经网络模型更新方法、图像处理方法及装置
WO2020192736A1 (zh) 物体识别方法及装置
CN111291809B (zh) 一种处理装置、方法及存储介质
WO2021057056A1 (zh) 神经网络架构搜索方法、图像处理方法、装置和存储介质
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
CN111914997B (zh) 训练神经网络的方法、图像处理方法及装置
CN110222717B (zh) 图像处理方法和装置
WO2021018245A1 (zh) 图像分类方法及装置
WO2022052601A1 (zh) 神经网络模型的训练方法、图像处理方法及装置
US20220148291A1 (en) Image classification method and apparatus, and image classification model training method and apparatus
CN110222718B (zh) 图像处理的方法及装置
WO2021018251A1 (zh) 图像分类方法及装置
CN111695673B (zh) 训练神经网络预测器的方法、图像处理方法及装置
CN113011562A (zh) 一种模型训练方法及装置
CN112529146A (zh) 神经网络模型训练的方法和装置
CN113807183A (zh) 模型训练方法及相关设备
WO2022267036A1 (zh) 神经网络模型训练方法和装置、数据处理方法和装置
WO2022179606A1 (zh) 一种图像处理方法及相关装置
CN110705564B (zh) 图像识别的方法和装置
WO2022156475A1 (zh) 神经网络模型的训练方法、数据处理方法及装置
WO2021057690A1 (zh) 构建神经网络的方法与装置、及图像处理方法与装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947395

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180099442.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE