WO2021022983A1 - 图像处理方法和装置、电子设备、计算机可读存储介质 - Google Patents

图像处理方法和装置、电子设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2021022983A1
WO2021022983A1 PCT/CN2020/102023 CN2020102023W WO2021022983A1 WO 2021022983 A1 WO2021022983 A1 WO 2021022983A1 CN 2020102023 W CN2020102023 W CN 2020102023W WO 2021022983 A1 WO2021022983 A1 WO 2021022983A1
Authority
WO
WIPO (PCT)
Prior art keywords
subject
map
visible light
sub
confidence
Prior art date
Application number
PCT/CN2020/102023
Other languages
English (en)
French (fr)
Inventor
黄海东
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021022983A1 publication Critical patent/WO2021022983A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • This application relates to the field of imaging, in particular to an image processing method and device, electronic equipment, and computer-readable storage media.
  • the embodiments of the present application provide an image processing method and device, electronic equipment, and computer-readable storage medium, which can improve the accuracy of subject detection.
  • An image processing method including:
  • the subject detection model is a model obtained by training according to preset conditions of the same scene;
  • An image processing device including:
  • the recognition module is used to input the visible light image into the subject recognition layer of the subject detection model to obtain the subject recognition map; wherein the subject detection model is a model obtained by training according to preset conditions of the same scene;
  • a prediction module configured to input the visible light image into the depth prediction layer of the subject detection model to obtain a depth prediction image
  • a fusion module for fusing the subject recognition map and the depth prediction map to obtain a confidence map of the subject area
  • the determining module is configured to determine the target subject in the visible light image according to the confidence map of the subject area.
  • An electronic device includes a memory and a processor, and a computer program is stored in the memory.
  • the processor executes the following operations:
  • the subject detection model is a model obtained by training according to preset conditions of the same scene;
  • the subject detection model is a model obtained by training according to preset conditions of the same scene;
  • the above-mentioned image processing method and device, electronic equipment, and computer-readable storage medium obtain a visible light image, and input the visible light image into the subject recognition layer of the subject detection model to obtain the subject recognition map, thereby preliminarily identifying the subject in the visible light image.
  • the visible light map is input into the depth prediction layer of the subject detection model, and the depth map corresponding to the visible light map can be obtained.
  • Figure 1 is a block diagram of the internal structure of an electronic device in an embodiment
  • FIG. 2 is a flowchart of an image processing method in an embodiment
  • 3 is a flowchart of the operation of fusing the subject recognition map and the depth prediction map to obtain the confidence map of the subject area in an embodiment
  • FIG. 4 is a flowchart of the operation of determining the weighted confidence of the overlap region corresponding to each sub-block in an embodiment
  • Figure 5 is a schematic diagram of a network structure of a subject detection model in an embodiment
  • FIG. 6 is a flowchart of an image processing method in another embodiment
  • Figure 7 is a schematic diagram of image processing effects in an embodiment
  • Fig. 8 is a structural block diagram of a training device for a subject detection model
  • Fig. 9 is a block diagram of the internal structure of an electronic device in another embodiment.
  • the image processing method and the training method of the subject detection model in the embodiments of the present application can be applied to electronic devices.
  • the electronic device may be a computer device with a camera, a personal digital assistant, a tablet computer, a smart phone, a wearable device, etc.
  • the camera in the electronic device takes an image, it will automatically focus to ensure that the captured image is clear.
  • the above electronic device may include an image processing circuit, which may be implemented by hardware and/or software components, and may include various processing units that define an ISP (Image Signal Processing, image signal processing) pipeline.
  • Fig. 1 is a schematic diagram of an image processing circuit in an embodiment. As shown in FIG. 1, for ease of description, only various aspects of the image processing technology related to the embodiments of the present application are shown.
  • the image processing circuit includes a first ISP processor 130, a second ISP processor 140, and a control logic 150.
  • the first camera 110 includes one or more first lenses 112 and a first image sensor 114.
  • the first image sensor 114 may include a color filter array (such as a Bayer filter).
  • the first image sensor 114 may acquire the light intensity and wavelength information captured by each imaging pixel of the first image sensor 114, and provide information that can be obtained by the first ISP.
  • the second camera 120 includes one or more second lenses 122 and a second image sensor 124.
  • the second image sensor 124 may include a color filter array (such as a Bayer filter).
  • the second image sensor 124 may acquire the light intensity and wavelength information captured by each imaging pixel of the second image sensor 124, and provide information that can be used by the second ISP.
  • a set of image data processed by the processor 140 includes one or more first lenses 112 and a first image sensor 114.
  • the first image collected by the first camera 110 is transmitted to the first ISP processor 130 for processing.
  • the statistical data of the first image (such as image brightness, image contrast value) , The color of the image, etc.) are sent to the control logic 150, and the control logic 150 can determine the control parameters of the first camera 110 according to the statistical data, so that the first camera 110 can perform operations such as auto focus and auto exposure according to the control parameters.
  • the first image may be stored in the image memory 160 after being processed by the first ISP processor 130, and the first ISP processor 130 may also read the image stored in the image memory 160 for processing.
  • the first image can be directly sent to the display 170 for display after being processed by the ISP processor 130, and the display 170 can also read the image in the image memory 160 for display.
  • the first ISP processor 130 processes image data pixel by pixel in multiple formats.
  • each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the first ISP processor 130 may perform one or more image processing operations on the image data and collect statistical information about the image data.
  • the image processing operations can be performed with the same or different bit depth accuracy.
  • the image memory 160 may be a part of a memory device, a storage device, or an independent dedicated memory in an electronic device, and may include DMA (Direct Memory Access) features.
  • DMA Direct Memory Access
  • the first ISP processor 130 may perform one or more image processing operations, such as temporal filtering.
  • the processed image data can be sent to the image memory 160 for additional processing before being displayed.
  • the first ISP processor 130 receives processing data from the image memory 160, and performs image data processing in the RGB and YCbCr color spaces on the processing data.
  • the image data processed by the first ISP processor 130 may be output to the display 170 for viewing by the user and/or further processed by a graphics engine or a GPU (Graphics Processing Unit, graphics processor).
  • the output of the first ISP processor 130 can also be sent to the image memory 160, and the display 170 can read image data from the image memory 160.
  • the image memory 160 may be configured to implement one or more frame buffers.
  • the statistical data determined by the first ISP processor 130 may be sent to the control logic 150.
  • the statistical data may include first image sensor 114 statistical information such as automatic exposure, automatic white balance, automatic focus, flicker detection, black level compensation, and shading correction of the first lens 112.
  • the control logic 150 may include a processor and/or microcontroller that executes one or more routines (such as firmware), and the one or more routines can determine the control parameters and the first camera 110 of the first camera 110 based on the received statistical data.
  • the control parameters of the first camera 110 may include gain, integration time of exposure control, anti-shake parameters, flash control parameters, first lens 112 control parameters (for example, focal length for focusing or zooming), or a combination of these parameters.
  • the ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (for example, during RGB processing), and the first lens 112 shading correction parameters.
  • the second image collected by the second camera 120 is transmitted to the second ISP processor 140 for processing.
  • the statistical data of the second image (such as image brightness, image The contrast value of the image, the color of the image, etc.) are sent to the control logic 150.
  • the control logic 150 can determine the control parameters of the second camera 120 according to the statistical data, so that the second camera 120 can perform automatic focusing, automatic exposure and other operations according to the control parameters.
  • the second image can be stored in the image memory 160 after being processed by the second ISP processor 140, and the second ISP processor 140 can also read the image stored in the image memory 160 for processing.
  • the second image can be directly sent to the display 170 for display after being processed by the ISP processor 140, and the display 170 can also read the image in the image memory 160 for display.
  • the second camera 120 and the second ISP processor 140 may also implement the processing procedures described by the first camera 110 and the first ISP processor 130.
  • the first camera 110 may be a color camera
  • the second camera 120 may be a TOF (Time Of Flight) camera or a structured light camera.
  • TOF camera can obtain TOF depth map
  • structured light camera can obtain structured light depth map.
  • the first camera 110 and the second camera 120 may both be color cameras. Obtain binocular depth maps through two color cameras.
  • the first ISP processor 130 and the second ISP processor 140 may be the same ISP processor.
  • the first camera 110 and the second camera 120 capture the same scene to obtain a visible light map and a depth map, respectively, and send the visible light map and the depth map to the ISP processor.
  • the ISP processor can train the subject detection model according to the visible light map, the depth map and the corresponding labeled subject mask map to obtain a trained model.
  • the ISP processor obtains the visible light image; inputs the visible light image into the subject recognition layer of the subject detection model to obtain the subject recognition map; wherein the subject detection model is a model obtained by training according to the preset conditions of the same scene;
  • the visible light map is input into the depth prediction layer of the subject detection model to obtain a depth prediction map; the subject recognition map and the depth prediction map are merged to obtain a subject area confidence map; the subject area confidence map is determined according to the subject area confidence map.
  • the depth map and the subject recognition map are obtained through two-way network recognition, and then the subject recognition map and the depth prediction map are merged to obtain the confidence map of the subject area. According to the confidence map of the subject area, the target subject in the visible light image can be determined more accurately.
  • the target subject in the visible light image can be determined more accurately.
  • the target subject in the visible light image can be determined more accurately.
  • FIG. 2 is a flowchart of an image processing method in an embodiment. As shown in Figure 2, the image processing method includes:
  • subject detection refers to automatically processing regions of interest when facing a scene and selectively ignoring regions of interest.
  • the area of interest is called the body area.
  • the visible light image refers to an RGB (Red, Green, Blue) image. Color images can be obtained by shooting any scene with a color camera, that is, RGB images.
  • the visible light image may be stored locally by the electronic device, may also be stored by other devices, may also be stored on the network, or may be captured by the electronic device in real time, but is not limited to this.
  • the ISP processor or central processing unit of the electronic device can obtain a visible light image from a local or other device or network, or obtain a visible light image by shooting a scene through a camera.
  • the visible light image is input into the subject recognition layer of the subject detection model to obtain the subject recognition map.
  • the subject detection model is a model obtained by training according to the preset conditions of the same scene.
  • the preset condition refers to obtaining different training data according to the same scene, and training the subject detection model according to different training data.
  • the training data acquired according to the same scene may include the visible light map, the depth map and the corresponding labeled subject mask map of the same scene.
  • the subject detection model is obtained by inputting the visible light map, the depth map and the corresponding labeled subject mask map of the same scene into the subject detection model including the initial network weight for training.
  • the visible light image is used as the input of the trained subject detection model
  • the depth map and the labeled subject mask image are used as the ground truth of the trained subject detection model.
  • the subject mask map is an image filter template used to identify the subject in the image, which can block other parts of the image and filter out the subject in the image.
  • the subject detection model can be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.
  • the training data acquired according to the same scene may include the visible light map, the center weight map, the depth map, and the labeled subject mask map corresponding to the same scene.
  • the visible light map and the center weight map are used as the input of the trained subject detection model
  • the depth map and the labeled subject mask map are used as the ground truth that the trained subject detection model expects to output.
  • the subject detection model includes a subject recognition layer and a depth prediction layer.
  • the ISP processor or central processing unit can input the visible light image into the subject recognition layer in the subject detection model, and the subject recognition layer processes the visible light image. Get the corresponding subject identification map.
  • the visible light image is input into the depth prediction layer of the subject detection model to obtain the depth prediction image.
  • the depth prediction layer of the subject detection model is used to detect the visible light image to obtain the depth prediction image corresponding to the visible light image.
  • the ISP processor or the central processor can input the visible light image into the depth prediction layer in the subject detection model, and process the visible light image through the depth prediction layer to obtain the depth prediction image corresponding to the visible light image.
  • the subject recognition map and the depth prediction map are merged to obtain the subject region confidence map.
  • image fusion refers to a technology that extracts the most favorable information in the channel from image data about the same image collected by multiple source channels to synthesize a high-quality image.
  • the ISP processor or the central processing unit may perform fusion processing on the subject recognition map and the depth prediction map through a fusion algorithm to obtain the subject region confidence map.
  • the subject area confidence map is used to record the probability of the subject which can be identified. For example, the probability of a certain pixel belonging to a person is 0.8, the probability of a flower is 0.1, and the probability of a background is 0.1.
  • Operation 210 Determine the target subject in the visible light image according to the confidence map of the subject area.
  • the subject refers to various objects, such as people, flowers, cats, dogs, cows, blue sky, white clouds, background, etc.
  • the target subject refers to the subject in need, which can be selected according to needs.
  • the ISP processor or the central processing unit can select the highest or second highest confidence level as the subject in the visible light map according to the confidence map of the subject area. If there is one subject, then the subject will be regarded as the target subject; if there are multiple subjects, One or more subjects can be selected as the target subject according to needs.
  • a visible light image is obtained, and the visible light image is input into the subject recognition layer of the subject detection model to obtain the subject recognition map, thereby preliminarily identifying the subject in the visible light image.
  • the visible light map is input into the depth prediction layer of the subject detection model, and the depth map corresponding to the visible light map can be obtained.
  • the depth map and the subject recognition map are obtained through two-way network recognition, and then the subject recognition map and the depth prediction map are merged to obtain the confidence map of the subject area.
  • the target subject in the visible light map can be determined, using the visible light map and depth Subject detection models trained on images and subject mask maps, or subject detection models trained on visible light maps, center weight maps, depth maps, and subject mask maps, can more accurately identify the target subject in the visible light map .
  • the fusion of the subject recognition map and the depth prediction map to obtain the subject region confidence map includes:
  • Operation 302 Perform block processing on the depth prediction map to obtain at least two sub-blocks.
  • the ISP processor or the central processor may divide the depth prediction map into connected domain blocks. Further, the depth prediction map can be divided into different sub-blocks according to different depths, and at least two sub-blocks can be obtained.
  • Operation 304 Determine the overlap area between each of the at least two sub-blocks and the subject identification map, and determine the weighted confidence of the overlap area corresponding to each sub-block.
  • Confidence degree also called reliability, confidence level or confidence coefficient, refers to the probability that the overall parameter value falls within a certain area of the sample statistical value.
  • the weighted confidence refers to the confidence after the weighting factor is assigned.
  • the ISP processor or the central processor determines the overlap area between each sub-block and the subject identification map in at least two sub-blocks, and can perform an AND operation on each sub-block and the subject identification map, and keep each sub-block in the subject identification map.
  • the area in the subject identification map is the overlapping area.
  • the ISP processor or the central processing unit can calculate the weighted confidence of the overlapping area, which is the region reserved for each sub-block in the subject identification map, to obtain the weighted confidence of the overlapping region corresponding to each sub-block.
  • a confidence map of the subject region is generated according to the weighted confidence.
  • the ISP processor or the central processing unit may generate the body region confidence map according to the weighted confidence of the overlapping region corresponding to each sub-block.
  • the image processing method in this embodiment performs block processing on the depth prediction map to obtain at least two sub-blocks, determines the overlap area between each sub-block of the at least two sub-blocks and the subject identification map, and determines that each sub-block corresponds to Based on the weighted confidence of the overlapping area, the subject area confidence map is generated according to the weighted confidence, and the fusion of the depth prediction map and the subject identification map can be obtained. Combining the depth prediction map and the subject recognition map to identify the subject of the image, the accuracy and accuracy of subject recognition are improved.
  • the determining the weighted confidence of the overlapping area corresponding to each sub-block includes: determining the area of the overlapping area corresponding to each sub-block and the depth of each sub-block; obtaining a weighting factor, and according to the weighting factor , The area of the overlapping area corresponding to each sub-block and the depth of each sub-block, to obtain the weighted confidence of the overlapping area corresponding to each sub-block.
  • the ISP processor or the central processing unit may determine the area of each sub-block reserved in the subject identification map, that is, the area of each sub-block and the overlapping area corresponding to the subject identification map. Then, the ISP processor or the central processing unit can obtain the depth of each sub-block and obtain the weighting factor. According to the weighting factor, the depth of a sub-block and the area of the overlapping area corresponding to the sub-block, the corresponding sub-block can be calculated Weighted confidence of overlapping regions. Further, the weighted confidence of the overlapping area corresponding to each sub-block can be calculated in the same way.
  • the weighted confidence of the overlapping area corresponding to each sub-block is positively correlated with the area of the overlapping area corresponding to each sub-block.
  • the weighted confidence of the overlapping area corresponding to the sub-block is also larger.
  • the weighted confidence of the overlap region corresponding to each sub-block is positively correlated with the depth of each sub-block.
  • the weighted confidence of the overlap region corresponding to the sub-block is also greater.
  • the ISP processor or the central processing unit can calculate the product of the area of the overlapping area corresponding to each sub-block and the weighting factor, and add the corresponding product of each sub-block to the depth of each sub-block. The weighted confidence of the overlapping area corresponding to each sub-block can be obtained.
  • the ISP processor or the central processing unit can calculate the weighted confidence of the overlap region corresponding to each sub-block according to the fusion algorithm.
  • the weighting factor is obtained by determining the area of the overlapping area corresponding to each sub-block and the depth of each sub-block, according to the weighting factor, the area of the overlapping area corresponding to each sub-block, and the area of each sub-block. Depth, the weighted confidence of the overlapping area corresponding to each sub-block is obtained, so that the main area becomes more finely controllable.
  • the fusion of the depth map and the subject detection map can more accurately identify the target subject in the visible light image. This solution can be applied to scenes such as blurring of monocular camera images or assisting autofocus.
  • the ISP processor or the central processing unit may obtain the first weighting factor corresponding to the area of the overlapping area of the sub-block, and the second weighting factor corresponding to the depth of the sub-block.
  • the weighted confidence of the overlapping area corresponding to each sub-block is positively correlated with the area of the overlapping area corresponding to each sub-block, and is also positively correlated with the depth of each sub-block.
  • the ISP processor or the central processor can calculate the product of the area of each sub-block and the first weighting factor, and calculate the product of the depth of each sub-block and the second weighting factor, and The two products corresponding to each sub-block are added together to obtain the weighted confidence of the overlapping area corresponding to each sub-block.
  • the area of the region, d is the depth of a sub-block.
  • the ISP processor or the central processing unit can calculate the weighted confidence of the overlap region corresponding to each sub-block according to the fusion algorithm.
  • the determining the target subject in the visible light image according to the confidence map of the subject area includes:
  • the subject region confidence map is processed to obtain a subject mask map.
  • the subject region confidence map can be filtered by the ISP processor or the central processing unit to obtain the subject mask map.
  • the filtering process can be configured to configure a confidence threshold to filter pixels with a confidence value lower than the confidence threshold in the confidence map of the subject area.
  • the confidence threshold may be an adaptive confidence threshold, a fixed threshold, or a corresponding threshold configured by region.
  • the visible light image is detected, and a highlight area in the visible light image is determined.
  • the highlight area refers to an area where the brightness value is greater than the brightness threshold.
  • the ISP processor or the central processing unit performs highlight detection on the visible light image, filters to obtain target pixels with a brightness value greater than the brightness threshold, and applies connected domain processing to the target pixels to obtain the highlight area.
  • a target subject for eliminating highlights in the visible light image is determined according to the highlight area in the visible light image and the subject mask image.
  • the ISP processor or the central processing unit can perform a difference calculation or a logical AND calculation between the highlight area in the visible light image and the main body mask image to obtain the target subject for eliminating the highlight in the visible light image.
  • the main body area confidence map is filtered to obtain the main body mask image, which improves the reliability of the main body area confidence map.
  • the visible light image is detected to obtain the highlight area, which is then processed with the main body mask image.
  • the target subject with the highlight eliminated is obtained, and the filter is used to process the highlights and highlight areas that affect the accuracy of the subject recognition separately, which improves the accuracy and accuracy of the subject recognition.
  • processing the subject region confidence map to obtain the subject mask map includes: performing adaptive confidence threshold filtering processing on the subject region confidence map to obtain the subject mask map.
  • the adaptive confidence threshold refers to the confidence threshold.
  • the adaptive confidence threshold may be a local adaptive confidence threshold.
  • the local adaptive confidence threshold is to determine the binarization confidence threshold at the position of the pixel according to the pixel value distribution of the domain block of the pixel.
  • the image area with higher brightness has a higher binarization confidence threshold configuration
  • the image area with lower brightness has a lower binarization threshold confidence configuration.
  • the process of configuring the adaptive confidence threshold includes: when the brightness value of the pixel is greater than the first brightness value, configure the first confidence threshold; when the brightness of the pixel is less than the second brightness value, configure the second Confidence threshold.
  • a third confidence threshold is configured, where the second brightness value is less than or equal to the first brightness value, and the second confidence threshold is less than The third confidence threshold, the third confidence threshold is less than the first confidence threshold.
  • the process of configuring the adaptive confidence threshold includes: when the brightness value of the pixel is greater than the first brightness value, configure the first confidence threshold; when the brightness of the pixel is less than or equal to the first brightness value, configure The second confidence threshold, where the second brightness value is less than or equal to the first brightness value, and the second confidence threshold is less than the first confidence threshold.
  • the confidence value of each pixel in the subject area confidence map is compared with the corresponding confidence threshold, and the pixel is retained if it is greater than or equal to the confidence threshold. If it is less than the confidence threshold, the pixel is removed, unnecessary information can be removed, and key information can be retained.
  • performing adaptive confidence threshold filtering processing on the subject region confidence map to obtain the subject mask map includes:
  • the ISP processor or the central processing unit filters the confidence map of the subject area according to the adaptive confidence threshold, and then uses 1 to represent the confidence value of the retained pixels, and uses 0 to represent the confidence value of the removed pixels. , Get the binarization mask map.
  • Morphological treatments can include corrosion and expansion. You can perform the erosion operation on the binarized mask first, and then perform the expansion operation to remove the noise; and then conduct the guided filtering process on the binarized mask after morphological processing to realize the edge filtering operation and obtain the main mask for edge extraction.
  • Membrane diagram
  • the determining the target subject for eliminating the highlight in the visible light image based on the highlight area in the visible light image and the main body mask image includes: forming the highlight area in the visible light image and the main body mask image Differential processing, get the target subject that eliminates highlights.
  • the ISP processor or the central processing unit performs differential processing on the highlight area in the visible light image and the main body mask image, that is, the corresponding pixel values in the visible light image and the main body mask image are subtracted to obtain the image in the visible light image.
  • Target subject The target subject with the highlight removed is obtained through difference processing, and the calculation method is simple.
  • the training method of the subject detection model includes:
  • the depth prediction layer of the subject detection model uses the depth map and the labeled subject mask map as the true value of the subject detection model output, and trains the subject detection model including the initial network weight to obtain the subject detection model Target network weight.
  • a large number of visible light images can be collected, and then based on the foreground target image and simple background image in the COCO data set, a large number of pure-color background or simple background images can be obtained as training visible light images.
  • the COCO data set contains a large number of prospects.
  • the network structure of the subject detection model adopts the architecture based on mobile-Unet, and the bridge between the layers is added in the decoder part, so that the high-level semantic features are more fully transmitted during upsampling.
  • the central weight map acts on the output layer of the subject monitoring model, and introduces a central attention mechanism to make it easier for the object in the center of the screen to be detected as the subject.
  • the subject detection model includes an input layer, a subject recognition layer, a depth prediction layer and an output layer.
  • the network structure of the subject recognition layer includes a convolution layer (conv), a pooling layer (pooling), a bilinear interpolation layer (Bilinear Up sampling), a convolution feature connection layer (concat+conv), an output layer, etc.
  • the deconvolution+add (deconvolution feature superposition) operation is used to bridge between the bilinear interpolation layer and the convolution feature connection layer, so that high-level semantic features are more fully transferred during upsampling.
  • Convolutional layer, pooling layer, bilinear interpolation layer, convolution feature connection layer, etc. can be the middle layer of the subject detection model.
  • the network structure of the depth prediction layer includes convolutional layer (conv), pooling layer and so on.
  • the initial network weight refers to the initial weight of each layer of the initialized deep learning network model.
  • the initial network weight is continuously updated iteratively, so as to obtain the target network weight.
  • the target network weight refers to the weight of each layer of the deep learning network model trained to detect the main body of the image.
  • the initial network weight is the initial weight of each layer in the initialized subject detection model.
  • the target network weight refers to the weight of each layer in the subject detection model that is trained to detect the subject of the image.
  • the target network weight can be obtained by preset training times, and the loss function of the deep learning network model can also be set. When the training loss function value is less than the loss threshold, the current network weight of the subject detection model is used as the target network weight.
  • Fig. 5 is a schematic diagram of the network structure of the subject detection model in an embodiment.
  • the network structure of the subject recognition layer of the subject detection model includes a convolutional layer 502, a pooling layer 504, a convolutional layer 506, a pooling layer 508, a convolutional layer 510, a pooling layer 512, and a convolutional layer.
  • the product feature connection layer 542 serves as the output layer of the subject recognition layer.
  • the coding part of the subject detection model includes convolutional layer 502, pooling layer 504, convolutional layer 506, pooling layer 508, convolutional layer 510, pooling layer 512, convolutional layer 514, pooling layer 516, convolutional layer Layer 518, the decoding part includes convolution layer 520, bilinear interpolation layer 522, convolution layer 524, bilinear interpolation layer 526, convolution layer 528, convolution feature connection layer 530, bilinear interpolation layer 532, convolution Build-up layer 534, convolution feature connection layer 536, bilinear interpolation layer 538, convolution layer 540, convolution feature connection layer 542.
  • the convolutional layer 506 and the convolutional layer 534 are concatenated (Concatenation), the convolutional layer 510 and the convolutional layer 528 are concatenated, and the convolutional layer 514 is concatenated with the convolutional layer 524.
  • the bilinear interpolation layer 522 and the convolution feature connection layer 530 are bridged by deconvolution feature stacking (Deconvolution+add).
  • the bilinear interpolation layer 532 and the convolution feature connection layer 536 adopt deconvolution feature overlay bridge.
  • the bilinear interpolation layer 538 and the convolution feature connection layer 542 are bridged by deconvolution feature overlay.
  • the network structure of the depth prediction layer of the subject detection model includes convolutional layer 552, pooling layer 554, convolutional layer 556, pooling layer 558, convolutional layer 560, pooling layer 562, convolutional layer 564, and pooling layer 566 , Convolutional layer 568, Pooling layer 570, Convolutional layer 572, Pooling layer 574, Convolutional layer 576, Pooling layer 578.
  • the convolutional layer 552 is used as the input layer of the depth prediction layer
  • the pooling layer 578 is used as the output layer of the depth prediction layer.
  • the output feature sizes of the convolution layer 564, the pooling layer 566, the convolution layer 568, the pooling layer 570, the convolution layer 572, the pooling layer 574, the convolution layer 576, and the pooling layer 578 are the same.
  • the network structure of the subject recognition layer and the network structure of the depth prediction layer of the subject detection model in this embodiment are only examples, and are not intended to limit the application. It is understandable that the convolutional layer, pooling layer, bilinear interpolation layer, convolutional feature connection layer, etc. in the network structure of the subject detection model can be set in multiples as required.
  • the original image 500 (such as the visible light image) is input to the convolution layer 502 of the subject recognition layer of the subject detection model, and the original image 500 (such as the visible light image) is input into the convolution layer 552 of the depth prediction layer of the subject detection model.
  • the convolution feature connection layer 542 of the subject recognition layer outputs a subject recognition map 580
  • the pooling layer 578 of the depth prediction layer outputs a depth prediction map 590.
  • a preset value loss rate is used for the depth map.
  • the preset value can be 50%.
  • Probability dropout is introduced in the training process of the depth map, so that the subject detection model can fully mine the information of the depth map. When the subject detection model cannot obtain the depth map, it can still output accurate results. Using the dropout method for the depth map input makes the subject detection model more robust to the depth map, and can accurately segment the subject area even if there is no depth map.
  • the depth map is designed with a dropout probability of 50% during training, which can ensure the subject detection model when there is no depth information. It can still be detected normally.
  • a dual deep learning network structure is designed.
  • One of the deep learning network structure is used to process the RGB map to obtain the deep prediction map, and the other deep learning network structure is used to process the RGB map to obtain the subject recognition map.
  • the output of the two deep learning network structures is connected by convolutional features, that is, the depth prediction map and the subject recognition map are merged and then output, which can accurately identify the target subject in the visible light image.
  • training to obtain a subject detection model according to preset conditions of the same scene includes: obtaining a visible light map, a depth map, and a labeled subject mask map of the same scene; generating a center weight map corresponding to the visible light map, Among them, the weight value represented by the center weight map gradually decreases from center to edge; the visible light map is applied to the input layer of the subject detection model containing the initial network weight, and the depth map and the center weight map are applied to the initial
  • the output layer of the subject detection model uses the labeled subject mask map as the true value of the subject detection model output, and trains the subject detection model including the initial network weight to obtain the target network weight of the subject detection model.
  • the Methods when the subject detection model is a model obtained by training in advance according to the visible light map, center weight map, depth map, and corresponding labeled subject mask map of the same scene, the Methods also include:
  • a center weight map corresponding to the visible light map is generated, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.
  • the central weight map refers to a map used to record the weight value of each pixel in the visible light image.
  • the weight value recorded in the center weight map gradually decreases from the center to the four sides, that is, the center weight is the largest, and the weight gradually decreases toward the four sides.
  • the weight value from the center pixel of the visible light image to the edge pixel of the image is gradually reduced by the center weight map.
  • the ISP processor or the central processing unit can generate a corresponding central weight map according to the size of the visible light map.
  • the weight value represented by the center weight map gradually decreases from the center to the four sides.
  • the center weight map can be generated using a Gaussian function, a first-order equation, or a second-order equation.
  • the Gaussian function may be a two-dimensional Gaussian function.
  • the center weight map is applied to the output layer of the subject detection model.
  • the fusion of the subject recognition map and the depth prediction map to obtain a confidence map of the subject area includes:
  • the center weight map, the subject recognition map, and the depth prediction map are merged to obtain a subject region confidence map.
  • the subject recognition layer of the subject detection model outputs the subject recognition map
  • the ISP processor or central processing unit applies the central weight map to the output layer of the subject detection model
  • the central weight map, the subject recognition map and the depth prediction map are merged through the output layer to obtain the subject region confidence map.
  • the image processing method in this embodiment obtains the visible light image and generates the center weight map corresponding to the visible light image, and then inputs the visible light image into the subject recognition layer and the depth prediction layer of the subject detection model for detection to obtain the subject recognition map and depth Forecast chart.
  • the central weight map is applied to the output layer of the subject detection model, and combined with the subject recognition map and the depth prediction map for processing, the subject area confidence map can be obtained, and the target subject in the visible light map can be determined according to the subject area confidence map.
  • Using the center weight map can make the object in the center of the image easier to be detected.
  • the trained subject detection model trained with visible light map, center weight map and subject mask map you can more accurately identify the target in the visible light map main body.
  • the above-mentioned image processing method further includes: when there are multiple subjects, according to the priority of each subject's category, the area occupied by each subject in the visible light image, and each subject in the visible light image At least one of the positions in, determine the target subject.
  • the category refers to the category of the subject, such as portraits, flowers, animals, landscapes and other categories.
  • the position refers to the position in the visible light map, and can be expressed in coordinates.
  • the priority of each subject's category is obtained, and the subject with the highest priority or the second highest priority is selected as the target subject.
  • the area occupied by each subject in the visible light image is obtained, and the subject with the largest or second largest area in the visible light image is selected as the target subject.
  • the position of each subject in the visible light map is obtained, and the subject with the smallest distance between the position of the subject in the visible light map and the center point of the visible light map is selected as the target subject.
  • the area occupied by the multiple subjects in the visible light image is obtained, and the subject with the largest or second largest area in the visible light image is selected as the target subject.
  • one or at least two of the priority of the subject's category, the subject's area in the visible light map, and the subject's position in the visible light map are screened to determine the target subject, which can be accurately determined Target subject.
  • the aforementioned image processing method further includes: when it is determined that there are multiple subjects, and the multiple subjects are all human faces, judging whether the multiple human faces are on the same plane;
  • the face with the largest area is selected as the target subject.
  • the depth information of each face can be obtained, and whether multiple faces are on the same plane by comparing whether the depth information is the same. When the depth information is the same, they are on the same plane, and when the depth information is different, they are not on the same plane. .
  • the depth information of the face can be represented by the average, median, or weighted value of the depth information of each pixel in the area where the face is located.
  • the depth information of the face can also be calculated by using each pixel in the area where the face is located according to a preset function.
  • the preset function can be a linear function, an exponential function, or a power function.
  • Figure 7 is a schematic diagram of image processing effects in an embodiment. As shown in Figure 7, there is a butterfly in the RGB image 702. After inputting the RGB image into the subject detection model, the subject area confidence map 704 is obtained, and then the subject area confidence map 704 is filtered and binarized to obtain the binarization. The mask image 706 is then subjected to morphological processing and guided filtering on the binarized mask image 706 to achieve edge enhancement, and the main mask image 708 is obtained.
  • an image processing method including:
  • Operation (a1) is to obtain the visible light map, the depth map and the marked subject mask map of the same scene.
  • Operation (a2) apply the visible light map to the subject recognition layer of the subject detection model containing the initial network weights, and apply the visible light map to the depth prediction layer of the subject detection model containing the initial network weights, and the depth map and The labeled subject mask map is used as the true value output by the subject detection model, and the subject detection model including the initial network weight is trained to obtain the target network weight of the subject detection model.
  • Operation (a4) input the visible light image into the subject recognition layer of the subject detection model to obtain a subject recognition map.
  • the subject detection model is a model obtained by training in advance according to the visible light map, the depth map and the corresponding labeled subject mask map of the same scene.
  • Operation (a5) input the visible light image into the depth prediction layer of the subject detection model to obtain a depth prediction image.
  • Operation (a6) is to perform block processing on the depth prediction map to obtain at least two sub-blocks.
  • Operation (a7) is to determine the overlapping area between each of the at least two sub-blocks and the subject identification map, and determine the area of the overlapping area corresponding to each sub-block and the depth of each sub-block.
  • Operation (a8) is to obtain a weighting factor, and according to the weighting factor, the area of the overlapping area corresponding to each sub-block, and the depth of each sub-block, the weighted confidence of the overlapping area corresponding to each sub-block is obtained.
  • a confidence map of the subject region is generated according to the weighted confidence.
  • the subject region confidence map is subjected to adaptive confidence threshold filtering processing to obtain a binarized mask map.
  • morphological processing and guided filtering processing are performed on the binary mask image to obtain the main body mask image.
  • Operation (a12) is to detect the visible light image and determine the highlight area in the visible light image.
  • Operation (a13) according to the highlight area in the visible light image and the subject mask image, determine the target subject for eliminating the highlight in the visible light image.
  • the RGB image is recognized through a two-way network, and a center weight map is introduced, which enhances the depth feature and the center attention feature, which can not only be accurate Segmenting simple scenes, such as the subject in a scene with a single subject and low contrast in the background area, greatly improves the accuracy of target subject recognition in complex scenes.
  • the introduction of depth map can solve the problem of poor robustness of traditional target detection methods to the ever-changing targets of natural images. Aiming at the highlight and highlight areas that affect the accuracy of subject recognition, highlight detection is used to identify the highlight areas in the RGB image, and then a separate filter is used for filtering.
  • FIG. 8 is a structural block diagram of an image processing apparatus according to an embodiment.
  • an image processing device includes: an acquisition module 802, an identification module 804, a prediction module 806, a fusion module 808, and a determination module 810. among them,
  • the obtaining module 802 is used to obtain a visible light image.
  • the recognition module 804 is configured to input the visible light image into the subject recognition layer of the subject detection model to obtain the subject recognition map; wherein, the subject detection model is a model obtained by training according to preset conditions of the same scene.
  • the prediction module 806 is configured to input the visible light image into the depth prediction layer of the subject detection model to obtain a depth prediction image.
  • the fusion module 808 is used for fusing the subject recognition map and the depth prediction map to obtain the subject region confidence map.
  • the determining module 810 is configured to determine the target subject in the visible light image according to the confidence map of the subject area.
  • the image processing device in this embodiment obtains the visible light image, and inputs the visible light image into the subject recognition layer of the subject detection model to obtain the subject recognition map, thereby preliminarily identifying the subject in the visible light image.
  • the visible light map is input into the depth prediction layer of the subject detection model, and the depth map corresponding to the visible light map can be obtained.
  • the depth map and the subject recognition map are obtained through two-way network recognition, and then the subject recognition map and the depth prediction map are merged to obtain the confidence map of the subject area.
  • the target subject in the visible light map can be determined, using the visible light map and depth Subject detection models trained on images and subject mask maps, or subject detection models trained on visible light maps, center weight maps, depth maps, and subject mask maps, can more accurately identify the target subject in the visible light map .
  • the fusion module 808 is further configured to: perform block processing on the depth prediction map to obtain at least two sub-blocks; determine the overlap area between each of the at least two sub-blocks and the subject identification map, and Determine the weighted confidence of the overlapping area corresponding to each sub-block; generate a confidence map of the subject area according to the weighted confidence.
  • the image processing device in this embodiment performs block processing on the depth prediction map to obtain at least two sub-blocks, determines the overlap area between each sub-block of the at least two sub-blocks and the subject identification map, and determines that each sub-block corresponds to Based on the weighted confidence of the overlapping area, the subject area confidence map is generated according to the weighted confidence, and the fusion of the depth prediction map and the subject identification map can be obtained. Combining the depth prediction map and the subject recognition map to identify the subject of the image, the accuracy and accuracy of subject recognition are improved.
  • the fusion module 808 is further configured to: determine the area of the overlapping area corresponding to each sub-block and the depth of each sub-block; obtain a weighting factor, according to the weighting factor, the overlapping area corresponding to each sub-block The area and the depth of each sub-block are used to obtain the weighted confidence of the overlapping area corresponding to each sub-block.
  • the weighting factor is obtained.
  • the weighting factor is obtained.
  • the weighted confidence of the overlapping regions corresponding to each sub-block makes the subject region more finely controllable.
  • the fusion of the depth map and the subject detection map can more accurately identify the target subject in the visible light image.
  • the determining module 810 is further configured to: process the confidence map of the subject region to obtain a subject mask map; detect the visible light map to determine the highlight area in the visible light map; The region and the subject mask map determine the target subject for eliminating highlights in the visible light map. Filtering the confidence map of the subject area to obtain the subject mask map, which improves the reliability of the confidence map of the subject area.
  • the visible light map is detected to obtain the highlight area, and then processed with the subject mask map to get the highlight elimination
  • a filter is used to process the highlights and highlight areas that affect the accuracy of the subject's recognition separately, which improves the accuracy and accuracy of the subject's recognition.
  • the determining module 810 is further configured to: perform adaptive confidence threshold filtering processing on the subject region confidence map to obtain a subject mask map.
  • the confidence value of each pixel in the subject area confidence map is compared with the corresponding confidence threshold, and the pixel is retained if it is greater than or equal to the confidence threshold. If it is less than the confidence threshold, the pixel is removed, unnecessary information can be removed, and key information can be retained.
  • the determining module 810 is further configured to: perform adaptive confidence threshold filtering processing on the subject region confidence map to obtain a binarized mask map; perform morphological processing and processing on the binarized mask map Guide the filtering process to obtain the main body mask. Through morphological processing and guided filtering processing, it can be ensured that the resulting subject mask has less or no noise and the edges are softer.
  • the image processing device further includes: a training module.
  • the training module is used to: obtain the visible light map, depth map, and annotated subject mask map of the same scene; apply the visible light map to the subject recognition layer of the subject detection model containing the initial network weight, and apply the visible light map to The depth prediction layer of the subject detection model including the initial network weight, the depth map and the labeled subject mask map as the true value of the subject detection model output, and the subject detection model including the initial network weight is trained, Get the target network weight of the subject detection model.
  • the device when the subject detection model is a model obtained by pre-training based on the visible light map, center weight map, depth map, and corresponding labeled subject mask map of the same scene, the device further includes: a generation module .
  • the generating module is used for: generating a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map gradually decreases from center to edge; and the center weight map is applied to the output layer of the subject detection model ;
  • the fusion module is also used to fuse the center weight map, the subject identification map, and the depth prediction map to obtain a confidence map of the subject area.
  • a dual deep learning network structure is designed.
  • One of the deep learning network structure is used to process the RGB map to obtain the deep prediction map, and the other deep learning network structure is used to process the RGB map to obtain the subject recognition map.
  • the output of the two deep learning network structures is connected by convolutional features, that is, the depth prediction map and the subject recognition map are merged and then output, which can accurately identify the target subject in the visible light image.
  • the image processing apparatus may be divided into different modules as required to complete all or part of the functions of the above-mentioned image processing apparatus.
  • Fig. 9 is a schematic diagram of the internal structure of an electronic device in an embodiment.
  • the electronic device includes a processor and a memory connected through a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire electronic device.
  • the memory may include a non-volatile storage medium and internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the computer program can be executed by a processor to implement an image processing method provided in the following embodiments.
  • the internal memory provides a cached operating environment for the operating system computer program in the non-volatile storage medium.
  • the electronic device can be a mobile phone, a tablet computer or a personal digital assistant or a wearable device.
  • each module in the image processing apparatus provided in the embodiment of the present application may be in the form of a computer program.
  • the computer program can be run on a terminal or server.
  • the program module composed of the computer program can be stored in the memory of the terminal or server.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • a computer program product containing instructions that, when run on a computer, causes the computer to execute an image processing method.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • RDRAM synchronous chain Channel
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法,包括:获取可见光图;将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;融合所述主体识别图和所述深度预测图,得到主体区域置信度图;根据所述主体区域置信度图确定所述可见光图中的目标主体。

Description

图像处理方法和装置、电子设备、计算机可读存储介质
相关申请的交叉引用
本申请要求于2019年08月07日提交中国专利局、申请号为2019107267853、发明名称为“图像处理方法和装置、电子设备、计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及影像领域,特别是涉及一种图像处理方法和装置、电子设备、计算机可读存储介质。
背景技术
随着影像技术的发展,人们越来越习惯通过电子设备上的摄像头等图像采集设备拍摄图像或视频,记录各种信息。摄像头在采集图像过程中有时需要检测到主体,传统的主体检测方式无法准确的检测出图像中的主体。
发明内容
本申请实施例提供一种图像处理方法和装置、电子设备、计算机可读存储介质,能够提高主体检测的准确性。
一种图像处理方法,包括:
获取可见光图;
将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
根据所述主体区域置信度图确定所述可见光图中的目标主体。
一种图像处理装置,包括:
获取模块,用于获取可见光图;
识别模块,用于将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
预测模块,用于将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
融合模块,用于融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
确定模块,用于根据所述主体区域置信度图确定所述可见光图中的目标主体。
一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下操作:
获取可见光图;
将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
根据所述主体区域置信度图确定所述可见光图中的目标主体。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下操作:
获取可见光图;
将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
根据所述主体区域置信度图确定所述可见光图中的目标主体。
上述图像处理方法和装置、电子设备、计算机可读存储介质,获取可见光图,将可见光图输入主体检测模型的主体识别层中,可以得到主体识别图,从而初步识别出可见光图中的主体。将可见光图输入主体检测模型的深度预测层中,可以得到可见光图对应的深度图。通过双路网络识别得到深度图和主体识别图,再融合主体识别图和深度预测图,得到主体区域置信度图,根据主体区域置信度图确定可见光图中的目标主体,从而更加准确的识别出可见光图中的目标主体。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中电子设备的内部结构框图;
图2为一个实施例中图像处理方法的流程图;
图3为一个实施例中融合主体识别图和深度预测图,得到主体区域置信度图的操作的流程图;
图4为一个实施例中确定每个子块对应的重叠区域的加权置信度的操作的流程图;
图5为一个实施例中主体检测模型的网络结构示意图;
图6为另一个实施例中图像处理方法的流程图;
图7为一个实施例中图像处理效果示意图;
图8为一种主体检测模型的训练装置的结构框图;
图9为另一个实施例中电子设备的内部结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例中的图像处理方法、主体检测模型的训练方法可应用于电子设备。该电子设备可为带有摄像头的计算机设备、个人数字助理、平板电脑、智能手机、穿戴式设备等。电子设备中的摄像头在拍摄图像时,会进行自动对焦,以保证拍摄的图像清晰。
在一个实施例中,上述电子设备中可包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义ISP(Image Signal Processing,图像信号处理)管线的各种处理单元。图1为一个实施例中图像处理电路的示意图。如图1所示,为便于说明,仅示出与本申请实施例相关的图像处理技术的各个方面。
如图1所示,图像处理电路包括第一ISP处理器130、第二ISP处理器140和控制逻辑器150。第一摄像头110包括一个或多个第一透镜112和第一图像传感器114。第一图像传感器114可包括色彩滤镜阵列(如Bayer滤镜),第一图像传感器114可获取用第一图像传感器114的每个成像像素捕捉的光强度和波长信息,并提供可由第一ISP处理器130处理的一组图像数据。第二摄像头120包括一个或多个第二透镜122和第二图像传感器 124。第二图像传感器124可包括色彩滤镜阵列(如Bayer滤镜),第二图像传感器124可获取用第二图像传感器124的每个成像像素捕捉的光强度和波长信息,并提供可由第二ISP处理器140处理的一组图像数据。
第一摄像头110采集的第一图像传输给第一ISP处理器130进行处理,第一ISP处理器130处理第一图像后,可将第一图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器150,控制逻辑器150可根据统计数据确定第一摄像头110的控制参数,从而第一摄像头110可根据控制参数进行自动对焦、自动曝光等操作。第一图像经过第一ISP处理器130进行处理后可存储至图像存储器160中,第一ISP处理器130也可以读取图像存储器160中存储的图像以对进行处理。另外,第一图像经过ISP处理器130进行处理后可直接发送至显示器170进行显示,显示器170也可以读取图像存储器160中的图像以进行显示。
其中,第一ISP处理器130按多种格式逐个像素地处理图像数据。例如,每个图像像素可具有8、10、12或14比特的位深度,第一ISP处理器130可对图像数据进行一个或多个图像处理操作、收集关于图像数据的统计信息。其中,图像处理操作可按相同或不同的位深度精度进行。
图像存储器160可为存储器装置的一部分、存储设备、或电子设备内的独立的专用存储器,并可包括DMA(Direct Memory Access,直接直接存储器存取)特征。
当接收到来自第一图像传感器114接口时,第一ISP处理器130可进行一个或多个图像处理操作,如时域滤波。处理后的图像数据可发送给图像存储器160,以便在被显示之前进行另外的处理。第一ISP处理器130从图像存储器160接收处理数据,并对所述处理数据进行RGB和YCbCr颜色空间中的图像数据处理。第一ISP处理器130处理后的图像数据可输出给显示器170,以供用户观看和/或由图形引擎或GPU(Graphics Processing Unit,图形处理器)进一步处理。此外,第一ISP处理器130的输出还可发送给图像存储器160,且显示器170可从图像存储器160读取图像数据。在一个实施例中,图像存储器160可被配置为实现一个或多个帧缓冲器。
第一ISP处理器130确定的统计数据可发送给控制逻辑器150。例如,统计数据可包括自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、第一透镜112阴影校正等第一图像传感器114统计信息。控制逻辑器150可包括执行一个或多个例程(如固件)的处理器和/或微控制器,一个或多个例程可根据接收的统计数据,确定第一摄像头110的控制参数及第一ISP处理器130的控制参数。例如,第一摄像头110的控制参数可包括增益、曝光控制的积分时间、防抖参数、闪光控制参数、第一透镜112控制参数(例如聚焦或变焦用焦距)、或这些参数的组合等。ISP控制参数可包括用于自动白平衡和颜色调整(例如,在RGB处理期间)的增益水平和色彩校正矩阵,以及第一透镜112阴影校正参数。
同样地,第二摄像头120采集的第二图像传输给第二ISP处理器140进行处理,第二ISP处理器140处理第一图像后,可将第二图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器150,控制逻辑器150可根据统计数据确定第二摄像头120的控制参数,从而第二摄像头120可根据控制参数进行自动对焦、自动曝光等操作。第二图像经过第二ISP处理器140进行处理后可存储至图像存储器160中,第二ISP处理器140也可以读取图像存储器160中存储的图像以对进行处理。另外,第二图像经过ISP处理器140进行处理后可直接发送至显示器170进行显示,显示器170也可以读取图像存储器160中的图像以进行显示。第二摄像头120和第二ISP处理器140也可以实现如第一摄像头110和第一ISP处理器130所描述的处理过程。
在一个实施例中,第一摄像头110可为彩色摄像头,第二摄像头120可为TOF(Time Of Flight,飞行时间)摄像头或结构光摄像头。TOF摄像头可获取TOF深度图,结构光摄像头可获取结构光深度图。第一摄像头110和第二摄像头120可均为彩色摄像头。通过两 个彩色摄像头获取双目深度图。第一ISP处理器130和第二ISP处理器140可为同一ISP处理器。
第一摄像头110和第二摄像头120拍摄同一场景分别得到可见光图和深度图,将可见光图和深度图发送给ISP处理器。ISP处理器可根据可见光图和深度图及对应的已标注的主体掩膜图对主体检测模型进行训练,得到训练好的模型。ISP处理器获取可见光图;将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;融合所述主体识别图和所述深度预测图,得到主体区域置信度图;根据所述主体区域置信度图确定所述可见光图中的目标主体。通过双路网络识别得到深度图和主体识别图,再融合主体识别图和深度预测图,得到主体区域置信度图,根据主体区域置信度图确定可见光图中的目标主体,可以更加准确的识别出可见光图中的目标主体。
图2为一个实施例中图像处理方法的流程图。如图2所示,该图像处理方法包括:
操作202,获取可见光图。
其中,主体检测(salient object detection)是指面对一个场景时,自动地对感兴趣区域进行处理而选择性的忽略不感兴趣区域。感兴趣区域称为主体区域。可见光图是指RGB(Red、Green、Blue)图像。可通过彩色摄像头拍摄任意场景得到彩色图像,即RGB图像。该可见光图可为电子设备本地存储的,也可为其他设备存储的,也可以为从网络上存储的,还可为电子设备实时拍摄的,不限于此。
具体地,电子设备的ISP处理器或中央处理器可从本地或其他设备或网络上获取可见光图,或者通过摄像头拍摄一场景得到可见光图。
操作204,将可见光图输入主体检测模型的主体识别层中,得到主体识别图。其中,主体检测模型是根据同一场景的预设条件进行训练得到的模型。
其中,预设条件是指根据同一场景获取不同的训练数据,根据不同的训练数据训练主体检测模型。根据同一场景获取的训练数据可包括同一场景的可见光图、深度图及对应的已标注的主体掩膜图。该主体检测模型是将同一场景的可见光图、深度图及对应的已标注的主体掩膜图输入到包含有初始网络权重的主体检测模型进行训练得到的。其中,可见光图作为训练的主体检测模型的输入,深度图和已标注的主体掩膜(mask)图作为训练的主体检测模型期望输出得到的真实值(ground truth)。主体掩膜图是用于识别图像中主体的图像滤镜模板,可以遮挡图像的其他部分,筛选出图像中的主体。主体检测模型可训练能够识别检测各种主体,如人、花、猫、狗、背景等。
在本实施例中,根据同一场景获取的训练数据可以包括同一场景对应的可见光图、中心权重图、深度图及已标注的主体掩膜图。其中,可见光图和中心权重图作为训练的主体检测模型的输入,深度图和已标注的主体掩膜(mask)图作为训练的主体检测模型期望输出得到的真实值(ground truth)。
具体地,该主体检测模型包括主体识别层和深度预测层,ISP处理器或中央处理器可将该可见光图输入该主体检测模型中的主体识别层,主体识别层对该可见光图进行处理,可得到对应的主体识别图。
操作206,将可见光图输入主体检测模型的深度预测层中,得到深度预测图。
具体地,主体检测模型的深度预测层用于对可见光图进行检测,得到可见光图对应的深度预测图。ISP处理器或中央处理器可将该可见光图输入该主体检测模型中的深度预测层,通过深度预测层对该可见光图进行处理,可得到该可见光图对应的深度预测图。
操作208,融合主体识别图和深度预测图,得到主体区域置信度图。
其中,图像融合是指将多源信道所采集到的关于同一图像的图像数据最大限度地提取信道中的有利信息合成高质量图像的技术。
具体地,ISP处理器或中央处理器可将该主体识别图和深度预测图通过融合算法进行融合处理,得到主体区域置信度图。主体区域置信度图是用于记录主体属于哪种能识别的主体的概率,例如某个像素点属于人的概率是0.8,花的概率是0.1,背景的概率是0.1等。
操作210,根据主体区域置信度图确定该可见光图中的目标主体。
其中,主体是指各种对象,如人、花、猫、狗、牛、蓝天、白云、背景等。目标主体是指需要的主体,可根据需要选择。
具体地,ISP处理器或中央处理器可根据主体区域置信度图选取置信度最高或次高等作为可见光图中的主体,若存在一个主体,则将该主体作为目标主体;若存在多个主体,可根据需要选择其中一个或多个主体作为目标主体。
本实施例中的图像处理方法,获取可见光图,将可见光图输入主体检测模型的主体识别层中,可以得到主体识别图,从而初步识别出可见光图中的主体。将可见光图输入主体检测模型的深度预测层中,可以得到可见光图对应的深度图。通过双路网络识别得到深度图和主体识别图,再融合主体识别图和深度预测图,得到主体区域置信度图,根据主体区域置信度图可以确定可见光图中的目标主体,利用可见光图、深度图和主体掩膜图等训练得到的主体检测模型,或者利用可见光图、中心权重图、深度图和主体掩膜图等训练得到的主体检测模型,可以更加准确的识别出可见光图中的目标主体。
在一个实施例中,如图3所示,该融合该主体识别图和该深度预测图,得到主体区域置信度图,包括:
操作302,对深度预测图进行分块处理,得到至少两个子块。
具体地,ISP处理器或中央处理器可将深度预测图进行连通域分块。进一步地,可将深度预测图按照不同的深度将连通域分为不同的子块,可得到至少两个子块。
操作304,确定至少两个子块中的每个子块与该主体识别图的重叠区域,并确定该每个子块对应的重叠区域的加权置信度。
其中,重叠区域是指子块和该主体识别图中相同的区域。置信度也称可靠度、置信水平或置信系数,是指总体参数值落在样本统计值某一区内的概率。加权置信度是指赋予加权因子之后的置信度。
具体地,ISP处理器或中央处理器确定至少两个子块中的每个子块与该主体识别图的重叠区域,可将每个子块与该主体识别图做与运算,并保留每个子块在该主体识别图中的区域,即重叠区域。接着,ISP处理器或中央处理器可计算出每个子块保留在主体识别图中的区域,即重叠区域的加权置信度,得到每个子块对应的重叠区域的加权置信度。
操作306,根据加权置信度生成主体区域置信度图。
具体地,ISP处理器或中央处理器可根据每个子块对应的重叠区域的加权置信度生成主体区域置信度图。
本实施例中的图像处理方法,对深度预测图进行分块处理,得到至少两个子块,确定至少两个子块中的每个子块与该主体识别图的重叠区域,并确定该每个子块对应的重叠区域的加权置信度,根据加权置信度生成主体区域置信度图,可以得到深度预测图和主体识别图的融合后的主体区域置信度图。结合深度预测图和主体识别图识别图像的主体,提高了主体识别的精度和准确性。
在一个实施例中,该确定该每个子块对应的重叠区域的加权置信度,包括:确定该每个子块对应的重叠区域的面积和该每个子块的深度;获取加权因子,根据该加权因子、该每个子块对应的重叠区域的面积和该每个子块的深度,得到该每个子块对应的重叠区域的加权置信度。
具体地,ISP处理器或中央处理器可确定该每个子块保留在主体识别图中的区域的面积,即每个子块和该主体识别图对应的重叠区域的面积。接着,ISP处理器或中央处理器可获取该每个子块的深度,并获取加权因子,根据加权因子、一个子块的深度和该子块对 应的重叠区域的面积,计算得到该子块对应的重叠区域的加权置信度。进一步地,按照相同的方式可计算出每个子块对应的重叠区域的加权置信度。
在本实施例中,每个子块对应的重叠区域的加权置信度与每个子块对应的重叠区域的面积呈正相关。当子块对应的重叠区域的面积越大,则计算得到的该子块对应的重叠区域的加权置信度也越大。
在本实施例中,每个子块对应的重叠区域的加权置信度与每个子块的深度呈正相关。当子块深度越大,则计算得到的该子块对应的重叠区域的加权置信度也越大。
在本实施例中,ISP处理器或中央处理器可计算出每个子块对应的重叠区域的面积分别与加权因子的乘积,并将每个子块对应的乘积与每个子块的深度对应相加,可得到每个子块对应的重叠区域的加权置信度。
例如,融合算法为F=λS+d,其中,F为加权置信度,λ为加权因子,S为一个子块和主体识别图的重叠区域的面积,d为子块的深度。ISP处理器或中央处理器可根据该融合算法计算得到每个子块对应的重叠区域的加权置信度。
本实施例中,通过确定该每个子块对应的重叠区域的面积和该每个子块的深度,获取加权因子,根据该加权因子、该每个子块对应的重叠区域的面积和该每个子块的深度,得到该每个子块对应的重叠区域的加权置信度,使得主体区域变得更精细可控。通过深度图和主体检测图融合可以更加准确的识别出可见光图中的目标主体。该方案可应用于单目相机图像虚化或辅助自动对焦等场景。
在一个实施例中,ISP处理器或中央处理器可获取子块的重叠区域的面积对应的第一加权因子,及子块的深度对应的第二加权因子。每个子块对应的重叠区域的加权置信度与每个子块对应的重叠区域的面积呈正相关,与每个子块的深度也呈正相关。
进一步地,ISP处理器或中央处理器可计算出每个子块对应的重叠区域的面积分别与第一加权因子的乘积,并计算出每个子块的深度分别与第二加权因子的乘积,并将每个子块相对应的两个乘积相加,可得到每个子块对应的重叠区域的加权置信度。例如,融合算法为F=λ 1S+λ 2d,其中,F为加权置信度,λ 1为第一加权因子,λ 2为第一加权因子,S为一个子块和主体识别图的重叠区域的面积,d为一个子块的深度。ISP处理器或中央处理器可根据该融合算法计算得到每个子块对应的重叠区域的加权置信度。
在一个实施例中,该根据该主体区域置信度图确定该可见光图中的目标主体,包括:
操作402,对该主体区域置信度图进行处理,得到主体掩膜图。
具体地,主体区域置信度图中存在一些置信度较低、零散的点,可通过ISP处理器或中央处理器对主体区域置信度图进行过滤处理,得到主体掩膜图。该过滤处理可采用配置置信度阈值,将主体区域置信度图中置信度值低于置信度阈值的像素点过滤。该置信度阈值可采用自适应置信度阈值,也可以采用固定阈值,也可以采用分区域配置对应的阈值。
操作404,检测该可见光图,确定该可见光图中的高光区域。
其中,高光区域是指亮度值大于亮度阈值的区域。
具体地,ISP处理器或中央处理器对可见光图进行高光检测,筛选得到亮度值大于亮度阈值的目标像素点,对目标像素点采用连通域处理得到高光区域。
操作406,根据该可见光图中的高光区域与该主体掩膜图,确定该可见光图中消除高光的目标主体。
具体地,ISP处理器或中央处理器可将可见光图中的高光区域与该主体掩膜图做差分计算或逻辑与计算得到可见光图中消除高光的目标主体。
本实施例中,对主体区域置信度图做过滤处理得到主体掩膜图,提高了主体区域置信度图的可靠性,对可见光图进行检测得到高光区域,然后与主体掩膜图进行处理,可得到 消除了高光的目标主体,针对影响主体识别精度的高光、高亮区域单独采用滤波器进行处理,提高了主体识别的精度和准确性。
在一个实施例中,该对该主体区域置信度图进行处理,得到主体掩膜图,包括:对该主体区域置信度图进行自适应置信度阈值过滤处理,得到主体掩膜图。
其中,自适应置信度阈值是指置信度阈值。自适应置信度阈值可为局部自适应置信度阈值。该局部自适应置信度阈值是根据像素点的领域块的像素值分布来确定该像素点位置上的二值化置信度阈值。亮度较高的图像区域的二值化置信度阈值配置的较高,亮度较低的图像区域的二值化阈值置信度配置的较低。
可选地,自适应置信度阈值的配置过程包括:当像素点的亮度值大于第一亮度值,则配置第一置信度阈值,当像素点的亮度值小于第二亮度值,则配置第二置信度阈值,当像素点的亮度值大于第二亮度值且小于第一亮度值,则配置第三置信度阈值,其中,第二亮度值小于或等于第一亮度值,第二置信度阈值小于第三置信度阈值,第三置信度阈值小于第一置信度阈值。
可选地,自适应置信度阈值的配置过程包括:当像素点的亮度值大于第一亮度值,则配置第一置信度阈值,当像素点的亮度值小于或等于第一亮度值,则配置第二置信度阈值,其中,第二亮度值小于或等于第一亮度值,第二置信度阈值小于第一置信度阈值。
对主体区域置信度图进行自适应置信度阈值过滤处理时,将主体区域置信度图中各像素点的置信度值与对应的置信度阈值比较,大于或等于置信度阈值则保留该像素点,小于置信度阈值则去掉该像素点,可去除不必要的信息,保留关键信息。
在一个实施例中,该对该主体区域置信度图进行自适应置信度阈值过滤处理,得到主体掩膜图,包括:
对该主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图;对该二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
具体地,ISP处理器或中央处理器将主体区域置信度图按照自适应置信度阈值过滤处理后,将保留的像素点的置信度值采用1表示,去掉的像素点的置信度值采用0表示,得到二值化掩膜图。
形态学处理可包括腐蚀和膨胀。可先对二值化掩膜图进行腐蚀操作,再进行膨胀操作,去除噪声;再对形态学处理后的二值化掩膜图进行引导滤波处理,实现边缘滤波操作,得到边缘提取的主体掩膜图。
通过形态学处理和引导滤波处理可以保证得到的主体掩膜图的噪点少或没有噪点,边缘更加柔和。
在一个实施例中,该根据该可见光图中的高光区域与该主体掩膜图,确定该可见光图中消除高光的目标主体,包括:将该可见光图中的高光区域与该主体掩膜图做差分处理,得到消除高光的目标主体。
具体地,ISP处理器或中央处理器将该可见光图中的高光区域与该主体掩膜图做差分处理,即可见光图和主体掩膜图中对应的像素值相减,得到该可见光图中的目标主体。通过差分处理得到去除高光的目标主体,计算方式简单。
在一个实施例中,该主体检测模型的训练方式,包括:
获取同一场景的可见光图、深度图和已标注的主体掩膜图;将该可见光图作用于包含初始网络权重的主体检测模型的主体识别层,并将该可见光图作用于该包含初始网络权重的主体检测模型的深度预测层,将该深度图和该已标注的主体掩膜图作为该主体检测模型输出的真实值,对该包含初始网络权重的主体检测模型进行训练,得到该主体检测模型的目标网络权重。
可收集一个场景的可见光图、深度图和对应的已标注的主体掩膜图。对可见光图和深度图进行语义级的标注,标注里面的主体。可收集大量的可见光图,然后基于COCO数 据集中的前景目标图和简单的背景图进行融合得到大量的纯色背景或简单背景的图像,作为训练的可见光图。COCO数据集中包含数量众多的前景目标。
主体检测模型的网络结构采用基于mobile-Unet的架构,并在decoder部分增加层之间的桥接,使高级语义特征在上采样时更充分的传递。中心权重图作用于主体监测模型的输出层,引入中心注意力机制,让处于画面中心的对象更容易被检测为主体。
主体检测模型包括输入层、主体识别层、深度预测层和输出层。主体识别层的网络结构包括卷积层(conv)、池化层(pooling)、双线性插值层(Bilinear Up sampling)、卷积特征连接层(concat+conv)、输出层等。在双线性插值层和卷积特征连接层之间采用deconvolution+add(反卷积特征叠加)操作实现桥接,使得高级语义特征在上采样时更充分的传递。卷积层、池化层、双线性插值层、卷积特征连接层等可为主体检测模型的中间层。深度预测层的网络结构包括卷积层(conv)、池化层(pooling)等。
初始网络权重是指初始化的深度学习网络模型的每一层的初始权重。在模型训练过程中,该初始网络权重不断迭代更新,从而得到目标网络权重。目标网络权重是指训练得到的能够检测图像主体的深度学习网络模型的每一层的权重。在本实施例中,该初始网络权重为初始化的主体检测模型中每一层的初始权重。该目标网络权重是指训练得到的能够检测图像主体的主体检测模型中每一层的权重。可通过预设训练次数得到目标网络权重,也可以设置深度学习网络模型的损失函数。当训练得到损失函数值小于损失阈值时,将主体检测模型的当前网络权重作为目标网络权重。
图5为一个实施例中主体检测模型的网络结构示意图。如图5所示,主体检测模型的主体识别层的网络结构包括卷积层502、池化层504、卷积层506、池化层508、卷积层510、池化层512、卷积层514、池化层516、卷积层518、卷积层520、双线性插值层522、卷积层524、双线性插值层526、卷积层528、卷积特征连接层530、双线性插值层532、卷积层534、卷积特征连接层536、双线性插值层538、卷积层540、卷积特征连接层542等,卷积层502作为主体识别层的输入层,卷积特征连接层542作为主体识别层的输出层。
该主体检测模型的编码部分包括卷积层502、池化层504、卷积层506、池化层508、卷积层510、池化层512、卷积层514、池化层516、卷积层518,解码部分包括卷积层520、双线性插值层522、卷积层524、双线性插值层526、卷积层528、卷积特征连接层530、双线性插值层532、卷积层534、卷积特征连接层536、双线性插值层538、卷积层540、卷积特征连接层542。卷积层506和卷积层534级联(Concatenation),卷积层510和卷积层528级联,卷积层514与卷积层524级联。双线性插值层522和卷积特征连接层530采用反卷积特征叠加(Deconvolution+add)桥接。双线性插值层532和卷积特征连接层536采用反卷积特征叠加桥接。双线性插值层538和卷积特征连接层542采用反卷积特征叠加桥接。
主体检测模型的深度预测层的网络结构包括卷积层552、池化层554、卷积层556、池化层558、卷积层560、池化层562、卷积层564、池化层566、卷积层568、池化层570、卷积层572、池化层574、卷积层576、池化层578。其中,卷积层552作为深度预测层的输入层,池化层578作为深度预测层的输出层。卷积层564、池化层566、卷积层568、池化层570、卷积层572、池化层574、卷积层576、池化层578的输出的特征大小相同。
可以理解的是,本实施例中的主体检测模型的的主体识别层的网络结构和深度预测层的网络机构仅为示例,不作为对本申请的限制。可以理解的是,主体检测模型的网络结构中的卷积层、池化层、双线性插值层、卷积特征连接层等均可以根据需要设置多个。
原图500(如可见光图)输入到主体检测模型的主体识别层的卷积层502,同时将原图500(如可见光图)输入到主体检测模型的深度预测层的卷积层552。经过处理,主体识别层的卷积特征连接层542输出主体识别图580,深度预测层的池化层578输出深度预测图590。
该主体检测模型的训练过程中对深度图采用预设数值的丢失率。该预设数值可为50%。深度图的训练过程中引入概率的dropout,让主体检测模型可以充分的挖掘深度图的信息,当主体检测模型无法获取深度图时,仍然可以输出准确结果。对深度图输入采用dropout的方式,让主体检测模型对深度图的鲁棒性更好,即使没有深度图也可以准确分割主体区域。
此外,因正常的电子设备拍摄过程中,深度图的拍摄和计算都相当耗时耗力,难以获取,在训练时深度图设计为50%的dropout概率,能够保证没有深度信息的时候主体检测模型依然可以正常检测。
本实施例通过设计一个双深度学习网络结构,其中一个深度学习网络结构用于对RGB图进行处理得到深度预测图,另一个深度学习网络结构用于对RGB图进行处理,得到主体识别图,然后将两个深度学习网络结构的输出进行卷积特征连接,即将深度预测图和主体识别图进行融合然后再输出,可准确识别可见光图像中的目标主体。
在一个实施例中,根据同一场景的预设条件训练得到主体检测模型,包括:获取同一场景的可见光图、深度图和已标注的主体掩膜图;生成与该可见光图对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小;将该可见光图作用于包含初始网络权重的主体检测模型的输入层,将该深度图和该中心权重图作用于初始的主体检测模型的输出层,将该已标注的主体掩膜图作为该主体检测模型输出的真实值,对该包含初始网络权重的主体检测模型进行训练,得到该主体检测模型的目标网络权重。
在一个实施例中,如图6所示,当该主体检测模型是预先根据同一场景的可见光图、中心权重图、深度图及对应的已标注的主体掩膜图进行训练得到的模型时,该方法还包括:
操作602,生成与该可见光图对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小。
其中,中心权重图是指用于记录可见光图中各个像素点的权重值的图。中心权重图中记录的权重值从中心向四边逐渐减小,即中心权重最大,向四边权重逐渐减小。通过中心权重图表征可见光图的图像中心像素点到图像边缘像素点的权重值逐渐减小。
ISP处理器或中央处理器可以根据可见光图的大小生成对应的中心权重图。该中心权重图所表示的权重值从中心向四边逐渐减小。中心权重图可采用高斯函数、或采用一阶方程、或二阶方程生成。该高斯函数可为二维高斯函数。
操作606,将该中心权重图作用于该主体检测模型的输出层。
该融合该主体识别图和该深度预测图,得到主体区域置信度图,包括:
操作608,对该中心权重图、该主体识别图和该深度预测图进行融合,得到主体区域置信度图。
具体地,主体检测模型的主体识别层输出主体识别图,主体检测模型的深度预测层输出该深度预测图后,ISP处理器或中央处理器将该中心权重图作用于该主体检测模型的输出层,通过输出层对该中心权重图、该主体识别图和该深度预测图进行融合,得到主体区域置信度图。
本实施例中的图像处理方法,获取可见光图,并生成与可见光图对应的中心权重图后,将可见光图输入到主体检测模型的主体识别层和深度预测层中检测,得到主体识别图和深度预测图。将中心权重图作用于该主体检测模型的输出层,与主体识别图及深度预测图结合进行处理,可以得到主体区域置信度图,根据主体区域置信度图可以确定得到可见光图中的目标主体,利用中心权重图可以让图像中心的对象更容易被检测,利用训练好的利用可见光图、中心权重图和主体掩膜图等训练得到的主体检测模型,可以更加准确的识别出可见光图中的目标主体。
在一个实施例中,上述图像处理方法还包括:当存在多个主体时,根据每个主体所属类别的优先级、每个主体在可见光图中所占的面积、每个主体在所述可见光图中的位置中 的至少一种,确定目标主体。
其中,类别是指对主体所分的类,如人像、花、动物、风景等类别。位置是指在可见光图中的位置,可以采用坐标表示。
具体地,当存在多个主体时,获取每个主体所属类别的优先级,选取优先级最高或次高等的主体作为目标主体。
当存在多个主体时,获取每个主体在可见光图中所占的面积,选取在可见光图中所占面积最大或次大等的主体作为目标主体。
当存在多个主体时,获取每个主体在可见光图中的位置,选取主体在可见光图中的位置与该可见光图的中心点之间的距离最小的主体为目标主体。
当存在多个主体所属类别的优先级相同且最高时,获取该多个主体在可见光图中所占的面积,选取在可见光图中所占面积最大或次大的主体作为目标主体。
当存在多个主体所属类别的优先级相同且最高时,获取该优先级相同且最高的多个主体中每个主体在可见光图中所占的面积,选取在可见光图中所占面积最大或次大的主体作为目标主体。
当存在多个主体所属类别的优先级相同且最高时,获取该优先级相同且最高的多个主体中每个主体在可见光图中的位置,选取主体在可见光图中的位置与该可见光图的中心点之间的距离最小的主体为目标主体。
当存在多个主体所属类别的优先级相同且最高,获取该优先级相同且最高的多个主体中每个主体在可见光图中所占的面积,存在多个主体在可见光图中所占的面积相同时,获取面积相同的多个主体在可见光图中的位置,选取主体在可见光图中的位置与该可见光图的中心点之间的距离最小的主体为目标主体。
当存在多个主体时,可以获取每个主体所属类别的优先级、每个主体在可见光图中所占的面积、每个主体在可见光图中的位置,可以按照优先级、面积和位置三个维度筛选,优先级、面积和位置筛选的顺序可根据需要设定,不作限定。
本实施例中,当存在多个主体时,根据主体所属类别的优先级、主体在可见光图中面积和主体在可见光图中位置中的一种或至少两种进行筛选确定目标主体,可以准确确定目标主体。
在一个实施例中,上述图像处理方法还包括:当确定存在多个主体,且该多个主体均为人脸时,判断多个人脸是否在同一平面;
当该多个人脸处于同一平面时,将该多个人脸作为目标主体;
当该多个人脸处于不同平面时,选择面积最大的人脸作为目标主体。
具体地,可获取每个人脸的深度信息,通过比较深度信息是否相同来确定多个人脸是否在同一平面上,当深度信息相同时,则在同一平面,当深度信息不同时,则不在同一平面。人脸的深度信息可采用人脸所在区域的每个像素点的深度信息的平均值、中值或加权值等表示。人脸的深度信息也可采用人脸所在区域的每个像素点按照预设函数计算得到深度信息。该预设函数可为线性函数、指数函数或幂函数等。
图7为一个实施例中图像处理效果示意图。如图7所示,RGB图702中存在一只蝴蝶,将RGB图输入到主体检测模型后得到主体区域置信度图704,然后对主体区域置信度图704进行滤波和二值化得到二值化掩膜图706,再对二值化掩膜图706进行形态学处理和引导滤波实现边缘增强,得到主体掩膜图708。
在一个实施例中,提供了一种图像处理方法,包括:
操作(a1),获取同一场景的可见光图、深度图和已标注的主体掩膜图。
操作(a2),将该可见光图作用于包含初始网络权重的主体检测模型的主体识别层,并将该可见光图作用于该包含初始网络权重的主体检测模型的深度预测层,将该深度图和该已标注的主体掩膜图作为该主体检测模型输出的真实值,对该包含初始网络权重的主体 检测模型进行训练,得到该主体检测模型的目标网络权重。
操作(a3),获取可见光图。
操作(a4),将该可见光图输入主体检测模型的主体识别层中,得到主体识别图。其中,该主体检测模型是预先根据同一场景的可见光图、深度图及对应的已标注的主体掩膜图进行训练得到的模型。
操作(a5),将该可见光图输入该主体检测模型的深度预测层中,得到深度预测图。
操作(a6),对该深度预测图进行分块处理,得到至少两个子块。
操作(a7),确定该至少两个子块中的每个子块与该主体识别图的重叠区域,确定该每个子块对应的重叠区域的面积和该每个子块的深度。
操作(a8),获取加权因子,根据该加权因子、该每个子块对应的重叠区域的面积和该每个子块的深度,得到该每个子块对应的重叠区域的加权置信度。
操作(a9),根据该加权置信度生成主体区域置信度图。
操作(a10),对该主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图。
操作(a11),对该二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
操作(a12),检测该可见光图,确定该可见光图中的高光区域。
操作(a13),根据该可见光图中的高光区域与该主体掩膜图,确定该可见光图中消除高光的目标主体。
本实施例中的图像处理方法,对RGB图像进行主体检测时,通过双路网络对该RGB图像进行识别,并引入了中心权重图,使得深度特征增强和中心注意力特征增强,不仅可以准确的分割简单场景,如主体单一,背景区域对比度不高的场景下的主体,更大大提高了复杂场景下的目标主体识别准确度。引入深度图可以解决传统目标检测方法对自然图像***的目标鲁棒性较差的问题。针对影响主体识别精度的高光、高亮区域,采用了高光检测识别出RGB图像中的高光区域,然后采用单独的滤波器进行过滤处理。
应该理解的是,虽然图2-图6的流程图中的各个操作按照箭头的指示依次显示,但是这些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,图2-图6中的至少一部分操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。
图8为一个实施例的图像处理装置的结构框图。如图8所示,一种图像处理装置,包括:获取模块802、识别模块804、预测模块806、融合模块808和确定模块810。其中,
获取模块802,用于获取可见光图。
识别模块804,用于将该可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,该主体检测模型是根据同一场景的预设条件进行训练得到的模型。
预测模块806,用于将该可见光图输入该主体检测模型的深度预测层中,得到深度预测图。
融合模块808,用于融合该主体识别图和该深度预测图,得到主体区域置信度图。
确定模块810,用于根据该主体区域置信度图确定该可见光图中的目标主体。
本实施例中的图像处理装置,获取可见光图,将可见光图输入主体检测模型的主体识别层中,可以得到主体识别图,从而初步识别出可见光图中的主体。将可见光图输入主体检测模型的深度预测层中,可以得到可见光图对应的深度图。通过双路网络识别得到深度图和主体识别图,再融合主体识别图和深度预测图,得到主体区域置信度图,根据主体区域置信度图可以确定可见光图中的目标主体,利用可见光图、深度图和主体掩膜图等训练 得到的主体检测模型,或者利用可见光图、中心权重图、深度图和主体掩膜图等训练得到的主体检测模型,可以更加准确的识别出可见光图中的目标主体。
在一个实施例中,融合模块808还用于:对该深度预测图进行分块处理,得到至少两个子块;确定该至少两个子块中的每个子块与该主体识别图的重叠区域,并确定该每个子块对应的重叠区域的加权置信度;根据该加权置信度生成主体区域置信度图。
本实施例中的图像处理装置,对深度预测图进行分块处理,得到至少两个子块,确定至少两个子块中的每个子块与该主体识别图的重叠区域,并确定该每个子块对应的重叠区域的加权置信度,根据加权置信度生成主体区域置信度图,可以得到深度预测图和主体识别图的融合后的主体区域置信度图。结合深度预测图和主体识别图识别图像的主体,提高了主体识别的精度和准确性。
在一个实施例中,融合模块808还用于:确定该每个子块对应的重叠区域的面积和该每个子块的深度;获取加权因子,根据该加权因子、该每个子块对应的重叠区域的面积和该每个子块的深度,得到该每个子块对应的重叠区域的加权置信度。通过确定该每个子块对应的重叠区域的面积和该每个子块的深度,获取加权因子,根据该加权因子、该每个子块对应的重叠区域的面积和该每个子块的深度,得到该每个子块对应的重叠区域的加权置信度,使得主体区域变得更精细可控。通过深度图和主体检测图融合可以更加准确的识别出可见光图中的目标主体。
在一个实施例中,确定模块810还用于:对该主体区域置信度图进行处理,得到主体掩膜图;检测该可见光图,确定该可见光图中的高光区域;根据该可见光图中的高光区域与该主体掩膜图,确定该可见光图中消除高光的目标主体。对主体区域置信度图做过滤处理得到主体掩膜图,提高了主体区域置信度图的可靠性,对可见光图进行检测得到高光区域,然后与主体掩膜图进行处理,可得到消除了高光的目标主体,针对影响主体识别精度的高光、高亮区域单独采用滤波器进行处理,提高了主体识别的精度和准确性。
在一个实施例中,确定模块810还用于:对该主体区域置信度图进行自适应置信度阈值过滤处理,得到主体掩膜图。对主体区域置信度图进行自适应置信度阈值过滤处理时,将主体区域置信度图中各像素点的置信度值与对应的置信度阈值比较,大于或等于置信度阈值则保留该像素点,小于置信度阈值则去掉该像素点,可去除不必要的信息,保留关键信息。
在一个实施例中,确定模块810还用于:对该主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图;对该二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。通过形态学处理和引导滤波处理可以保证得到的主体掩膜图的噪点少或没有噪点,边缘更加柔和。
在一个实施例中,该图像处理装置还包括:训练模块。该训练模块用于:获取同一场景的可见光图、深度图和已标注的主体掩膜图;将该可见光图作用于包含初始网络权重的主体检测模型的主体识别层,并将该可见光图作用于该包含初始网络权重的主体检测模型的深度预测层,将该深度图和该已标注的主体掩膜图作为该主体检测模型输出的真实值,对该包含初始网络权重的主体检测模型进行训练,得到该主体检测模型的目标网络权重。
在一个实施例中,当该主体检测模型是预先根据同一场景的可见光图、中心权重图、深度图及对应的已标注的主体掩膜图进行训练得到的模型时,该装置还包括:生成模块。
该生成模块用于:生成与该可见光图对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小;将该中心权重图作用于该主体检测模型的输出层;
该融合模块还用于:对该中心权重图、该主体识别图和该深度预测图进行融合,得到主体区域置信度图。
本实施例通过设计一个双深度学习网络结构,其中一个深度学习网络结构用于对RGB图进行处理得到深度预测图,另一个深度学习网络结构用于对RGB图进行处理,得到主 体识别图,然后将两个深度学习网络结构的输出进行卷积特征连接,即将深度预测图和主体识别图进行融合然后再输出,可准确识别可见光图像中的目标主体。
上述图像处理装置中各个模块的划分仅用于举例说明,在其他实施例中,可将图像处理装置按照需要划分为不同的模块,以完成上述图像处理装置的全部或部分功能。
图9为一个实施例中电子设备的内部结构示意图。如图9所示,该电子设备包括通过***总线连接的处理器和存储器。其中,该处理器用于提供计算和控制能力,支撑整个电子设备的运行。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作***和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种图像处理方法。内存储器为非易失性存储介质中的操作***计算机程序提供高速缓存的运行环境。该电子设备可以是手机、平板电脑或者个人数字助理或穿戴式设备等。
本申请实施例中提供的图像处理装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在终端或服务器上运行。该计算机程序构成的程序模块可存储在终端或服务器的存储器上。该计算机程序被处理器执行时,实现本申请实施例中所描述方法的操作。
本申请实施例还提供了一种计算机可读存储介质。一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行图像处理方法的操作。
一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行图像处理方法。
本申请实施例所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种图像处理方法,其特征在于,包括:
    获取可见光图;
    将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
    将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
    融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
    根据所述主体区域置信度图确定所述可见光图中的目标主体。
  2. 根据权利要求1所述的方法,其特征在于,所述融合所述主体识别图和所述深度预测图,得到主体区域置信度图,包括:
    对所述深度预测图进行分块处理,得到至少两个子块;
    确定所述至少两个子块中的每个子块与所述主体识别图的重叠区域,并确定所述每个子块对应的重叠区域的加权置信度;
    根据所述加权置信度生成主体区域置信度图。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述每个子块对应的重叠区域的加权置信度,包括:
    确定所述每个子块对应的重叠区域的面积和所述每个子块的深度;
    获取加权因子,根据所述加权因子、所述每个子块对应的重叠区域的面积和所述每个子块的深度,得到所述每个子块对应的重叠区域的加权置信度。
  4. 根据权利要求3所述的方法,其特征在于,所述获取加权因子,根据所述加权因子、所述每个子块对应的重叠区域的面积和所述每个子块的深度,得到所述每个子块对应的重叠区域的加权置信度,包括:
    获取所述每个子块的重叠区域的面积对应的第一加权因子,及所述每个子块的深度对应的第二加权因子;
    根据所述第一加权因子、所述第二加权因子、所述每个子块对应的重叠区域的面积和所述每个子块的深度,得到所述每个子块对应的重叠区域的加权置信度。
  5. 根据权利要求3所述的方法,其特征在于,所述子块对应的重叠区域的加权置信度与所述子块对应的重叠区域的面积呈正相关。
  6. 根据权利要求3所述的方法,其特征在于,所述子块对应的重叠区域的加权置信度与所述子块深度呈正相关。
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述主体区域置信度图确定所述可见光图中的目标主体,包括:
    对所述主体区域置信度图进行处理,得到主体掩膜图;
    检测所述可见光图,确定所述可见光图中的高光区域;
    根据所述可见光图中的高光区域与所述主体掩膜图,确定所述可见光图中消除高光的目标主体。
  8. 根据权利要求7所述的方法,其特征在于,所述对所述主体区域置信度图进行处理,得到主体掩膜图,包括:
    对所述主体区域置信度图进行自适应置信度阈值过滤处理,得到主体掩膜图。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述主体区域置信度图进行自适应置信度阈值过滤处理,得到主体掩膜图,包括:
    对所述主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图;
    对所述二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
  10. 根据权利要求7所述的方法,其特征在于,所述根据所述可见光图中的高光区域 与所述主体掩膜图,确定所述可见光图中消除高光的目标主体,包括:
    将所述可见光图中的高光区域与所述主体掩膜图做差分处理,得到消除高光的目标主体。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述主体检测模型的训练方式,包括:
    获取同一场景的可见光图、深度图和已标注的主体掩膜图;
    将所述可见光图作用于包含初始网络权重的主体检测模型的主体识别层,并将所述可见光图作用于所述包含初始网络权重的主体检测模型的深度预测层,将所述深度图和所述已标注的主体掩膜图作为所述主体检测模型输出的真实值,对所述包含初始网络权重的主体检测模型进行训练,得到所述主体检测模型的目标网络权重,所述初始网络权重为初始化的主体检测模型中每层的初始权重。
  12. 根据权利要求1所述的方法,其特征在于,当所述主体检测模型是预先根据同一场景的可见光图、中心权重图、深度图及对应的已标注的主体掩膜图进行训练得到的模型时,所述方法还包括:
    生成与所述可见光图对应的中心权重图,其中,所述中心权重图所表示的权重值从中心到边缘逐渐减小;
    将所述中心权重图作用于所述主体检测模型的输出层;
    所述融合所述主体识别图和所述深度预测图,得到主体区域置信度图,包括:
    对所述中心权重图、所述主体识别图和所述深度预测图进行融合,得到主体区域置信度图。
  13. 一种图像处理装置,其特征在于,包括:
    获取模块,用于获取可见光图;
    识别模块,用于将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
    预测模块,用于将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
    融合模块,用于融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
    确定模块,用于根据所述主体区域置信度图确定所述可见光图中的目标主体。
  14. 一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
    获取可见光图;
    将所述可见光图输入主体检测模型的主体识别层中,得到主体识别图;其中,所述主体检测模型是根据同一场景的预设条件进行训练得到的模型;
    将所述可见光图输入所述主体检测模型的深度预测层中,得到深度预测图;
    融合所述主体识别图和所述深度预测图,得到主体区域置信度图;
    根据所述主体区域置信度图确定所述可见光图中的目标主体。
  15. 根据权利要求14所述的移动终端,其特征在于,所述处理器执行所述融合所述主体识别图和所述深度预测图,得到主体区域置信度图时,还执行如下操作:
    对所述深度预测图进行分块处理,得到至少两个子块;
    确定所述至少两个子块中的每个子块与所述主体识别图的重叠区域,并确定所述每个子块对应的重叠区域的加权置信度;
    根据所述加权置信度生成主体区域置信度图。
  16. 根据权利要求15所述的移动终端,其特征在于,所述处理器执行所述确定所述每个子块对应的重叠区域的加权置信度时,还执行如下操作:
    确定所述每个子块对应的重叠区域的面积和所述每个子块的深度;
    获取加权因子,根据所述加权因子、所述每个子块对应的重叠区域的面积和所述每个子块的深度,得到所述每个子块对应的重叠区域的加权置信度。
  17. 根据权利要求16所述的移动终端,其特征在于,所述处理器执行所述获取加权因子,根据所述加权因子、所述每个子块对应的重叠区域的面积和所述每个子块的深度,得到所述每个子块对应的重叠区域的加权置信度时,还执行如下操作:
    获取所述每个子块的重叠区域的面积对应的第一加权因子,及所述每个子块的深度对应的第二加权因子;
    根据所述第一加权因子、所述第二加权因子、所述每个子块对应的重叠区域的面积和所述每个子块的深度,得到所述每个子块对应的重叠区域的加权置信度。
  18. 根据权利要求15所述的移动终端,其特征在于,所述处理器执行所述根据所述主体区域置信度图确定所述可见光图中的目标主体时,还执行如下操作:
    对所述主体区域置信度图进行处理,得到主体掩膜图;
    检测所述可见光图,确定所述可见光图中的高光区域;
    根据所述可见光图中的高光区域与所述主体掩膜图,确定所述可见光图中消除高光的目标主体。
  19. 根据权利要求18所述的移动终端,其特征在于,所述处理器执行所述对所述主体区域置信度图进行处理,得到主体掩膜图时,还执行如下操作:
    对所述主体区域置信度图进行自适应置信度阈值过滤处理,得到主体掩膜图。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至12中任一项所述的方法的步骤。
PCT/CN2020/102023 2019-08-07 2020-07-15 图像处理方法和装置、电子设备、计算机可读存储介质 WO2021022983A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910726785.3 2019-08-07
CN201910726785.3A CN110473185B (zh) 2019-08-07 2019-08-07 图像处理方法和装置、电子设备、计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021022983A1 true WO2021022983A1 (zh) 2021-02-11

Family

ID=68511544

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/102023 WO2021022983A1 (zh) 2019-08-07 2020-07-15 图像处理方法和装置、电子设备、计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110473185B (zh)
WO (1) WO2021022983A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118127A (zh) * 2021-10-15 2022-03-01 北京工业大学 一种视觉场景标志检测与识别方法及装置
CN116664567A (zh) * 2023-07-26 2023-08-29 山东艾迈科思电气有限公司 一种固体绝缘开关柜质量评估方法及***

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473185B (zh) * 2019-08-07 2022-03-15 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备、计算机可读存储介质
CN111008604A (zh) * 2019-12-09 2020-04-14 上海眼控科技股份有限公司 预测图像获取方法、装置、计算机设备和存储介质
CN111368698B (zh) * 2020-02-28 2024-01-12 Oppo广东移动通信有限公司 主体识别方法、装置、电子设备及介质
CN111311520B (zh) * 2020-03-12 2023-07-18 Oppo广东移动通信有限公司 图像处理方法、装置、终端及存储介质
CN113705285A (zh) * 2020-05-22 2021-11-26 珠海金山办公软件有限公司 主体识别方法、装置、及计算机可读存储介质
CN111709886B (zh) * 2020-05-27 2023-04-18 杭州电子科技大学 一种基于u型空洞残差网络的图像去高光方法
CN112184700B (zh) * 2020-10-21 2022-03-18 西北民族大学 基于单目相机的农业无人车障碍物感知方法及装置
CN112258528B (zh) * 2020-11-02 2024-05-14 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备
CN112801076B (zh) * 2021-04-15 2021-08-03 浙江大学 基于自注意力机制的电子商务视频高光检测方法及***
CN113066115B (zh) * 2021-04-28 2022-03-25 北京的卢深视科技有限公司 深度预测网络训练方法、装置、服务器和可读存储介质
CN116778431B (zh) * 2023-08-25 2023-11-10 青岛娄山河水务有限公司 基于计算机视觉的污泥处理自动监测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090129674A1 (en) * 2007-09-07 2009-05-21 Yi-Chun Lin Device and method for obtaining clear image
CN108307116A (zh) * 2018-02-07 2018-07-20 腾讯科技(深圳)有限公司 图像拍摄方法、装置、计算机设备和存储介质
CN108900769A (zh) * 2018-07-16 2018-11-27 Oppo广东移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质
CN110473185A (zh) * 2019-08-07 2019-11-19 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备、计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8498453B1 (en) * 2009-09-30 2013-07-30 Lifetouch, Inc. Evaluating digital images using head points
US11461912B2 (en) * 2016-01-05 2022-10-04 California Institute Of Technology Gaussian mixture models for temporal depth fusion
CN107301380A (zh) * 2017-06-01 2017-10-27 华南理工大学 一种用于视频监控场景中行人重识别的方法
CN108334830B (zh) * 2018-01-25 2022-10-04 南京邮电大学 一种基于目标语义和深度外观特征融合的场景识别方法
CN108520219B (zh) * 2018-03-30 2020-05-12 台州智必安科技有限责任公司 一种卷积神经网络特征融合的多尺度快速人脸检测方法
CN110046599A (zh) * 2019-04-23 2019-07-23 东北大学 基于深度融合神经网络行人重识别技术的智能监控方法
CN110097568B (zh) * 2019-05-13 2023-06-09 中国石油大学(华东) 一种基于时空双分支网络的视频对象检测与分割方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090129674A1 (en) * 2007-09-07 2009-05-21 Yi-Chun Lin Device and method for obtaining clear image
CN108307116A (zh) * 2018-02-07 2018-07-20 腾讯科技(深圳)有限公司 图像拍摄方法、装置、计算机设备和存储介质
CN108900769A (zh) * 2018-07-16 2018-11-27 Oppo广东移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质
CN110473185A (zh) * 2019-08-07 2019-11-19 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备、计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118127A (zh) * 2021-10-15 2022-03-01 北京工业大学 一种视觉场景标志检测与识别方法及装置
CN114118127B (zh) * 2021-10-15 2024-05-21 北京工业大学 一种视觉场景标志检测与识别方法及装置
CN116664567A (zh) * 2023-07-26 2023-08-29 山东艾迈科思电气有限公司 一种固体绝缘开关柜质量评估方法及***
CN116664567B (zh) * 2023-07-26 2023-09-29 山东艾迈科思电气有限公司 一种固体绝缘开关柜质量评估方法及***

Also Published As

Publication number Publication date
CN110473185A (zh) 2019-11-19
CN110473185B (zh) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2021022983A1 (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
US11457138B2 (en) Method and device for image processing, method for training object detection model
CN110428366B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN110248096B (zh) 对焦方法和装置、电子设备、计算机可读存储介质
WO2020259179A1 (zh) 对焦方法、电子设备和计算机可读存储介质
KR102278776B1 (ko) 이미지 처리 방법, 기기, 및 장치
US10896518B2 (en) Image processing method, image processing apparatus and computer readable storage medium
CN108810418B (zh) 图像处理方法、装置、移动终端及计算机可读存储介质
WO2021057474A1 (zh) 主体对焦方法、装置、电子设备和存储介质
CN110276831B (zh) 三维模型的建构方法和装置、设备、计算机可读存储介质
JP2020536457A (ja) 画像処理方法および装置、電子機器、ならびにコンピュータ可読記憶媒体
CN113766125B (zh) 对焦方法和装置、电子设备、计算机可读存储介质
EP3480784B1 (en) Image processing method, and device
CN107862658B (zh) 图像处理方法、装置、计算机可读存储介质和电子设备
EP3798975B1 (en) Method and apparatus for detecting subject, electronic device, and computer readable storage medium
CN109146906B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN109712177B (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
CN110191287B (zh) 对焦方法和装置、电子设备、计算机可读存储介质
WO2019029573A1 (zh) 图像虚化方法、计算机可读存储介质和计算机设备
WO2021225472A2 (en) Joint objects image signal processing in temporal domain
CN110365897B (zh) 图像修正方法和装置、电子设备、计算机可读存储介质
CN107578372B (zh) 图像处理方法、装置、计算机可读存储介质和电子设备
CN110688926B (zh) 主体检测方法和装置、电子设备、计算机可读存储介质
CN110399823B (zh) 主体跟踪方法和装置、电子设备、计算机可读存储介质
CN107770446B (zh) 图像处理方法、装置、计算机可读存储介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20850540

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20850540

Country of ref document: EP

Kind code of ref document: A1