WO2019184604A1 - 一种检测目标图像的方法及装置 - Google Patents

一种检测目标图像的方法及装置 Download PDF

Info

Publication number
WO2019184604A1
WO2019184604A1 PCT/CN2019/074761 CN2019074761W WO2019184604A1 WO 2019184604 A1 WO2019184604 A1 WO 2019184604A1 CN 2019074761 W CN2019074761 W CN 2019074761W WO 2019184604 A1 WO2019184604 A1 WO 2019184604A1
Authority
WO
WIPO (PCT)
Prior art keywords
configuration information
target
image
candidate
picture
Prior art date
Application number
PCT/CN2019/074761
Other languages
English (en)
French (fr)
Inventor
白博
朱博
毛坤
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019184604A1 publication Critical patent/WO2019184604A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • the present application relates to the field of communications, and in particular, to a method and apparatus for detecting a target image.
  • the target image may be a human body image and/or a vehicle image in a motion state in the surveillance video.
  • a rectangular frame is used in the picture to frame a target image such as a human body image or a vehicle image in a moving state, so as to facilitate tracking or tracking a certain vehicle.
  • the target image in the picture can be detected by inputting the picture into a Convolutional Neural Network (CNN), which obtains the first feature picture and the second after CNN undergoes multiple convolution operations.
  • CNN Convolutional Neural Network
  • the feature picture, the number of convolution operations that the first feature picture passes is less than the number of convolution operations that the second feature picture passes.
  • RPN Region Proposal Network
  • Each candidate frame in the feature picture frames a target image, and the location information of the candidate frame includes the position of a pair of diagonal points of the candidate frame, and the confidence score of the candidate frame is used to indicate the target image framed by the candidate frame.
  • the state of the state is the probability of motion. Adding a detection frame corresponding to each candidate box and a target image included in the detection frame to the picture according to the location information of each candidate frame and the second feature picture in the N candidate frames with the largest confidence score. Types of.
  • the target image framed by some of the candidates in the N candidate boxes with the highest confidence score is not a complete target image.
  • some candidates may frame part of the human body image or part of the vehicle image, so that the candidate box is
  • the target image in the detection frame added in the picture is also an incomplete target image, which reduces the detection accuracy.
  • the embodiment of the present application provides a method and apparatus for detecting a target image.
  • the technical solution is as follows:
  • the present application provides a method for detecting a target image by acquiring a foreground moving image corresponding to a picture to be detected and acquiring a first feature picture obtained by convolving the picture to be detected.
  • the foreground moving image includes a target image in a moving state in the to-be-detected image and a background image other than the target image; and detecting a target image in the first feature image to obtain a first candidate frame configuration information set
  • the first candidate box configuration information set includes configuration information of each candidate box in the at least one candidate box, where each of the candidate frames includes at least one target image, the first
  • the target image in a feature picture is the same as the target image in the to-be-detected picture; filtering the target image included in the first candidate frame configuration information set as a candidate for the non-complete target image according to the foreground moving image
  • the configuration information of the frame is obtained by the second candidate frame configuration information set; and the second candidate frame configuration information set is used in the to-be-detected image.
  • the second candidate frame configuration information set is obtained by filtering the configuration information of the candidate box including the non-complete target object from the first candidate box configuration information set, so the information collection is set in the to-be-detected picture according to the second candidate frame configuration information. Adding a detection frame can improve the detection accuracy.
  • the foreground moving image corresponding to the to-be-detected picture is obtained by performing a mixed Gaussian background modeling on the to-be-detected picture.
  • the foreground moving image can be used to implement the filtering of the configuration information of the candidate box including the incomplete target object from the first candidate box configuration information set.
  • the target image is the configuration information of the candidate box of the non-complete target image. Since the integral map corresponding to the foreground moving image is obtained, the configuration information in the first candidate frame configuration information set can be filtered according to the integral map, and the filtering speed can be improved by the integral graph, thereby improving the detection efficiency.
  • the ratio between the values of the target candidate box is filtered from the first candidate box configuration information set when the ratio is less than the preset ratio threshold. Since the integral map area corresponding to the target candidate box is obtained, the calculation amount of calculating the ratio can be reduced according to the integral map area, thereby improving the calculation speed.
  • the calculation amount required for calculating the target image area is small, so that the target image area can be quickly calculated, and the calculation efficiency is improved.
  • the present application provides an apparatus for detecting a target image for performing the method of the first aspect or any one of the possible implementations of the first aspect.
  • the apparatus comprises means for performing the method of the first aspect or any one of the possible implementations of the first aspect.
  • the present application provides an apparatus for detecting a target image, the apparatus comprising: at least one processor; and at least one memory; the at least one memory storing one or more programs, the one or more The program is configured to be executed by the at least one processor, the one or more programs comprising instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect.
  • the present application provides an apparatus for detecting a target image, the apparatus comprising a transceiver, a processor, and a memory.
  • the transceiver, the processor and the memory can be connected by a bus system.
  • the memory is for storing a program, an instruction or a code
  • the processor is for executing a program, an instruction or a code in the memory, completing the method of the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer program product comprising a computer program stored in a computer readable storage medium, and the computing program is loaded by a processor to implement the first aspect or the first A method of any possible implementation on the one hand.
  • the present application provides a non-transitory computer readable storage medium for storing a computer program, the computer program being loaded by a processor to perform the above first aspect or any possible implementation of the first aspect The method of the method of the way.
  • the embodiment of the present application provides a chip, the chip including programmable logic circuitry and/or program instructions for implementing any of the above first or first aspects when the chip is in operation The way to implement it.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • 2-1 is a flowchart of a method for detecting a target image according to an embodiment of the present application
  • FIG. 2-2 is a block diagram of an RPN device provided by an embodiment of the present application.
  • 2-3 is a schematic diagram of adding an sliding window of an RPN device according to an embodiment of the present application.
  • FIGS. 2-5 are schematic diagrams of an integral map area provided by an embodiment of the present application.
  • FIGS. 2-7 are block diagrams of a Fast Rcnn device provided by an embodiment of the present application.
  • FIGS. 2-8 are block diagrams of software systems for detecting target images provided by embodiments of the present application.
  • 3-1 is a schematic structural diagram of an apparatus for detecting a target image according to an embodiment of the present application
  • 3-2 is a schematic structural diagram of another apparatus for detecting a target image according to an embodiment of the present application.
  • 3-3 is a schematic structural diagram of another apparatus for detecting a target image according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another apparatus for detecting a target image according to an embodiment of the present application.
  • an embodiment of the present application provides a network architecture, including:
  • a network connection is established between the camera device and the server, the camera device and the server, and the network connection may be a wireless connection or a wired connection.
  • Camera equipment can be installed in places such as shopping malls and roads to take pictures and send pictures to the server.
  • the network architecture can be applied to a scenario such as video surveillance.
  • the camera device can capture a frame of a frame and send the captured image to the server.
  • the picture captured by the imaging device includes a foreground moving image in a moving state and a background image in a stationary state.
  • the foreground moving image in a moving state may be a human body image and/or a vehicle image or the like in a moving state
  • the background image in a stationary state may be a building image, a tree image, and/or a vehicle image in a stationary state or the like.
  • the target image in the picture may be detected, and the target image may be one or more of the foreground moving images in the picture in motion.
  • the target image of the picture is detected, the target image is also framed in the picture using the detection frame, so that it is convenient to follow a certain target.
  • the target is a certain person or a certain vehicle, the person or the vehicle may be tracked according to each picture of the added detection frame.
  • the processing for detecting the target image in the picture may be performed by the imaging device, that is, after the camera device obtains a frame of the frame, the process of detecting the target image in the image may be performed.
  • the imaging device may be configured with a higher computing resource, which may be a central processing unit (CPU), a graphics processing unit (GPU), and a memory capacity. At least one of the resources.
  • a higher computing resource which may be a central processing unit (CPU), a graphics processing unit (GPU), and a memory capacity. At least one of the resources.
  • the camera device may not execute, but may be executed by the server, that is, after receiving the frame picture sent by the camera device, the server may perform detection on the picture.
  • the server reads a frame of the picture from the memory and performs a process of detecting the target image in the picture, and the picture in the memory may be a picture taken by the imaging device.
  • the server may first store the picture in the memory when receiving the picture sent by the camera device.
  • the camera device can be a surveillance camera or a mobile phone with a camera.
  • the embodiment of the present application provides a method for detecting a target image, which may be applied to the network architecture provided by the embodiment shown in FIG. 1.
  • the execution body of the method may be a camera in the network architecture.
  • Equipment or server, etc. including:
  • Step 201 Acquire a foreground moving image corresponding to the to-be-detected picture and obtain a first feature picture obtained by convolving the picture to be detected.
  • the foreground moving image includes a target image in a moving state and a background image other than the target image. .
  • the picture to be detected may be any picture in the video taken by the camera device.
  • the execution subject of the embodiment is an image capturing device
  • the image capturing device captures a frame of a picture
  • the image may be used as the image to be detected.
  • the execution subject of the embodiment is a server
  • the server may use the picture as a picture to be detected when receiving a frame of the picture sent by the camera device; or the server reads a frame of the picture from the memory as the picture to be detected.
  • the server may be stored in the memory.
  • the mixed Gaussian background modeling process may be performed on the picture to be detected to obtain the foreground moving image corresponding to the picture to be detected.
  • a preset mixed Gaussian model device and a Fast Region-based Convolution Neural Network (Fast Rcnn) device are included, and the Fast Rcnn device includes CNN.
  • the to-be-detected pictures can be input into the mixed Gaussian model device and the CNN of the Fast Rcnn device respectively; then, the mixed Gaussian background modeling process is performed on the detected pictures by the mixed Gaussian model device, and the foreground corresponding to the picture to be detected is obtained.
  • the moving image is subjected to convolution operation processing by the CNN to detect the picture, and the first feature picture corresponding to the picture to be detected is obtained.
  • the foreground moving image corresponding to the picture to be detected is a black and white picture, and the pixel value of each pixel in the foreground moving image is 1 or 0.
  • the foreground moving image corresponding to the picture to be detected is a picture of a size equal to the size of the picture to be detected. For each pixel of the picture to be detected, the pixel has a corresponding pixel in the foreground moving image. If a pixel in the image to be detected is a pixel in the target image in the moving state, the pixel value of the corresponding pixel in the foreground moving image corresponding to the image to be detected is 1. If a pixel in the image to be detected is a pixel in the background image in the image to be detected, the pixel value of the corresponding pixel in the foreground moving image corresponding to the image to be detected is 0.
  • the target image may be a human body image and/or a vehicle image or the like in the picture to be detected.
  • the operation of performing mixed Gaussian background modeling processing on the detected image may be divided into the following operations from 2011 to 2014:
  • P(x j ) is the probability of the jth pixel point in the picture to be detected, and the probability is the probability that the jth pixel point belongs to the target image in the motion state
  • x j is the pixel values of the j-th pixel point
  • x j [x jR x jG x jB], x jR pixel values of the R channel, x jG the pixel values of the G channel, x jB pixel B channel values
  • t is The time corresponding to the picture to be detected may be the frame number of the picture to be detected as the time t when implemented.
  • K is a preset value, and the probability of the jth pixel point in the picture at the 0th time calculated by the preset mixed Gaussian model device, before the probability of calculating the pixel point using the formula (1)
  • the probability of the jth pixel in the picture ...the probability of the jth pixel in the picture at time t-1, the estimated value of the weight coefficients of the K Gaussian distributions at time t, K Gaussian distribution Mean vector and K covariance matrices.
  • the estimated values of the weight coefficients of the K Gaussian distribution are respectively The mean vectors of the K Gaussian distributions are respectively The K covariance matrices are
  • the calculated probability is less than or equal to the preset probability threshold, determining that the pixel is a pixel in a background image in a still state in the image to be detected, according to the position of the pixel in the image to be detected, creating The foreground moving image is filled with pixels with a pixel value of 0. For each pixel in the picture to be detected, the pixel corresponding to the created foreground image is filled in the above manner, and the foreground moving image corresponding to the picture to be detected is obtained.
  • the CNN includes a plurality of convolution layers, and the first convolution layer is used to perform a convolution operation on the to-be-detected picture input to the CNN.
  • the input of each convolutional layer in the CNN except the first convolutional layer is the output of its adjacent previous convolutional layer, and each of the other convolutional layers is used for the previous convolution of its neighboring convolutional layer.
  • the result of the layer output is subjected to a convolution operation.
  • each convolutional layer output in the CNN is a feature picture corresponding to the picture to be detected.
  • the feature picture outputted by the convolution layer is more abstract than the convolution layer.
  • the degree of abstraction of a feature image output by a convolutional layer is more abstract than the convolution layer.
  • the process of performing convolution processing on the detected picture by the CNN may be: inputting the picture to be detected into the CNN, and convolution processing the first convolution layer in the CNN to obtain a picture to be detected.
  • the second convolution layer performs a convolution operation on the feature picture, and still obtains a feature picture corresponding to the picture to be detected, and the degree of abstraction of the feature picture is greater than the abstraction degree of the feature picture output by the first convolution layer.
  • the feature image of the to-be-detected picture outputted by the first target convolutional layer is obtained as the first feature picture
  • the first target convolution layer is other than the first convolutional layer and the last convolutional layer in the CNN.
  • the other one is convolutional.
  • a convolution layer located at a middle position of the CNN may be selected as the first target convolution layer, and the feature picture corresponding to the to-be-detected picture outputted by the first target convolution layer is obtained as the first feature picture.
  • the second feature image that is subjected to convolution processing on the image to be detected is further obtained, and the number of convolution operations that the first feature image passes is smaller than the number of convolution operations that the second feature image passes.
  • a convolution layer located at a position behind the CNN may be selected as a second target convolution layer, and a feature picture corresponding to the to-be-detected picture outputted by the second target convolution layer is obtained as a second feature picture, and the second target volume is obtained.
  • the number of layers in which the layer is stacked is greater than the number of layers in which the first target convolution layer is located.
  • a convolutional layer at the back position of the CNN may select one of the last N convolutional layers in the CNN as the second target convolutional layer.
  • N is a preset value.
  • N can be a value of 5, 4, 3, 2, or 1.
  • the last layer convolution layer in the CNN may be selected as the second target convolution layer, that is, the feature picture corresponding to the to-be-detected picture outputted by the last layer convolution layer of the CNN is used as the second feature picture.
  • Step 202 Detect the target image in the first feature picture to obtain a first candidate frame configuration information set, where the first candidate frame configuration information set includes configuration information of each candidate frame in the at least one candidate frame.
  • the at least one target image is included in each candidate frame in the first feature image, and the target image in the first feature image is the same as the target image in the image to be detected.
  • the configuration information of the candidate box includes at least the location information and the confidence score of the candidate box.
  • the location information of the candidate box may include the position of a pair of diagonal points of the candidate box, and the pair of diagonal points may be two diagonal points on any diagonal of the candidate box, and the diagonal points
  • the location may be a location of the diagonal point in the first feature picture; or the location information of the candidate frame may include a location of a vertex of the candidate frame and a size of the candidate frame, the vertex may be the candidate At any vertex of the frame, the position of the vertex is the position of the vertex in the first feature picture, and the size of the candidate frame may include the width and height of the candidate frame.
  • the candidate box may be a rectangular box, and the confidence score of the candidate box may indicate a probability that the state of the target object in the candidate box is a motion state.
  • the RPN device is preset.
  • the first feature picture may be input to the RPN device, and the first feature picture is processed by the RPN device to obtain configuration information of each candidate frame in at least one candidate box, and each candidate is selected.
  • the configuration information of the frame constitutes a first candidate box configuration information set.
  • the first feature image may include at least one target image
  • the target image may be a human body image and/or a vehicle image or the like.
  • the RPN device When receiving the input first feature picture, the RPN device adds a candidate box for framing the target image in the first feature picture through the Propoasls layer of the RPN device, and obtains the position of the candidate frame in the first feature picture. The information is used to estimate a confidence score indicating a probability that the state of the target image is a motion state, thereby obtaining configuration information of the candidate frame.
  • the RPN device when the first feature picture is input to the RPN device, the RPN device adds a sliding window to the first feature picture, moves the position of the sliding window, and enlarges or reduces the size of the sliding window. Obtaining a plurality of different sliding windows, encoding the feature vector of each sliding window through the convolution layer, and outputting the position information of the at least one candidate box and the at least one candidate box according to the feature vector of each sliding window through the fully connected layer Confidence score.
  • a candidate box is present in the candidate box, and the target image included in the candidate panel is a non-complete target image, and also includes a larger background image.
  • Step 203 According to the foreground moving image, filter the configuration information of the candidate target image into the candidate frame of the non-complete target image from the first candidate frame configuration information set to obtain a second candidate frame configuration information set.
  • filtering methods for implementing the step, for example, filtering the first candidate frame configuration information set according to the integral image of the foreground moving image; and, for example, performing the first candidate frame configuration information set according to the foreground moving image. filter.
  • filter For other filtering methods, they are not listed one by one.
  • the process of filtering the first candidate frame configuration information set according to the integral map of the foreground moving image may be performed by the following operations 2031 to 2034, respectively:
  • Equation (2) calculates that, based on the position of the pixel in the foreground moving image, the integrated value of the pixel is filled in the created integral map, that is, at the position of the Mth row and the Nth column of the created integral map. Fill in the integral value of this pixel.
  • Integral (M, N) is an integral value of a pixel point of the Mth row and the Nth column
  • image(i, j) is a pixel point of the i-th row and the j-th column in the foreground moving image. Pixel values.
  • the integrated value of each pixel is filled in the created integral map in the above manner, and an integral map corresponding to the foreground moving image is obtained.
  • the integration of the pixel of the Mth row and the Nth column The value may be equal to an area of the foreground moving image in an image region in the foreground moving image, the image region including a pixel of the first row and the first column and a pixel of the Mth row and the Nth column in the foreground moving image, And the size of the image area is M ⁇ N.
  • the configuration information of the candidate image included in the first candidate frame configuration information set to be the non-complete target image is filtered, and the detailed implementation process may include the following operations of 2032 to 2034.
  • the configuration information of the target candidate box obtains the corresponding candidate map area in the target candidate frame, and the configuration information of the target candidate box is any candidate box in the first candidate box configuration information set. Configuration information.
  • the corresponding integral map area of the target candidate frame in the integral map may be obtained.
  • the target candidate box is obtained in the integral map.
  • the integral map area is a region of a pair of diagonal points of the target candidate box.
  • the integral map corresponding to the target candidate box is obtained in the integral map. region.
  • the position information of the target is assumed Hou box comprises a pair of corner location of the box Hou target, wherein the position of a corner of an i-th row of J 1 column of the other the position of the corner point of row i 2 j 2 columns.
  • the integral map area corresponding to the target candidate box is obtained in the integral map shown in FIG. 2-5.
  • the integrated value of the pixel points located at the four vertex positions of the integral map area may be acquired; and the target image area located in the target candidate box is calculated according to the obtained integral value of each pixel point.
  • Integral (i 2, j 2 ) is the integrated value of the pixel of the i-th row 2 of the j 2 column
  • integral (i 2, j 1 ) is the integrated value of the pixel of the i-th row 2 of J 1 column
  • integral (i 1, j 1 ) of the pixel i-th row 1 of J 1 column The integral value.
  • the area of the target candidate box may be calculated according to the location information of the target candidate box.
  • the area of the target candidate frame is calculated according to the position of each of the pair of diagonal points.
  • the area of the target candidate frame is calculated according to the size.
  • the configuration information of the target candidate box is filtered from the first candidate box configuration information set.
  • the location information of the target candidate box is retained in the first candidate box configuration information set.
  • the process of filtering the first candidate frame configuration information set according to the foreground moving image may be performed by the following operations 2131 to 2134, respectively:
  • the corresponding image area of the target candidate frame in the foreground moving image may be obtained according to the location information of the target candidate box.
  • the target candidate frame is acquired in the foreground moving image according to the position of each of the pair of diagonal points. Corresponding image area.
  • the location information of the target candidate box includes the position of a vertex of the target candidate box and the size of the target candidate box, according to the position and the size of the one vertex, the corresponding image of the target candidate frame in the foreground moving image is acquired. region.
  • a ratio between the area of the target image located in the target candidate box and the area of the target candidate frame may be calculated according to the image area, and the implementation process may include the following operations of 2132 to 2134.
  • the pixels in the foreground moving image are divided into two categories.
  • One type of pixel belongs to the target image in motion, and each pixel belonging to the class has a pixel value of 1, and the other type of pixel belongs to the background in a stationary state.
  • the image, and each pixel belonging to another class has a pixel value of zero.
  • the number of pixels in the image area with a pixel value of 1 may be counted, and the number of pixels belonging to the target image in the image area is obtained.
  • 2133 Calculate a ratio between the number of the pixels and the total number of pixels, and obtain a ratio between a target image area located in the target candidate box and an area of the target candidate frame.
  • the location information of the target candidate box is retained in the first candidate box configuration information set.
  • the configuration information of each candidate box in the second candidate box configuration information set may be sorted according to the confidence score of each candidate box in the second candidate box configuration information set, The first configuration information sequence.
  • the configuration information of the preset value candidate box with the largest confidence score may be selected from the first configuration information sequence, and each candidate box is added to the to-be-detected image according to the selected location information of each candidate box.
  • the detection box, the candidate box and the detection box of the candidate box are equal in size.
  • the non-maximum value suppression operation may be performed on the first configuration information sequence to obtain a second configuration information sequence, where the number of candidate frame configuration information in the second configuration information sequence is less than or equal to the first configuration information sequence.
  • the number of candidate configuration information may be selected from the second configuration information sequence, and the detection frame of each candidate frame is added to the to-be-detected image according to the selected location information of each candidate frame.
  • the candidate box and the check box of the candidate box are equal in size.
  • the non-maximum value suppression operation is to identify any two candidate boxes whose overlapping area exceeds a preset threshold from the first configuration information sequence, and filter the configuration information of one of the candidate boxes from the two candidate boxes. Alternatively, the two candidate boxes are combined into one candidate box, and the configuration information of the synthesized candidate box is obtained.
  • step 203 a large number of candidate frame configuration information is filtered out from the first candidate frame configuration information set, so non-maximum suppression is performed on the configuration information of the candidate frame in the first configuration information sequence.
  • the number of candidate box configuration information that requires operation processing can be reduced, thereby improving the efficiency of the operation processing and further improving the efficiency of detecting the target image.
  • the type of the target image in the detection frame may be added, and when implemented, according to the second feature picture and the configuration information of each candidate frame selected, The detection frame is added to the detection frame and the type of the target image in the detection frame.
  • the second feature picture and the selected configuration information of each candidate frame may be input to the Region of Interest (RoI) pooling layer of the Fast Rcnn device, and the RoI pooling layer of the Fast Rcnn device may be adopted.
  • the target image type in each candidate box selected is output, and the detection frame and the type of the target image in the detection frame are added to the image to be detected according to the position information of each candidate frame.
  • the above processing may be performed for each frame of the picture, thereby realizing the addition of the detection frame and the type of the target image in the detection frame in each frame of the picture.
  • the Fast Rcnn device includes a shared convolutional layer, a unique convolutional layer, a RoI pooling layer, and a fully connected layer.
  • the image to be detected passes through a shared convolutional layer and a unique convolution.
  • the second feature picture and the second candidate frame configuration information set are input to the RoI pooling layer, and then processed by the RoI pooling layer and the fully connected layer to output the detection frame and the type of the target image in each detection frame. .
  • the foregoing application may be applied to a software system that may be executed by a device to perform the process of the foregoing method.
  • the device may be in the embodiment shown in FIG.
  • the software system may include a filtering device, a hybrid Gaussian model device, an RPN device, and a Fast Rcnn device.
  • the to-be-detected pictures are respectively input to the CNN and the hybrid Gaussian model device in the Fast Rcnn device, and the CNN in the Fast Rcnn device inputs the first feature picture to the RPN device; the mixed Gaussian model device outputs the foreground moving image to the filtering device, and the RPN device filters again.
  • the module inputs a first candidate frame configuration information set; the filtering device obtains a second candidate frame configuration information set by the operation of the foregoing step 203, and inputs a second candidate frame configuration information set to the Fast Rcnn device, and the Fast Rcnn device is in the to-be-detected image. Add the type of the target image in the detection box and the detection box.
  • the foreground moving image of the to-be-detected picture is obtained, and according to the foreground moving image, the configuration information of the candidate box including the incomplete target object is filtered out from the first candidate box configuration information set, and the second is obtained.
  • the candidate box configuration information set adds a detection frame to the to-be-detected picture according to the second candidate frame configuration information set, thereby improving detection accuracy. Since the configuration information of the large number of candidate frames is filtered out in the second candidate frame configuration information set, when the non-maximum value suppression operation is performed on the second candidate frame configuration information set, the candidate for the operation processing is reduced. The number of configuration information of the frame, thereby reducing the amount of calculation, increasing the processing speed, and thereby improving the detection efficiency.
  • an embodiment of the present application provides an apparatus 300 for detecting a target image, where the apparatus 300 can be used to implement the embodiment shown in FIG. 2-1, and the server in the embodiment shown in FIG. 1 can also be implemented. Or the capabilities of the camera device, including:
  • the acquiring unit 301 is configured to acquire a foreground moving image corresponding to the to-be-detected picture and obtain a first feature picture obtained by performing a convolution operation on the to-be-detected picture, where the foreground moving image includes a target image in a moving state in the image to be detected and the target image a background image outside the image;
  • the detecting unit 302 is further configured to detect the target image in the first feature image to obtain a first candidate frame configuration information set, where the first candidate frame configuration information set includes configuration information of each candidate frame in the at least one candidate frame. At least one target image is included in each candidate frame in the first feature picture, and the target image in the first feature picture is the same as the target image in the picture to be detected;
  • the filtering unit 303 is further configured to: according to the foreground moving image, filter the configuration information of the candidate image included in the first candidate frame configuration information set to the candidate frame of the non-complete target image, to obtain the second candidate frame configuration information set;
  • the adding unit 304 is further configured to add a detection frame to the to-be-detected picture according to the second candidate frame configuration information set, where the detection frame includes at least one target image in the to-be-detected picture.
  • the apparatus 300 further includes: at least one of a transceiver unit 305 and a storage unit 306;
  • the picture to be detected may be a picture received by the transceiver unit 305, or the picture to be detected may be a picture stored in the storage unit 306.
  • the device 300 when the device 300 is used to implement the function of the image capturing device, the device 300 may further include an image capturing unit 307, which may be a camera or the like, and the image to be detected may be the camera unit 307. Take the picture.
  • the device 300 may further include a transceiver unit 305 and/or a storage unit 306, which may be used to transmit a picture taken by the camera unit 307, and the storage unit 306 may be used to store a picture taken by the camera unit 307.
  • the apparatus 300 when the apparatus 300 is used to implement the functions of a server, the apparatus 300 may include a transceiving unit 305 and/or a storage unit 306.
  • the obtaining unit 301 is configured to perform a mixed Gaussian background modeling on the detected image to obtain a foreground moving image corresponding to the to-be-detected picture.
  • the filtering unit 303 is configured to:
  • the filtering unit 303 is configured to:
  • the configuration information of the target candidate box the corresponding candidate map area in the integral map is obtained, and the configuration information of the target candidate box is the configuration of any candidate box in the first candidate box configuration information set. information;
  • the configuration information of the target candidate box is filtered from the first candidate box configuration information set.
  • the filtering unit 303 is configured to:
  • the filtering unit 303 is configured to:
  • Obtaining, according to the configuration information of the target candidate box, the corresponding candidate image frame in the foreground moving image, and the configuration information of the target candidate frame is the configuration of any candidate frame in the first candidate frame configuration information set. information;
  • the configuration information of the target candidate box is filtered from the first candidate box configuration information set.
  • the filtering unit 303 is configured to:
  • the adding unit 304 is configured to:
  • the foreground moving image of the to-be-detected picture is obtained, so that the configuration information of the candidate box including the non-complete target object is filtered out from the first candidate box configuration information set according to the foreground moving image.
  • the second candidate frame configuration information set is added to the to-be-detected picture according to the second candidate frame configuration information set, which can improve the detection precision.
  • FIG. 4 is a schematic diagram of an apparatus 400 for detecting a target image according to an embodiment of the present application.
  • the apparatus 400 includes at least one processor 401, a bus system 402, a memory 403, and at least one transceiver 404.
  • the device 400 is a hardware structured device that can be used to implement the functional modules in the device described in FIG. 3-1.
  • the obtaining unit 301, the detecting unit 302, the filtering unit 303, and/or the adding unit 304 in the apparatus 300 shown in FIG. 3-1 may call the code in the memory 403 through the at least one processor 401.
  • the transceiver unit 305 in the device 300 shown in FIG. 3-1 can be implemented by the at least one transceiver 404.
  • the device 400 can also be used to implement the functions of the imaging device in the embodiment as described in FIG. 1 or to implement the functions of the server in the embodiment shown in FIG. 1.
  • the device 400 may further include a camera 407 through which the imaging unit 307 in the device 300 shown in FIG. 3-1 can be implemented.
  • the processor 401 may be a general central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the present invention.
  • the bus system 402 described above can include a path for communicating information between the components.
  • the transceiver 404 is configured to communicate with other devices or communication networks, such as an Ethernet, a radio access network (RAN), a wireless local area network (WLAN), and the like.
  • RAN radio access network
  • WLAN wireless local area network
  • the above memory 403 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other information that can store information and instructions.
  • Type of dynamic storage device also can be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other medium accessed by the computer, but is not limited to this.
  • the memory can exist independently and be connected to the processor via a bus.
  • the memory can also be integrated with the processor.
  • the memory 403 is used to store application code for executing the solution of the present application, and is controlled by the processor 401 for execution.
  • the processor 401 is configured to execute application code stored in the memory 403 to implement the functions in the method of the present patent.
  • the processor 401 may include one or more CPUs, such as CPU0 and CPU1 in FIG.
  • the apparatus 400 can include a plurality of processors, such as the processor 401 and the processor 408 of FIG. Each of these processors can be a single-CPU processor or a multi-core processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
  • the apparatus 400 when the apparatus 400 is used to implement the functions of a server, the apparatus 400 may further include an output device 405 and an input device 406.
  • Output device 405 is in communication with processor 401 and can display information in a variety of ways.
  • the output device 405 can be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.
  • Input device 406 is in communication with processor 401 and can accept user input in a variety of ways.
  • input device 406 can be a mouse, keyboard, touch screen device, or sensing device, and the like.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种检测目标图像的方法及装置,属于通信领域。所述方法包括:通过获取待检测图片对应的前景运动图像以及获取对所述待检测图片进行卷积运算得到的第一特征图片;检测所述第一特征图片中的目标图像得到第一侯选框配置信息集合,所述第一侯选框配置信息集合包括至少一个侯选框中的每个侯选框的配置信息;根据所述前景运动图像从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,得到第二侯选框配置信息集合;根据所述第二侯选框配置信息集合,在所述待检测图片中添加检测框,所述检测框中包括所述待检测图片中的至少一个目标图像。本申请能够提高检测精度。

Description

一种检测目标图像的方法及装置
本申请要求于2018年3月27日提交中国国家知识产权局、申请号为201810258574.7、发明名称为“一种检测目标图像的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,特别涉及一种检测目标图像的方法及装置。
背景技术
伴随着平安城市的建设,目前布置了大量监控摄像头,这些监控摄像头用于拍摄监控视频。对于每个监控摄像头拍摄的监控视频,需要检测监控视频中的每帧图片中的目标图像,目标图像可以为监控视频中处于运动状态的人体图像和/或车辆图像等。对图片执行目标图像检测操作后,该图片中使用矩形框框住处于运动状态的人体图像或车辆图像等目标图像,以便于后续对某个人进行跟踪或对某个车辆进行跟踪等。
目前可以采用如下方式检测图片中的目标图像,可以为:将图片输入到卷积神经网络(Convolutional Neural Network,CNN),该图片在CNN经过多次卷积运算后得到第一特征图片和第二特征图片,第一特征图片经过的卷积运算次数小于第二特征图片经过的卷积运算次数。将第一特征图片输入到区域侯选网络(Region Proposal Network,RPN),通过RPN获取至少一个侯选框中的每个侯选框在第一特征图片中的位置信息和置信得分,在第一特征图片中每个侯选框框住一个目标图像,侯选框的位置信息包括该侯选框的一对对角点的位置,侯选框的置信得分用于表示该侯选框框住的目标图像的状态为运动状态的概率。根据置信得分最大的N个侯选框中的每个侯选框的位置信息和第二特征图片,在该图片中添加每个侯选框对应的检测框以及该检测框中包括的目标图像的类型。
在实现本申请的过程中,发明人发现现有技术至少存在以下问题:
置信得分最大的N个侯选框中存在部分侯选框框住的目标图像不是完整的目标图像,例如有些侯选框中可能框住部分人体图像或部分车辆图像,这样根据该部分侯选框在图片中添加的检测框中的目标图像也是不完整的目标图像,降低检测精度。
发明内容
为了提高检测精度,本申请实施例提供了一种检测目标图像的方法及装置。所述技术方案如下:
第一方面,本申请提供了一种检测目标图像的方法,所述方法通过获取待检测图片对应的前景运动图像以及获取对所述待检测图片进行卷积运算得到的第一特征图片,所述前景运动图像包括所述待检测图片中处于运动状态的目标图像和除所述目标图像外的背景图像;检测所述第一特征图片中的目标图像得到第一侯选框配置信息集合,所述第一侯选框配置信息集合包括至少一个侯选框中的每个侯选框的配置信息,在所述第一特征图片中所述每个侯选框内包括至少一个目标图像,所述第一特征图片中的目标图像与所述待检测图片中的目标图 像相同;根据所述前景运动图像从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,得到第二侯选框配置信息集合;根据所述第二侯选框配置信息集合,在所述待检测图片中添加检测框,所述检测框中包括所述待检测图片中的至少一个目标图像。由于从第一侯选框配置信息集合过滤掉包括非完整目标对象的侯选框的配置信息,得到第二侯选框配置信息集合,所以根据第二侯选框配置信息集合在待检测图片中添加检测框,可以提高检测精度。
在第一方面的一种可能的实现方式中,通过对所述待检测图片进行混合高斯背景建模,得到所述待检测图片对应的前景运动图像。这样可以通过该前景运动图像,以实现从第一侯选框配置信息集合中过滤掉包括非完整目标对象的侯选框的配置信息。
在第一方面的一种可能的实现方式中,根据所述前景运动图像,计算所述前景运动图像对应的积分图;根据所述积分图从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息。由于得到前景运动图像对应的积分图,这样可以根据积分图过滤第一侯选框配置信息集合中的配置信息,通过积分图可以提高过滤速度,进而提高检测效率。
在第一方面的一种可能的实现方式中,根据目标侯选框的配置信息,获取所述目标侯选框在所述积分图中对应的积分图区域,所述目标侯选框的配置信息为所述第一侯选框配置信息集合中的任一个侯选框的配置信息;根据所述积分图区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值;在所述比值小于预设比值阈值时,从所述第一侯选框配置信息集合中过滤所述目标侯选框的配置信息。由于获取到目标侯选框对应的积分图区域,根据该积分图区域可以减小计算该比值的计算量,从而提高了计算速度。
在第一方面的一种可能的实现方式中,获取位于所述积分图区域的四个顶点位置的像素点的积分值;根据所述获取的各像素点的积分值,计算位于所述目标侯选框内的目标图像面积;根据所述目标侯选框的配置信息,计算所述目标侯选框的面积;计算所述目标图像面积与所述目标侯选框的面积之间的比值。其中,根据四个顶位置的像素点的积分值,计算目标图像面积所需要的计算量较小,从而能够快速计算出目标图像面积,提高了计算效率。
在第一方面的一种可能的实现方式中,根据目标侯选框的配置信息,获取所述目标侯选框在所述前景运动图像中对应的图像区域,所述目标侯选框的配置信息为所述第一侯选框配置信息集合中的任一个侯选框的配置信息;根据所述图像区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值;在所述比值小于预设比值阈值时,从所述第一侯选框配置信息集合中过滤所述目标侯选框的配置信息。这样根据图像区域就可以确定是否过滤掉目标侯选框,简化了方案实现逻辑。
在第一方面的一种可能的实现方式中,统计所述图像区域中属于目标图像的像素点数目和所述图像区域的总像素点数目;计算所述像素点数目与所述总像素点数目之间的比值,得到位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值。
在第一方面的一种可能的实现方式中,获取对所述待检测图片进行卷积运算得到的第二特征图片,对所述第一特征图片进行卷积运算的次数小于对所述第二特征图片进行卷积运算的次数;根据所述第二特征图片和所述第二侯选框配置信息集合,在所述待检测图片中添加检测框和所述检测框内的目标图像的类型。由于第二侯选框配置信息集合已被过滤掉大量的 侯选框的配置信息,这样根据第二侯选框配置信息集合,在待检测图片添加检测框时可以减小运算量,进而提高检测效率。
第二方面,本申请提供了一种检测目标图像的装置,用于执行第一方面或第一方面的任意一种可能的实现方式中的方法。具体地,所述装置包括用于执行第一方面或第一方面的任意一种可能的实现方式的方法的模块。
第三方面,本申请提供了一种检测目标图像的装置,所述装置包括:至少一个处理器;和至少一个存储器;所述至少一个存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述至少一个处理器执行,所述一个或多个程序包含用于进行第一方面或第一方面的任意一种可能的实现方式的方法的指令。
第四方面,本申请提供了一种检测目标图像的装置,所述装置包括收发器、处理器和存储器。其中,所述收发器、所述处理器以及所述存储器之间可以通过总线***相连。所述存储器用于存储程序、指令或代码,所述处理器用于执行所述存储器中的程序、指令或代码,完成第一方面或第一方面的任意可能的实现方式中的方法。
第五方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括在计算机可读存储介质中存储的计算机程序,并且所述计算程序通过处理器进行加载来实现上述第一方面或第一方面的任意可能的实现方式的方法。
第六方面,本申请提供了一种非易失性计算机可读存储介质,用于存储计算机程序,所述计算机程序通过处理器进行加载来执行上述第一方面或第一方面的任意可能的实现方式的方法的指令。
第七方面,本申请提实施例供了一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片运行时用于实现上述第一方面或第一方面的任意可能的实现方式的方法。
附图说明
图1是本申请实施例提供的一种网络架构示意图;
图2-1是本申请实施例提供的一种检测目标图像的方法流程图;
图2-2是本申请实施例提供的RPN装置的模块图;
图2-3是本申请实施例提供的RPN装置添加滑动窗口的示意图;
图2-4是本申请实施例提供的过滤配置信息的方法流程图;
图2-5是本申请实施例提供的积分图区域的示意图;
图2-6是本申请实施例提供的另一种过滤配置信息的方法流程图;
图2-7是本申请实施例提供的Fast Rcnn装置的模块图;
图2-8是本申请实施例提供的检测目标图像的软件***模块图;
图3-1是本申请实施例提供的一种检测目标图像的装置结构示意图;
图3-2是本申请实施例提供的另一种检测目标图像的装置结构示意图;
图3-3是本申请实施例提供的另一种检测目标图像的装置结构示意图;
图4是本申请实施例提供的另一种检测目标图像的装置结构示意图。
具体实施方式
下面将结合附图对本申请实施方式作进一步地详细描述。
参见图1,本申请实施例提供了一种网络架构,包括:
摄像设备和服务器,摄像设备和服务器之间建立有网络连接,该网络连接可以为无线连接或有线连接。
摄像设备可以安装在商场和道路等场所,用于拍摄图片,向服务器发送拍摄的图片。
可选的,该网络架构可以应用于视频监控等场景,例如,在视频监控场景下,摄像设备可以拍摄得到一帧帧的图片,可以向服务器发送拍摄的图片。
其中,摄像设备拍摄得到的图片中包括处于运动状态的前景运动图像和处理静止状态的背景图像。处于运动状态的前景运动图像可以为处于运动状态的人体图像和/或车辆图像等,处于静止状态的背景图像可以为建筑物图像、树木图像和/或处于静止状态的车辆图像等。
在摄像设备拍摄得到一帧帧的图片时,可以检测该图片中的目标图像,目标图像可以是该图片中处于运动状态的前景运动图像中的一个或多个。在检测到图片的目标图像时,还同时在该图片中使用检测框框住该目标图像,这样便于后续对某个目标进行跟踪。例如,当目标为某个人或某个车辆时,可以根据添加检测框的各图片,对该某个人或对该某个车辆进行跟踪等。
可选的,对于上述检测图片中的目标图像的处理过程,可以由摄像设备来执行,即摄像设备在拍摄得到一帧帧图片后,可以执行检测该图片中的目标图像的处理过程。
其中,为了提高摄像设备的检测效率,可以为摄像设备配置较高的计算资源,该计算资源可以为中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)和内存容量等资源中的至少一个。
可选的,对于上述检测图片中的目标图像的处理过程,摄像设备可以不执行,而是可以由服务器来执行,即服务器在接收到摄像设备发送的一帧帧图片后,可以执行检测该图片中的目标图像的处理过程;或者,服务器从存储器中读取一帧图片,并执行检测该图片中的目标图像的处理过程,存储器中的图片可以是摄像设备摄像的图片。
其中,服务器在接收到摄像设备发送的图片时,可以先将该图片存储在存储器中。摄像设备可以为监控摄像机或带有摄像头的手机等设备。
参见图2-1,本申请实施例提供了一种检测目标图像的方法,该方法可以应用于图1所示的实施例提供的网络架构,该方法的执行主体可以为该网络架构中的摄像设备或服务器等,包括:
步骤201:获取待检测图片对应的前景运动图像以及获取对待检测图片进行卷积运算得到的第一特征图片,前景运动图像包括待检测图片中处于运动状态的目标图像和除目标图像外的背景图像。
待检测图片可以为摄像设备拍摄的视频中的任一张图片。当本实施例的执行主体为摄像设备时,摄像设备在拍摄到一帧图片时,可以将该图片作为待检测图片。当本实施例的执行主体为服务器时,服务器在接收到摄像设备发送的一帧图片时,可以将该图片作为待检测图片;或者,服务器从存储器中读取一帧图片作为待检测图片。其中,服务器接收摄像设备发送的图片时,可以将该图片存储在存储器中。
对于待检测图片对应的前景运动图像,可以通过对待检测图片进行混合高斯背景建模处理,得到待检测图片对应的前景运动图像。
在本申请实施例中,预设混合高斯模型装置和基于快速区域的卷积神经网络(Fast Region-based Convolution Neural Network,Fast Rcnn)装置,Fast Rcnn装置包括CNN。在本步骤中,可以将待检测图片分别输入到混合高斯模型装置和Fast Rcnn装置的CNN中;然后,通过混合高斯模型装置对待检测图片进行混合高斯背景建模处理,得到待检测图片对应的前景运动图像,通过CNN对待检测图片进行卷积运算处理,得到待检测图片对应的第一特征图片。
待检测图片对应的前景运动图像是一个黑白图片,前景运动图像中的每个像素点的像素值为1或为0。
待检测图片对应的前景运动图像是一张尺寸与待检测图片的尺寸等大小的图片。对于待检测图片的每个像素点,该像素点在前景运动图像中存在对应的像素点。如果待检测图片中的一个像素点是待检测图片中处于运动状态的目标图像中的像素点,则该像素点在待检测图片对应的前景运动图像中对应的像素点的像素值为1。如果待检测图片中的一个像素点是待检测图片中处于静止状态的背景图像中的像素点,则该像素点在待检测图片对应的前景运动图像中对应的像素点的像素值为0。在本实施例中,目标图像可能为待检测图片中的人体图像和/或车辆图像等。
可选的,对待检测图片进行混合高斯背景建模处理的操作,可以分为如下2011至2014的操作:
2011:创建一张尺寸与待检测图片的尺寸等大小的空白前景运动图像。
2012:从待检测图片中读取待检测图片中的像素点的像素值,该像素值包括R通道的像素值、G通道的像素值和B通道的像素值,然后按如下公式(1)计算该像素点属于处于运动状态的目标图像的概率。
Figure PCTCN2019074761-appb-000001
其中,在上述公式(1)中,P(x j)为待检测图片中的第j个像素点的概率,该概率就是第j个像素点属于处于运动状态的目标图像的概率,x j为第j个像素点的像素值,x j=[x jRx jGx jB],x jR为R通道的像素值,x jG为G通道的像素值,x jB为B通道的像素值;t为待检测图片对应的时刻,在实现时可以用待检测图片的帧号作为时刻t,
Figure PCTCN2019074761-appb-000002
表示在时刻t混合高斯模型装置中第i个高斯分布的权系数的估计值,
Figure PCTCN2019074761-appb-000003
Figure PCTCN2019074761-appb-000004
分别表示在时刻t混合高斯模型装置中第i个高斯分布的均值向量和协方差矩阵(此处假定像素的红、绿、蓝分量相互独立);η表示高斯分布概率密度函数。
K为预设数值,在使用公式(1)计算该像素点的概率之前,预设混合高斯模型装置根据已计算出的第0时刻的图片中的第j个像素点的概率、第1时刻的图片中的第j个像素点的概率……第t-1时刻的图片中的第j个像素点的概率,获取在时刻t时的K个高斯分布的权系数的估计值,K个高斯分布的均值向量和K个协方差矩阵。
其中,该K个高斯分布的权系数的估计值分别为
Figure PCTCN2019074761-appb-000005
该K个高斯分布的均值向量分别为
Figure PCTCN2019074761-appb-000006
该K个协方差矩阵分别为
Figure PCTCN2019074761-appb-000007
Figure PCTCN2019074761-appb-000008
2013:如果计算出的概率大于预设概率阈值,则确定该像素点是待检测图片中处于运动状态的目标图像中的像素点,根据该像素点在待检测图片中的位置,在创建的前景运动图像中填充像素值为1的像素点。
2014:如果计算出的概率小于或等于预设概率阈值,则确定该像素点是待检测图片中处于静止状态的背景图像中的像素点,根据该像素点在待检测图片中的位置,在创建的前景运动图像中填充像素值为0的像素点。对于待检测图片中的每个像素点,按上述方式填充该像素点在创建的前景运动图像对应的像素点,得到待检测图片对应的前景运动图像。
其中,CNN包括多个卷积层,第一个卷积层用于对输入到CNN的待检测图片进行卷积运算处理。CNN中除第一个卷积层之外的其他每个卷积层的输入是其相邻的上一个卷积层的输出,其他每个卷积层用于对其相邻的上一个卷积层输出的结果进行卷积运算处理。
CNN中的每个卷积层输出的结果为待检测图片对应的一张特征图片,对于每个卷积层,该卷积层输出的特征图片的抽象程度大于与该卷积层相邻的上一个卷积层输出的特征图片的抽象程度。
在本步骤中,CNN对待检测图片进行卷积运算处理的过程可以为:将待检测图片输入到CNN中,CNN中的第一个卷积层对待检测图片进行卷积处理,得到待检测图片对应的特征图片,并将该特征图片输入到第二个卷积层。第二个卷积层对该特征图片进行卷积运算处理,仍得到待检测图片对应的一张特征图片,且该特征图片的抽象程度大于第一个卷积层输出的特征图片的抽象程度,将该特征图片输入到第三个卷积层。按上述过程直到CNN的最后一个卷积层输出待检测图片对应的特征图片时为止。
在本步骤中,获取第一目标卷积层输出的待检测图片的特征图片作为第一特征图片,第一目标卷积层是CNN中除第一个卷积层和最后一个卷积层之外的其他一个卷积层。
可选的,可以选择位于CNN中间位置的一个卷积层作为第一目标卷积层,获取第一目标卷积层输出的待检测图片对应的特征图片作为第一特征图片。
可选的,在本步骤中,还可以获取对待检测图片进行卷积处理的第二特征图片,第一特征图片经过的卷积运算次数小于第二特征图片经过的卷积运算次数。
可选的,可以选择位于CNN靠后位置的一个卷积层作为第二目标卷积层,获取第二目标卷积层输出的待检测图片对应的特征图片作为第二特征图片,第二目标卷积层所在的层数大于第一目标卷积层所在的层数。
可选的,所谓CNN靠后位置的一个卷积层,即可以选择CNN中最后N个卷积层中的某个卷积层作为第二目标卷积层。N为预设数值,例如,N可以为数值5、4、3、2或1等值。
可选的,可以选择CNN中的最后一层卷积层作为第二目标卷积层,即将CNN的最后一层卷积层输出的待检测图片对应的特征图片作为第二特征图片。
步骤202:检测第一特征图片中的目标图像得到第一侯选框配置信息集合,第一侯选框 配置信息集合包括至少一个侯选框中的每个侯选框的配置信息。
其中,在第一特征图片中每个侯选框内包括至少一个目标图像,第一特征图片中的目标图像与待检测图片中的目标图像相同。
侯选框的配置信息至少包括侯选框的位置信息和置信得分。侯选框的位置信息可以包括该侯选框的一对对角点的位置,该一对对角点可以是侯选框的任意一个对角线上的两个对角点,对角点的位置可以为该对角点在第一特征图片中的位置;或者,侯选框的位置信息可以包括该侯选框的一个顶点的位置和该侯选框的尺寸,该顶点可以是该侯选框的任一个顶点,该顶点的位置是该顶点在第一特征图片中的位置,该侯选框的尺寸可以包括该侯选框的宽度和高度。
可选的,侯选框可以为矩形框,侯选框的置信得分可以表示侯选框内的目标对象的状态为运动状态的概率。
可选的,在本申请实施例中,预设RPN装置。在本步骤中,可以将第一特征图片输入到RPN装置,通过该RPN装置对第一特征图片进行处理,得到至少一个侯选框中的每个侯选框的配置信息,将每个侯选框的配置信息组成第一侯选框配置信息集合。
其中,需要说明的是:第一特征图片中可能包括至少一个目标图像,目标图像可以为人体图像和/或车辆图像等。RPN装置在接收到输入的第一特征图片时,通过RPN装置的Propoasls层在第一特征图片中添加用于框住目标图像的侯选框,获取该侯选框在第一特征图片中的位置信息并估计出用于表示该目标图像的状态为运动状态的概率的置信得分,从而得到该侯选框的配置信息。
参见图2-2所示的RPN装置的模块图,第一特征图片输入到RPN装置时,RPN装置在第一特征图片中添加滑动窗口,移动滑动窗口的位置以及放大或缩小该滑动窗口的尺寸得到多个不同的滑动窗口,通过卷积层编码出每个滑动窗口的特征向量,通过全连接层根据每个滑动窗口的特征向量输出至少一个侯选框的位置信息和至少一个侯选框的置信得分。
参见图2-3,在第一特征图片中添加滑动窗口后,通过移动滑动窗口以及放大或缩小该滑动窗口,得到多个侯选框以及输出每个侯选框的置信得分。
在本步骤中,得到的侯选框中存在一部分侯选框,该部分侯选框中包括的目标图像是非完整目标图像,且还包括面积较大的背景图像。
步骤203:根据该前景运动图像从第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,得到第二侯选框配置信息集合。
实现本步骤的过滤方法有多种,例如,根据该前景运动图像的积分图对第一侯选框配置信息集合进行过滤;再例如,根据该前景运动图像对第一侯选框配置信息集合进行过滤。对于其他的过滤方法就不再一一列举。
参见图2-4,对上述根据该前景运动图像的积分图对第一侯选框配置信息集合进行过滤的过程,可以通过如下2031至2034的操作来完成,分别为:
2031:根据该前景运动图像,计算该前景运动图像对应的积分图。
首先创建一个尺寸与该前景运动图像的尺寸相等的空白积分图,对于该前景运动图像中的任一个像素点,假设为第M行第N列的像素点,该像素点的积分值可以通过如下公式(2)计算得到,根据该像素点在该前景运动图像中的位置,在创建的积分图中填充该像素点的积 分值,即在创建的积分图的第M行第N列的位置处填充该像素点的积分值。
Figure PCTCN2019074761-appb-000009
在上述公式(2)中,Integral(M,N)为第M行第N列的像素点的积分值,image(i,j)为该前景运动图像中第i行第j列的像素点的像素值。
对于该前景运动图像的其他每个像素点,按上述方式在创建的积分图中填充每个像素点的积分值,得到该前景运动图像对应的积分图。
由于在前景运动图像中,处于运动状态的前景运动图像的像素点的像素值为1,处于静止状态的背景图像的像素点的像素值为0,所以第M行第N列的像素点的积分值可以等于该前景运动图像中的一个图像区域中的前景运动图像的面积,该图像区域包括该前景运动图像中的第一行第一列的像素点和第M行第N列的像素点,且该图像区域的尺寸为M×N。
接下来可以根据该积分图从第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,详细实现过程可以包括如下2032至2034的操作。
2032:根据目标侯选框的配置信息,获取目标侯选框在积分图中对应的积分图区域,目标侯选框的配置信息为第一侯选框配置信息集合中的任一个侯选框的配置信息。
可选的,可以根据目标侯选框的位置信息,获取目标侯选框在积分图中对应的积分图区域。
当目标侯选框的位置信息包括目标侯选框的一对对角点的位置时,根据该一对对角点中的每个对角点的位置,在积分图中获取目标侯选框对应的积分图区域。
当目标侯选框的位置信息包括目标侯选框的一个顶点的位置和目标侯选框的尺寸时,根据该一个顶点的位置和该尺寸,在积分图中获取目标侯选框对应的积分图区域。
例如,参见图2-5,假设目标侯选框的位置信息包括目标侯选框的一对对角点的位置,其中一个对角点的位置为第i 1行第j 1列,另一个对角点的位置为第i 2行第j 2列。根据该两个对角点的位置在图2-5所示的积分图中获取目标侯选框对应的积分图区域。
2033:根据该积分图区域计算位于目标侯选框内的目标图像面积与目标侯选框的面积之间的比值。
可选的,对于本步骤的实现,可以获取位于该积分图区域的四个顶点位置的像素点的积分值;根据获取的各像素点的积分值,计算位于目标侯选框内的目标图像面积。
其中,参见图2-5,积分图区域的左上顶点位置的像素点为第i 1行第j 1列的像素点,右下顶点位置的像素点为第i 2行第j 2列的像素点,左下顶点位置的像素点为第i 2行第j 1列,右上顶点位置的像素点为第i 1行第j 2列。根据该四个像素点的积分值,按如下公式(3)计算出位于目标侯选框内的目标图像面积Area;
Area=Integral(i 2,j 2)-Integral(i 1,j 2)-Integral(i 2,j 1)+Integral(i 1,j 1)……(3);
在上述公式(3)中,Integral(i 2,j 2)为第i 2行第j 2列的像素点的积分值,Integral(i 1,j 2)为第i 1行第j 2列的像素点的积分值,Integral(i 2,j 1)为第i 2行第j 1列的像素点的积分值,Integral(i 1,j 1)为第i 1行第j 1列的像素点的积分值。
以及,根据该目标侯选框的配置信息,计算该目标侯选框的面积;计算目标图像面积与 目标侯选框的面积之间的比值。
可选的,可以根据目标侯选框的位置信息,计算该目标侯选框的面积。
当目标侯选框的位置信息包括目标侯选框的一对对角点的位置时,根据该一对对角点中的每个对角点的位置,计算该目标侯选框的面积。
当目标侯选框的位置信息包括目标侯选框的一个顶点的位置和目标侯选框的尺寸时,根据该尺寸,计算该目标侯选框的面积。
在本步骤中,只需要根据四个顶点位置的像素点的积分值,便可以计算目标图像面积,所需要的计算量较小,从而可以减小过滤操作所需要的计算量,提高了过滤速度,进而提高检测目标图像的效率。
2034:在该比值小于预设比值阈值时,从第一侯选框配置信息集合中过滤目标侯选框的配置信息。
在该比值大于或等于预设比值阈值时,在第一侯选框配置信息集合中保留目标侯选框的位置信息。
参见图2-6,对上述根据该前景运动图像对第一侯选框配置信息集合进行过滤的过程,可以通过如下2131至2134的操作来完成,分别为:
2131:根据目标侯选框的配置信息,获取目标侯选框在前景运动图像中对应的图像区域,目标侯选框的配置信息为第一侯选框配置信息集合中的任一个侯选框的配置信息。
可选的,可以根据目标侯选框的位置信息,获取目标侯选框在前景运动图像中对应的图像区域。
当目标侯选框的位置信息包括目标侯选框的一对对角点的位置时,根据该一对对角点中的每个对角点的位置,获取目标侯选框在前景运动图像中对应的图像区域。
当目标侯选框的位置信息包括目标侯选框的一个顶点的位置和目标侯选框的尺寸时,根据该一个顶点的位置和该尺寸,获取目标侯选框在前景运动图像中对应的图像区域。
接下来,可以根据该图像区域计算位于目标侯选框内的目标图像面积与目标侯选框的面积之间的比值,实现过程可以包括如下2132至2134的操作。
2132:统计该图像区域中属于目标图像的像素点数目和该图像区域的总像素点数目。
前景运动图像中的像素点分为两类,一类像素点属于处于运动状态的目标图像,且属于该类的每个像素点的像素值为1,另一类像素点属于处于静止状态的背景图像,且属于另一类的每个像素点的像素值为0。
可选的,可以统计该图像区域内像素值为1的像素点数目,得到该图像区域中属于目标图像的像素点数目。
2133:计算该像素点数目与该总像素点数目之间的比值,得到位于目标侯选框内的目标图像面积与目标侯选框的面积之间的比值。
2134:在该比值小于预设比值阈值时,从第一侯选框配置信息集合中过滤目标侯选框的配置信息。
在该比值大于或等于预设比值阈值时,在第一侯选框配置信息集合中保留目标侯选框的位置信息。
204:根据第二侯选框配置信息集合,在待检测图片中添加检测框,该检测框中包括待检 测图片中的至少一个目标图像。
在本实施例中,可以根据第二侯选框配置信息集合中的每个侯选框的置信得分,对第二侯选框配置信息集合中的每个侯选框的配置信息进行排序,得到第一配置信息序列。
可选的,可以从第一配置信息序列中选择置信得分最大的预设数值个侯选框的配置信息,根据选择的每个侯选框的位置信息在待检测图片中添加每个侯选框的检测框,侯选框和该侯选框的检测框的大小相等。
可选的,还可以对第一配置信息序列进行非极大值抑制操作,得到第二配置信息序列,第二配置信息序列中的侯选框配置信息的数目小于或等于第一配置信息序列中的侯选框配置信息的数目。可以从第二配置信息序列中选择置信得分最大的预设数值个侯选框的配置信息,根据选择的每个侯选框的位置信息在待检测图片中添加每个侯选框的检测框,侯选框和该侯选框的检测框的大小相等。
所谓非极大值抑制操作就是从第一配置信息序列中识别出重叠面积超过预设阈值的任意两个侯选框,从该两个侯选框中过滤掉其中一个侯选框的配置信息,或者,将该两个侯选框合成为一个侯选框,并得到合成后的侯选框的配置信息。
其中,由于在步骤203中,从第一侯选框配置信息集合中过滤掉大量的侯选框配置信息,所以在对第一配置信息序列中的侯选框的配置信息进行非极大值抑制操作时,可以减小需要操作处理的侯选框配置信息的数目,从而提高了操作处理的效率,进一步提高了检测目标图像的效率。
可选的,在待检测图片添加检测框时,还可以添加该检测框中的目标图像的类型,在实现时可以,根据第二特征图片和选择的每个侯选框的配置信息,在待检测图片中添加检测框和该检测框内的目标图像的类型。
在实现时,可以将第二特征图片和选择的每个侯选框的配置信息输入到Fast Rcnn装置的感兴趣区域(Region of Interest,RoI)池化层,通过Fast Rcnn装置的RoI池化层输出选择的每个侯选框内的目标图像类型,以及根据每个侯选框的位置信息,在待检测图片中添加检测框和该检测框内的目标图像的类型。
可以对每一帧图片执行上述的处理过程,从而实现在每一帧图片中添加检测框和检测框中的目标图像的类型。
参见图2-7所示的Fast Rcnn装置的模块图,Fast Rcnn装置包括共享卷积层、特有卷积层、RoI池化层和全连接层,待检测图片经过共享卷积层和特有卷积层处理后与第二特征图片和第二侯选框配置信息集合输入到RoI池化层,再经过RoI池化层和全连接层的处理后输出检测框和各检测框中的目标图像的类型。
参见图2-8,通过上述流程,可以得出本申请实施例应用于如下软件***,该软件***可以被设备执行,以执行上述方法的流程,该设备可以为图1所示的实施例中的摄像设备或服务器等,该软件***可以包括过滤装置、混合高斯模型装置、RPN装置和Fast Rcnn装置。
待检测图片分别输入到Fast Rcnn装置中的CNN和混合高斯模型装置,Fast Rcnn装置中的CNN向RPN装置输入第一特征图片;混合高斯模型装置向过滤装置输出前景运动图像,RPN装置再向过滤模块输入第一侯选框配置信息集合;过滤装置通过上述步骤203的操作得到第二侯选框配置信息集合,向Fast Rcnn装置输入第二侯选框配置信息集合,Fast Rcnn 装置在待检测图片中添加检测框和检测框中的目标图像的类型。
在本申请实施例中,获取待检测图片的前景运动图像,根据该前景运动图像,从第一侯选框配置信息集合中过滤掉包括非完整目标对象的侯选框的配置信息,得到第二侯选框配置信息集合,根据第二侯选框配置信息集合在待检测图片中添加检测框,提高了检测精度。由于在第二侯选框配置信息集合中过滤掉大量侯选框的配置信息,这样在对第二侯选框配置信息集合进行非极大值抑制操作时,减小了需要操作处理的侯选框的配置信息的数目,从而减小了计算量,提高了处理速度,进而提高了检测效率。
参见图3-1,本申请实施例提供了一种检测目标图像的装置300,所述装置300可用于实现图2-1所示的实施例,还可以实现图1所示实施例中的服务器或摄像设备的功能,包括:
获取单元301,用于获取待检测图片对应的前景运动图像以及获取对待检测图片进行卷积运算得到的第一特征图片,该前景运动图像包括待检测图片中处于运动状态的目标图像和除该目标图像外的背景图像;
检测单元302,还用于检测第一特征图片中的目标图像得到第一侯选框配置信息集合,第一侯选框配置信息集合包括至少一个侯选框中的每个侯选框的配置信息,在第一特征图片中每个侯选框内包括至少一个目标图像,第一特征图片中的目标图像与待检测图片中的目标图像相同;
过滤单元303,还用于根据该前景运动图像从第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,得到第二侯选框配置信息集合;
添加单元304,还用于根据第二侯选框配置信息集合,在待检测图片中添加检测框,该检测框中包括待检测图片中的至少一个目标图像。
可选的,参见图3-2,该装置300还包括:收发单元305和存储单元306中的至少一个;
其中,待检测图片可以为收发单元305接收的图片,或者,待检测图片可以为存储单元306中存储的图片。
可选的,参见图3-3,当该装置300用于实现摄像设备的功能时,该装置300还可以包括摄像单元307,该摄像单元307可以为摄像头等,待检测图片可以为摄像单元307拍摄得到的图片。该装置300还可以包括收发单元305和/或存储单元306,该收发单元305可以用于发送摄像单元307拍摄的图片,该存储单元306可以用于存储摄像单元307拍摄的图片。
可选的,当该装置300用于实现服务器的功能时,该装置300可以包括收发单元305和/或存储单元306。
可选的,获取单元301,用于通过对待检测图片进行混合高斯背景建模,得到待检测图片对应的前景运动图像。
可选的,过滤单元303,用于:
根据该前景运动图像,计算该前景运动图像对应的积分图;
根据该积分图从第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息。
可选的,过滤单元303,用于
根据目标侯选框的配置信息,获取目标侯选框在该积分图中对应的积分图区域,目标侯选框的配置信息为第一侯选框配置信息集合中的任一个侯选框的配置信息;
根据该积分图区域计算位于目标侯选框内的目标图像面积与目标侯选框的面积之间的比值;
在该比值小于预设比值阈值时,从第一侯选框配置信息集合中过滤目标侯选框的配置信息。
可选的,过滤单元303,用于:
获取位于该积分图区域的四个顶点位置的像素点的积分值;
根据获取的各像素点的积分值,计算位于目标侯选框内的目标图像面积;
根据目标侯选框的配置信息,计算目标侯选框的面积;
计算目标图像面积与目标侯选框的面积之间的比值。
可选的,过滤单元303,用于:
根据目标侯选框的配置信息,获取目标侯选框在该前景运动图像中对应的图像区域,目标侯选框的配置信息为第一侯选框配置信息集合中的任一个侯选框的配置信息;
根据该图像区域计算位于目标侯选框内的目标图像面积与目标侯选框的面积之间的比值;
在该比值小于预设比值阈值时,从第一侯选框配置信息集合中过滤目标侯选框的配置信息。
可选的,过滤单元303,用于:
统计该图像区域中属于目标图像的像素点数目和该图像区域的总像素点数目;
计算该像素点数目与该总像素点数目之间的比值,得到位于目标侯选框内的目标图像面积与目标侯选框的面积之间的比值。
可选的,添加单元304,用于:
获取对待检测图片进行卷积运算得到的第二特征图片,对第一特征图片进行卷积运算的次数小于对第二特征图片进行卷积运算的次数;
根据第二特征图片和第二侯选框配置信息集合,在待检测图片中添加检测框和该检测框内的目标图像的类型。
在本申请实施例中,由于获取待检测图片的前景运动图像,这样根据该前景运动图像,从第一侯选框配置信息集合中过滤掉包括非完整目标对象的侯选框的配置信息,得到第二侯选框配置信息集合,根据第二侯选框配置信息集合在待检测图片中添加检测框,可以提高检测精度。
参见图4,图4所示为本申请实施例提供的一种检测目标图像的装置400示意图。该装置400包括至少一个处理器401,总线***402,存储器403以及至少一个收发器404。
该装置400是一种硬件结构的装置,可以用于实现图3-1所述的装置中的功能模块。例如,本领域技术人员可以想到图3-1所示的装置300中的获取单元301、检测单元302、过滤单元303和/或添加单元304可以通过该至少一个处理器401调用存储器403中的代码来实现,图3-1所示的装置300中的收发单元305可以通过该至少一个收发器404来实现。
可选的,该装置400还可用于实现如图1所述的实施例中摄像设备的功能,或者实现图1所示的实施例中服务器的功能。
该装置400用于摄像设备的功能时,该装置400还可以包括摄像头407,图3-1所示的装置300中的摄像单元307可以通过该摄像头407来实现。
可选的,上述处理器401可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
上述总线***402可包括一通路,在上述组件之间传送信息。
上述收发器404,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。
上述存储器403可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器403用于存储执行本申请方案的应用程序代码,并由处理器401来控制执行。处理器401用于执行存储器403中存储的应用程序代码,从而实现本专利方法中的功能。
在具体实现中,作为一种实施例,处理器401可以包括一个或多个CPU,例如图4中的CPU0和CPU1。
在具体实现中,作为一种实施例,该装置400可以包括多个处理器,例如图4中的处理器401和处理器408。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,当该装置400用于实现服务器的功能时,该装置400还可以包括输出设备405和输入设备406。输出设备405和处理器401通信,可以以多种方式来显示信息。例如,输出设备405可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备406和处理器401通信,可以以多种方式接受用户的输入。例如,输入设备406可以是鼠标、键盘、触摸屏设备或传感设备等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种检测目标图像的方法,其特征在于,所述方法包括:
    获取待检测图片对应的前景运动图像以及获取对所述待检测图片进行卷积运算得到的第一特征图片,所述前景运动图像包括所述待检测图片中处于运动状态的目标图像和除所述目标图像外的背景图像;
    检测所述第一特征图片中的目标图像得到第一侯选框配置信息集合,所述第一侯选框配置信息集合包括至少一个侯选框中的每个侯选框的配置信息,在所述第一特征图片中所述每个侯选框内包括至少一个目标图像,所述第一特征图片中的目标图像与所述待检测图片中的目标图像相同;
    根据所述前景运动图像从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,得到第二侯选框配置信息集合;
    根据所述第二侯选框配置信息集合,在所述待检测图片中添加检测框,所述检测框中包括所述待检测图片中的至少一个目标图像。
  2. 如权利要求1所述的方法,其特征在于,所述获取待检测图片对应的前景运动图像,包括:
    通过对所述待检测图片进行混合高斯背景建模,得到所述待检测图片对应的前景运动图像。
  3. 如权利要求1或2所述的方法,其特征在于,所述根据所述前景运动图像从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,包括:
    根据所述前景运动图像,计算所述前景运动图像对应的积分图;
    根据所述积分图从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息。
  4. 如权利要求3所述的方法,其特征在于,所述根据所述积分图从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,包括:
    根据目标侯选框的配置信息,获取所述目标侯选框在所述积分图中对应的积分图区域,所述目标侯选框的配置信息为所述第一侯选框配置信息集合中的任一个侯选框的配置信息;
    根据所述积分图区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值;
    在所述比值小于预设比值阈值时,从所述第一侯选框配置信息集合中过滤所述目标侯选框的配置信息。
  5. 如权利要求4所述的方法,其特征在于,所述根据所述积分图区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值,包括:
    获取位于所述积分图区域的四个顶点位置的像素点的积分值;
    根据所述获取的各像素点的积分值,计算位于所述目标侯选框内的目标图像面积;
    根据所述目标侯选框的配置信息,计算所述目标侯选框的面积;
    计算所述目标图像面积与所述目标侯选框的面积之间的比值。
  6. 如权利要求1或2所述的方法,其特征在于,所述根据所述前景运动图像从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,包括:
    根据目标侯选框的配置信息,获取所述目标侯选框在所述前景运动图像中对应的图像区域,所述目标侯选框的配置信息为所述第一侯选框配置信息集合中的任一个侯选框的配置信息;
    根据所述图像区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值;
    在所述比值小于预设比值阈值时,从所述第一侯选框配置信息集合中过滤所述目标侯选框的配置信息。
  7. 如权利要求6所述的方法,其特征在于,所述根据所述图像区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值,包括:
    统计所述图像区域中属于目标图像的像素点数目和所述图像区域的总像素点数目;
    计算所述像素点数目与所述总像素点数目之间的比值,得到位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值。
  8. 如权利要求1至7任一项所述的方法,其特征在于,所述根据所述第二侯选框配置信息集合,在所述待检测图片中添加检测框,包括:
    获取对所述待检测图片进行卷积运算得到的第二特征图片,对所述第一特征图片进行卷积运算的次数小于对所述第二特征图片进行卷积运算的次数;
    根据所述第二特征图片和所述第二侯选框配置信息集合,在所述待检测图片中添加检测框和所述检测框内的目标图像的类型。
  9. 一种检测目标图像的装置,其特征在于,所述装置包括:
    获取单元,用于获取待检测图片对应的前景运动图像以及获取对所述待检测图片进行卷积运算得到的第一特征图片,所述前景运动图像包括所述待检测图片中处于运动状态的目标图像和除所述目标图像外的背景图像;
    检测单元,还用于检测所述第一特征图片中的目标图像得到第一侯选框配置信息集合,所述第一侯选框配置信息集合包括至少一个侯选框中的每个侯选框的配置信息,在所述第一特征图片中所述每个侯选框内包括至少一个目标图像,所述第一特征图片中的目标图像与所述待检测图片中的目标图像相同;
    过滤单元,还用于根据所述前景运动图像从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息,得到第二侯选框配置信息集合;
    添加单元,还用于根据所述第二侯选框配置信息集合,在所述待检测图片中添加检测框,所述检测框中包括所述待检测图片中的至少一个目标图像。
  10. 如权利要求9所述的装置,其特征在于,
    所述获取单元,用于通过对所述待检测图片进行混合高斯背景建模,得到所述待检测图片对应的前景运动图像。
  11. 如权利要求9或10所述的装置,其特征在于,所述过滤单元,用于:
    根据所述前景运动图像,计算所述前景运动图像对应的积分图;
    根据所述积分图从所述第一侯选框配置信息集合中过滤包括的目标图像为非完整目标图像的侯选框的配置信息。
  12. 如权利要求11所述的装置,其特征在于,所述过滤单元,用于:
    根据目标侯选框的配置信息,获取所述目标侯选框在所述积分图中对应的积分图区域,所述目标侯选框的配置信息为所述第一侯选框配置信息集合中的任一个侯选框的配置信息;
    根据所述积分图区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值;
    在所述比值小于预设比值阈值时,从所述第一侯选框配置信息集合中过滤所述目标侯选框的配置信息。
  13. 如权利要求12所述的装置,其特征在于,所述过滤单元,用于:
    获取位于所述积分图区域的四个顶点位置的像素点的积分值;
    根据所述获取的各像素点的积分值,计算位于所述目标侯选框内的目标图像面积;
    根据所述目标侯选框的配置信息,计算所述目标侯选框的面积;
    计算所述目标图像面积与所述目标侯选框的面积之间的比值。
  14. 如权利要求9或10所述的装置,其特征在于,所述过滤单元,用于:
    根据目标侯选框的配置信息,获取所述目标侯选框在所述前景运动图像中对应的图像区域,所述目标侯选框的配置信息为所述第一侯选框配置信息集合中的任一个侯选框的配置信息;
    根据所述图像区域计算位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值;
    在所述比值小于预设比值阈值时,从所述第一侯选框配置信息集合中过滤所述目标侯选框的配置信息。
  15. 如权利要求14所述的装置,其特征在于,所述过滤单元,用于:
    统计所述图像区域中属于目标图像的像素点数目和所述图像区域的总像素点数目;
    计算所述像素点数目与所述总像素点数目之间的比值,得到位于所述目标侯选框内的目标图像面积与所述目标侯选框的面积之间的比值。
  16. 如权利要求9至15任一项所述的装置,其特征在于,所述添加单元,用于:
    获取对所述待检测图片进行卷积运算得到的第二特征图片,对所述第一特征图片进行卷积运算的次数小于对所述第二特征图片进行卷积运算的次数;
    根据所述第二特征图片和所述第二侯选框配置信息集合,在所述待检测图片中添加检测框和所述检测框内的目标图像的类型。
  17. 一种检测目标图像的装置,其特征在于,所述装置包括:
    至少一个处理器;和
    至少一个存储器;
    所述至少一个存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述至少一个处理器执行,所述一个或多个程序包含用于进行如权利要求1至8任一项权利要求所述的方法的指令。
  18. 一种非易失性计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序通过处理器进行加载来执行如权利要求1至8任一项权利要求所述的方法的指令。
PCT/CN2019/074761 2018-03-27 2019-02-11 一种检测目标图像的方法及装置 WO2019184604A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810258574.7 2018-03-27
CN201810258574.7A CN110310301B (zh) 2018-03-27 2018-03-27 一种检测目标对象的方法及装置

Publications (1)

Publication Number Publication Date
WO2019184604A1 true WO2019184604A1 (zh) 2019-10-03

Family

ID=68062170

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/074761 WO2019184604A1 (zh) 2018-03-27 2019-02-11 一种检测目标图像的方法及装置

Country Status (2)

Country Link
CN (1) CN110310301B (zh)
WO (1) WO2019184604A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836985A (zh) * 2020-06-24 2021-12-24 富士通株式会社 图像处理装置、图像处理方法和计算机可读存储介质
CN114511694A (zh) * 2022-01-28 2022-05-17 北京百度网讯科技有限公司 图像识别方法、装置、电子设备和介质
CN114550062A (zh) * 2022-02-25 2022-05-27 京东科技信息技术有限公司 图像中运动对象的确定方法、装置、电子设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852321B (zh) * 2019-11-11 2022-11-22 北京百度网讯科技有限公司 候选框过滤方法、装置以及电子设备
CN111815570A (zh) * 2020-06-16 2020-10-23 浙江大华技术股份有限公司 区域入侵检测方法及其相关装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557778A (zh) * 2016-06-17 2017-04-05 北京市商汤科技开发有限公司 通用物体检测方法和装置、数据处理装置和终端设备
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN107133974A (zh) * 2017-06-02 2017-09-05 南京大学 高斯背景建模与循环神经网络相结合的车型分类方法
CN107590489A (zh) * 2017-09-28 2018-01-16 国家新闻出版广电总局广播科学研究院 基于级联卷积神经网络的目标检测方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10002313B2 (en) * 2015-12-15 2018-06-19 Sighthound, Inc. Deeply learned convolutional neural networks (CNNS) for object localization and classification
CN106709447A (zh) * 2016-12-21 2017-05-24 华南理工大学 基于目标定位与特征融合的视频中异常行为检测方法
CN107256225B (zh) * 2017-04-28 2020-09-01 济南中维世纪科技有限公司 一种基于视频分析的热度图生成方法及装置
CN107833213B (zh) * 2017-11-02 2020-09-22 哈尔滨工业大学 一种基于伪真值自适应法的弱监督物体检测方法
CN107730553B (zh) * 2017-11-02 2020-09-15 哈尔滨工业大学 一种基于伪真值搜寻法的弱监督物体检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN106557778A (zh) * 2016-06-17 2017-04-05 北京市商汤科技开发有限公司 通用物体检测方法和装置、数据处理装置和终端设备
CN107133974A (zh) * 2017-06-02 2017-09-05 南京大学 高斯背景建模与循环神经网络相结合的车型分类方法
CN107590489A (zh) * 2017-09-28 2018-01-16 国家新闻出版广电总局广播科学研究院 基于级联卷积神经网络的目标检测方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836985A (zh) * 2020-06-24 2021-12-24 富士通株式会社 图像处理装置、图像处理方法和计算机可读存储介质
CN114511694A (zh) * 2022-01-28 2022-05-17 北京百度网讯科技有限公司 图像识别方法、装置、电子设备和介质
CN114511694B (zh) * 2022-01-28 2023-05-12 北京百度网讯科技有限公司 图像识别方法、装置、电子设备和介质
CN114550062A (zh) * 2022-02-25 2022-05-27 京东科技信息技术有限公司 图像中运动对象的确定方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN110310301B (zh) 2021-07-16
CN110310301A (zh) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2019184604A1 (zh) 一种检测目标图像的方法及装置
KR102319177B1 (ko) 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체
CN109559320B (zh) 基于空洞卷积深度神经网络实现视觉slam语义建图功能的方法及***
WO2018121013A1 (en) Systems and methods for detecting objects in images
CN113286194A (zh) 视频处理方法、装置、电子设备及可读存储介质
WO2020134528A1 (zh) 目标检测方法及相关产品
US20210117704A1 (en) Obstacle detection method, intelligent driving control method, electronic device, and non-transitory computer-readable storage medium
CN111126399A (zh) 一种图像检测方法、装置、设备及可读存储介质
CN111091123A (zh) 文本区域检测方法及设备
US11367195B2 (en) Image segmentation method, image segmentation apparatus, image segmentation device
WO2021147113A1 (zh) 一种平面语义类别的识别方法以及图像数据处理装置
US20220414908A1 (en) Image processing method
US20160335775A1 (en) Visual navigation method, visual navigation device and robot
CN107451976A (zh) 一种图像处理方法及装置
US20160232420A1 (en) Method and apparatus for processing signal data
CN116310688A (zh) 基于级联融合的目标检测模型及其构建方法、装置及应用
CN116051736A (zh) 一种三维重建方法、装置、边缘设备和存储介质
WO2021196925A1 (zh) 移动物体检测和跟踪方法以及装置
CN110188607B (zh) 一种多线程并行计算的交通视频目标检测方法及装置
CN106506932A (zh) 图像的获取方法及装置
CN117612153A (zh) 基于图像与点云信息补全的三维目标识别与定位方法
CN116883981A (zh) 一种车牌定位识别方法、***、计算机设备及存储介质
CN115205793B (zh) 基于深度学习二次确认的电力机房烟雾检测方法及装置
CN116798015A (zh) 交通信息提取方法、装置、终端设备及存储介质
CN111489384B (zh) 基于互视角的遮挡评估方法及装置、设备、***和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19774687

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19774687

Country of ref document: EP

Kind code of ref document: A1