WO2022000862A1 - Method and apparatus for detecting object in fisheye image, and storage medium - Google Patents

Method and apparatus for detecting object in fisheye image, and storage medium Download PDF

Info

Publication number
WO2022000862A1
WO2022000862A1 PCT/CN2020/121513 CN2020121513W WO2022000862A1 WO 2022000862 A1 WO2022000862 A1 WO 2022000862A1 CN 2020121513 W CN2020121513 W CN 2020121513W WO 2022000862 A1 WO2022000862 A1 WO 2022000862A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
detection
cropping
fisheye
Prior art date
Application number
PCT/CN2020/121513
Other languages
French (fr)
Chinese (zh)
Inventor
王程
毛晓蛟
章勇
曹李军
Original Assignee
苏州科达科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州科达科技股份有限公司 filed Critical 苏州科达科技股份有限公司
Publication of WO2022000862A1 publication Critical patent/WO2022000862A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/047Fisheye or wide-angle transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to an object detection method, device and storage medium in a fisheye image, belonging to the technical field of image processing.
  • Object detection eg, face detection
  • Object detection methods based on deep learning have gradually become mainstream.
  • the objects in fisheye images not only have the characteristics of objects in conventional images, but also have the characteristics of any angle in the plane.
  • Existing methods for object detection include: object detection for objects in images based on a Single Shot MultiBox Detector (SSD).
  • SSD Single Shot MultiBox Detector
  • the accuracy and recall rate of object detection directly on fisheye images based on the single-step detector are relatively low, and the angle information of the object cannot be obtained, which is not conducive to further use of the object information for the construction of upper-layer applications, such as: based on object information Perform object recognition, object tracking, etc.
  • the present application provides an object detection method, device and storage medium in a fisheye image, which can solve the problem that the existing object detection model cannot detect the fisheye image.
  • This application provides the following technical solutions:
  • an object detection method in a fisheye image comprising:
  • the fisheye image includes a plurality of object regions with different angles in the plane, and the angle of the object region refers to the angle of the object in the object region relative to the center of the fisheye image;
  • the object detection frame is mapped back to the fisheye image according to the corresponding cropping angle to obtain an object detection result.
  • the method before the object detection is performed on the stitched image by using an object detection model to obtain an object detection frame, the method further includes:
  • the training data includes a plurality of object images with different sizes and an object labeling frame corresponding to each object image;
  • the neural network structure includes a feature detection network and a single-step detection network, the feature detection network is used for extracting object features, and the single-step detection network is used for determining object anchors based on each object feature frame;
  • the neural network structure is trained to obtain the object detection model.
  • the feature detection network includes a first-stage feature pyramid and a second-stage feature pyramid;
  • the first-stage feature pyramid is used for bottom-up feature extraction of the input object image to obtain a multi-layer feature map
  • the second-stage feature pyramid is used to extract features from the input feature map from top to bottom, and combine the extracted features with the feature maps of the corresponding layers of the first-stage feature pyramid to obtain a multi-layer feature map.
  • performing sample matching on the multiple object anchor frames and corresponding object annotation frames to obtain target object anchor frames including:
  • For each object annotation frame determine the object anchor frame with the highest intersection ratio with the object annotation frame as the target object anchor frame matched with the object annotation frame;
  • the object anchor frame is determined as the target object anchor frame of the corresponding object annotation frame; the n is a positive integer;
  • the first threshold is greater than the second threshold.
  • the anchor frame size of the object anchor frame is determined based on the step size of the feature map to which the object anchor frame belongs relative to the original image, and the feature map is an image output by the feature detection network.
  • the obtaining training data includes:
  • the original object image includes an object annotation frame
  • the augmentation processing includes at least one of the following manners: randomly augmenting the original object image; randomly cropping the original object image; randomly cropping the augmented object image; Horizontal flipping is performed on the original object image, the randomly augmented object image, and/or the randomly cropped object image.
  • the loss function includes a cross loss function and a smoothL1 loss function
  • the crossover loss function is represented by:
  • f is the confidence level of the object output by the neural network structure
  • y is the category of the object
  • y is 1 to indicate that it is an object
  • y is 0 to indicate that it is not an object
  • the smoothL1 loss function is represented by:
  • the x is the difference between the target object anchor frame and the corresponding object annotation result.
  • mapping the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain an object detection result including:
  • mapping the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain an object detection result including:
  • the object detection frame with the largest area is mapped back to the fisheye image to obtain the object detection result.
  • the multiple object regions are distributed with the center of the circle as the center point, and the fisheye image is image-cropped according to multiple cropping angles based on the center of the fisheye image to obtain a cropped image, including:
  • Image cropping is performed on the fisheye image according to the cropping area and the rotated cropping area to obtain the cropped image.
  • an object detection device in a fisheye image comprising:
  • the image acquisition module is used to acquire a fisheye image
  • the fisheye image includes a plurality of object areas with different angles in the plane, and the angle of the object area refers to the distance of the object in the object area relative to the center of the fisheye image. angle;
  • an image cropping module configured to perform image cropping on the fisheye image according to a plurality of cropping angles according to the center of the fisheye image to obtain a cropped image; the cropping angles include angles corresponding to the plurality of object regions;
  • an image stitching module for stitching the cropped images to obtain a stitched image
  • an object detection module configured to perform object detection on the spliced image by using an object detection model to obtain an object detection frame
  • the result mapping module is used for mapping the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain the object detection result.
  • a third aspect provides an object detection device in a fisheye image, the device includes a processor and a memory; the memory stores a program, and the program is loaded and executed by the processor to implement the first aspect Object detection methods in fisheye images.
  • a computer-readable storage medium is provided, and a program is stored in the storage medium, and the program is loaded and executed by the processor to implement the method for detecting an object in a fisheye image according to the first aspect.
  • the beneficial effects of the present application are: by acquiring a fisheye image, the fisheye image includes a plurality of object regions with different angles in the plane; image cropping is performed on the fisheye image according to a plurality of cropping angles based on the center of the fisheye image to obtain a cropped image;
  • the cropping angle includes angles corresponding to multiple object areas; stitch the cropped images to obtain a stitched image; use an object detection model to perform object detection on the stitched image to obtain an object detection frame; map the object detection frame back to the fisheye according to the corresponding cropping angle image to obtain the object detection result; it can solve the problem that the existing object detection model cannot detect the fisheye image; since the direction of the object in the spliced image obtained by splicing the cropped image is positive, the object detection model The object detection result can be obtained, and the angle of the object can be obtained through the cropping angle, so that the detection of the object position and the object angle in the fisheye image can be realized.
  • FIG. 1 is a schematic diagram of feature extraction of a feature pyramid network provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of an object detection method in a fisheye image provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a fisheye image provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an acquisition process of a cropped image provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a stitched image provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a training neural network structure provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of an object detection apparatus in a fisheye image provided by an embodiment of the present application.
  • FIG. 8 is a block diagram of an apparatus for detecting objects in a fisheye image provided by an embodiment of the present application.
  • Single Shot MultiBox Detector Single Shot refers to target localization and classification in a single forward pass of the network; MultiBox refers to bounding box regression techniques; detection The Detector classifies the detected objects.
  • the original architecture of SSD is based on the VGG-16 architecture, with only some adjustments to VGG-16, such as using an auxiliary convolutional layer above the Conv6 layer instead of a fully connected layer.
  • Features can be extracted at multiple scales by using auxiliary convolutional layers and gradually reduce the size of each subsequent layer.
  • FPN Feature Pyramid Networks for Object Detection
  • FPN contains a bottom-up and a top-down path.
  • the bottom-up path is a common feature extraction process using convolutional networks. On the way up, the spatial resolution decreases. After detecting higher-dimensional structures, the semantic information of each layer is augmented.
  • the top-down path builds a semantically rich layer into a higher-resolution layer. Since the reconstructed layer has strong semantic information but is down-sampled and up-sampled, the localization of objects is not very accurate. We concatenate the reconstruction layer and the associated feature map side for more precise localization.
  • the FPN includes a bottom-up path 11 and a top-down path 12 .
  • P3, P4, P5, P6 and P7 are feature maps for object detection.
  • the bottom-up feature extraction process is as follows:
  • the top-down feature extraction process is:
  • DownSample is downsampling; is the fused feature map of the top-down feature pyramid.
  • the feature pyramid is 7 layers as an example for illustration.
  • the feature pyramid may have more or less layers, and this application does not limit the number of layers of the feature pyramid.
  • the features extracted from the first layer and the second layer in the feature pyramid are shallow, not enough semantic information can be extracted, and the feature maps of the first layer and the second layer are large, which will introduce a larger amount of calculation. 1, the features of the first layer and the second layer are not extracted. In actual implementation, the features of the first layer and the second layer can also be extracted. This application does not limit the process of feature extraction.
  • Non-Maximum Suppression refers to the suppression of elements that are not maximal.
  • the non-maximum suppression algorithm can be understood as a local maximum search.
  • the principle of the non-maximum suppression algorithm includes: taking 6 detection boxes (or rectangular boxes) corresponding to the same object as an example, and sorting the 6 detection boxes according to the classification probability of the classifier. Assuming that the probabilities from small to large are A, B, C, D, E, and F, respectively, the non-maximum suppression algorithm includes at least the following steps:
  • Fisheye image refers to the image captured by the fisheye lens.
  • a fisheye lens is an extreme wide-angle lens with a front lens that is short in diameter and protrudes parabolically toward the front of the lens, similar to a fish's eye. Since the shorter the focal length, the larger the viewing angle, the stronger the distortion caused by the optical principle. In order to achieve a large viewing angle of 360 degrees, the fisheye image collected by the fisheye lens has barrel distortion. That is, in the fisheye image, except for the object in the center of the picture, the other objects that should be horizontal or vertical are distorted accordingly.
  • the present application takes the execution subject of each embodiment as an electronic device as an example for description.
  • the electronic device may be a terminal or a server or other device with image processing capabilities, and the terminal may be a mobile phone, a computer, a tablet computer, or a video conference.
  • a terminal, etc., the type of the electronic device is not limited in this embodiment.
  • the application scenarios of the object detection method in the fisheye image provided by this application include but are not limited to the following scenarios:
  • the fisheye image includes multiple face regions corresponding to the participants, and the object detection method is used to detect the face in the fisheye image;
  • the fisheye image includes vehicle areas corresponding to multiple vehicles, and the object detection method is used to detect vehicles in the fisheye image;
  • the fisheye image includes personnel areas corresponding to multiple persons, and the object detection method is used to detect persons in the fisheye image.
  • the object detection method in the fisheye image proposed in this application can also be used in other scenes.
  • the fisheye image corresponding to the scene has multiple object regions with different angles, and the objects corresponding to the object regions can be people, vehicles, animals, Obstacles, etc., this embodiment does not limit the type of the object and the application scenario of the object detection method.
  • FIG. 2 is a flowchart of an object detection method in a fisheye image provided by an embodiment of the present application. The method includes at least the following steps:
  • Step 201 Obtain a fisheye image, where the fisheye image includes multiple object regions with different angles in the plane.
  • the angle of the object area refers to the angle of the object in the object area with respect to the center of the fisheye image.
  • the angle of the object relative to the circle center refers to the angle between the line connecting the object and the circle center relative to any coordinate axis in the two-dimensional coordinate system established based on the circle center of the fisheye image.
  • the two-dimensional coordinate system established based on the center of the fisheye image takes the center of the fisheye image as the origin, the horizontal direction of the fisheye image as the x-axis, and the vertical direction of the fisheye image as the y-axis.
  • the angle of the area is the angle of the line between the center point and the origin of the object in the object area relative to the x-axis or the y-axis.
  • the fisheye image includes a plurality of object regions corresponding to conference participants, and at least two object regions have different angles in the plane.
  • Step 202 Perform image cropping on the fisheye image according to multiple cropping angles based on the center of the fisheye image to obtain a cropped image.
  • the center of the fisheye image refers to the point with the smallest pixel distortion in the fisheye image.
  • the cropping angle includes angles corresponding to multiple object regions.
  • the directions of the objects in the obtained cropped image are all positive.
  • the plurality of object regions are distributed around the center of the circle.
  • image cropping is performed on the fisheye image according to multiple cropping angles based on the center of the fisheye image to obtain a cropped image, including: taking the vertical distance below the center of the circle and between the center of the circle as a preset distance, and determining the upper part of the cropping area. Edge; based on the upper edge and the preset crop size, get the cropped area; rotate the cropped area with the center of the circle as the center point to get the rotated cropped area; on the fisheye image, perform image cropping according to the cropped area and the rotated cropped area , to get the cropped image.
  • the preset crop size and preset distance can ensure that both the cropped area and the rotated cropped area are located within the fisheye image.
  • the clipping area may be a shape such as a rectangle or a hexagon, and the shape of the clipping area is not limited in this embodiment.
  • the number of preset distances may be multiple and/or the number of preset cropping sizes may be multiple, that is, the electronic device may determine multiple upper edges of the cropping area according to multiple preset distances, each The upper edge corresponds to a cropping area; and/or, the electronic device may determine a plurality of cropping areas according to a plurality of preset cropping sizes, and each preset cropping size corresponds to a cropping area.
  • a horizontal line is determined at the position where the center of the fisheye image is below the center of the circle and the vertical distance from the center of the circle is the preset distance L, and the upper edge of the cropped area 41 is obtained; The crop size and this upper edge define the crop area 41 . Then, the cropped area 41 is rotated counterclockwise or clockwise around the center of the circle for several times to obtain the rotated cropped area 42 .
  • Step 203 stitching the cropped images to obtain a stitched image.
  • the directions of the objects in the cropped images are all positive, the directions of the objects in the stitched images obtained based on each cropped image are also positive.
  • each cropped image in the same fisheye image may correspond to one stitched image; or, corresponding to multiple stitched images, this embodiment does not limit the number of stitched images corresponding to each cropped image in the same fisheye image.
  • stitching the cropped images to obtain a stitched image includes: stitching the cropped images according to the cropping order to obtain a stitched image; or, randomly stitching the cropped images to obtain a stitched image; or, according to the identification name of the cropped image Stitching is performed in the order in the preset dictionary to obtain a stitched image.
  • the identification name of the cropped image is used to uniquely identify the cropped image.
  • the manner in which the electronic device performs image stitching may also be other manners, which are not listed one by one in this embodiment.
  • the cropped images are arranged in an n ⁇ m array in the stitched image, where both n and m are integers greater than or equal to 1.
  • the values of n and m may be fixed values; alternatively, they may be determined based on the number of cropped images.
  • the stitched image includes 4 cropped images, and the 4 cropped images are arranged in a 4-square grid, that is, a 2 ⁇ 2 array.
  • Step 204 using an object detection model to perform object detection on the stitched image to obtain an object detection frame.
  • the object detection model is used to detect objects in the input image, and the detection results are represented by object detection boxes.
  • the object detection model can be a single-step multi-frame detector; or, it is a neural network model obtained after improving the single-step multi-frame detector; or, it can also be other object detection models established based on neural network models, This embodiment does not limit the type of the object detection model.
  • the object detection model is obtained by training the preset neural network structure using multiple object images and the object annotation frame corresponding to each object image.
  • the process of training the neural network structure by the electronic device at least includes the following steps:
  • Step 61 Acquire training data, where the training data includes multiple object images with different sizes and an object labeling frame corresponding to each object image.
  • the multiple object images in the training data are obtained by performing augmentation processing based on the original object images.
  • acquiring the training data includes: acquiring an original object image, which includes an object labeling frame; and performing image augmentation processing on the original object image to obtain the object image in the training data.
  • the augmentation processing includes at least one of the following methods: randomly augmenting the original object image; randomly cropping the original object image; randomly cropping the augmented object image; The randomly augmented object image and/or the randomly cropped object image is flipped horizontally.
  • the method for random augmentation includes: filling a filling area composed of an image mean value around the object image, and augmenting the filled object image to a preset multiple of the original image (for example, two to four times); The scale of the object area in the object image is reduced relative to the entire image, which increases the scale of the small-sized object area.
  • random augmentation is performed around the object image to ensure that the proportion of the object area relative to the entire image remains unchanged, thus increasing the coverage of the object area at different positions in the object image.
  • the random cropping method includes: cropping according to a preset aspect ratio on the original object image or the augmented object image.
  • the cropped object image retains the object annotation box whose center point is still in the cropped object image.
  • the range of the preset aspect ratio may be [0.5, 2], of course, it may also be other ranges, and this embodiment does not limit the value range of the preset aspect ratio.
  • the object image is an image with only y-channel pixel values.
  • the electronic device calculates the pixel mean value and the pixel standard deviation of the object image, and performs a normalization operation on the object image to obtain a preprocessed object image. Since the object image only has the y-channel pixel value, it does not need to do color data enhancement such as color perturbation, which can reduce the complexity of model training.
  • the stitched image input to the object detection model is also an image with only y-channel pixel values.
  • Step 62 Obtain a preset neural network structure; the neural network structure includes a feature detection network and a single-step detection network, the feature detection network is used to extract object features, and the single-step detection network is used to determine an object anchor frame based on each object feature.
  • the development and deployment of the single-step detection network is simple and the training difficulty is low
  • the deployment difficulty of the object detection model can be reduced, and the training efficiency can be improved.
  • the feature detection network is FPN
  • the FPN includes a first-stage feature pyramid and a second-stage feature pyramid.
  • the first-stage feature pyramid is used to extract features from the input object image from bottom to top to obtain a multi-layer feature map
  • the second-stage feature pyramid is used to extract features from the input feature map from top to bottom, and the The extracted features are combined with the feature maps of the corresponding layers of the first-stage feature pyramid to obtain multi-layer feature maps.
  • the multi-layer feature map output by the feature pyramid of the second stage is used for object detection by the single-step detection network, and the object anchor frame is obtained.
  • the first-stage pyramid can extract the features in the object image from the bottom up, if the feature map of each layer is directly used for prediction, the prediction results obtained may be inaccurate because the features of the shallow layers are not robust.
  • FPN that is, constructing the second-stage pyramid on the basis of the first-stage pyramid, so that the low-level features and the processed high-level features are accumulated, which can combine the more accurate position information of the shallow layer with the more accurate position information of the deep layer. Accurate feature information is used for prediction, and the obtained prediction result is more accurate.
  • Step 63 Input the object image into the neural network structure to obtain a plurality of object anchor boxes.
  • the feature detection network outputs multiple layers of feature maps, each of which includes at least one object anchor box.
  • the object anchor box refers to the bounding box determined with each feature point (object feature) as the center.
  • the anchor frame size of the object anchor frame is determined based on the step size of the associated feature map relative to the original image, where the feature map is an image output by the feature detection network.
  • the object anchor frame is an anchor frame with an aspect ratio of 1:1, and the anchor frame size is 2 times the step size of the feature map relative to the original image and/or times. For example: the step size of the feature map relative to the original image is 8, then the anchor box size is 16 and
  • the recall rate of the object detection model obtained by final training can be improved by setting the anchor frame size with dense equal intervals.
  • Step 64 Perform sample matching between the multiple object anchor frames and the corresponding object annotation frames to obtain target object anchor frames.
  • the object image corresponds to a multi-layer feature map, and each layer of feature maps includes at least one object anchor frame; at this time, the object annotation frame corresponding to each object anchor frame refers to: the feature map to which the object anchor frame belongs corresponds to The object annotation box in the object image that overlaps the object anchor box.
  • performing sample matching between multiple object anchor boxes and corresponding object annotation boxes to obtain target object anchor boxes including: determining the intersection between each object anchor box and the corresponding object annotation box in the feature map of each layer Union comparison; for each object annotation frame, the object anchor frame with the highest intersection ratio with the object annotation frame is determined as the target object anchor frame that matches the object annotation frame; for each layer of feature maps in the first n layers of feature maps , compare the intersection ratio of the object anchor frame that is not matched to the object annotation frame on the feature map with the first threshold; determine the object anchor frame whose intersection ratio is greater than the first threshold as the target object anchor frame of the corresponding object annotation frame ; For each layer of feature maps located under the first n layers of feature maps, compare the intersection ratio of the object anchor frame that is not matched to the object annotation frame on the feature map with the second threshold; The object anchor frame is determined as the target object anchor frame of the corresponding object annotation frame; wherein the first threshold is greater than the second threshold.
  • n is a positive integer. The value of n may be
  • the feature map output by the shallow feature pyramid has the characteristics of large resolution, a large number of object anchor boxes, and is mainly responsible for detecting small targets, setting a higher positive sample matching threshold for matching can improve the object detection model obtained by final training. precision and recall. In addition, low-quality small-scale samples can be reduced, making the neural network model easier to converge.
  • Step 65 Determine the difference between the target object anchor frame and the corresponding object annotation result based on the preset loss function.
  • the single-step detection network includes classification and regression branches.
  • the classification and regression branches include the classification and regression branches corresponding to each feature extraction layer in the FPN, and the weights are shared between the classification and regression branches. Since each feature layer corresponds to different object scales, through weight sharing, similar features can be extracted from object images of different scales, improving the robustness of object detection.
  • the loss function includes the cross loss function and the smoothL1 loss function.
  • the cross loss function For the classification branch, use the cross loss function for training; for the regression branch use the smoothL1 loss function for training.
  • the crossover loss function is represented by:
  • f is the confidence level of the object output by the neural network structure
  • y is the category of the object
  • y is 1 means it is an object
  • y is 0 means it is not an object
  • the smoothL1 loss function is represented by:
  • x is the difference between the target object anchor box and the corresponding object annotation result.
  • the electronic device When calculating the difference between the target object anchor frame and the corresponding object annotation result, the electronic device encodes the object annotation result to obtain the regression target of the regression branch; the output of the regression network (target object anchor frame) and the encoded regression
  • the difference between the targets is x.
  • Step 66 Train the neural network structure according to the difference between the target object anchor frame and the corresponding object labeling result to obtain an object detection model.
  • the stitched image is input into the object detection model, and the object detection frame of each object is obtained in the stitched image.
  • Step 205 Map the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain the object detection result.
  • the electronic device will record the cropping angle of each cropped image in the spliced image to indicate the position of the cropped image in the fisheye image; in this way, after obtaining the object detection frame, the electronic device can perform the detection on the object detection frame according to the cropping angle. Rotate to map the object detection frame back to the fisheye image to obtain the object detection result.
  • the following situations may occur during the object detection frame mapping process:
  • Case 1 The same object corresponds to multiple object detection frames. At this time, multiple object detection frames are screened based on the non-maximum value suppression algorithm; the screened object detection frames are mapped back to the fisheye image.
  • Case 2 Multiple object detection frames are located at the image stitching position of the stitched image, that is, the object detection frames cover the two cropped images. At this time, for a plurality of object detection frames located at the image splicing position of the spliced image, the object detection frame with the largest area is mapped back to the fisheye image to obtain the object detection result.
  • the fisheye image includes multiple object regions with different angles in the plane; Crop the fisheye image to obtain a cropped image; the cropping angle includes angles corresponding to multiple object regions; stitch the cropped images to obtain a stitched image; use an object detection model to perform object detection on the stitched image to obtain an object detection frame; The object detection frame is mapped back to the fisheye image according to the corresponding cropping angle, and the object detection result is obtained; it can solve the problem that the existing object detection model cannot detect the fisheye image; because the objects in the stitched image obtained by stitching the cropped images are obtained Therefore, the object detection result can be obtained through the object detection model, and the angle of the object can be obtained through the cropping angle, so that the detection of the object position and the object angle in the fisheye image can be realized.
  • the cropped image obtained by the cropping method provided in this embodiment can ensure that the obtained cropped image includes the image of each object in the fisheye image in the forward direction. In this way, the angle of the object does not need to be adjusted during object detection, which reduces the difficulty of object detection. .
  • the first-stage pyramid can extract the features in the object image from the bottom up, if the feature map of each layer is directly used for prediction, the prediction result may be inaccurate because the features of the shallow layer are not robust.
  • FPN that is, constructing the second-stage pyramid on the basis of the first-stage pyramid, so that the low-level features and the processed high-level features are accumulated, which can combine the more accurate position information of the shallow layer with the more accurate position information of the deep layer. Accurate feature information is used for prediction, and the obtained prediction result is more accurate.
  • FIG. 7 is a block diagram of an object detection apparatus in a fisheye image provided by an embodiment of the present application.
  • the apparatus at least includes the following modules: an image acquisition module 710 , an image cropping module 720 , an image stitching module 730 , an object detection module 740 and a result mapping module 750 .
  • an image acquisition module 710 configured to acquire a fisheye image, where the fisheye image includes a plurality of object regions with different angles in the plane;
  • An image cropping module 720 configured to perform image cropping on the fisheye image according to the center of the fisheye image and according to multiple cropping angles to obtain a cropped image; the cropping angles include angles corresponding to the plurality of object regions;
  • an image stitching module 730 configured to stitch the cropped images to obtain a stitched image
  • an object detection module 740 configured to use an object detection model to perform object detection on the stitched image to obtain an object detection frame
  • the result mapping module 750 is configured to map the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain an object detection result.
  • the object detection device in the fisheye image provided in the above embodiment performs object detection in the fisheye image
  • only the division of the above functional modules is used as an example for illustration.
  • the above-mentioned function distribution is completed by different function modules, that is, the internal structure of the object detection device in the fisheye image is divided into different function modules, so as to complete all or part of the functions described above.
  • the apparatus for detecting objects in fisheye images provided by the above embodiments and the method embodiments for detecting objects in fisheye images belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
  • FIG. 8 is a block diagram of an object detection device in a fisheye image provided by an embodiment of the present application.
  • the device may be a smartphone, tablet computer, notebook computer, desktop computer, or server, etc. This embodiment does not affect the device type of the object detection device. limited.
  • the apparatus includes at least a processor 801 and a memory 802 .
  • the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 801 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 801 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 801 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in the memory 802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 801 to implement the fisheye provided by the method embodiments in this application. Object detection methods in images.
  • the object detection apparatus in the fisheye image may optionally further include: a peripheral device interface and at least one peripheral device.
  • the processor 801, the memory 802 and the peripheral device interface can be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface through bus, signal line or circuit board.
  • peripheral devices include, but are not limited to, radio frequency circuits, touch display screens, audio circuits, and power supplies.
  • the object detection apparatus in the fisheye image may further include fewer or more components, which is not limited in this embodiment.
  • the present application further provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to realize the object in the fisheye image of the above method embodiment. Detection method.
  • the present application further provides a computer product, the computer product includes a computer-readable storage medium, and a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the above method embodiments Object detection methods in fisheye images.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for detecting an object in a fisheye image, and a storage medium, which belong to the technical field of image processing. The method comprises: acquiring a fisheye image, wherein the fisheye image comprises a plurality of object regions at different angles in a plane, and the angle of the object region refers to an angle of an object in the object region with respect to the circle center of the fisheye image; performing image cropping on the fisheye image on the basis of the circle center of the fisheye image and according to a plurality of cropping angles, so as to obtain cropped images, wherein the cropping angles comprise angles corresponding to the plurality of object regions; splicing the cropped images to obtain a spliced image; performing object detection on the spliced image by using an object detection model, so as to obtain an object detection box; and mapping the object detection box back to the fisheye image according to the corresponding cropping angle, so as to obtain an object detection result. The problem whereby an existing object detection model cannot perform detection on a fisheye image can be solved, and the detection of an object position and an object angle in a fisheye image can thus be realized.

Description

鱼眼图像中的对象检测方法、装置及存储介质Object detection method, device and storage medium in fisheye image
本申请要求申请号为:CN202010603240.6、申请日为2020.06.29的中国国家知识产权局的在先专利申请为优先权,该在先专利申请文本中的内容通过引用而完全加入本专利申请中。This application claims that the application number: CN202010603240.6, the application date is 2020.06.29, the prior patent application of the State Intellectual Property Office of China is the priority, and the content of the prior patent application text is fully incorporated into this patent application by reference .
技术领域technical field
本申请涉及一种鱼眼图像中的对象检测方法、装置及存储介质,属于图像处理技术领域。The present application relates to an object detection method, device and storage medium in a fisheye image, belonging to the technical field of image processing.
背景技术Background technique
对象检测(比如:人脸检测)广泛应用在诸如智慧城市,安防,媒体认证和银行等领域。随着计算机算力和数据的大规模增长,基于深度学***面内任意角度的特点。Object detection (eg, face detection) is widely used in fields such as smart cities, security, media authentication, and banking. With the large-scale growth of computer computing power and data, object detection methods based on deep learning have gradually become mainstream. Compared with conventional images, the objects in fisheye images not only have the characteristics of objects in conventional images, but also have the characteristics of any angle in the plane.
现有的对象进行检测的方法包括:基于单步多框检测器(Single Shot MultiBox Detector,SSD)对图像中的对象进行对象检测。Existing methods for object detection include: object detection for objects in images based on a Single Shot MultiBox Detector (SSD).
但是,基于单步检测器直接对鱼眼图像进行对象检测的准确率和召回率都比较低,并且无法得到对象的角度信息,不利于进一步利用对象信息进行上层应用的搭建,比如:基于对象信息进行对象识别、对象跟踪等。However, the accuracy and recall rate of object detection directly on fisheye images based on the single-step detector are relatively low, and the angle information of the object cannot be obtained, which is not conducive to further use of the object information for the construction of upper-layer applications, such as: based on object information Perform object recognition, object tracking, etc.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种鱼眼图像中的对象检测方法、装置及存储介质,可以解决现有的对象检测模型无法对鱼眼图像进行检测的问题。本申请提供如下技术方案:The present application provides an object detection method, device and storage medium in a fisheye image, which can solve the problem that the existing object detection model cannot detect the fisheye image. This application provides the following technical solutions:
第一方面,提供了一种鱼眼图像中的对象检测方法,所述方法包括:In a first aspect, an object detection method in a fisheye image is provided, the method comprising:
获取鱼眼图像,所述鱼眼图像包括平面内角度不同的多个对象区域,对象区域的角度是指所述对象区域中的对象相对于所述鱼眼图像的圆心的角度;acquiring a fisheye image, where the fisheye image includes a plurality of object regions with different angles in the plane, and the angle of the object region refers to the angle of the object in the object region relative to the center of the fisheye image;
基于所述鱼眼图像的圆心按照多个裁剪角度对所述鱼眼图像进行图像裁剪,得到裁剪图像;所述裁剪角度包括所述多个对象区域对应的角度;Perform image cropping on the fisheye image according to a plurality of cropping angles based on the center of the fisheye image to obtain a cropped image; the cropping angles include angles corresponding to the plurality of object regions;
将所述裁剪图像进行拼接,得到拼接图像;Stitching the cropped images to obtain a stitched image;
使用对象检测模型对所述拼接图像进行对象检测,得到对象检测框;using an object detection model to perform object detection on the stitched image to obtain an object detection frame;
将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果。The object detection frame is mapped back to the fisheye image according to the corresponding cropping angle to obtain an object detection result.
可选地,所述使用对象检测模型对所述拼接图像进行对象检测,得到对象检测框之前,还包括:Optionally, before the object detection is performed on the stitched image by using an object detection model to obtain an object detection frame, the method further includes:
获取训练数据,所述训练数据包括尺寸不同的多张对象图像和每张对象图像对应的对象标注框;Acquiring training data, the training data includes a plurality of object images with different sizes and an object labeling frame corresponding to each object image;
获取预设的神经网络结构;所述神经网络结构包括特征检测网络和单步检测网络,所述特征检测网络用于提取对象特征,所述单步检测网络用于基于每个对象特征确定对象锚框;Obtain a preset neural network structure; the neural network structure includes a feature detection network and a single-step detection network, the feature detection network is used for extracting object features, and the single-step detection network is used for determining object anchors based on each object feature frame;
将所述对象图像输入所述神经网络结构,得到多个对象锚框;Inputting the object image into the neural network structure to obtain a plurality of object anchor frames;
将所述多个对象锚框与对应的对象标注框进行样本匹配,得到目标对象锚框;performing sample matching on the multiple object anchor frames and corresponding object annotation frames to obtain target object anchor frames;
基于预设的损失函数确定所述目标对象锚框和对应的对象标注结果之间的差异;Determine the difference between the target object anchor frame and the corresponding object annotation result based on a preset loss function;
根据所述目标对象锚框和对应的对象标注结果之间的差异,对所述神经网络结构进行训练,得到所述对象检测模型。According to the difference between the target object anchor frame and the corresponding object annotation result, the neural network structure is trained to obtain the object detection model.
可选地,所述特征检测网络包括第一阶段特征金字塔和第二阶段特征金字塔;Optionally, the feature detection network includes a first-stage feature pyramid and a second-stage feature pyramid;
所述第一阶段特征金字塔用于对输入的对象图像自下而上进行特征提取,得到多层特征图;The first-stage feature pyramid is used for bottom-up feature extraction of the input object image to obtain a multi-layer feature map;
所述第二阶段特征金字塔用于对输入的特征图自上而下进行特征提取,并将提取到的特征与所述第一阶段特征金字塔对应层的特征图进行结合,得到多层特征图。The second-stage feature pyramid is used to extract features from the input feature map from top to bottom, and combine the extracted features with the feature maps of the corresponding layers of the first-stage feature pyramid to obtain a multi-layer feature map.
可选地,所述将所述多个对象锚框与对应的对象标注框进行样本匹配,得到目标对象锚框,包括:Optionally, performing sample matching on the multiple object anchor frames and corresponding object annotation frames to obtain target object anchor frames, including:
确定每层特征图中每个对象锚框与对应的对象标注框之间的交并比;Determine the intersection ratio between each object anchor box and the corresponding object annotation box in the feature map of each layer;
对于每个对象标注框,将与所述对象标注框的交并比最高的对象锚框确定为与所述对象标注框相匹配的目标对象锚框;For each object annotation frame, determine the object anchor frame with the highest intersection ratio with the object annotation frame as the target object anchor frame matched with the object annotation frame;
对于前n层特征图中的每层特征图,将所述特征图上未匹配到对象标注框的对象锚框的交并比与第一阈值进行比较;将交并比大于所述第一阈值的对象锚框确定为对应的对象标注框的目标对象锚框;所述n为正整数;For each layer of feature maps in the feature maps of the first n layers, compare the intersection ratio of the object anchor boxes that are not matched to the object annotation frame on the feature map with the first threshold; make the intersection ratio greater than the first threshold The object anchor frame is determined as the target object anchor frame of the corresponding object annotation frame; the n is a positive integer;
对于位于前n层特征图之下的每层特征图,将所述特征图上未匹配到对象标注框的对象锚框的交并比与第二阈值进行比较;将交并比大于所述第二阈值的对象锚框确定为对应的对象标注框的目标对象锚框;For each layer of feature maps located under the first n layers of feature maps, compare the intersection ratio of the object anchor frame that does not match the object annotation frame on the feature map with the second threshold; The object anchor frame of the second threshold is determined as the target object anchor frame of the corresponding object annotation frame;
其中,所述第一阈值大于所述第二阈值。Wherein, the first threshold is greater than the second threshold.
可选地,所述对象锚框的锚框尺寸基于所述对象锚框所属的特征图相对于原图的步长确定,所述特征图为所述特征检测网络输出的图像。Optionally, the anchor frame size of the object anchor frame is determined based on the step size of the feature map to which the object anchor frame belongs relative to the original image, and the feature map is an image output by the feature detection network.
可选地,所述获取训练数据,包括:Optionally, the obtaining training data includes:
获取原始的对象图像,所述原始的对象图像上包括对象标注框;Obtaining an original object image, the original object image includes an object annotation frame;
对所述原始的对象图像进行图像增广处理,得到所述训练数据;Perform image augmentation processing on the original object image to obtain the training data;
其中,所述增广处理包括以下方式中的至少一种:对所述原始的对象图像进行随机扩增;对所述原始的对象图像进行随机裁剪;对扩增后的对象图像进行随机裁剪;对所述原始的对象图像、随机扩增后的对象图像、和/或随机裁剪后的对象图像进行水平翻转。Wherein, the augmentation processing includes at least one of the following manners: randomly augmenting the original object image; randomly cropping the original object image; randomly cropping the augmented object image; Horizontal flipping is performed on the original object image, the randomly augmented object image, and/or the randomly cropped object image.
可选地,所述损失函数包括交叉损失函数和smoothL1损失函数;Optionally, the loss function includes a cross loss function and a smoothL1 loss function;
所述交叉损失函数通过下式表示:The crossover loss function is represented by:
L cls=ylogf+(1-y)log(1-f) L cls =ylogf+(1-y)log(1-f)
其中,f为所述神经网络结构输出的对象置信度,y为对象的类别,y为1表示是对象,y为0表示不是对象;Wherein, f is the confidence level of the object output by the neural network structure, y is the category of the object, y is 1 to indicate that it is an object, and y is 0 to indicate that it is not an object;
所述smoothL1损失函数通过下式表示:The smoothL1 loss function is represented by:
Figure PCTCN2020121513-appb-000001
Figure PCTCN2020121513-appb-000001
其中,所述x为目标对象锚框和对应的对象标注结果之间的差值。Wherein, the x is the difference between the target object anchor frame and the corresponding object annotation result.
可选地,所述将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果,包括:Optionally, mapping the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain an object detection result, including:
基于非极大值抑制算法对多个对象检测框进行筛选;Screening multiple object detection frames based on non-maximum suppression algorithm;
将筛选后的对象检测框映射回所述鱼眼图像。Map the filtered object detection boxes back to the fisheye image.
可选地,所述将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果,包括:Optionally, mapping the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain an object detection result, including:
对于位于所述拼接图像的图像拼接位置的多个对象检测框,将面积最大的对象检测框映射回所述鱼眼图像,得到所述对象检测结果。For a plurality of object detection frames located at the image stitching position of the stitched image, the object detection frame with the largest area is mapped back to the fisheye image to obtain the object detection result.
可选地,所述多个对象区域以所述圆心为中心点分布,所述基于所述鱼眼图像的圆心按照多个裁剪角度对所述鱼眼图像进行图像裁剪,得到裁剪图像,包括:Optionally, the multiple object regions are distributed with the center of the circle as the center point, and the fisheye image is image-cropped according to multiple cropping angles based on the center of the fisheye image to obtain a cropped image, including:
将所述圆心下方、且与所述圆心之间的垂直距离作为预设距离,确定裁剪区域的上边缘;Using the vertical distance below the center of the circle and between the center of the circle as a preset distance, determine the upper edge of the cropping area;
基于所述上边缘和预设裁剪尺寸,得到裁剪区域;Based on the upper edge and the preset crop size, obtain a crop area;
将所述裁剪区域以所述圆心为中心点进行旋转,得到旋转后的裁剪区域;Rotate the cropped area with the center of the circle as the center point to obtain the rotated cropped area;
在所述鱼眼图像上按照所述裁剪区域和所述旋转后的裁剪区域进行图像裁剪,得到所述裁剪图像。Image cropping is performed on the fisheye image according to the cropping area and the rotated cropping area to obtain the cropped image.
第二方面,提供了一种鱼眼图像中的对象检测装置,所述装置包括:In a second aspect, an object detection device in a fisheye image is provided, the device comprising:
图像获取模块,用于获取鱼眼图像,所述鱼眼图像包括平面内角度不同的多个对象区域,对象区域的角度是指所述对象区域中的对象相对于所述鱼眼图像的圆心的角度;The image acquisition module is used to acquire a fisheye image, the fisheye image includes a plurality of object areas with different angles in the plane, and the angle of the object area refers to the distance of the object in the object area relative to the center of the fisheye image. angle;
图像裁剪模块,用于按照所述鱼眼图像的圆心按照多个裁剪角度对所述鱼眼图像进行图像裁剪,得到裁剪图像;所述裁剪角度包括所述多个对象区域对应的角度;an image cropping module, configured to perform image cropping on the fisheye image according to a plurality of cropping angles according to the center of the fisheye image to obtain a cropped image; the cropping angles include angles corresponding to the plurality of object regions;
图像拼接模块,用于将所述裁剪图像进行拼接,得到拼接图像;an image stitching module for stitching the cropped images to obtain a stitched image;
对象检测模块,用于使用对象检测模型对所述拼接图像进行对象检测,得到对象检测框;an object detection module, configured to perform object detection on the spliced image by using an object detection model to obtain an object detection frame;
结果映射模块,用于将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果。The result mapping module is used for mapping the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain the object detection result.
第三方面,提供一种鱼眼图像中的对象检测装置,所述装置包括处理器和存储器;所述存储器中存储有程序,所述程序由所述处理器加载并执行以实现第一方面所述的鱼眼图像中的对象检测方法。A third aspect provides an object detection device in a fisheye image, the device includes a processor and a memory; the memory stores a program, and the program is loaded and executed by the processor to implement the first aspect Object detection methods in fisheye images.
第四方面,提供一种计算机可读存储介质,所述存储介质中存储有程序,所述程序由所述处理器加载并执行以实现第一方面所述的鱼眼图像中的对象检测方法。In a fourth aspect, a computer-readable storage medium is provided, and a program is stored in the storage medium, and the program is loaded and executed by the processor to implement the method for detecting an object in a fisheye image according to the first aspect.
本申请的有益效果在于:通过获取鱼眼图像,鱼眼图像包括平面内角度不同的多个对象区域;基于鱼眼图像的圆心按照多个裁剪角度对鱼眼图像进行图像裁剪,得到裁剪图像;裁剪角度包括多个对象区域对应的角度;将裁剪图像进行拼接,得到拼接图像;使用对象检测模型对拼接图像进行对象检测,得到对象检测框;将对象检测框按照对应的裁剪角度映射回鱼眼图像,得到对象检测结果;可以解决现有的对象检测模型无法对鱼眼图像进行检测的问题;由于通过将裁剪图像拼接后得到的拼接图像中对象的方向为正向,因此,通过对象检测模型可以得到对象检测结果,通过裁剪角度可以得到对象的角度,从而可以实现鱼眼图像中对象位置和对象角度的检测。The beneficial effects of the present application are: by acquiring a fisheye image, the fisheye image includes a plurality of object regions with different angles in the plane; image cropping is performed on the fisheye image according to a plurality of cropping angles based on the center of the fisheye image to obtain a cropped image; The cropping angle includes angles corresponding to multiple object areas; stitch the cropped images to obtain a stitched image; use an object detection model to perform object detection on the stitched image to obtain an object detection frame; map the object detection frame back to the fisheye according to the corresponding cropping angle image to obtain the object detection result; it can solve the problem that the existing object detection model cannot detect the fisheye image; since the direction of the object in the spliced image obtained by splicing the cropped image is positive, the object detection model The object detection result can be obtained, and the angle of the object can be obtained through the cropping angle, so that the detection of the object position and the object angle in the fisheye image can be realized.
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,并可依照说明书的内容予以实施,以下以本申请的较佳实施例并配合附图详细说明如后。The above description is only an overview of the technical solutions of the present application. In order to understand the technical means of the present application more clearly and implement them in accordance with the contents of the description, the preferred embodiments of the present application and the accompanying drawings are described in detail below.
附图说明Description of drawings
图1是本申请一个实施例提供的特征金字塔网络的特征提取示意图;1 is a schematic diagram of feature extraction of a feature pyramid network provided by an embodiment of the present application;
图2是本申请一个实施例提供的鱼眼图像中的对象检测方法的流程图;2 is a flowchart of an object detection method in a fisheye image provided by an embodiment of the present application;
图3是本申请一个实施例提供的鱼眼图像的示意图;3 is a schematic diagram of a fisheye image provided by an embodiment of the present application;
图4是本申请一个实施例提供的裁剪图像的获取过程的示意图;4 is a schematic diagram of an acquisition process of a cropped image provided by an embodiment of the present application;
图5是本申请一个实施例提供的拼接图像的示意图;5 is a schematic diagram of a stitched image provided by an embodiment of the present application;
图6是本申请一个实施例提供的训练神经网络结构的流程图;6 is a flowchart of a training neural network structure provided by an embodiment of the present application;
图7是本申请一个实施例提供的鱼眼图像中的对象检测装置的框图;7 is a block diagram of an object detection apparatus in a fisheye image provided by an embodiment of the present application;
图8是本申请一个实施例提供的鱼眼图像中的对象检测装置的框图。FIG. 8 is a block diagram of an apparatus for detecting objects in a fisheye image provided by an embodiment of the present application.
具体实施方式detailed description
下面结合附图和实施例,对本申请的具体实施方式作进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。The specific implementations of the present application will be described in further detail below with reference to the accompanying drawings and embodiments. The following examples are used to illustrate the present application, but are not intended to limit the scope of the present application.
首先,对本申请涉及的若干名词进行介绍:First, some terms involved in this application are introduced:
单步多框检测器(Single Shot MultiBox Detector,SSD):单步(Single Shot)是指目标定位和分类在网络的单个前向传递中完成;多框(MultiBox)是指边界框回归技术;检测器(Detector)对检测到的对象进行分类。Single Shot MultiBox Detector (SSD): Single Shot refers to target localization and classification in a single forward pass of the network; MultiBox refers to bounding box regression techniques; detection The Detector classifies the detected objects.
SSD的原始的架构基于VGG-16架构构建,只是对VGG-16进行了一些调整,比如:使用Conv6层以上的辅助卷积层而不是全连接层。通过使用辅助卷积层可以提取多个尺度的特征,并逐步减小每个后续层的尺寸。The original architecture of SSD is based on the VGG-16 architecture, with only some adjustments to VGG-16, such as using an auxiliary convolutional layer above the Conv6 layer instead of a fully connected layer. Features can be extracted at multiple scales by using auxiliary convolutional layers and gradually reduce the size of each subsequent layer.
特征金字塔网络(Feature Pyramid Networks for Object Detection,FPN):是一个基于特征金字塔概念设计的特征提取器。Feature Pyramid Networks for Object Detection (FPN): It is a feature extractor designed based on the concept of feature pyramid.
FPN包含一个从底向上和一个从顶向下的路径。从底向上的路径是一个常见的利用卷积网络提取特征的过程。向上的过程中,空间解析度减小。在检测到了更高维度的结构后,每一层的语义信息得以增加。从顶向下的路径将一个语义较丰富的层构建为一个解析度较高的层。由于重新构建的层拥有较强的语义信息但是经过降采样和升采样,物体的定位不是很准确。我们将重构层和相关的特征图侧连起来以获得更精确的定位。FPN contains a bottom-up and a top-down path. The bottom-up path is a common feature extraction process using convolutional networks. On the way up, the spatial resolution decreases. After detecting higher-dimensional structures, the semantic information of each layer is augmented. The top-down path builds a semantically rich layer into a higher-resolution layer. Since the reconstructed layer has strong semantic information but is down-sampled and up-sampled, the localization of objects is not very accurate. We concatenate the reconstruction layer and the associated feature map side for more precise localization.
参考图1所示的FPN,其中,FPN包括自底向上的路径11和自顶向下路径12。P3、P4、P5、P6和P7是用于物体检测的特征图。自下而上的特征提取过程为:Referring to the FPN shown in FIG. 1 , the FPN includes a bottom-up path 11 and a top-down path 12 . P3, P4, P5, P6 and P7 are feature maps for object detection. The bottom-up feature extraction process is as follows:
Figure PCTCN2020121513-appb-000002
Figure PCTCN2020121513-appb-000002
Figure PCTCN2020121513-appb-000003
为特征金字塔的第l层的特征图(3≤l≤7);
Figure PCTCN2020121513-appb-000004
为融合之后的特征图;Conv为1*1的卷积操作;upsample为双线性插值上采样。
Figure PCTCN2020121513-appb-000003
is the feature map of the lth layer of the feature pyramid (3≤l≤7);
Figure PCTCN2020121513-appb-000004
is the feature map after fusion; Conv is a 1*1 convolution operation; upsample is bilinear interpolation upsampling.
自上而下的特征提取过程为:The top-down feature extraction process is:
Figure PCTCN2020121513-appb-000005
Figure PCTCN2020121513-appb-000005
DownSample为下采样;
Figure PCTCN2020121513-appb-000006
为自上而下的特征金字塔的融合后的特征图。
DownSample is downsampling;
Figure PCTCN2020121513-appb-000006
is the fused feature map of the top-down feature pyramid.
需要补充说明的是,图1中以特征金字塔为7层为例进行说明,在实际实现时,特征金字塔可以为更多或更少的层数,本申请不对特征金字塔的层数作限定。另外,由于特征金字塔中第一层和第二层提取的特征较浅,提取不到足够的语义信息,且第一层和第二层的特征图较大,会引入更大的计算量,因此,图1中未对第一层和第二层的特征进行提取,在实际实现时,也可以对第一层和第二层的特征进行提取,本申请不对特征提取的过程作限定。It should be supplemented that in FIG. 1 , the feature pyramid is 7 layers as an example for illustration. In actual implementation, the feature pyramid may have more or less layers, and this application does not limit the number of layers of the feature pyramid. In addition, because the features extracted from the first layer and the second layer in the feature pyramid are shallow, not enough semantic information can be extracted, and the feature maps of the first layer and the second layer are large, which will introduce a larger amount of calculation. 1, the features of the first layer and the second layer are not extracted. In actual implementation, the features of the first layer and the second layer can also be extracted. This application does not limit the process of feature extraction.
非极大值抑制(Non-Maximum Suppression,NMS):是指抑制不是极大值的元素。非极大值抑制算法可以理解为局部最大搜索。Non-Maximum Suppression (NMS): refers to the suppression of elements that are not maximal. The non-maximum suppression algorithm can be understood as a local maximum search.
非极大值抑制算法的原理包括:以同一对象对应的6个检测框(或称矩形框)为例,根据分类器的类别分类概率对6个检测框进行排序。假设从小到大的概率分别为A、B、C、D、E、F,非极大值抑制算法至少包括以下几个步骤:The principle of the non-maximum suppression algorithm includes: taking 6 detection boxes (or rectangular boxes) corresponding to the same object as an example, and sorting the 6 detection boxes according to the classification probability of the classifier. Assuming that the probabilities from small to large are A, B, C, D, E, and F, respectively, the non-maximum suppression algorithm includes at least the following steps:
1、从最大概率检测框F开始,分别判断A~E与F的重叠度是否大于某个设定的阈值;1. Starting from the maximum probability detection frame F, determine whether the overlap between A to E and F is greater than a certain threshold;
2、若B、D与F的重叠度超过阈值,那么就删除B、D;并标记保留第一个检测框F;2. If the overlap between B, D and F exceeds the threshold, delete B and D; and mark the first detection frame F;
3、从剩下的检测框A、C、E中,选择概率最大的E;3. From the remaining detection boxes A, C, and E, select E with the highest probability;
4、判断E与A、C的重叠度,将重叠度大于一定的阈值的检测框删除;并标记保留检测框E,如此循环,直至所有检测框遍历完成时停止。4. Determine the degree of overlap between E and A and C, and delete the detection frames whose overlap is greater than a certain threshold; mark and retain the detection frame E, and repeat this cycle until all detection frames are traversed.
鱼眼图像:是指通过鱼眼镜头拍摄得到的图像。鱼眼镜头是一种极端的广角镜头,这种镜头的前镜片直径很短且呈抛物状向镜头前部凸出,与鱼的眼睛相似。由于焦距越短,视角越大,因光学原理产生的变形也就越强烈。而为了达到360度的超大视角,通过鱼眼镜头采集到的鱼眼图像存在桶形畸变。即,鱼眼图像中除了画面中心的对象保持不变,其他本应水平或垂直的对象都发生了相应的畸变。Fisheye image: refers to the image captured by the fisheye lens. A fisheye lens is an extreme wide-angle lens with a front lens that is short in diameter and protrudes parabolically toward the front of the lens, similar to a fish's eye. Since the shorter the focal length, the larger the viewing angle, the stronger the distortion caused by the optical principle. In order to achieve a large viewing angle of 360 degrees, the fisheye image collected by the fisheye lens has barrel distortion. That is, in the fisheye image, except for the object in the center of the picture, the other objects that should be horizontal or vertical are distorted accordingly.
可选地,本申请以各个实施例的执行主体为电子设备为例进行说明,该电子设备可以为终端或者服务器等具有图像处理能力的设备,该终端可以为手机、计算机、平板电脑、视频会议终端等,本实施例不对电子设备的类型作限定。Optionally, the present application takes the execution subject of each embodiment as an electronic device as an example for description. The electronic device may be a terminal or a server or other device with image processing capabilities, and the terminal may be a mobile phone, a computer, a tablet computer, or a video conference. A terminal, etc., the type of the electronic device is not limited in this embodiment.
可选地,本申请提供的鱼眼图像中的对象检测方法的应用场景包括但不限于以下场景:Optionally, the application scenarios of the object detection method in the fisheye image provided by this application include but are not limited to the following scenarios:
1、视频会议场景:鱼眼图像包括多个与会人员对应的人脸区域,对象检测方法用于检测鱼眼图像中的人脸;1. Video conference scene: The fisheye image includes multiple face regions corresponding to the participants, and the object detection method is used to detect the face in the fisheye image;
2、车辆监控场景:鱼眼图像包括多个车辆对应的车辆区域,对象检测方法用于检测鱼眼图像中的车辆;2. Vehicle monitoring scene: The fisheye image includes vehicle areas corresponding to multiple vehicles, and the object detection method is used to detect vehicles in the fisheye image;
3、人员监控场景:鱼眼图像包括多个人员对应的人员区域,对象检测方法用于检测鱼眼图像中的人员。3. Personnel monitoring scene: The fisheye image includes personnel areas corresponding to multiple persons, and the object detection method is used to detect persons in the fisheye image.
当然,本申请提出的鱼眼图像中的对象检测方法还可以用于其它场景,该场景对应的鱼眼图像中具有角度不同的多个对象区域,对象区域对应的对象可以为人、车辆、动物、障碍物等,本实施例不对对象的类型和对象检测方法的应用场景作限定。Of course, the object detection method in the fisheye image proposed in this application can also be used in other scenes. The fisheye image corresponding to the scene has multiple object regions with different angles, and the objects corresponding to the object regions can be people, vehicles, animals, Obstacles, etc., this embodiment does not limit the type of the object and the application scenario of the object detection method.
图2是本申请一个实施例提供的鱼眼图像中的对象检测方法的流程图。该方法至少包括以下几个步骤:FIG. 2 is a flowchart of an object detection method in a fisheye image provided by an embodiment of the present application. The method includes at least the following steps:
步骤201,获取鱼眼图像,该鱼眼图像包括平面内角度不同的多个对象区域。Step 201: Obtain a fisheye image, where the fisheye image includes multiple object regions with different angles in the plane.
对象区域的角度是指对象区域中的对象相对于鱼眼图像的圆心的角度。其中,对象相对于圆心的角度是指在基于鱼眼图像的圆心建立的二维坐标系中,对象与圆心之间的连线相对于任一坐标轴之间的角度。比如:基于鱼眼图像的圆心建立的二维坐标系为以鱼眼图像的圆心为原点,以鱼眼图像的水平方向为x轴、以鱼眼图像的垂直方向为y轴,此时,对象区域的角度的为对象区域中对象的中心点与原点之间的连线相对于x轴或者y轴的角度。The angle of the object area refers to the angle of the object in the object area with respect to the center of the fisheye image. The angle of the object relative to the circle center refers to the angle between the line connecting the object and the circle center relative to any coordinate axis in the two-dimensional coordinate system established based on the circle center of the fisheye image. For example, the two-dimensional coordinate system established based on the center of the fisheye image takes the center of the fisheye image as the origin, the horizontal direction of the fisheye image as the x-axis, and the vertical direction of the fisheye image as the y-axis. The angle of the area is the angle of the line between the center point and the origin of the object in the object area relative to the x-axis or the y-axis.
以视频会议场景为例,参考图3所示的鱼眼图像,该鱼眼图像包括多个与会人员对应的对象区域,且至少两个对象区域平面内的角度不同。Taking a video conference scene as an example, referring to the fisheye image shown in FIG. 3 , the fisheye image includes a plurality of object regions corresponding to conference participants, and at least two object regions have different angles in the plane.
步骤202,基于鱼眼图像的圆心按照多个裁剪角度对鱼眼图像进行图像裁剪,得到裁剪图像。Step 202: Perform image cropping on the fisheye image according to multiple cropping angles based on the center of the fisheye image to obtain a cropped image.
鱼眼图像的圆心是指鱼眼图像中像素畸变最小的点。The center of the fisheye image refers to the point with the smallest pixel distortion in the fisheye image.
裁剪角度包括多个对象区域对应的角度。本实施例中,通过按照裁剪角度对鱼眼图像进行裁剪,由于裁剪角度包括多个对象区域对应的角度,因此,得到的裁剪图像中对象的方向均为正向。The cropping angle includes angles corresponding to multiple object regions. In this embodiment, by cropping the fisheye image according to the cropping angle, since the cropping angle includes angles corresponding to multiple object regions, the directions of the objects in the obtained cropped image are all positive.
在一个示例中,多个对象区域以圆心为中心点分布。此时,基于鱼眼图像的圆心按照多个裁剪角度对鱼眼图像进行图像裁剪,得到裁剪图像,包括:将圆心下方、且与圆心之间的垂直距离作为预设距离,确定裁剪区域的上边缘;基于上边缘和预设裁剪尺寸,得到裁剪区域;将裁剪区域以圆心为中心点进行旋转,得到旋转后的裁剪区域;在鱼眼图像上按照裁剪区域和旋转后的裁剪区域进行图像裁剪,得到裁剪图像。In one example, the plurality of object regions are distributed around the center of the circle. At this time, image cropping is performed on the fisheye image according to multiple cropping angles based on the center of the fisheye image to obtain a cropped image, including: taking the vertical distance below the center of the circle and between the center of the circle as a preset distance, and determining the upper part of the cropping area. Edge; based on the upper edge and the preset crop size, get the cropped area; rotate the cropped area with the center of the circle as the center point to get the rotated cropped area; on the fisheye image, perform image cropping according to the cropped area and the rotated cropped area , to get the cropped image.
其中,在将裁剪区域以圆心为中心点进行多次旋转时,相邻两次的旋转角度相同或者不同。Wherein, when the cropping area is rotated multiple times with the center of the circle as the center point, the rotation angles of two adjacent rotations are the same or different.
预设裁剪尺寸和预设距离可以保证裁剪区域和旋转后的裁剪区域均位于鱼眼图像内。裁剪区域可以为矩形、六边形等图形,本实施实施例不对裁剪区域的形状作限定。The preset crop size and preset distance can ensure that both the cropped area and the rotated cropped area are located within the fisheye image. The clipping area may be a shape such as a rectangle or a hexagon, and the shape of the clipping area is not limited in this embodiment.
可选地,预设距离的数量可以为多个和/或预设裁剪尺寸的数量可以为多个,即,电子设备可以按照多个预设距离确定出裁剪区域的多个上边缘,每个上边缘对应一个裁剪区域;和/或,电子设备可以按照多个预设裁剪尺寸确定出多个裁剪区域,每个预设裁剪尺寸对应一个裁剪区域。Optionally, the number of preset distances may be multiple and/or the number of preset cropping sizes may be multiple, that is, the electronic device may determine multiple upper edges of the cropping area according to multiple preset distances, each The upper edge corresponds to a cropping area; and/or, the electronic device may determine a plurality of cropping areas according to a plurality of preset cropping sizes, and each preset cropping size corresponds to a cropping area.
参考图4所示的裁剪图像的获取过程,在鱼眼图像的圆心的下方、且与圆心之间的垂直距离为预设距离L的位置确定水平线,得到裁剪区域41的上边缘;以预设裁剪尺寸和该上边缘确定出裁剪区域41。然后,将裁剪区域41绕圆心逆时针或者顺时针旋转多次,得到旋转后的裁剪区域42。Referring to the acquisition process of the cropped image shown in FIG. 4 , a horizontal line is determined at the position where the center of the fisheye image is below the center of the circle and the vertical distance from the center of the circle is the preset distance L, and the upper edge of the cropped area 41 is obtained; The crop size and this upper edge define the crop area 41 . Then, the cropped area 41 is rotated counterclockwise or clockwise around the center of the circle for several times to obtain the rotated cropped area 42 .
由于多个对象区域以圆心为轴分布,且相对于圆心的角度不同,而位于圆心下方的对象通常是正向的,因此,通过使用位于圆心下方的裁剪区域进行裁剪,得到的裁剪图像中对象是正向的。又因为多个对象区域以圆心为轴分布,因此,后续将该裁剪区域绕圆心进行旋转进行裁剪,得到的裁剪图像中对象也是正向的。因此,本实施例提供的裁剪方式得到的裁剪图像,可以保证得到的裁剪图像包括鱼眼图像中各个对象在正向上的图像,这样,在对象检测时无需调整对象的角度,降低对象检测的难度。步骤203,将裁剪图像进行拼接,得到拼接图像。Since multiple object areas are distributed with the center of the circle as the axis and have different angles relative to the center of the circle, the object located below the center of the circle is usually positive. towards. Also, because the multiple object regions are distributed with the center of the circle as the axis, the objects in the obtained cropped image are also positive by rotating the cropped region around the center of the circle to be subsequently cropped. Therefore, the cropped image obtained by the cropping method provided in this embodiment can ensure that the obtained cropped image includes the image of each object in the fisheye image in the forward direction. In this way, the angle of the object does not need to be adjusted during object detection, which reduces the difficulty of object detection. . Step 203, stitching the cropped images to obtain a stitched image.
由于裁剪图像中对象的方向均为正向,因此,基于各个裁剪图像得到的拼接图像中对象的方向也为正向。Since the directions of the objects in the cropped images are all positive, the directions of the objects in the stitched images obtained based on each cropped image are also positive.
可选地,同一鱼眼图像中的各个裁剪图像可以对应一张拼接图像;或者,对应多张拼接图像,本实施例不对同一鱼眼图像中各个裁剪图像对应的拼接图像的数量作限定。Optionally, each cropped image in the same fisheye image may correspond to one stitched image; or, corresponding to multiple stitched images, this embodiment does not limit the number of stitched images corresponding to each cropped image in the same fisheye image.
可选地,将裁剪图像进行拼接,得到拼接图像,包括:按照裁剪顺序对裁剪图像进行拼接,得到拼接图像;或者,对裁剪图像进行随机拼接,得到拼接图像;或者,按照裁剪图像的标识名称在预设字典中的顺序进行拼接,得到拼接图像。其中,裁剪图像的标识名称用于唯一地标识裁剪图像。当然,电子设备进行图像拼接的方式还可以为其它方式,本实施例在此不再一一列举。Optionally, stitching the cropped images to obtain a stitched image includes: stitching the cropped images according to the cropping order to obtain a stitched image; or, randomly stitching the cropped images to obtain a stitched image; or, according to the identification name of the cropped image Stitching is performed in the order in the preset dictionary to obtain a stitched image. The identification name of the cropped image is used to uniquely identify the cropped image. Of course, the manner in which the electronic device performs image stitching may also be other manners, which are not listed one by one in this embodiment.
可选地,裁剪图像在拼接图像中排列为n×m阵列,n和m均为大于或等于1的整数。n和m的值可以为固定值;或者,也可以是基于裁剪图像的数量确定的。Optionally, the cropped images are arranged in an n×m array in the stitched image, where both n and m are integers greater than or equal to 1. The values of n and m may be fixed values; alternatively, they may be determined based on the number of cropped images.
参考图5所示的拼接图像,该拼接图像包括4个裁剪图像,4个裁剪图像排列为4宫格,即2×2的阵列。Referring to the stitched image shown in FIG. 5 , the stitched image includes 4 cropped images, and the 4 cropped images are arranged in a 4-square grid, that is, a 2×2 array.
步骤204,使用对象检测模型对拼接图像进行对象检测,得到对象检测框。 Step 204 , using an object detection model to perform object detection on the stitched image to obtain an object detection frame.
对象检测模型用于检测输入的图像中的对象,且检测结果通过对象检测框表示。The object detection model is used to detect objects in the input image, and the detection results are represented by object detection boxes.
可选地,对象检测模型可以为单步多框检测器;或者,是对单步多框检测器改进后得到的神经网络模型;或者,也可以是其它基于神经网络模型建立的对象检测模型,本实施例不对对象检测模型的类型作限定。Optionally, the object detection model can be a single-step multi-frame detector; or, it is a neural network model obtained after improving the single-step multi-frame detector; or, it can also be other object detection models established based on neural network models, This embodiment does not limit the type of the object detection model.
对象检测模型是使用多张对象图像和每张对象图像对应的对象标注框对预设的神经网络结构训练得到的。The object detection model is obtained by training the preset neural network structure using multiple object images and the object annotation frame corresponding to each object image.
参考图6,电子设备训练神经网络结构的过程至少包括以下几个步骤:Referring to FIG. 6 , the process of training the neural network structure by the electronic device at least includes the following steps:
步骤61,获取训练数据,训练数据包括尺寸不同的多张对象图像和每张对象图像对应的对象标注框。Step 61: Acquire training data, where the training data includes multiple object images with different sizes and an object labeling frame corresponding to each object image.
可选地,训练数据中的多张对象图像是基于原始的对象图像进行增广处理后得到的。此时,获取训练数据包括:获取原始的对象图像,原始的对象图像上包括对象标注框;对原始的对象图像进行图像增广处理,得到训练数据中的对象图像。其中,增广处理包括以下方式中的至少一种:对原始的对象图像进行随机扩增;对原始的对象图像进行随机裁剪;对扩增后的对象图像进行随机裁剪;对原始的对象图像、随机扩增后的对象图像、和/或随机裁剪后的对象图像进行水平翻转。Optionally, the multiple object images in the training data are obtained by performing augmentation processing based on the original object images. At this time, acquiring the training data includes: acquiring an original object image, which includes an object labeling frame; and performing image augmentation processing on the original object image to obtain the object image in the training data. The augmentation processing includes at least one of the following methods: randomly augmenting the original object image; randomly cropping the original object image; randomly cropping the augmented object image; The randomly augmented object image and/or the randomly cropped object image is flipped horizontally.
可选地,随机扩增的方法包括:在对象图像周围填充由图像均值构成的填充区域,填充后的对象图像扩增为原图预设倍数(比如:两倍到四倍);填充后的对象图像中的对象区域相对于整张图像的比例减小,这样增加了小尺寸对象区域的比例。或者,在对象图像周围随机扩增,保证对象区域相对于整张图像的比例不变,这样,增加了对象区域在对象图像中不同位置的覆盖范围。Optionally, the method for random augmentation includes: filling a filling area composed of an image mean value around the object image, and augmenting the filled object image to a preset multiple of the original image (for example, two to four times); The scale of the object area in the object image is reduced relative to the entire image, which increases the scale of the small-sized object area. Alternatively, random augmentation is performed around the object image to ensure that the proportion of the object area relative to the entire image remains unchanged, thus increasing the coverage of the object area at different positions in the object image.
可选地,随机裁剪的方法包括:在原始的对象图像或者扩增后的对象图像上按照预设长宽比进行裁剪。裁剪后的对象图像保留中心点仍在该裁剪后的对象图像中的对象标注框。其中,预设长宽比的范围可以为[0.5,2],当然,也可以是其它范围,本实施例不对预设长宽比的取值范围作限定。通过随机裁剪,一方面增加了大尺寸对象标注框的数量,另一方面也丰富了对象区域在对象图像中的位置分布。Optionally, the random cropping method includes: cropping according to a preset aspect ratio on the original object image or the augmented object image. The cropped object image retains the object annotation box whose center point is still in the cropped object image. The range of the preset aspect ratio may be [0.5, 2], of course, it may also be other ranges, and this embodiment does not limit the value range of the preset aspect ratio. Through random cropping, on the one hand, the number of large-sized object annotation boxes is increased, and on the other hand, the location distribution of the object region in the object image is enriched.
可选地,对象图像为只具有y通道像素值的图像。此时,电子设备获取到该对象图像后,计算对象图像的像素均值和像素标准差;对该对象图像进行标准化操作,得到预处理后的对象图像。由于对象图像只具有y通道像素值,因此不需要做颜色扰动等色彩方面的数据增强,可以降低模型训练复杂度。此时,输入至对象检测模型的拼接图像也为只具有y通道像素值的图像。Optionally, the object image is an image with only y-channel pixel values. At this time, after acquiring the object image, the electronic device calculates the pixel mean value and the pixel standard deviation of the object image, and performs a normalization operation on the object image to obtain a preprocessed object image. Since the object image only has the y-channel pixel value, it does not need to do color data enhancement such as color perturbation, which can reduce the complexity of model training. At this time, the stitched image input to the object detection model is also an image with only y-channel pixel values.
步骤62,获取预设的神经网络结构;神经网络结构包括特征检测网络和单步检测网络,特征检测网络用于提取对象特征,单步检测网络用于基于每个对象特征确定对象锚框。Step 62: Obtain a preset neural network structure; the neural network structure includes a feature detection network and a single-step detection network, the feature detection network is used to extract object features, and the single-step detection network is used to determine an object anchor frame based on each object feature.
本实施例中,由于单步检测网络的开发部署简单,训练难度低,因此,通过使用单步检测网络进行对象检测,可以降低对象检测模型的部署难度,提高训练效率。In this embodiment, since the development and deployment of the single-step detection network is simple and the training difficulty is low, by using the single-step detection network for object detection, the deployment difficulty of the object detection model can be reduced, and the training efficiency can be improved.
可选地,特征检测网络为FPN,FPN包括第一阶段特征金字塔和第二阶段特征金字塔。其中,第一阶段特征金字塔用于对输入的对象图像自下而上进行特征提取,得到多层特征图;第二阶段特征金字塔用于对输入的特征图自上而下进行特征提取,并将提取到的特征与第一阶段特征金字塔对应层的特征图进行结合,得到多层特征图。Optionally, the feature detection network is FPN, and the FPN includes a first-stage feature pyramid and a second-stage feature pyramid. Among them, the first-stage feature pyramid is used to extract features from the input object image from bottom to top to obtain a multi-layer feature map; the second-stage feature pyramid is used to extract features from the input feature map from top to bottom, and the The extracted features are combined with the feature maps of the corresponding layers of the first-stage feature pyramid to obtain multi-layer feature maps.
第二阶段特征金字塔输出的多层特征图用于供单步检测网络进行对象检测,得到对象锚框。The multi-layer feature map output by the feature pyramid of the second stage is used for object detection by the single-step detection network, and the object anchor frame is obtained.
由于第一阶段金字塔可以自下而上地提取对象图像中的特征,若直接使用每层特征图进行预测,由于浅层的特征不鲁棒,因此,得到的预测结果可能不准确。而本实施例中通过使用FPN,即在第一阶段金字塔的基础上构建第二阶段金字塔,使低层特征和处理过的高层特征进行累加,可以结合浅层的较准确的位置信息和深层的较准确的特征信息进行预测,得到的预测结果更加准确。Since the first-stage pyramid can extract the features in the object image from the bottom up, if the feature map of each layer is directly used for prediction, the prediction results obtained may be inaccurate because the features of the shallow layers are not robust. In this embodiment, by using FPN, that is, constructing the second-stage pyramid on the basis of the first-stage pyramid, so that the low-level features and the processed high-level features are accumulated, which can combine the more accurate position information of the shallow layer with the more accurate position information of the deep layer. Accurate feature information is used for prediction, and the obtained prediction result is more accurate.
步骤63,将对象图像输入神经网络结构,得到多个对象锚框。Step 63: Input the object image into the neural network structure to obtain a plurality of object anchor boxes.
在一个示例中,特征检测网络会输出多层特征图,每张特征图中包括至少一个对象锚框。In one example, the feature detection network outputs multiple layers of feature maps, each of which includes at least one object anchor box.
对象锚框是指以每个特征点(对象特征)为中心确定的边界框。可选地,对象锚框的锚框尺寸基于所属的特征图相对于原图的步长确定,该特征图为特征检测网络输出的图像。示意性地,对象锚框为长宽比为1:1的锚框,锚框尺寸是特征图相对于原图的步长的2倍和/或
Figure PCTCN2020121513-appb-000007
倍。比如:特征图相对于原图的步长为8,则锚框尺寸为16和
Figure PCTCN2020121513-appb-000008
The object anchor box refers to the bounding box determined with each feature point (object feature) as the center. Optionally, the anchor frame size of the object anchor frame is determined based on the step size of the associated feature map relative to the original image, where the feature map is an image output by the feature detection network. Illustratively, the object anchor frame is an anchor frame with an aspect ratio of 1:1, and the anchor frame size is 2 times the step size of the feature map relative to the original image and/or
Figure PCTCN2020121513-appb-000007
times. For example: the step size of the feature map relative to the original image is 8, then the anchor box size is 16 and
Figure PCTCN2020121513-appb-000008
本实施例中,通过设置稠密的等比间隔的锚框尺寸,可以提升最终训练得到的对象检测模型的召回率。In this embodiment, the recall rate of the object detection model obtained by final training can be improved by setting the anchor frame size with dense equal intervals.
步骤64,将多个对象锚框与对应的对象标注框进行样本匹配,得到目标对象锚框。Step 64: Perform sample matching between the multiple object anchor frames and the corresponding object annotation frames to obtain target object anchor frames.
对于每张对象图像,该对象图像对应多层特征图,每层特征图包括至少一个对象锚框;此时,每个对象锚框对应的对象标注框是指:对象锚框所属的特征图对应的对象图像中与对象锚框存在交叠的对象标注框。For each object image, the object image corresponds to a multi-layer feature map, and each layer of feature maps includes at least one object anchor frame; at this time, the object annotation frame corresponding to each object anchor frame refers to: the feature map to which the object anchor frame belongs corresponds to The object annotation box in the object image that overlaps the object anchor box.
在一个示例中,将多个对象锚框与对应的对象标注框进行样本匹配,得到目标对象锚框,包括:确定每层特征图中每个对象锚框与对应的对象标注框之间的交并比;对于每个对象标注框,将与对象标注框的交并比最高的对象锚框确定为与对象标注框相匹配的目标对象锚框;对于前n层特征图中的每层特征图,将特征图上未匹配到对象标注框的对象锚框的交并比与第一阈值进行比较;将交并比大于第一阈值的对象锚框确定为对应的对象标注框的目标对象锚框;对于位于前n层特征图之下的每层特征图,将特征图上未匹配到对象标注框的对象锚框的交并比与第二阈值进行比较;将交并比大于第二阈值的对象锚框确定为对应的对象标注框的目标对象锚框;其中,第一阈值大于第二阈值。n为正整数。n的值可以为1,当然,也可以为其它数值,本实施例不对n的取值作限定。In an example, performing sample matching between multiple object anchor boxes and corresponding object annotation boxes to obtain target object anchor boxes, including: determining the intersection between each object anchor box and the corresponding object annotation box in the feature map of each layer Union comparison; for each object annotation frame, the object anchor frame with the highest intersection ratio with the object annotation frame is determined as the target object anchor frame that matches the object annotation frame; for each layer of feature maps in the first n layers of feature maps , compare the intersection ratio of the object anchor frame that is not matched to the object annotation frame on the feature map with the first threshold; determine the object anchor frame whose intersection ratio is greater than the first threshold as the target object anchor frame of the corresponding object annotation frame ; For each layer of feature maps located under the first n layers of feature maps, compare the intersection ratio of the object anchor frame that is not matched to the object annotation frame on the feature map with the second threshold; The object anchor frame is determined as the target object anchor frame of the corresponding object annotation frame; wherein the first threshold is greater than the second threshold. n is a positive integer. The value of n may be 1, and of course, may also be other numerical values, and the value of n is not limited in this embodiment.
由于浅层的特征金字塔输出的特征图具有分辨率大、对象锚框数量多、主要负责检测小目标的特点,通过设置较高的正样本匹配阈值进行匹配,可以提高最终训练得到的对象检测模型的准确率和召回率。另外,可以减少低质量的小尺度样本,使得神经网络模型更容易收敛。Since the feature map output by the shallow feature pyramid has the characteristics of large resolution, a large number of object anchor boxes, and is mainly responsible for detecting small targets, setting a higher positive sample matching threshold for matching can improve the object detection model obtained by final training. precision and recall. In addition, low-quality small-scale samples can be reduced, making the neural network model easier to converge.
步骤65,基于预设的损失函数确定目标对象锚框和对应的对象标注结果之间的差异。Step 65: Determine the difference between the target object anchor frame and the corresponding object annotation result based on the preset loss function.
单步检测网络包括分类和回归分支。分类和回归分支包括FPN中每层特征提取层对应的分类和回归分支,且各个分类和回归分支之间权值共享。由于每层特征层对应不同的对象尺度,因此,通过权值共享,可以在不同尺度的对象图像中提取到类似的特征,提高对象检测的鲁棒性。The single-step detection network includes classification and regression branches. The classification and regression branches include the classification and regression branches corresponding to each feature extraction layer in the FPN, and the weights are shared between the classification and regression branches. Since each feature layer corresponds to different object scales, through weight sharing, similar features can be extracted from object images of different scales, improving the robustness of object detection.
此时,损失函数包括交叉损失函数和smoothL1损失函数。对于分类分支,使用交叉损失函数进行训练;对于回归分支使用smoothL1损失函数进行训练。At this time, the loss function includes the cross loss function and the smoothL1 loss function. For the classification branch, use the cross loss function for training; for the regression branch use the smoothL1 loss function for training.
交叉损失函数通过下式表示:The crossover loss function is represented by:
L cls=ylogf+(1-y)log(1-f) L cls =ylogf+(1-y)log(1-f)
其中,f为神经网络结构输出的对象置信度,y为对象的类别,y为1表示是对象,y为0表示不是对象;Among them, f is the confidence level of the object output by the neural network structure, y is the category of the object, y is 1 means it is an object, y is 0 means it is not an object;
smoothL1损失函数通过下式表示:The smoothL1 loss function is represented by:
Figure PCTCN2020121513-appb-000009
Figure PCTCN2020121513-appb-000009
其中,x为目标对象锚框和对应的对象标注结果之间的差值。Among them, x is the difference between the target object anchor box and the corresponding object annotation result.
在计算目标对象锚框和对应的对象标注结果之间的差值时,电子设备对对象标注结果进行编码,得到回归分支的回归目标;回归网络的输出(目标对象锚框)和编码后的回归目标之间的差值为x。When calculating the difference between the target object anchor frame and the corresponding object annotation result, the electronic device encodes the object annotation result to obtain the regression target of the regression branch; the output of the regression network (target object anchor frame) and the encoded regression The difference between the targets is x.
步骤66,根据目标对象锚框和对应的对象标注结果之间的差异,对神经网络结构进行训练,得到对象检测模型。Step 66: Train the neural network structure according to the difference between the target object anchor frame and the corresponding object labeling result to obtain an object detection model.
通过上述训练过程得到对象检测模型后,将拼接图像输入该对象检测模型,会在拼接图像中得到每个对象的对象检测框。After the object detection model is obtained through the above training process, the stitched image is input into the object detection model, and the object detection frame of each object is obtained in the stitched image.
步骤205,将对象检测框按照对应的裁剪角度映射回鱼眼图像,得到对象检测结果。Step 205: Map the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain the object detection result.
电子设备会记录拼接图像中每张裁剪图像的裁剪角度,以表示该裁剪图像在鱼眼图像中的位置;这样,在得到对象检测框后,电子设备可以按照该裁剪角度对该对象检测框进行旋转,从而将对象检测框映射回鱼眼图像,得到对象检测结果。The electronic device will record the cropping angle of each cropped image in the spliced image to indicate the position of the cropped image in the fisheye image; in this way, after obtaining the object detection frame, the electronic device can perform the detection on the object detection frame according to the cropping angle. Rotate to map the object detection frame back to the fisheye image to obtain the object detection result.
可选地,在对象检测框映射过程中可能出现下述情况:Optionally, the following situations may occur during the object detection frame mapping process:
情况1:同一个对象对应多个对象检测框。此时,基于非极大值抑制算法对多个对象检测框进行筛选;将筛选后的对象检测框映射回鱼眼图像。Case 1: The same object corresponds to multiple object detection frames. At this time, multiple object detection frames are screened based on the non-maximum value suppression algorithm; the screened object detection frames are mapped back to the fisheye image.
情况2:多个对象检测框位于拼接图像的图像拼接位置,即对象检测框覆盖两个裁剪图像。此时,对于位于拼接图像的图像拼接位置的多个对象检测框,将面积最大的对象检测框映射回鱼眼图像,得到对象检测结果。Case 2: Multiple object detection frames are located at the image stitching position of the stitched image, that is, the object detection frames cover the two cropped images. At this time, for a plurality of object detection frames located at the image splicing position of the spliced image, the object detection frame with the largest area is mapped back to the fisheye image to obtain the object detection result.
综上所述,本实施例提供的鱼眼图像中的对象检测方法,通过获取鱼眼图像,鱼眼图像包括平面内角度不同的多个对象区域;基于鱼眼图像的圆心按照多个裁剪角度对鱼眼图像进行图像裁剪,得到裁剪图像;裁剪角度包括多个对象区域对应的角度;将裁剪图像进行拼接,得到拼接图像;使用对象检测模型对拼接 图像进行对象检测,得到对象检测框;将对象检测框按照对应的裁剪角度映射回鱼眼图像,得到对象检测结果;可以解决现有的对象检测模型无法对鱼眼图像进行检测的问题;由于通过将裁剪图像拼接后得到的拼接图像中对象的方向为正向,因此,通过对象检测模型可以得到对象检测结果,通过裁剪角度可以得到对象的角度,从而可以实现鱼眼图像中对象位置和对象角度的检测。To sum up, in the method for object detection in a fisheye image provided by this embodiment, by acquiring a fisheye image, the fisheye image includes multiple object regions with different angles in the plane; Crop the fisheye image to obtain a cropped image; the cropping angle includes angles corresponding to multiple object regions; stitch the cropped images to obtain a stitched image; use an object detection model to perform object detection on the stitched image to obtain an object detection frame; The object detection frame is mapped back to the fisheye image according to the corresponding cropping angle, and the object detection result is obtained; it can solve the problem that the existing object detection model cannot detect the fisheye image; because the objects in the stitched image obtained by stitching the cropped images are obtained Therefore, the object detection result can be obtained through the object detection model, and the angle of the object can be obtained through the cropping angle, so that the detection of the object position and the object angle in the fisheye image can be realized.
另外,通过基于单步检测器构建对象检测模型,由于单步检测器开发部署简单,单帧处理时间不跟随图像大小、对象尺度范围和对象数量的变化而变化,因此,相对于现有的使用级联的检测器进行鱼眼图像检测来说,可以提高对象检测效率。In addition, by building an object detection model based on a single-step detector, since the development and deployment of the single-step detector is simple, the single-frame processing time does not change with the change of image size, object scale range and number of objects. Therefore, compared with the existing use of Cascaded detectors for fisheye image detection can improve object detection efficiency.
另外,由于多个对象区域以圆心为轴分布,且相对于圆心的角度不同,而位于圆心下方的对象通常是正向的,因此,通过使用位于圆心下方的裁剪区域进行裁剪,得到的裁剪图像中对象是正向的。又因为多个对象区域以圆心为轴分布,因此,后续将该裁剪区域绕圆心进行旋转进行裁剪,得到的裁剪图像中对象也是正向的。因此,本实施例提供的裁剪方式得到的裁剪图像,可以保证得到的裁剪图像包括鱼眼图像中各个对象在正向上的图像,这样,在对象检测时无需调整对象的角度,降低对象检测的难度。In addition, since multiple object areas are distributed around the center of the circle and have different angles relative to the center of the circle, and the object located below the center of the circle is usually positive, therefore, by using the cropped area located below the center of the circle for cropping, in the obtained cropped image Objects are positive. Also, because the multiple object regions are distributed with the center of the circle as the axis, the objects in the obtained cropped image are also positive by rotating the cropped region around the center of the circle to be subsequently cropped. Therefore, the cropped image obtained by the cropping method provided in this embodiment can ensure that the obtained cropped image includes the image of each object in the fisheye image in the forward direction. In this way, the angle of the object does not need to be adjusted during object detection, which reduces the difficulty of object detection. .
另外,由于第一阶段金字塔可以自下而上地提取对象图像中的特征,若直接使用每层特征图进行预测,由于浅层的特征不鲁棒,因此,得到的预测结果可能不准确。而本实施例中通过使用FPN,即在第一阶段金字塔的基础上构建第二阶段金字塔,使低层特征和处理过的高层特征进行累加,可以结合浅层的较准确的位置信息和深层的较准确的特征信息进行预测,得到的预测结果更加准确。In addition, since the first-stage pyramid can extract the features in the object image from the bottom up, if the feature map of each layer is directly used for prediction, the prediction result may be inaccurate because the features of the shallow layer are not robust. In this embodiment, by using FPN, that is, constructing the second-stage pyramid on the basis of the first-stage pyramid, so that the low-level features and the processed high-level features are accumulated, which can combine the more accurate position information of the shallow layer with the more accurate position information of the deep layer. Accurate feature information is used for prediction, and the obtained prediction result is more accurate.
图7是本申请一个实施例提供的鱼眼图像中的对象检测装置的框图。该装置至少包括以下几个模块:图像获取模块710、图像裁剪模块720、图像拼接模块730、对象检测模块740和结果映射模块750。FIG. 7 is a block diagram of an object detection apparatus in a fisheye image provided by an embodiment of the present application. The apparatus at least includes the following modules: an image acquisition module 710 , an image cropping module 720 , an image stitching module 730 , an object detection module 740 and a result mapping module 750 .
图像获取模块710,用于获取鱼眼图像,所述鱼眼图像包括平面内角度不同的多个对象区域;an image acquisition module 710, configured to acquire a fisheye image, where the fisheye image includes a plurality of object regions with different angles in the plane;
图像裁剪模块720,用于按照所述鱼眼图像的圆心按照多个裁剪角度对所述鱼眼图像进行图像裁剪,得到裁剪图像;所述裁剪角度包括所述多个对象区域对应的角度;An image cropping module 720, configured to perform image cropping on the fisheye image according to the center of the fisheye image and according to multiple cropping angles to obtain a cropped image; the cropping angles include angles corresponding to the plurality of object regions;
图像拼接模块730,用于将所述裁剪图像进行拼接,得到拼接图像;an image stitching module 730, configured to stitch the cropped images to obtain a stitched image;
对象检测模块740,用于使用对象检测模型对所述拼接图像进行对象检测,得到对象检测框;an object detection module 740, configured to use an object detection model to perform object detection on the stitched image to obtain an object detection frame;
结果映射模块750,用于将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果。The result mapping module 750 is configured to map the object detection frame back to the fisheye image according to the corresponding cropping angle to obtain an object detection result.
相关细节参考上述方法实施例。For relevant details, refer to the above method embodiments.
需要说明的是:上述实施例中提供的鱼眼图像中的对象检测装置在进行鱼眼图像中的对象检测时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将鱼眼图像中的对象检测装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的鱼眼图像中的对象检测装置与鱼眼图像中的对象检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: when the object detection device in the fisheye image provided in the above embodiment performs object detection in the fisheye image, only the division of the above functional modules is used as an example for illustration. The above-mentioned function distribution is completed by different function modules, that is, the internal structure of the object detection device in the fisheye image is divided into different function modules, so as to complete all or part of the functions described above. In addition, the apparatus for detecting objects in fisheye images provided by the above embodiments and the method embodiments for detecting objects in fisheye images belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
图8是本申请一个实施例提供的鱼眼图像中的对象检测装置的框图,该装置可以智能手机、平板电脑、笔记本电脑、台式电脑或服务器等,本实施例不对对象检测装置的设备类型作限定。该装置至少包括处理器801和存储器802。FIG. 8 is a block diagram of an object detection device in a fisheye image provided by an embodiment of the present application. The device may be a smartphone, tablet computer, notebook computer, desktop computer, or server, etc. This embodiment does not affect the device type of the object detection device. limited. The apparatus includes at least a processor 801 and a memory 802 .
处理器801可以包括一个或多个处理核心,比如:4核心处理器、8核心处理器等。处理器801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器801可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish. The processor 801 may also include a main processor and a coprocessor. The main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
存储器802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设 备。在一些实施例中,存储器802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器801所执行以实现本申请中方法实施例提供的鱼眼图像中的对象检测方法。Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in the memory 802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 801 to implement the fisheye provided by the method embodiments in this application. Object detection methods in images.
在一些实施例中,鱼眼图像中的对象检测装置还可选包括有:***设备接口和至少一个***设备。处理器801、存储器802和***设备接口之间可以通过总线或信号线相连。各个***设备可以通过总线、信号线或电路板与***设备接口相连。示意性地,***设备包括但不限于:射频电路、触摸显示屏、音频电路、和电源等。In some embodiments, the object detection apparatus in the fisheye image may optionally further include: a peripheral device interface and at least one peripheral device. The processor 801, the memory 802 and the peripheral device interface can be connected through a bus or a signal line. Each peripheral device can be connected to the peripheral device interface through bus, signal line or circuit board. Illustratively, peripheral devices include, but are not limited to, radio frequency circuits, touch display screens, audio circuits, and power supplies.
当然,鱼眼图像中的对象检测装置还可以包括更少或更多的组件,本实施例对此不作限定。Certainly, the object detection apparatus in the fisheye image may further include fewer or more components, which is not limited in this embodiment.
可选地,本申请还提供有一种计算机可读存储介质,所述计算机可读存储介质中存储有程序,所述程序由处理器加载并执行以实现上述方法实施例的鱼眼图像中的对象检测方法。Optionally, the present application further provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to realize the object in the fisheye image of the above method embodiment. Detection method.
可选地,本申请还提供有一种计算机产品,该计算机产品包括计算机可读存储介质,所述计算机可读存储介质中存储有程序,所述程序由处理器加载并执行以实现上述方法实施例的鱼眼图像中的对象检测方法。Optionally, the present application further provides a computer product, the computer product includes a computer-readable storage medium, and a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the above method embodiments Object detection methods in fisheye images.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-described embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above-described embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be regarded as the scope described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (12)

  1. 一种鱼眼图像中的对象检测方法,其特征在于,所述方法包括:A method for object detection in fisheye images, characterized in that the method comprises:
    获取鱼眼图像,所述鱼眼图像包括平面内角度不同的多个对象区域,对象区域的角度是指所述对象区域中的对象相对于所述鱼眼图像的圆心的角度;acquiring a fisheye image, where the fisheye image includes a plurality of object regions with different angles in the plane, and the angle of the object region refers to the angle of the object in the object region relative to the center of the fisheye image;
    基于所述鱼眼图像的圆心按照多个裁剪角度对所述鱼眼图像进行图像裁剪,得到裁剪图像;所述裁剪角度包括所述多个对象区域对应的多个角度;Perform image cropping on the fisheye image according to a plurality of cropping angles based on the center of the fisheye image to obtain a cropped image; the cropping angles include a plurality of angles corresponding to the plurality of object regions;
    将所述裁剪图像进行拼接,得到拼接图像;Stitching the cropped images to obtain a stitched image;
    使用对象检测模型对所述拼接图像进行对象检测,得到对象检测框;using an object detection model to perform object detection on the stitched image to obtain an object detection frame;
    将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果。The object detection frame is mapped back to the fisheye image according to the corresponding cropping angle to obtain an object detection result.
  2. 根据权利要求1所述的方法,其特征在于,所述使用对象检测模型对所述拼接图像进行对象检测,得到对象检测框之前,还包括:The method according to claim 1, wherein, before the object detection is performed on the stitched image by using an object detection model to obtain an object detection frame, the method further comprises:
    获取训练数据,所述训练数据包括尺寸不同的多张对象图像和每张对象图像对应的对象标注框;Acquiring training data, the training data includes a plurality of object images with different sizes and an object labeling frame corresponding to each object image;
    获取预设的神经网络结构;所述神经网络结构包括特征检测网络和单步检测网络,所述特征检测网络用于提取对象特征,所述单步检测网络用于基于每个对象特征确定对象锚框;Obtain a preset neural network structure; the neural network structure includes a feature detection network and a single-step detection network, the feature detection network is used for extracting object features, and the single-step detection network is used for determining object anchors based on each object feature frame;
    将所述对象图像输入所述神经网络结构,得到多个对象锚框;Inputting the object image into the neural network structure to obtain a plurality of object anchor frames;
    将所述多个对象锚框与对应的对象标注框进行样本匹配,得到目标对象锚框;performing sample matching on the multiple object anchor frames and corresponding object annotation frames to obtain target object anchor frames;
    基于预设的损失函数确定所述目标对象锚框和对应的对象标注结果之间的差异;Determine the difference between the target object anchor frame and the corresponding object annotation result based on a preset loss function;
    根据所述目标对象锚框和对应的对象标注结果之间的差异,对所述神经网络结构进行训练,得到所述对象检测模型。According to the difference between the target object anchor frame and the corresponding object annotation result, the neural network structure is trained to obtain the object detection model.
  3. 根据权利要求2所述的方法,其特征在于,所述特征检测网络包括第一阶段特征金字塔和第二阶段特征金字塔;The method according to claim 2, wherein the feature detection network comprises a first-stage feature pyramid and a second-stage feature pyramid;
    所述第一阶段特征金字塔用于对输入的对象图像自下而上进行特征提取,得到多层特征图;The first-stage feature pyramid is used for bottom-up feature extraction of the input object image to obtain a multi-layer feature map;
    所述第二阶段特征金字塔用于对输入的特征图自上而下进行特征提取,并将提取到的特征与所述第一阶段特征金字塔对应层的特征图进行结合,得到多层特征图。The second-stage feature pyramid is used to extract features from the input feature map from top to bottom, and combine the extracted features with the feature maps of the corresponding layers of the first-stage feature pyramid to obtain a multi-layer feature map.
  4. 根据权利要求3所述的方法,其特征在于,所述将所述多个对象锚框与对应的对象标注框进行样本匹配,得到目标对象锚框,包括:The method according to claim 3, wherein, performing sample matching on the multiple object anchor frames and corresponding object annotation frames to obtain target object anchor frames, comprising:
    确定每层特征图中每个对象锚框与对应的对象标注框之间的交并比;Determine the intersection ratio between each object anchor box and the corresponding object annotation box in the feature map of each layer;
    对于每个对象标注框,将与所述对象标注框的交并比最高的对象锚框确定为与所述对象标注框相匹配的目标对象锚框;For each object annotation frame, determine the object anchor frame with the highest intersection ratio with the object annotation frame as the target object anchor frame matched with the object annotation frame;
    对于前n层特征图中的每层特征图,将所述特征图上未匹配到对象标注框的对象锚框的交并比与第一阈值进行比较;将交并比大于所述第一阈值的对象锚框确定为对应的对象标注框的目标对象锚框;所述n为正整数;For each layer of feature maps in the feature maps of the first n layers, compare the intersection ratio of the object anchor boxes that are not matched to the object annotation frame on the feature map with the first threshold; make the intersection ratio greater than the first threshold The object anchor frame is determined as the target object anchor frame of the corresponding object annotation frame; the n is a positive integer;
    对于位于前n层特征图之下的每层特征图,将所述特征图上未匹配到对象标注框的对象锚框的交并比与第二阈值进行比较;将交并比大于所述第二阈值的对象锚框确定为对应的对象标注框的目标对象锚框;For each layer of feature maps located under the first n layers of feature maps, compare the intersection ratio of the object anchor frame that does not match the object annotation frame on the feature map with the second threshold; The object anchor frame of the second threshold is determined as the target object anchor frame of the corresponding object annotation frame;
    其中,所述第一阈值大于所述第二阈值。Wherein, the first threshold is greater than the second threshold.
  5. 根据权利要求2所述的方法,其特征在于,所述对象锚框的锚框尺寸基于所述对象锚框所属的特征图相对于原图的步长确定,所述特征图为所述特征检测网络输出的图像。The method according to claim 2, wherein the anchor frame size of the object anchor frame is determined based on the step size of the feature map to which the object anchor frame belongs relative to the original image, and the feature map is the feature detection The image output by the network.
  6. 根据权利要求2所述的方法,其特征在于,所述获取训练数据,包括:The method according to claim 2, wherein the acquiring training data comprises:
    获取原始的对象图像,所述原始的对象图像上包括对象标注框;Obtaining an original object image, the original object image includes an object annotation frame;
    对所述原始的对象图像进行图像增广处理,得到所述训练数据;Perform image augmentation processing on the original object image to obtain the training data;
    其中,所述增广处理包括以下方式中的至少一种:对所述原始的对象图像进行随机扩增;对所述原始的对象图像进行随机裁剪;对扩增后的对象图像进行随机裁剪;对所述原始的对象图像、随机扩增后的对象图像、和/或随机裁剪后的对象图像进行水平翻转。Wherein, the augmentation processing includes at least one of the following manners: randomly augmenting the original object image; randomly cropping the original object image; randomly cropping the augmented object image; Horizontal flipping is performed on the original object image, the randomly augmented object image, and/or the randomly cropped object image.
  7. 根据权利要求2所述的方法,其特征在于,所述损失函数包括交叉损失函数和smoothL1损失函数;The method according to claim 2, wherein the loss function comprises a cross loss function and a smoothL1 loss function;
    所述交叉损失函数通过下式表示:The crossover loss function is represented by:
    L cls=ylogf+(1-y)log(1-f) L cls =ylogf+(1-y)log(1-f)
    其中,f为所述神经网络结构输出的对象置信度,y为对象的类别,y为1表示是对象,y为0表示不是对象;Wherein, f is the confidence level of the object output by the neural network structure, y is the category of the object, y is 1 to indicate that it is an object, and y is 0 to indicate that it is not an object;
    所述smoothL1损失函数通过下式表示:The smoothL1 loss function is represented by:
    Figure PCTCN2020121513-appb-100001
    Figure PCTCN2020121513-appb-100001
    其中,所述x为目标对象锚框和对应的对象标注结果之间的差值。Wherein, the x is the difference between the target object anchor frame and the corresponding object annotation result.
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果,包括:The method according to any one of claims 1 to 7, wherein the object detection result is obtained by mapping the object detection frame back to the fisheye image according to the corresponding cropping angle, comprising:
    基于非极大值抑制算法对多个对象检测框进行筛选;Screening multiple object detection frames based on non-maximum suppression algorithm;
    将筛选后的对象检测框映射回所述鱼眼图像。Map the filtered object detection boxes back to the fisheye image.
  9. 根据权利要求1至7任一所述的方法,其特征在于,所述将所述对象检测框按照对应的裁剪角度映射回所述鱼眼图像,得到对象检测结果,包括:The method according to any one of claims 1 to 7, wherein the object detection result is obtained by mapping the object detection frame back to the fisheye image according to the corresponding cropping angle, comprising:
    对于位于所述拼接图像的图像拼接位置的多个对象检测框,将面积最大的对象检测框映射回所述鱼眼图像,得到所述对象检测结果。For a plurality of object detection frames located at the image stitching position of the stitched image, the object detection frame with the largest area is mapped back to the fisheye image to obtain the object detection result.
  10. 根据权利要求1至7任一所述的方法,其特征在于,所述多个对象区域以所述圆心为中心点分布,所述基于所述鱼眼图像的圆心按照多个裁剪角度对所述鱼眼图像进行图像裁剪,得到裁剪图像,包括:The method according to any one of claims 1 to 7, wherein the multiple object regions are distributed with the center of the circle as the center point, and the center of the fisheye image is based on the center of the circle according to multiple cropping angles. Crop the fisheye image to get the cropped image, including:
    将所述圆心下方、且与所述圆心之间的垂直距离作为预设距离,确定裁剪区域的上边缘;Using the vertical distance below the center of the circle and between the center of the circle as a preset distance, determine the upper edge of the cropping area;
    基于所述上边缘和预设裁剪尺寸,得到裁剪区域;Based on the upper edge and the preset crop size, obtain a crop area;
    将所述裁剪区域以所述圆心为中心点进行旋转,得到旋转后的裁剪区域;Rotate the cropped area with the center of the circle as the center point to obtain the rotated cropped area;
    在所述鱼眼图像上按照所述裁剪区域和所述旋转后的裁剪区域进行图像裁剪,得到所述裁剪图像。Image cropping is performed on the fisheye image according to the cropping area and the rotated cropping area to obtain the cropped image.
  11. 一种鱼眼图像中的对象检测装置,其特征在于,所述装置包括处理器和存储器;所述存储器中存储有程序,所述程序由所述处理器加载并执行以实现如权利要求1至10任一项所述的鱼眼图像中的对象检测方法。A device for detecting objects in a fisheye image, characterized in that the device comprises a processor and a memory; the memory stores a program, and the program is loaded and executed by the processor to realize claims 1 to 10 The object detection method in any one of the fisheye images.
  12. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有程序,所述程序被处理器执行时用于实现如权利要求1至10任一项所述的鱼眼图像中的对象检测方法。A computer-readable storage medium, wherein a program is stored in the storage medium, and when the program is executed by a processor, the program is used to realize the object in the fisheye image according to any one of claims 1 to 10 Detection method.
PCT/CN2020/121513 2020-06-29 2020-10-16 Method and apparatus for detecting object in fisheye image, and storage medium WO2022000862A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010603240.6 2020-06-29
CN202010603240.6A CN111754394B (en) 2020-06-29 2020-06-29 Method and device for detecting object in fisheye image and storage medium

Publications (1)

Publication Number Publication Date
WO2022000862A1 true WO2022000862A1 (en) 2022-01-06

Family

ID=72677873

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121513 WO2022000862A1 (en) 2020-06-29 2020-10-16 Method and apparatus for detecting object in fisheye image, and storage medium

Country Status (2)

Country Link
CN (1) CN111754394B (en)
WO (1) WO2022000862A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063838A (en) * 2022-06-15 2022-09-16 北京市地铁运营有限公司 Method and system for detecting fisheye distortion image
CN116012721A (en) * 2023-03-28 2023-04-25 浙江大学湖州研究院 Deep learning-based rice leaf spot detection method
CN117455940A (en) * 2023-12-25 2024-01-26 四川汉唐云分布式存储技术有限公司 Cloud-based customer behavior detection method, system, equipment and storage medium
CN117649737A (en) * 2024-01-30 2024-03-05 云南电投绿能科技有限公司 Method, device, equipment and storage medium for monitoring equipment in park
CN117876822A (en) * 2024-03-11 2024-04-12 盛视科技股份有限公司 Target detection migration training method applied to fish eye scene

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754394B (en) * 2020-06-29 2022-06-10 苏州科达科技股份有限公司 Method and device for detecting object in fisheye image and storage medium
CN112101361B (en) * 2020-11-20 2021-04-23 深圳佑驾创新科技有限公司 Target detection method, device and equipment for fisheye image and storage medium
CN114616586A (en) * 2020-12-15 2022-06-10 深圳市大疆创新科技有限公司 Image annotation method and device, electronic equipment and computer-readable storage medium
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN113791055B (en) * 2021-08-17 2024-05-14 北京农业信息技术研究中心 Fish freshness detection method and system
CN114004986A (en) * 2021-10-29 2022-02-01 北京百度网讯科技有限公司 Image processing method, training method, device, equipment and medium for detection model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011061511A (en) * 2009-09-10 2011-03-24 Dainippon Printing Co Ltd Fish-eye monitoring system
JP2012226645A (en) * 2011-04-21 2012-11-15 Sony Corp Image processing apparatus, image processing method, recording medium, and program
CN102831386A (en) * 2011-04-26 2012-12-19 日立信息通讯工程有限公司 Object recognition method and recognition apparatus
CN111260539A (en) * 2020-01-13 2020-06-09 魔视智能科技(上海)有限公司 Fisheye pattern target identification method and system
CN111754394A (en) * 2020-06-29 2020-10-09 苏州科达科技股份有限公司 Method and device for detecting object in fisheye image and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491762B (en) * 2017-08-23 2018-05-15 珠海安联锐视科技股份有限公司 A kind of pedestrian detection method
CN110349077B (en) * 2018-04-02 2023-04-07 杭州海康威视数字技术股份有限公司 Panoramic image synthesis method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011061511A (en) * 2009-09-10 2011-03-24 Dainippon Printing Co Ltd Fish-eye monitoring system
JP2012226645A (en) * 2011-04-21 2012-11-15 Sony Corp Image processing apparatus, image processing method, recording medium, and program
CN102831386A (en) * 2011-04-26 2012-12-19 日立信息通讯工程有限公司 Object recognition method and recognition apparatus
CN111260539A (en) * 2020-01-13 2020-06-09 魔视智能科技(上海)有限公司 Fisheye pattern target identification method and system
CN111754394A (en) * 2020-06-29 2020-10-09 苏州科达科技股份有限公司 Method and device for detecting object in fisheye image and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063838A (en) * 2022-06-15 2022-09-16 北京市地铁运营有限公司 Method and system for detecting fisheye distortion image
CN116012721A (en) * 2023-03-28 2023-04-25 浙江大学湖州研究院 Deep learning-based rice leaf spot detection method
CN117455940A (en) * 2023-12-25 2024-01-26 四川汉唐云分布式存储技术有限公司 Cloud-based customer behavior detection method, system, equipment and storage medium
CN117455940B (en) * 2023-12-25 2024-02-27 四川汉唐云分布式存储技术有限公司 Cloud-based customer behavior detection method, system, equipment and storage medium
CN117649737A (en) * 2024-01-30 2024-03-05 云南电投绿能科技有限公司 Method, device, equipment and storage medium for monitoring equipment in park
CN117649737B (en) * 2024-01-30 2024-04-30 云南电投绿能科技有限公司 Method, device, equipment and storage medium for monitoring equipment in park
CN117876822A (en) * 2024-03-11 2024-04-12 盛视科技股份有限公司 Target detection migration training method applied to fish eye scene
CN117876822B (en) * 2024-03-11 2024-05-28 盛视科技股份有限公司 Target detection migration training method applied to fish eye scene

Also Published As

Publication number Publication date
CN111754394B (en) 2022-06-10
CN111754394A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
WO2022000862A1 (en) Method and apparatus for detecting object in fisheye image, and storage medium
WO2020119527A1 (en) Human action recognition method and apparatus, and terminal device and storage medium
TWI766201B (en) Methods and devices for biological testing and storage medium thereof
US20220230282A1 (en) Image processing method, image processing apparatus, electronic device and computer-readable storage medium
AU2019268184B2 (en) Precise and robust camera calibration
CN111583097A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111667030B (en) Method, system and storage medium for realizing remote sensing image target detection based on deep neural network
CN110276831B (en) Method and device for constructing three-dimensional model, equipment and computer-readable storage medium
CN109754461A (en) Image processing method and related product
CN111523439B (en) Method, system, device and medium for target detection based on deep learning
CN112711034B (en) Object detection method, device and equipment
CN111079739A (en) Multi-scale attention feature detection method
CN114255197B (en) Infrared and visible light image self-adaptive fusion alignment method and system
KR20210029692A (en) Method and storage medium for applying bokeh effect to video images
CN111325107A (en) Detection model training method and device, electronic equipment and readable storage medium
CN114399781A (en) Document image processing method and device, electronic equipment and storage medium
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
Chen et al. Coupled global–local object detection for large vhr aerial images
CN113228105A (en) Image processing method and device and electronic equipment
CN116760937B (en) Video stitching method, device, equipment and storage medium based on multiple machine positions
US20230053952A1 (en) Method and apparatus for evaluating motion state of traffic tool, device, and medium
CN116682105A (en) Millimeter wave radar and visual feature attention fusion target detection method
CN116310899A (en) YOLOv 5-based improved target detection method and device and training method
CN116310105A (en) Object three-dimensional reconstruction method, device, equipment and storage medium based on multiple views
CN115527082A (en) Deep learning small target detection method based on image multi-preprocessing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20942609

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20942609

Country of ref document: EP

Kind code of ref document: A1