CN112561889A - Target detection method and device, electronic equipment and storage medium - Google Patents

Target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112561889A
CN112561889A CN202011510808.6A CN202011510808A CN112561889A CN 112561889 A CN112561889 A CN 112561889A CN 202011510808 A CN202011510808 A CN 202011510808A CN 112561889 A CN112561889 A CN 112561889A
Authority
CN
China
Prior art keywords
picture
standard
candidate frame
frame
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011510808.6A
Other languages
Chinese (zh)
Inventor
吴晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Shenzhen Saiante Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Saiante Technology Service Co Ltd filed Critical Shenzhen Saiante Technology Service Co Ltd
Priority to CN202011510808.6A priority Critical patent/CN112561889A/en
Publication of CN112561889A publication Critical patent/CN112561889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of image detection, and discloses a target object detection method, which comprises the following steps: performing size standardization processing on a picture to be detected to obtain a standard picture, and extracting picture features in the standard picture through a feature extraction network to obtain a feature picture; performing target detection on the feature picture by using a region generation network to generate a candidate frame, and pooling the candidate frame into a fixed size by using a region feature aggregation algorithm to obtain a standard candidate frame; performing regression and classification on the standard candidate frame to obtain a target object candidate frame; and performing coordinate mapping on the picture to be detected according to the target object candidate frame, and marking a target object detection result in the picture to be detected. The invention also relates to a block chain technology, and the picture to be detected can be stored in the block chain node. The invention also provides a target object detection device, electronic equipment and a storage medium. The embodiment of the invention solves the problem that the target detection result is inaccurate when the target object is fuzzy.

Description

Target detection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of image detection technologies, and in particular, to a method and an apparatus for detecting a target object, an electronic device, and a computer-readable storage medium.
Background
The application lane is a special lane used by vehicles for processing emergency affairs, such as engineering rescue, medical rescue, policemen executing emergency official affairs and the like. The traffic law stipulates that the motor vehicle cannot occupy an emergency lane under non-special conditions in the driving process, otherwise, penalty is given. At present, a deep learning method based on a regional convolutional neural network can be adopted for the detection of the emergency lane. Although the deep learning method can achieve high accuracy in a simple scene, the accuracy is relatively low in the scenes of haze, rainy days, nights, fuzzy emergency lane lines and the like.
Disclosure of Invention
The invention provides a target detection method, a target detection device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of target detection.
In order to achieve the above object, the present invention provides a target detection method, including:
acquiring a picture to be detected, performing size standardization processing on the picture to be detected to obtain a standard picture, and extracting picture characteristics in the standard picture through a pre-constructed characteristic extraction network to obtain a characteristic picture;
performing target detection on the feature picture by using a region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by using a region feature aggregation algorithm to obtain a standard candidate frame;
performing regression and classification on the standard candidate frame to obtain a target object candidate frame;
and performing coordinate mapping on the picture to be detected according to the target object candidate frame, and marking a target object detection result in the picture to be detected.
Optionally, the performing size standardization processing on the picture to be detected to obtain a standard picture includes:
judging whether the size of the picture to be detected is larger than the size of a standard picture input by a user;
when the size of the picture to be detected is larger than the size of the standard picture, performing cutting processing on the picture to be detected according to the size of the standard picture to obtain a standard picture;
and when the size of the picture to be detected is smaller than the size of the standard picture, performing filling processing on the picture to be detected according to the size of the standard picture to obtain the standard picture.
Optionally, the performing target detection on the feature picture by using the area generation network, and generating a candidate frame according to a detection result includes:
generating a preset number of anchor point frames with different scales and aspect ratios for each point on the feature picture;
inputting the anchor frame into a detection frame classification layer of a region generation network, classifying the detection frame classification layer, and judging whether a feature map in the anchor frame belongs to a foreground or a background;
inputting the anchor point frame into a detection frame regression layer of the area generation network to obtain coordinate information of the anchor point frame;
and selecting an anchor frame of the feature picture belonging to the foreground as a candidate frame, and displaying the candidate frame on the feature picture according to the corresponding coordinate value.
Optionally, the pooling of the candidate frames into a fixed size by using a regional feature clustering algorithm to obtain standard candidate frames includes:
dividing each of the candidate boxes into n x n fixed-size cells;
determining sampling points in each unit according to a preset rule, calculating pixel values of the sampling points by using a bilinear interpolation method, and performing maximum pooling operation on the pixel values of the sampling points to select pixel points with maximum pixel values in the sampling points;
and obtaining a standard candidate frame corresponding to each candidate frame according to the selected pixel points.
Optionally, the performing regression and classification on the standard candidate frame to obtain a target candidate frame includes:
obtaining an offset predicted value of the standard candidate frame relative to the actual position by using a frame regression function so as to correct the standard candidate frame;
inputting the standard candidate box into a full-link layer and a softmax function in a pre-trained neural network, calculating the category to which the feature map in the standard candidate box belongs, outputting the score of the category, and obtaining the target detection box according to the score.
Optionally, before the picture features in the standard picture are extracted through the pre-constructed feature extraction network to obtain the feature picture, the method further includes:
constructing a first convolution layer according to convolution operation, normalization operation and activation operation;
constructing a second convolution layer by using the combination function and the addition function;
and constructing the feature extraction network according to the first convolution layer and the second convolution layer.
Optionally, the target object is an application lane.
In order to solve the above problems, the present invention also provides a target detection apparatus, comprising:
the image feature extraction module is used for performing size standardization processing on the image to be detected to obtain a standard image, and extracting image features in the standard image through a pre-constructed feature extraction network to obtain a feature image;
the candidate frame generation module is used for carrying out target detection on the feature picture by utilizing a region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by utilizing a region feature aggregation algorithm to obtain a standard candidate frame;
the classification regression module is used for performing regression and classification on the standard candidate frame to obtain a target object candidate frame;
and the candidate frame mapping module is used for executing coordinate mapping on the picture to be detected according to the target object candidate frame and marking a target object detection result in the picture to be detected.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform a method of object detection as described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium comprising a storage data area and a storage program area, wherein the storage data area stores created data, and the storage program area stores a computer program; wherein the computer program, when executed by a processor, implements a method of object detection as described above.
The embodiment of the invention extracts the features of the picture to be detected through the pre-constructed feature extraction network, enhances the feature expression capability in difficult scenes, thereby improving the overall accuracy of target object detection, such as emergency lane detection, simultaneously utilizes the region generation network and the candidate frame generated by the region feature set algorithm, and regresses and classifies the candidate frame, thereby effectively relieving the problem of pixel deviation, improving the regression positioning of the candidate frame for target object detection, and further improving the overall accuracy of target object detection. Therefore, the target detection method, the target detection device and the computer-readable storage medium provided by the embodiment of the invention can improve the accuracy of target detection.
Drawings
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a detailed implementation process of one step in the method for detecting a target object provided in FIG. 1;
FIG. 3 is a schematic diagram illustrating a detailed implementation of another step in the method for detecting a target object provided in FIG. 1;
FIG. 4 is a schematic view of another detailed implementation of another step in the method for detecting a target object provided in FIG. 1;
FIG. 5 is a schematic view of another detailed implementation of another step in the method for detecting a target object provided in FIG. 1;
fig. 6 is a schematic block diagram of a target detection apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an internal structure of an electronic device for implementing a target detection method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a target object detection method. The execution subject of the target object detection method includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiments of the present application. In other words, the target object detection method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention. In this embodiment, the target detection method includes:
s1, obtaining a picture to be detected, performing size standardization processing on the picture to be detected to obtain a standard picture, and extracting picture characteristics in the standard picture through a pre-constructed characteristic extraction network to obtain a characteristic picture.
In the embodiment of the invention, various optical camera devices can be used for acquiring the picture to be detected, for example, a traffic monitoring camera is used for shooting emergency lanes at intervals of a specified time period, the shot emergency lane picture is uploaded to a database, and finally the emergency lane picture in the database is summarized to obtain the picture to be detected.
In detail, referring to fig. 2, the performing of the size normalization process on the picture to be detected includes:
s10, judging whether the size of the picture to be detected is larger than the size of the standard picture input by the user;
s11, when the size of the picture to be detected is larger than the size of the standard picture, cutting the picture to be detected according to the size of the standard picture to obtain the standard picture;
and S12, when the size of the picture to be detected is smaller than the size of the standard picture, performing filling processing on the picture to be detected according to the size of the standard picture to obtain the standard picture.
For example, the standard picture size may be set to 1000 × 600. When the size of the picture to be detected is 1200 x 1200, a standard picture can be obtained by cutting through a center cutting method by taking the center of the picture as an original point, the length of the picture is 1000 and the width of the picture is 600; when the size of the picture to be detected is 800 × 400, the picture frame is used as a boundary by an edge expansion filling method, and the picture is expanded outwards by using a preset pixel value until the expanded size is 1000 in length and 600 in width, so that a standard picture is obtained.
In an embodiment of the present invention, the feature extraction network may be a DarkNet63 network, and further, in one embodiment of the present invention, before the extracting the picture features in the standard picture by using the pre-constructed feature extraction network to obtain the feature picture, the constructing of the feature extraction network is further included.
In detail, the feature extraction network is constructed by the following method: constructing a first convolution layer according to convolution operation (Conv), normalization operation (BN) and activation operation (Leaky relu); combining the first convolutional layer using a combining function (Concat) and an adding function (Add) to construct a second convolutional layer; and constructing the feature extraction network according to the first convolution layer and the second convolution layer.
The convolution operation is a 2D convolution operation and is used for obtaining a feature map by utilizing 2D convolution kernels with different effects through convolution from the standard picture; the normalization operation is used for reducing the pixel values of the pixel points in the characteristic diagram by utilizing a normalization function; the activation operation reduces the area size of the feature map by using an activation function; the merge function is used to connect two or more first convolution layers, and the add function is used to add the first convolution layers into a flow execution.
And S2, performing target detection on the feature picture by using the region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by using a region feature clustering algorithm to obtain a standard candidate frame.
In the embodiment of the present invention, the RPN (regional pro-portal Network) Network is configured to select a candidate frame from the feature picture.
Referring to fig. 3, the performing target detection on the feature picture by using the area generation network and generating a candidate frame according to a detection result includes:
s20, generating a preset number of anchor boxes (anchor boxes) with different scales and aspect ratios for each point on the feature picture;
s21, inputting the anchor point frame into a detection classification layer of a region generation network for classification, and judging whether a feature map in the anchor point frame belongs to a foreground or a background;
s22, inputting the anchor point frame to a detection frame regression layer of the area generation network to obtain coordinate information of the anchor point frame;
s23, selecting an anchor frame of the feature map belonging to the foreground as a candidate frame, and displaying the candidate frame on the feature map according to the corresponding coordinate value.
In the embodiment of the present invention, for each point on the feature picture, 9 anchor blocks with different scales and aspect ratios may be generated. Wherein, the 9 anchor boxes are obtained by 3 different sizes and 3 different proportions, for example: the 3 sizes are 8, 16 and 32 (may be set to other sizes), and the 3 different ratios are 1:1, 1:2 and 2:1 (may be set to other ratios), so that the resulting 9 anchor boxes x are (8 × 8, 8 × 16, 16 × 8, 16 × 32, 32 × 16, 32 × 32, 32 × 64 and 64 × 32).
Further, in the embodiment of the present invention, the detection classification layer and the detection frame regression layer of the area generation network are used to classify and identify the obtained feature map content framed in all anchor frames, determine whether the foreground or the background is obtained, obtain the coordinate information of the anchor frame, further select an anchor frame of which the feature map belongs to the foreground as a candidate frame, and display the candidate frame on the feature map according to the corresponding coordinate value.
Further, referring to fig. 4, in an embodiment of the present invention, the pooling the candidate frames into a fixed size by using a regional feature aggregation algorithm to obtain a standard candidate frame includes:
s24, dividing each candidate frame into n x n units with fixed size;
s25, determining sampling points in each unit according to a preset rule, calculating pixel values of the sampling points by using a bilinear interpolation method, and performing maximum pooling operation on each block to select pixel points with maximum pixel values in the sampling points;
and S26, obtaining a standard candidate frame corresponding to each candidate frame according to the selected pixel points.
The embodiment of the invention executes the maximum pooling operation on each unit to select the pixel point with the maximum pixel value in the sampling points, reserves the candidate frame containing the pixel point with the maximum pixel value, and eliminates the candidate frame without the pixel point with the maximum pixel value to obtain the standard candidate frame.
The bilinear interpolation method is to perform linear interpolation in two directions respectively, and to determine the pixel values of sampling points by using the intersection points generated by the two linear interpolations as the sampling points.
And S3, performing regression and classification on the standard candidate frame to obtain a target object candidate frame.
In detail, referring to fig. 5, the performing regression and classification on the standard candidate frame to obtain the target candidate frame includes:
s30, obtaining a predicted offset value of the standard candidate frame relative to the actual position by using a frame regression function so as to correct the standard candidate frame;
s31, inputting the standard candidate box into a full connection layer and a softmax function in a pre-trained neural network, and calculating the category of the feature map in the standard candidate box to obtain a target detection box.
In the embodiment of the present invention, the standard candidate frame is generally represented by a four-dimensional vector (x, y, w, h), and the center coordinates (x, y), the width w, and the height h of the standard candidate frame are respectively represented. Using a to represent standard candidate box embodiments of the present invention seek the transformation relationship F using a box regression function such that the standard candidate box is modified to obtain an actual candidate box G, namely:
given A ═ Ax,Ay,Aw,Ah),G=(Gx,Gy,Gw,Gh),F(Ax,Ay,Aw,Ah)=(Gx,Gy,Gw,Gh);
F (a) ═ G by translation and scaling;
translation: gx=Ax+Aw·dx(A),Gy=Ay+Ah·dy(A);
Zooming: gw=Aw·exp(dw(A)),Gh=Ah·exp(dh(A));
In the embodiment of the invention, d is obtained by calculating a frame regression functionx(A),dy(A),dw(A),dh(A) Thereby implementing the correction of the standard candidate box.
Further, the modified standard candidate frame is input into a full connection layer and a softmax function in a pre-trained neural network, the category to which the feature map in the standard candidate frame belongs is calculated, and a target detection frame is obtained, for example, in one embodiment of the present invention, the feature map in the standard candidate frame is classified into types of an automobile, a street lamp, an indicator, a normal driving lane, an emergency lane, and the like, and the standard candidate frame classified into the emergency lane is obtained as the target detection frame.
And S4, performing coordinate mapping of the picture to be detected according to the target object candidate frame, and marking a target object detection result in the picture to be detected.
The embodiment of the invention executes coordinate mapping to map the target object candidate frame into the picture to be detected so as to mark the identified target, such as an emergency lane, in the picture to be detected.
The embodiment of the invention extracts the features of the picture to be detected through the pre-constructed feature extraction network, enhances the feature expression capability in difficult scenes, thereby improving the overall accuracy of target object detection, such as emergency lane detection, simultaneously utilizes the region generation network and the candidate frame generated by the region feature set algorithm, and regresses and classifies the candidate frame, thereby effectively relieving the problem of pixel deviation, improving the regression positioning of the candidate frame for target object detection, and further improving the overall accuracy of target object detection. Therefore, the target detection accuracy can be improved by the embodiment of the invention.
Fig. 6 is a schematic block diagram of a target detection device according to the present invention.
The object detection device 100 according to the present invention may be installed in an electronic apparatus. According to the implemented functions, the object detection device 100 may include a picture feature extraction module 101, a candidate frame generation module 102, a classification regression 103, and a candidate frame mapping 104. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the picture feature extraction module 101 is configured to perform size standardization processing on the picture to be detected to obtain a standard picture, and extract picture features in the standard picture through a pre-constructed feature extraction network to obtain a feature picture.
In the embodiment of the present invention, the picture feature extraction module 101 may use various optical cameras to obtain the picture to be detected, for example, a traffic monitoring camera is used to photograph emergency lanes at specified time intervals, and the photographed emergency lane picture is uploaded to a database, and finally the emergency lane picture in the database is summarized to obtain the picture to be detected.
In detail, the picture feature extraction module 101 performs size normalization on the picture to be detected by:
step A, judging whether the size of the picture to be detected is larger than the size of a standard picture input by a user;
b, when the size of the picture to be detected is larger than the size of the standard picture, performing cutting processing on the picture to be detected according to the size of the standard picture to obtain the standard picture;
and C, when the size of the picture to be detected is smaller than the size of the standard picture, filling the picture to be detected according to the size of the standard picture to obtain the standard picture.
For example, the standard picture size may be set to 1000 × 600. When the size of the picture to be detected is 1200 × 1200, the picture feature extraction module 101 may cut the picture to obtain a standard picture by a center cutting method, with the center of the picture as an origin, the length of 1000, and the width of 600; when the size of the picture to be detected is 800 × 400, the picture feature extraction module 101 may expand outward by using a preset pixel value by using a picture frame as a boundary through an edge expansion filling method until the expanded size is 1000 in length and 600 in width, so as to obtain a standard picture.
In an embodiment of the present invention, the feature extraction network may be a DarkNet63 network, and further, in one embodiment of the present invention, before the extracting the picture features in the standard picture by using the pre-constructed feature extraction network to obtain the feature picture, the constructing of the feature extraction network is further included.
In detail, the feature extraction network is constructed by the following method: the first convolution layer and the second convolution layer construct the first convolution layer according to convolution operation (Conv), normalization operation (BN) and activation operation (Leaky relu); combining the first convolutional layer using a combining function (Concat) and an adding function (Add) to construct a second convolutional layer; and constructing the feature extraction network according to the first convolution layer and the second convolution layer.
The convolution operation is a 2D convolution operation and is used for obtaining a feature map by utilizing 2D convolution kernels with different effects through convolution from the standard picture; the normalization operation is used for reducing the pixel values of the pixel points in the characteristic diagram by utilizing a normalization function; the activation operation reduces the area size of the feature map by using an activation function; the merge function is used to connect two or more first convolution layers, and the add function is used to add the first convolution layers into a flow execution.
The candidate frame generation module 102 is configured to perform target detection on the feature picture by using a region generation network, generate a candidate frame according to a detection result, and pool the candidate frame into a fixed size by using a region feature aggregation algorithm to obtain a standard candidate frame.
In the embodiment of the present invention, the RPN (regional pro-portal Network) Network is configured to select a candidate frame from the feature picture.
In detail, the candidate frame generation module 102 performs target detection on the feature picture by the following operations, and generates a candidate frame according to a detection result:
step a, generating a preset number of anchor boxes (anchor boxes) with different scales and aspect ratios for each point on the characteristic picture;
b, inputting the anchor frame into a detection classification layer of a region generation network for classification, and judging whether a feature map in the anchor frame belongs to a foreground or a background;
step c, inputting the anchor point frame into a detection frame regression layer of the area generation network to obtain coordinate information of the anchor point frame;
and d, selecting an anchor frame of which the feature map belongs to the foreground as a candidate frame, and displaying the candidate frame on the feature map according to the corresponding coordinate value.
In this embodiment of the present invention, for each point on the feature picture, the candidate frame generation module 102 may generate 9 anchor frames with different scales and aspect ratios. Wherein, the 9 anchor boxes are obtained by 3 different sizes and 3 different proportions, for example: the 3 sizes are 8, 16 and 32 (may be set to other sizes), and the 3 different ratios are 1:1, 1:2 and 2:1 (may be set to other ratios), so that the resulting 9 anchor boxes x are (8 × 8, 8 × 16, 16 × 8, 16 × 32, 32 × 16, 32 × 32, 32 × 64 and 64 × 32).
Further, in the embodiment of the present invention, the candidate frame generation module 102 performs classification and identification on the obtained feature map content framed in all anchor frames through a detection classification layer and a detection frame regression layer of the area generation network, determines whether the foreground or the background is obtained, obtains coordinate information of the anchor frame, further selects an anchor frame of which the feature map belongs to the foreground as a candidate frame, and displays the candidate frame on the feature map according to a corresponding coordinate value.
Further, in this embodiment of the present invention, the pooling of the candidate frames into a fixed size by using a regional feature clustering algorithm to obtain a standard candidate frame includes:
step e, dividing each candidate frame into n × n units with fixed size;
f, determining sampling points in each unit according to a preset rule, calculating pixel values of the sampling points by using a bilinear interpolation method, and executing maximum pooling operation on each block to select pixel points with maximum pixel values in the sampling points;
and g, obtaining a standard candidate frame corresponding to each candidate frame according to the selected pixel points.
The embodiment of the invention executes the maximum pooling operation on each unit to select the pixel point with the maximum pixel value in the sampling points, reserves the candidate frame containing the pixel point with the maximum pixel value, and eliminates the candidate frame without the pixel point with the maximum pixel value to obtain the standard candidate frame.
The bilinear interpolation method is to perform linear interpolation in two directions respectively, and to determine the pixel values of sampling points by using the intersection points generated by the two linear interpolations as the sampling points.
The classification regression module 103 is configured to perform regression and classification on the standard candidate frames to obtain target candidate frames.
In detail, the classification regression module 103 performs regression and classification on the standard candidate frames by the following method to obtain target candidate frames: obtaining an offset predicted value of the standard candidate frame relative to the actual position by using a frame regression function so as to correct the standard candidate frame; and inputting the standard candidate box into a full connection layer and a softmax function in a pre-trained neural network, and calculating the category of the feature map in the standard candidate box to obtain a target detection box.
In the embodiment of the present invention, the standard candidate frame is generally represented by a four-dimensional vector (x, y, w, h), and the center coordinates (x, y), the width w, and the height h of the standard candidate frame are respectively represented. The classification regression module 103 uses a to represent the standard candidate frame in the embodiment of the present invention, which uses the frame regression function to find the transformation relationship F, so that the standard candidate frame is corrected to obtain the actual candidate frame G, that is:
given A ═ Ax,Ay,Aw,Ah),G=(Gx,Gy,Gw,Gh),F(Ax,Ay,Aw,Ah)=(Gx,Gy,Gw,Gh);
F (a) ═ G by translation and scaling;
translation: gx=Ax+Aw·dx(A),Gy=Ay+Ah·dy(A);
Zooming: gw=Aw·exp(dw(A)),Gh=Ah·exp(dh(A));
The classification regression module 103 of the embodiment of the present invention obtains d by calculating a frame regression functionx(A),dy(A),dw(A),dh(A) Thereby implementing the correction of the standard candidate box.
Further, the classification regression module 103 according to an embodiment of the present invention inputs the modified standard candidate frame into a full connection layer and a softmax function in a pre-trained neural network, calculates a category to which a feature map in the standard candidate frame belongs, and obtains a target detection frame, for example, according to an embodiment of the present invention, the feature map in the standard candidate frame is classified into types of an automobile, a street lamp, an indicator, a normal driving lane, an emergency lane, and the like, and obtains the standard candidate frame classified as the emergency lane as the target detection frame.
The candidate frame mapping module 104 is configured to perform coordinate mapping on the picture to be detected according to the target object candidate frame, and mark a target object detection result in the picture to be detected.
The candidate frame mapping module 104 according to the embodiment of the present invention performs coordinate mapping to map the target candidate frame to the to-be-detected picture, so as to mark an identified target, such as an emergency lane, in the to-be-detected picture.
Fig. 7 is a schematic structural diagram of an electronic device for implementing the target detection method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as an object detection program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as a code of the object detection program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing a program or a module (for example, executing an object detection program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 7 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 7 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
An object detection program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:
acquiring a picture to be detected, performing size standardization processing on the picture to be detected to obtain a standard picture, and extracting picture characteristics in the standard picture through a pre-constructed characteristic extraction network to obtain a characteristic picture;
performing target detection on the feature picture by using a region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by using a region feature aggregation algorithm to obtain a standard candidate frame;
performing regression and classification on the standard candidate frame to obtain a target object candidate frame;
and performing coordinate mapping on the picture to be detected according to the target object candidate frame, and marking a target object detection result in the picture to be detected.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 5, which is not repeated herein.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring a picture to be detected, performing size standardization processing on the picture to be detected to obtain a standard picture, and extracting picture characteristics in the standard picture through a pre-constructed characteristic extraction network to obtain a characteristic picture;
performing target detection on the feature picture by using a region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by using a region feature aggregation algorithm to obtain a standard candidate frame;
performing regression and classification on the standard candidate frame to obtain a target object candidate frame;
and performing coordinate mapping on the picture to be detected according to the target object candidate frame, and marking a target object detection result in the picture to be detected.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of detecting a target, the method comprising:
acquiring a picture to be detected, performing size standardization processing on the picture to be detected to obtain a standard picture, and extracting picture characteristics in the standard picture through a pre-constructed characteristic extraction network to obtain a characteristic picture;
performing target detection on the feature picture by using a region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by using a region feature aggregation algorithm to obtain a standard candidate frame;
performing regression and classification on the standard candidate frame to obtain a target object candidate frame;
and performing coordinate mapping on the picture to be detected according to the target object candidate frame, and marking a target object detection result in the picture to be detected.
2. The method for detecting the target object according to claim 1, wherein the step of performing size normalization processing on the picture to be detected to obtain a standard picture comprises:
judging whether the size of the picture to be detected is larger than the size of a standard picture input by a user;
when the size of the picture to be detected is larger than the size of the standard picture, performing cutting processing on the picture to be detected according to the size of the standard picture to obtain a standard picture;
and when the size of the picture to be detected is smaller than the size of the standard picture, performing filling processing on the picture to be detected according to the size of the standard picture to obtain the standard picture.
3. The method for detecting the target object according to claim 1, wherein the performing target detection on the feature picture by using the area generation network and generating the candidate frame according to the detection result comprises:
generating a preset number of anchor point frames with different scales and aspect ratios for each point on the feature picture;
inputting the anchor frame into a detection frame classification layer of a region generation network, classifying the detection frame classification layer, and judging whether a feature map in the anchor frame belongs to a foreground or a background;
inputting the anchor point frame into a detection frame regression layer of the area generation network to obtain coordinate information of the anchor point frame;
and selecting an anchor frame of the feature picture belonging to the foreground as a candidate frame, and displaying the candidate frame on the feature picture according to the corresponding coordinate value.
4. The method of claim 3, wherein the pooling of the candidate boxes to a fixed size using a regional feature clustering algorithm results in standard candidate boxes comprising:
dividing each of the candidate boxes into n x n fixed-size cells;
determining sampling points in each unit according to a preset rule, calculating pixel values of the sampling points by using a bilinear interpolation method, and performing maximum pooling operation on the pixel values of the sampling points to select pixel points with maximum pixel values in the sampling points;
and obtaining a standard candidate frame corresponding to each candidate frame according to the selected pixel points.
5. The method of claim 1, wherein the step of performing regression and classification on the standard candidate frames to obtain target candidate frames comprises:
obtaining an offset predicted value of the standard candidate frame relative to the actual position by using a frame regression function so as to correct the standard candidate frame;
inputting the standard candidate box into a full-link layer and a softmax function in a pre-trained neural network, calculating the category to which the feature map in the standard candidate box belongs, outputting the score of the category, and obtaining the target detection box according to the score.
6. The method for detecting the target object according to any one of claims 1 to 5, wherein before the extracting the picture features in the standard picture through the pre-constructed feature extraction network to obtain the feature picture, the method further comprises:
constructing a first convolution layer according to convolution operation, normalization operation and activation operation;
constructing a second convolution layer by using the combination function and the addition function;
and constructing the feature extraction network according to the first convolution layer and the second convolution layer.
7. The object detection method according to any one of claims 1 to 5, wherein the object is an application lane.
8. An intelligent question-answering device based on big data, which is characterized by comprising:
the image feature extraction module is used for performing size standardization processing on the image to be detected to obtain a standard image, and extracting image features in the standard image through a pre-constructed feature extraction network to obtain a feature image;
the candidate frame generation module is used for carrying out target detection on the feature picture by utilizing a region generation network, generating a candidate frame according to a detection result, and pooling the candidate frame into a fixed size by utilizing a region feature aggregation algorithm to obtain a standard candidate frame;
the classification regression module is used for performing regression and classification on the standard candidate frame to obtain a target object candidate frame;
and the candidate frame mapping module is used for executing coordinate mapping on the picture to be detected according to the target object candidate frame and marking a target object detection result in the picture to be detected.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of object detection as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium comprising a storage data area and a storage program area, wherein the storage data area stores created data, and the storage program area stores a computer program; wherein the computer program, when executed by a processor, implements a method of object detection as claimed in any one of claims 1 to 7.
CN202011510808.6A 2020-12-18 2020-12-18 Target detection method and device, electronic equipment and storage medium Pending CN112561889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011510808.6A CN112561889A (en) 2020-12-18 2020-12-18 Target detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011510808.6A CN112561889A (en) 2020-12-18 2020-12-18 Target detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112561889A true CN112561889A (en) 2021-03-26

Family

ID=75030490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011510808.6A Pending CN112561889A (en) 2020-12-18 2020-12-18 Target detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112561889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420684A (en) * 2021-06-29 2021-09-21 深圳壹账通智能科技有限公司 Report recognition method and device based on feature extraction, electronic equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420684A (en) * 2021-06-29 2021-09-21 深圳壹账通智能科技有限公司 Report recognition method and device based on feature extraction, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN111652845A (en) Abnormal cell automatic labeling method and device, electronic equipment and storage medium
WO2021151277A1 (en) Method and apparatus for determining severity of damage on target object, electronic device, and storage medium
CN112699775A (en) Certificate identification method, device and equipment based on deep learning and storage medium
CN112137591B (en) Target object position detection method, device, equipment and medium based on video stream
CN111476225B (en) In-vehicle human face identification method, device, equipment and medium based on artificial intelligence
CN111311010A (en) Vehicle risk prediction method and device, electronic equipment and readable storage medium
CN112132216B (en) Vehicle type recognition method and device, electronic equipment and storage medium
CN112200189B (en) Vehicle type recognition method and device based on SPP-YOLOv and computer readable storage medium
CN111931729B (en) Pedestrian detection method, device, equipment and medium based on artificial intelligence
CN112541902A (en) Similar area searching method, similar area searching device, electronic equipment and medium
CN112528908A (en) Living body detection method, living body detection device, electronic apparatus, and storage medium
CN113487621A (en) Medical image grading method and device, electronic equipment and readable storage medium
CN111985449A (en) Rescue scene image identification method, device, equipment and computer medium
CN114708461A (en) Multi-modal learning model-based classification method, device, equipment and storage medium
CN114723636A (en) Model generation method, device, equipment and storage medium based on multi-feature fusion
CN113887439A (en) Automatic early warning method, device, equipment and storage medium based on image recognition
CN113190703A (en) Intelligent retrieval method and device for video image, electronic equipment and storage medium
CN112561889A (en) Target detection method and device, electronic equipment and storage medium
CN112528903A (en) Face image acquisition method and device, electronic equipment and medium
CN117197227A (en) Method, device, equipment and medium for calculating yaw angle of target vehicle
CN111652226B (en) Picture-based target identification method and device and readable storage medium
CN112434601B (en) Vehicle illegal detection method, device, equipment and medium based on driving video
CN115049836A (en) Image segmentation method, device, equipment and storage medium
CN114463685A (en) Behavior recognition method and device, electronic equipment and storage medium
CN113627394A (en) Face extraction method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination