CN115861316A - Pedestrian detection model training method and device and pedestrian detection method - Google Patents

Pedestrian detection model training method and device and pedestrian detection method Download PDF

Info

Publication number
CN115861316A
CN115861316A CN202310166416.XA CN202310166416A CN115861316A CN 115861316 A CN115861316 A CN 115861316A CN 202310166416 A CN202310166416 A CN 202310166416A CN 115861316 A CN115861316 A CN 115861316A
Authority
CN
China
Prior art keywords
pedestrian
leg
grounding point
characteristic
grounding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310166416.XA
Other languages
Chinese (zh)
Other versions
CN115861316B (en
Inventor
居聪
刘国清
杨广
王启程
郑伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Minieye Innovation Technology Co Ltd
Original Assignee
Shenzhen Minieye Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Minieye Innovation Technology Co Ltd filed Critical Shenzhen Minieye Innovation Technology Co Ltd
Priority to CN202310166416.XA priority Critical patent/CN115861316B/en
Publication of CN115861316A publication Critical patent/CN115861316A/en
Application granted granted Critical
Publication of CN115861316B publication Critical patent/CN115861316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The application relates to a training method and a device of a pedestrian detection model and a pedestrian detection method. The method comprises the following steps: acquiring a plurality of images of the environment around the vehicle body, and splicing to obtain a bird-eye view image; acquiring a pedestrian leg grounding point marked in the aerial view image; inputting the aerial view image into a pedestrian detection model to be trained for processing, and outputting a response characteristic diagram and a representation characteristic diagram of the aerial view image; according to the response characteristic diagram and the characterization characteristic diagram, the predicted pedestrian leg grounding point and the pedestrian to which the pedestrian leg grounding point belongs are obtained; and adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians to obtain the trained pedestrian detection model. By adopting the method, the complexity of pedestrian detection can be reduced, and the accuracy of pedestrian detection is improved.

Description

Training method and device of pedestrian detection model and pedestrian detection method
Technical Field
The application relates to the technical field of pedestrian detection, in particular to a training method and a device of a pedestrian detection model and a pedestrian detection method.
Background
In the field of automatic driving, tasks such as detection and segmentation of the BEV space have received much attention. By means of a plurality of devices such as a panoramic camera and the like arranged on the self-vehicle, the self-vehicle has the capability of perceiving the surrounding environment through data acquisition, calibration and labeling, deep learning model training, test deployment and the like. Under the parking area environment, the speed of a motor vehicle is slow, shelters from seriously, for guarantee driving safety, needs to carry out pedestrian's detection.
In the correlation technique, the monocular 3D pedestrian detection technology is adopted to realize the detection of pedestrians, but the monocular 3D pedestrian detection technology needs to adopt a 3D frame to label the pedestrians, so that the regression information dimension is high, the 3D frame representation is also complex, the detection complexity is too high, and the detection accuracy is not high. Therefore, how to reduce the complexity of pedestrian detection and improve the detection accuracy becomes a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, it is necessary to provide a method and an apparatus for training a pedestrian detection model, and a method for detecting a pedestrian, which can improve the accuracy of pedestrian detection and reduce the complexity of pedestrian detection.
In a first aspect, the present application provides a method for training a pedestrian detection model. The method comprises the following steps:
acquiring a plurality of panoramic images of the surrounding environment of the vehicle body;
carrying out inverse perspective transformation on the plurality of panoramic images, and splicing to obtain a bird-eye view image;
acquiring a pedestrian leg grounding point marked in the aerial view image;
inputting the aerial view image into a pedestrian detection model to be trained for processing, and outputting a response characteristic diagram and a representation characteristic diagram of the aerial view image;
obtaining predicted pedestrian leg grounding points and pedestrians to which the pedestrian leg grounding points belong according to the response characteristic diagram and the characterization characteristic diagram;
adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, and obtaining a trained pedestrian detection model until the pedestrian detection model converges; the pedestrian detection model is used for pedestrian detection.
In one embodiment, the obtaining the predicted pedestrian leg grounding point and the pedestrian belonging to the pedestrian leg grounding point according to the response characteristic map and the characteristic map comprises:
obtaining a predicted grounding point of the leg of the pedestrian according to the response characteristic diagram;
acquiring a grounding point characteristic vector of the grounding point of the leg of the pedestrian on the characteristic feature diagram;
calculating the characteristic distance of the grounding points of any two pedestrian legs according to the grounding point characteristic vectors of the grounding points of any two pedestrian leg grounding points;
and determining whether the grounding points of the two pedestrian legs belong to the same pedestrian or not according to the characteristic distance of the grounding points of the any two pedestrian legs.
In one embodiment, the adjusting the pedestrian detection model based on the difference between the labeled pedestrian leg ground point and the predicted pedestrian leg ground point, the characteristic distance constraint of the pedestrian leg ground point for the same pedestrian, and the characteristic distance constraint of the pedestrian leg ground point for different pedestrians, includes:
calculating the position return loss of the grounding point according to the difference between the marked grounding point of the pedestrian leg and the predicted grounding point of the pedestrian leg;
calculating a first distance loss of a characteristic distance of the pedestrian leg grounding point of the same pedestrian; the first distance loss aims at reducing the characteristic distance of the grounding points of the legs of two pedestrians of the same pedestrian;
calculating a second distance loss of characteristic distances of the pedestrian leg grounding points of different pedestrians; the second distance loss is aimed at enlarging the characteristic distance of the grounding points of the legs of two pedestrians of different pedestrians;
and obtaining total loss according to the position regression loss of the grounding point, the first distance loss and the second distance loss, and adjusting the pedestrian detection model according to the total loss.
In a second aspect, embodiments of the present application provide a pedestrian detection method. The method comprises the following steps:
acquiring a plurality of panoramic images of the surrounding environment of the vehicle body;
carrying out inverse perspective transformation on the plurality of panoramic images, and splicing to obtain a bird-eye view image;
inputting the aerial view image into a pre-trained pedestrian detection model, and outputting a response characteristic diagram and a representation characteristic diagram through the pedestrian detection model;
determining the grounding point of the leg of the pedestrian according to the response characteristic diagram;
acquiring grounding point characteristic vectors of grounding points of the leg parts of each pedestrian on the characteristic diagram;
and determining the pedestrian in the bird's-eye image and the pedestrian leg grounding point of each pedestrian according to the grounding point characteristic vectors.
In one embodiment, the determining the pedestrian leg grounding points of the pedestrians and each pedestrian in the bird's eye view image according to the grounding point feature vectors comprises:
calculating the distance between the characteristic vectors of the grounding points to obtain a characteristic distance;
and if the characteristic distance is smaller than a distance threshold value, determining that the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to the same pedestrian.
In one embodiment, the method further comprises:
and if the characteristic distance is larger than a distance threshold value, determining that the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to different pedestrians.
In one embodiment, the method further comprises:
mapping the grounding point of the leg of the pedestrian corresponding to each pedestrian to obtain a corresponding pedestrian detection result; the pedestrian detection result includes position information.
In a third aspect, the application further provides a training device for the pedestrian detection model. The device comprises:
the all-round-looking image acquisition module is used for acquiring all-round-looking multiple images of the environment around the vehicle body;
the image splicing module is used for carrying out inverse perspective transformation on the plurality of images in the panoramic view and splicing to obtain a bird-eye view image
The grounding point acquisition module is used for acquiring a grounding point of a leg of a pedestrian marked in the aerial view image;
the first processing module is used for inputting the aerial view image into a pedestrian detection model to be trained for processing, and outputting a response characteristic map and a representation characteristic map of the aerial view image;
the prediction module is used for obtaining predicted pedestrian leg grounding points and pedestrians to which the pedestrian leg grounding points belong according to the response characteristic diagram and the characterization characteristic diagram;
the training module is used for adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, and obtaining a trained pedestrian detection model until the pedestrian detection model converges; the pedestrian detection model is used for pedestrian detection.
In a fourth aspect, an embodiment of the present application provides a pedestrian detection apparatus, including:
the all-round-looking image acquisition module is used for acquiring all-round-looking multiple images of the environment around the vehicle body;
the image splicing module is used for carrying out inverse perspective transformation on the plurality of all-around images and splicing to obtain a bird-eye view image;
the second processing module is used for inputting the aerial view image into a pre-trained pedestrian detection model and outputting a response characteristic map and a representation characteristic map through the pedestrian detection model;
the pedestrian leg grounding point determining module is used for determining the grounding point of the pedestrian leg according to the response characteristic diagram;
the vector acquisition module is used for acquiring grounding point characteristic vectors of grounding points of the leg grounding points of the pedestrians on the characteristic diagram;
and the pedestrian determining module is used for determining the pedestrian in the bird's-eye view image and the pedestrian leg grounding point of each pedestrian according to the grounding point characteristic vectors.
In a fifth aspect, the application further provides a computer device. The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the step of the training method of the pedestrian detection model when executing the computer program; alternatively, the steps of the pedestrian detection method are implemented.
In a sixth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned training method of the pedestrian detection model; alternatively, the steps of the pedestrian detection method are implemented.
In a seventh aspect, the present application further provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above-described method of training a pedestrian detection model; alternatively, the steps of the pedestrian detection method are implemented.
According to the training method and device of the pedestrian detection model and the pedestrian detection method, the pedestrian leg grounding point is marked in the aerial view image obtained by splicing the plurality of panoramic images, so that the marking complexity and cost are reduced, the complexity of subsequent processing is reduced, the aerial view image is input into the pedestrian detection model to be trained to be processed, the response characteristic diagram and the representation characteristic diagram of the aerial view image are output, then the predicted pedestrian leg grounding point and the predicted pedestrian leg grounding point belong to the pedestrian are obtained according to the response characteristic diagram and the representation characteristic diagram, so that the pedestrian detection model is adjusted according to the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the pedestrian detection model for pedestrian detection is obtained, and the accuracy of pedestrian detection is improved.
Drawings
FIG. 1 is a diagram of an application environment of a training method of a pedestrian detection model or a pedestrian detection method in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a method for training a pedestrian detection model in one embodiment;
FIG. 3 is a diagram illustrating a plurality of images viewed around in one embodiment;
FIG. 4 is a schematic diagram of a bird's eye view image in one embodiment;
FIG. 5 is a schematic diagram illustrating a bird's eye view image after marking a grounding point of a leg of a pedestrian according to an embodiment;
FIG. 6 is a schematic illustration of a response signature graph in one embodiment;
FIG. 7 is a diagram illustrating pedestrian detection results in one embodiment;
FIG. 8 is a flowchart illustrating the steps for determining whether the grounding points of two pedestrian legs belong to the same pedestrian according to one embodiment;
FIG. 9 is a flowchart illustrating steps for adjusting a pedestrian detection model in one embodiment;
FIG. 10 is a flow diagram illustrating a pedestrian detection method in accordance with one embodiment;
FIG. 11 is a flowchart illustrating the steps of determining whether the grounding points of two legs of a pedestrian belong to the same pedestrian according to an embodiment;
FIG. 12 is a flow chart illustrating a pedestrian detection method according to another embodiment;
FIG. 13 is a block diagram showing the construction of a training apparatus for a pedestrian detection model in one embodiment;
FIG. 14 is a block diagram showing the construction of a pedestrian detection apparatus in one embodiment;
FIG. 15 is a diagram of an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The training method of the pedestrian detection model and the pedestrian detection method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The intelligent vehicle 102 is a vehicle capable of automatic driving, and the intelligent vehicle 102 includes a plurality of vehicle-mounted devices, such as a vehicle-mounted terminal, an image capturing device, a vehicle-mounted radar, and the like. The smart car 102 communicates with the server 104 over a network. The training method of the pedestrian detection model and the pedestrian detection method may be executed by a vehicle-mounted terminal of the smart car, or the acquired data may be uploaded to the server 104 through a network and then executed by the server 104, or the server 104 may be used to execute the training method of the pedestrian detection model, and the vehicle-mounted terminal executes the pedestrian detection method, which is not particularly limited in this respect. The data storage system may store data that the server 104 needs to process, such as storing the panoramic multiple images and the bird's-eye view images. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. When the training method of the pedestrian detection model is executed, the vehicle-mounted terminal acquires multiple looking-around images of the environment around the vehicle body from the server 104 through the network, processes the images to obtain the bird's-eye view image, acquires the pedestrian leg grounding points marked in the bird's-eye view image, inputs the bird's-eye view image into the pedestrian detection model to be trained for processing, and outputs the response characteristic map and the representation characteristic map of the bird's-eye view image. Obtaining predicted pedestrian leg grounding points and pedestrians to which the pedestrian leg grounding points belong according to the response characteristic diagram and the characterization characteristic diagram, adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding points and the predicted pedestrian leg grounding points, the characteristic distance constraint of the pedestrian leg grounding points of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, and obtaining a trained pedestrian detection model until the pedestrian detection model is converged; the pedestrian detection model is used for pedestrian detection. The server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, a method for training a pedestrian detection model is provided, which is described by taking the method as an example of being applied to the vehicle-mounted terminal in fig. 1, and includes the following steps:
step 202, a plurality of images of the environment around the vehicle body are obtained.
In some embodiments, image acquisition devices such as a pinhole camera or a fisheye camera may be installed on each of the four sides of the vehicle, and then a plurality of images around the vehicle body are obtained through shooting by the image acquisition devices, and the plurality of images around the vehicle body are stored in the vehicle-mounted terminal or the server.
And step 204, performing inverse perspective transformation on the plurality of annular vision images, and splicing to obtain the bird's-eye view image.
The bird's eye view image is an image used for model training. The bird's-eye view image is an image in a bird's-eye view space. The bird's-eye view image is obtained by acquiring a plurality of panoramic images in a vehicle-mounted terminal or a server through a network, carrying out inverse perspective transformation and then splicing. The bird's-eye view image is stored in the in-vehicle terminal or the server.
In some embodiments, the bird's-eye view image may be obtained by stitching a plurality of images for representing the environment around the vehicle body. For example, image acquisition devices such as a pinhole camera or a fisheye camera can be installed on each face around the vehicle, then multiple images around the vehicle body can be obtained through image acquisition device shooting, and then the multiple images obtained through shooting are subjected to inverse perspective transformation and splicing to obtain the bird-eye view image.
In a parking lot close view, 4 images (each having a resolution of 800 × 1280, 800 is high, and 1280 is wide) are acquired by fish-eye cameras installed on the front, rear, left, and right sides of a vehicle, and then are merged by an IPM (Inverse Perspective transformation) algorithm to obtain a bird's-eye view image (the resolution of the bird's-eye view image is 3360 × 2464, 3360 is high, and 2464 is wide). Each pixel in the bird's eye view image represents 0.974 cm in the real space (parking lot), and 3360 represents about 16.36 meters before and after and about 12 meters to the left and right. Referring to fig. 3, front represents an image captured by an image capturing device mounted at a front position of a vehicle body, rear represents an image captured by an image capturing device mounted at a rear position of the vehicle body, left represents an image captured by an image capturing device mounted at a left position of the vehicle body, and right represents an image captured by an image capturing device mounted at a right position of the vehicle body. The bird's-eye view image obtained by stitching is shown in fig. 4, and the frame in fig. 4 indicates the position of the vehicle body.
And step 206, acquiring the grounding point of the leg of the pedestrian marked in the aerial view image.
The pedestrian leg grounding point can be a point where the leg of the pedestrian contacts with the ground where the pedestrian is located.
For example, the pedestrian leg grounding point corresponding to each pedestrian may be artificially marked in the bird's-eye view image, or may be semi-automatically marked in the bird's-eye view image by a pedestrian detection model trained by the vehicle-mounted terminal. And the grounding point of the leg of the pedestrian marked in advance in the aerial view image stored in the server can be acquired through the network. Compared with the method that multiple times of labeling are carried out on multiple original panoramic views, unified labeling is directly carried out on the bird's-eye view images, pedestrian association of labeling is not needed, and labeling cost is greatly reduced. Wherein the pedestrian association is an association indicating that the same pedestrian appears in both images. In addition, the aerial view image is directly marked, so that the model can be directly trained in the aerial view space, and information such as the directions and distances of pedestrians around the self-vehicle can be obtained, which cannot be directly realized by the original image space.
Specifically, the bird's-eye view image after the marking of the grounding point of the leg of the pedestrian can be as shown in fig. 5, and in fig. 5, the grounding point of the leg of the pedestrian is represented by a point, so that the marking of the pedestrian is realized, the complexity of the marking is reduced, and the data amount of the processing is reduced.
And step 208, inputting the bird's-eye view image into a pedestrian detection model to be trained for processing, and outputting a response characteristic map and a representation characteristic map of the bird's-eye view image.
The pedestrian detection model may refer to an algorithm model for pedestrian detection. The neural network model may be selected as the pedestrian detection model to be trained.
The response characteristic map may refer to a characteristic map for responding to all pedestrian leg grounding points in the bird's eye image, the response characteristic map including a plurality of response points, each response point representing a predicted grounding point of the pedestrian detection model for all pedestrian leg grounding points in the bird's eye image. If n pedestrians to be detected are assumed in one bird's eye view image, 2n points of response are needed in the response characteristic map at most. In the response characteristic diagram, the greater the brightness indicates the greater the probability that the pedestrian's grounding point is detected at the corresponding position, and the response characteristic diagram may be as shown in fig. 6.
The characterization feature map may refer to a feature map used to characterize feature vectors corresponding to all response points in the response feature map.
Illustratively, the neural network model can be selected as a pedestrian detection model, the bird's-eye view image is preprocessed firstly, then the preprocessed bird's-eye view image is input into the neural network model for forward operation, and the output of the neural network model is obtained, so that the representation characteristic map and the response characteristic map corresponding to the bird's-eye view image are obtained.
For example, the bird's-eye view image is preprocessed to 480 × 352 resolution to obtain a preprocessed bird's-eye view image, and then the preprocessed bird's-eye view image is sent to a neural network model to perform forward operation, wherein the neural network model processes the preprocessed bird's-eye view image into a response characteristic map of B × 1 × 480 × 352 and a characteristic map of B × 64 × 480 × 352, wherein B is batch (lot).
And step 210, obtaining the predicted pedestrian leg grounding point and the pedestrian to which the pedestrian leg grounding point belongs according to the response characteristic diagram and the characterization characteristic diagram.
Exemplarily, each response point in the response characteristic diagram is used as a predicted pedestrian leg grounding point, then a grounding point characteristic vector corresponding to each predicted pedestrian leg grounding point is obtained in the characteristic diagram, and then a characteristic distance between the grounding points of the legs of the pedestrians is calculated through the grounding point characteristic vector so as to predict the pedestrian to which the grounding point of the leg of the pedestrian belongs.
Step 212, adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, and obtaining a trained pedestrian detection model until the pedestrian detection model is converged; the pedestrian detection model is used for pedestrian detection.
Specifically, the loss value can be calculated through the actually labeled pedestrian leg grounding point and the predicted pedestrian leg grounding point, the loss value of the pedestrian leg grounding point of the same pedestrian and the loss value of the pedestrian leg grounding point of different pedestrians are calculated, and then the pedestrian detection model is adjusted according to all the loss values obtained through calculation, so that training of the pedestrian detection module is achieved.
The trained pedestrian detection model can be used for pedestrian detection to obtain a corresponding pedestrian detection result. As the pedestrian detection result can be as shown in fig. 7, the pedestrian leg grounding points of the same color are used to represent the same pedestrian.
According to the technical scheme, only the pedestrian leg grounding point is marked in the bird's-eye view image, so that the marking complexity and the marking cost are reduced, the complexity of subsequent processing is reduced, the bird's-eye view image is input into the pedestrian detection model to be trained to be processed, the response characteristic diagram and the representation characteristic diagram of the bird's-eye view image are output, then the pedestrian to which the predicted pedestrian leg grounding point and the pedestrian leg grounding point belong is obtained according to the response characteristic diagram and the representation characteristic diagram, the pedestrian detection model is adjusted according to the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the pedestrian detection model for pedestrian detection is obtained, the accuracy of pedestrian detection is improved, and the marking cost and the calculation cost of an algorithm are reduced. Meanwhile, the pedestrian detection result is directly displayed in the bird's-eye view space and contains the position information of the pedestrian such as the direction, the distance and the like of the automobile, so that the automobile can better sense the pedestrian around. An ordinary image space image 2D pedestrian detector cannot obtain information such as directions, distances and the like, and a 3D image space detector is difficult to directly obtain labeling information for training.
In some embodiments, as shown in FIG. 8, in some embodiments, step 210 includes, but is not limited to, the following steps:
and step 802, obtaining the predicted grounding point of the leg of the pedestrian according to the response characteristic diagram.
For example, response values for individual response points in the response profile may be calculated, with response points having response values greater than a response threshold value being predicted pedestrian leg grounding points, and response points having response values less than or equal to the response threshold value not being predicted pedestrian leg grounding points.
Wherein the response threshold is preset.
And step 804, acquiring a grounding point characteristic vector of the grounding point of the leg of the pedestrian on the characteristic diagram.
For example, after all the pedestrian leg grounding points are determined, the grounding point eigenvector corresponding to each pedestrian leg grounding point can be determined in the characterization feature map, for example, the number of corresponding eigen point position eigenvectors on the characterization feature map is taken as the corresponding grounding point eigenvector.
And 806, calculating the characteristic distance of the grounding points of any two pedestrian legs according to the grounding point characteristic vectors of the grounding points of any two pedestrian legs.
Illustratively, the Euclidean distance between any two eigenvectors can be calculated, and the characteristic distance of the grounding point of the leg of the corresponding pedestrian can be calculated.
And 808, determining whether the grounding points of the legs of the two pedestrians belong to the same pedestrian or not according to the characteristic distance of the grounding points of the legs of any two pedestrians.
In particular, a distance threshold may be set that determines that any two pedestrian leg ground points can match if the characteristic distance of the two pedestrian leg ground points is less than the distance threshold, and that the two pedestrian leg ground points belong to the same pedestrian. If the characteristic distance of any two pedestrian leg grounding points is greater than or equal to the distance threshold, it is determined that the two pedestrian leg grounding points do not match, and the two pedestrian leg grounding points do not belong to the same pedestrian. After traversing all the pedestrian leg grounding points in the bird's-eye view image, if the pedestrian leg grounding points which cannot be matched exist, the leg of the pedestrian corresponding to the pedestrian leg grounding points is considered to be shielded, and at the moment, the leg grounding points are represented by one pedestrian leg grounding point.
If, there are 5 pedestrians that wait to detect in the bird's eye view image, pedestrian's shank ground point has 8, and after 3 pedestrians were confirmed in 6 pedestrian shank ground point matches, remaining two pedestrian shank ground points can not match, then think that remaining two pedestrian's shank is sheltered from, and every pedestrian shank ground point represents a pedestrian.
The technical scheme of this application embodiment obtains the characteristic distance between pedestrian's shank ground point through calculating the european style distance between the corresponding ground point eigenvector of pedestrian's shank ground point to judge whether two pedestrian shank ground points belong to same pedestrian, be convenient for follow-up realization is to the training of pedestrian detection model, thereby has improved the accuracy that pedestrian detection model detected the pedestrian.
Referring to fig. 9, in some embodiments, the step of "adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian, and the characteristic distance constraint of the pedestrian leg grounding point of different pedestrians" until the pedestrian detection model converges to obtain the trained pedestrian detection model "includes but is not limited to the following steps:
step 902, calculating a position return loss of the grounding point according to a difference between the marked grounding point of the leg of the pedestrian and the predicted grounding point of the leg of the pedestrian.
Illustratively, the location regression Loss of the grounding point may be calculated from the design location regression Loss of the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point. The position regression Loss can be shown as formula (1), where formula (1) is specifically:
Figure SMS_1
(1)
in equation (1), H =480, w =352, n is the data amount of the pedestrian leg grounding point,
Figure SMS_2
predicted response value (predicted pedestrian leg grounding point) for pixel coordinate (i, j) in bird's eye view image>
Figure SMS_3
For the marking at the corresponding pixel (marked pedestrian leg grounding point), ->
Figure SMS_4
,/>
Figure SMS_5
The parameters can be adjusted according to the actual effect.
And substituting the marked pedestrian leg grounding point corresponding to the aerial view image and the predicted pedestrian leg grounding point into the formula (1) to calculate the position regression loss of the grounding point.
Step 904, calculating a first distance loss of the characteristic distance of the grounding point of the leg of the pedestrian of the same pedestrian; the first distance loss is targeted at reducing the characteristic distance of the ground points of the legs of two pedestrians of the same pedestrian.
Specifically, the first distance loss of the characteristic distance of the leg grounding point of the same pedestrian can be calculated by formula (2), where formula (2) is specifically:
Figure SMS_6
(2)
in the formula (2), N is the number of the grounding points of the leg of the pedestrian,
Figure SMS_7
and &>
Figure SMS_8
Left and right of the two pedestrian leg grounding points representing the same pedestrian, respectively, and->
Figure SMS_9
Is the mean of the two ground point eigenvectors of the same pedestrian.
Figure SMS_10
The method is used for drawing the characteristic distance of the grounding points of the legs of two pedestrians close to the same pedestrian so as to aggregate the corresponding characteristic vectors of the grounding points.
And (3) calculating to obtain a first distance loss through the formula (2) so as to reduce the characteristic distance of the grounding points of the legs of the two pedestrians of the same pedestrian.
Step 906, calculating a second distance loss of the characteristic distance of the grounding points of the legs of the pedestrians of different pedestrians; the goal of the second distance loss is to increase the characteristic distance of the ground points of the legs of two pedestrians for different pedestrians.
Illustratively, the second distance loss may be calculated by the following formula (3), where formula (3) is specifically:
Figure SMS_11
(3)
in the formula (3), N is the number of grounding points of the leg of the pedestrian,
Figure SMS_12
characteristic vector representing grounding point at point j>
Figure SMS_13
Is the mean of the two ground point eigenvectors of the same pedestrian.
Figure SMS_14
The method is used for enlarging the characteristic distance of the grounding points of the legs of two pedestrians of different pedestrians so as to keep away the characteristic distance of the grounding point characteristic vector of the grounding points of the legs of the pedestrians of different pedestrians.
And (4) calculating to obtain a second distance loss through the formula (3), so as to enlarge the characteristic distance of the grounding points of the legs of two pedestrians of different pedestrians.
And 908, obtaining total loss according to the position regression loss of the grounding point, the first distance loss and the second distance loss, and adjusting the pedestrian detection model according to the total loss.
For example, the sum of the position return loss of the grounding point, the first distance loss and the second distance may be used as the total loss, and then the pedestrian detection model may be adjusted by the back propagation algorithm according to the total loss.
Referring to fig. 10, some embodiments of the present application further provide a pedestrian detection method, which can perform pedestrian detection by using the trained pedestrian detection model. Taking the application of the method to the vehicle-mounted terminal in fig. 1 as an example for explanation, the pedestrian detection algorithm includes, but is not limited to, the following steps:
step 1002, a plurality of images of the environment around the vehicle body are obtained.
The vehicle body surroundings may refer to surroundings of the vehicle, such as a parking lot where the vehicle is parked, and the like, among others.
Illustratively, a plurality of images around the vehicle body can be acquired by the image acquisition device. The image capture device may be a fisheye camera, a pinhole camera, a wide-angle camera, or the like. For example, 4 images around the vehicle can be acquired by fish-eye cameras arranged at four directions of the front, the back, the left and the right of the vehicle.
Each image acquired may have a resolution of 800 x 1280, where 800 is high and 1280 is wide.
And step 1004, performing inverse perspective transformation on the plurality of annular view images, and splicing to obtain the aerial view image.
Wherein the inverse perspective transformation may refer to an inverse process of the hypothesis-based camera imaging. The bird's-eye view image may refer to an image from a bird's-eye view perspective in bird's-eye space. The bird's-eye view image is proportional to the real space (the space where the vehicle body is located).
For example, the multiple panoramic images may be subjected to inverse perspective transformation based on the ground as an assumed plane, and then the multiple panoramic images may be stitched together to obtain the corresponding bird's-eye view image.
For example, 4 images with the resolution of 800 × 1280 are respectively subjected to inverse perspective transformation to obtain bird's-eye images with the resolution of 3360 × 2464 by splicing. Each pixel in the bird's-eye view image corresponds to 0.974 cm in real space, and 3360 × 2464 shows that the content displayed in the bird's-eye view image is about 16.36 meters in front and rear, and about 12 meters in left and right, respectively, in real space, centered on the vehicle body.
And step 1006, inputting the bird's-eye view image into a pre-trained pedestrian detection model, and outputting a response characteristic map and a representation characteristic map through the pedestrian detection model.
The pedestrian detection model may refer to a model obtained by training through the aforementioned training method of the pedestrian detection model.
The response characteristic map may refer to a characteristic map for responding to all pedestrian leg grounding points in the bird's eye image, the response characteristic map including a plurality of response points, each response point representing a predicted grounding point of the pedestrian detection model for all pedestrian leg grounding points in the bird's eye image.
The characterization feature map may refer to a feature map used to characterize feature vectors corresponding to all response points in the response feature map.
The bird's-eye view image is input into the pedestrian detection model to be processed, the response characteristic diagram and the characterization characteristic diagram are obtained through processing of the pedestrian detection model, the output of the pedestrian detection model is obtained, and the corresponding response characteristic diagram and the corresponding characterization characteristic diagram are obtained.
And step 1008, determining the grounding point of the leg of the pedestrian according to the response characteristic diagram.
For example, a plurality of response points exist in the response characteristic diagram, a response threshold value can be set, then a response value corresponding to each response point in the response characteristic diagram is calculated, and if the response value is greater than the response threshold value, the response point corresponding to the response value is determined to be the grounding point of the leg of the pedestrian.
And step 1010, acquiring a grounding point characteristic vector of the grounding point of each pedestrian leg on the characteristic diagram.
Illustratively, after the grounding point of the leg of the pedestrian is determined, the position of the grounding point of the leg of the pedestrian is found on the characteristic diagram, and the characteristic channel number of the position is taken as the corresponding grounding point characteristic vector.
And step 1012, determining the pedestrian in the bird's eye view image and the pedestrian leg grounding point of each pedestrian according to the grounding point characteristic vectors.
After the grounding point characteristic vectors corresponding to all the pedestrian leg grounding points are determined, the pedestrians in the bird-eye view image and the pedestrian leg grounding points of all the pedestrians can be determined according to the characteristic distance between the grounding point characteristic vectors, and therefore the pedestrian detection is achieved.
According to the pedestrian detection method, the bird's-eye view image is input into the trained pedestrian detection model, calculated data amount is reduced, complexity of pedestrian detection is further reduced, the response characteristic diagram and the representation characteristic diagram are obtained by obtaining output of the pedestrian detection model, then the pedestrian leg grounding point is determined according to the response characteristic diagram, the grounding point characteristic vector corresponding to the pedestrian leg grounding point is obtained in the representation characteristic diagram, and pedestrians in the bird's-eye view image and the pedestrian leg grounding point of each pedestrian are determined according to the grounding point characteristic vector, and therefore accuracy of pedestrian detection is improved.
In some embodiments, referring to FIG. 11, in some embodiments, step 1012 includes, but is not limited to, the following steps:
step 1102, calculating distances between the characteristic vectors of the grounding points to obtain characteristic distances.
The characteristic distance may refer to a distance between characteristic vectors of grounding points corresponding to the leg grounding points of the pedestrian. The characteristic distance may be obtained by calculating a euclidean distance.
Illustratively, calculating Euclidean distances among characteristic vectors of all the grounding points to obtain the characteristic distances among the grounding points of the leg of the pedestrian.
And 1104, if the characteristic distance is smaller than the distance threshold, determining that the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to the same pedestrian.
Wherein, the distance threshold is a preset threshold.
Illustratively, a characteristic distance from a corresponding pedestrian leg ground point belonging to the same pedestrian is indicative of a capability of matching of the corresponding two pedestrian leg ground points when the characteristic distance is less than a distance threshold.
In some embodiments, the pedestrian detection method further comprises the steps of: and if the characteristic distance is greater than the distance threshold value, determining that the grounding points of the leg parts of the two pedestrians corresponding to the characteristic distance belong to different pedestrians.
Specifically, when the characteristic distance is greater than the distance threshold, it is indicated that the corresponding pedestrian leg grounding points cannot be matched, and it is determined that the two pedestrian leg grounding points corresponding to the characteristic distance belong to different pedestrians.
After traversing all the grounding points of the leg of the pedestrian, if the situation that the grounding points of the leg of the pedestrian cannot be matched occurs, the leg of the pedestrian corresponding to the grounding point of the leg of the pedestrian is considered to be shielded, and the pedestrian is represented by the grounding point of the leg of the pedestrian.
In some embodiments, the pedestrian detection method further comprises, but is not limited to, the steps of: mapping the grounding point of the leg of the pedestrian corresponding to each pedestrian to obtain a corresponding pedestrian detection result; the pedestrian detection result includes position information.
The position information includes information such as the direction and distance of the pedestrian in the vehicle.
Specifically, according to a preset mapping relationship, the grounding point of the leg of the pedestrian is mapped to the surrounding environment of the vehicle body, so as to obtain a corresponding coordinate position, thereby determining the position of the corresponding pedestrian in the bird's-eye space, and obtaining a corresponding pedestrian detection result. The mapping relationship may refer to a transformation relationship between the pedestrian leg grounding point corresponding to the bird's-eye view image and the pedestrian leg grounding point in the real space (the vehicle body surrounding space).
For example, the resolution of the original captured bird's eye view image is 3360 × 2464, the resolution of the image input to the pedestrian detection model after preprocessing is 480 × 352, and the image is reduced by 7 times, so that the resolution of 3360 × 2464 of the all-round view image can be restored by multiplying 7 by the current position of the grounding point of the leg of the pedestrian, and then the actual position of the pedestrian can be calculated based on the dimensional change relationship between the bird's eye view image and the space around the vehicle body (for example, one pixel in the bird's eye view image represents 0.974 cm of the space around the vehicle body), so as to determine the pedestrian detection result.
In some embodiments, the pedestrian detection result may further include a pedestrian orientation, a pedestrian distance (which may be calculated from a relative relationship between a pedestrian position and a vehicle body position), and the like.
In some embodiments, as shown in fig. 12, the present application provides a pedestrian detection method, including but not limited to the following steps:
step 1202, a plurality of images of the environment around the vehicle body are obtained.
And 1204, performing inverse perspective transformation on the plurality of annular vision images, splicing to obtain a bird-eye view image, and acquiring the grounding point of the leg of the pedestrian marked in the bird-eye view image.
And step 1206, inputting the aerial view image into a pedestrian detection model to be trained for processing, and outputting a response characteristic map and a representation characteristic map of the aerial view image.
And 1208, obtaining the predicted grounding point of the leg of the pedestrian according to the response characteristic diagram, and obtaining the grounding point characteristic vector of the grounding point of the leg of the pedestrian on the characteristic diagram.
And step 1210, calculating the characteristic distance of the grounding points of any two pedestrian legs according to the grounding point characteristic vectors of the grounding points of any two pedestrian legs.
At step 1212, it is determined whether any two pedestrian leg grounding points belong to the same pedestrian.
If the characteristic distance of the grounding points of any two pedestrian legs is greater than the distance threshold value, the grounding points of the two pedestrian legs are not considered to belong to the same pedestrian, and if the characteristic distance of the grounding points of any two pedestrian legs is less than or equal to the distance threshold value, the grounding points of the two pedestrian legs are considered to belong to the same pedestrian.
Step 1214, calculating a position return loss of the ground point based on the difference between the marked pedestrian leg ground point and the predicted pedestrian leg ground point.
Step 1216, calculating a first distance loss of a characteristic distance of a grounding point of a leg of the pedestrian of the same pedestrian; the first distance loss is targeted at reducing the characteristic distance of the ground points of the legs of two pedestrians of the same pedestrian.
Step 1218, calculating a second distance loss of the characteristic distance of the grounding point of the leg of the pedestrian of the different pedestrians; the goal of the second distance loss is to increase the characteristic distance of the ground points of the legs of two pedestrians for different pedestrians.
And step 1220, obtaining a total loss according to the position regression loss of the grounding point, the first distance loss and the second distance loss, and adjusting the pedestrian detection model according to the total loss.
And 1222, acquiring a plurality of all-around images of the environment around the vehicle body, performing inverse perspective transformation on the plurality of all-around images, and splicing to obtain the aerial view image.
And step 1224, inputting the aerial view image into a pre-trained pedestrian detection model, and outputting a response characteristic map and a characterization characteristic map through the pedestrian detection model.
And 1226, determining the grounding points of the legs of the pedestrians according to the response characteristic diagram, and acquiring the grounding point characteristic vectors of the grounding points of the legs of the pedestrians on the characteristic diagram.
And 1228, calculating the distance between the characteristic vectors of the grounding points to obtain a characteristic distance, and judging whether the characteristic distance is greater than a distance threshold value.
And obtaining a characteristic distance by calculating the Euclidean distance between the characteristic vectors of the grounding points, if the characteristic distance is greater than a distance threshold value, skipping to execute the step 1230, and if the characteristic distance is less than or equal to the distance threshold value, skipping to execute the step 1232.
And step 1230, the grounding points of the two pedestrian legs corresponding to the characteristic distance do not belong to the same pedestrian.
For two pedestrian leg grounding points which do not belong to the same pedestrian, the characteristic distance between the two pedestrian leg grounding points and the grounding points of other pedestrian legs needs to be calculated in a traversing mode, and then re-judgment is carried out according to the characteristic distance and the distance threshold value. And if all the grounding points of the leg of the pedestrian are traversed, the grounding point of the leg of the pedestrian cannot be matched with the grounding points of the legs of other pedestrians, the leg of the pedestrian corresponding to the grounding point of the leg of the pedestrian is considered to be shielded, and the grounding point of the leg of the single pedestrian is adopted to represent the pedestrian.
And step 1232, the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to the same pedestrian.
Step 1234, mapping the pedestrian leg grounding points corresponding to each pedestrian to obtain corresponding pedestrian detection results, where the pedestrian detection results include position information.
Steps 1202 to 1220 are method steps corresponding to a training phase of the pedestrian detection model, please refer to the embodiments shown in fig. 1 to 9, and steps 1222 to 1234 are method steps corresponding to an application phase of the pedestrian detection model, please refer to the embodiments shown in fig. 10 to 11, which are not repeated herein.
It should be understood that, although the steps in the flowcharts related to the embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the application also provides a training device of the pedestrian detection model, which is used for realizing the training method of the pedestrian detection model. The implementation of the solution provided by the device is similar to the implementation described in the above method.
In one embodiment, as shown in fig. 13, there is provided a training apparatus of a pedestrian detection model, including: an image acquisition module 1302, an image stitching module 1304, a ground point acquisition module 1306, a first processing module 1308, a prediction module 1310, and a training module 1312, wherein:
the image acquiring module 1302 is configured to acquire a plurality of images around the vehicle body.
And the image splicing module 1304 is used for performing inverse perspective transformation on the plurality of annular-view images and splicing to obtain the aerial-view image.
And the grounding point acquisition module 1306 is used for acquiring the grounding point of the leg of the pedestrian marked in the bird-eye view image.
And a first processing module 1308, configured to input the bird's-eye view image into a pedestrian detection model to be trained for processing, and output a response feature map and a characterization feature map of the bird's-eye view image.
The prediction module 1310 is configured to obtain the predicted pedestrian leg grounding point and the pedestrian to which the pedestrian leg grounding point belongs according to the response characteristic map and the characterization characteristic map.
A training module 1312, configured to adjust the pedestrian detection model according to the difference between the labeled pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian, and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, until the pedestrian detection model converges, to obtain a trained pedestrian detection model; the pedestrian detection model is used for pedestrian detection.
In some embodiments, the prediction module 1310 is further configured to derive a predicted pedestrian leg grounding point based on the response profile; acquiring a grounding point characteristic vector of the grounding point of the leg of the pedestrian on a characteristic diagram; calculating the characteristic distance of the grounding points of any two pedestrian legs according to the grounding point characteristic vectors of the grounding points of any two pedestrian leg grounding points; and determining whether the grounding points of the two pedestrian legs belong to the same pedestrian or not according to the characteristic distance of the grounding points of any two pedestrian legs.
In some embodiments, the training module 1312 is further configured to calculate a positional return loss of the grounding point based on a difference between the labeled pedestrian leg grounding point and the predicted pedestrian leg grounding point; calculating a first distance loss of a characteristic distance of a grounding point of a leg of the pedestrian of the same pedestrian; the first distance loss aims at reducing the characteristic distance between the grounding points of the legs of two pedestrians of the same pedestrian; calculating a second distance loss of characteristic distances of the pedestrian leg grounding points of different pedestrians; the second distance loss aims at enlarging the characteristic distance of the grounding points of the legs of two pedestrians of different pedestrians; and obtaining total loss according to the position regression loss of the grounding point, the first distance loss and the second distance loss, and adjusting the pedestrian detection model according to the total loss.
In some embodiments, as shown in fig. 14, embodiments of the present application further provide a pedestrian detection apparatus, including: a look-around image acquisition module 1402, an image stitching module 1404, a second processing module 1406, a pedestrian leg grounding point determination module 1408, a vector acquisition module 1410, a pedestrian determination module 1412. Wherein:
the around-view image acquiring module 1402 is configured to acquire a plurality of around-view images of an environment around a vehicle body.
And the image splicing module 1404 is configured to perform inverse perspective transformation on the multiple panoramic images and splice the multiple panoramic images to obtain a bird's-eye view image.
And the second processing module 1406 is used for inputting the bird's-eye view image into a pre-trained pedestrian detection model, and outputting a response characteristic map and a representation characteristic map through the pedestrian detection model.
A pedestrian leg grounding point determination module 1408 for determining a pedestrian leg grounding point based on the response map.
And a vector acquisition module 1410, configured to acquire a feature vector of a grounding point on the characterization feature map of the grounding point of each pedestrian leg.
And a pedestrian determining module 1412, configured to determine the pedestrian in the bird's-eye view image and the pedestrian leg grounding point of each pedestrian according to the grounding point feature vectors.
In some embodiments, the pedestrian determination module 1412 is further configured to calculate a characteristic distance between the ground point characteristic vectors; and if the characteristic distance is smaller than the distance threshold value, determining that the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to the same pedestrian.
In some embodiments, the pedestrian determination module 1412 is further configured to determine that the two pedestrian leg grounding points corresponding to the characteristic distance belong to different pedestrians if the characteristic distance is greater than the distance threshold.
In some embodiments, the pedestrian detection device further includes a mapping module, configured to perform mapping processing on a grounding point of a leg of a pedestrian corresponding to each pedestrian to obtain a corresponding pedestrian detection result; the pedestrian detection result includes position information.
The training device of the pedestrian detection model or each module in the pedestrian detection device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 15. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a training method of a pedestrian detection model, or a pedestrian detection method. The display unit of the computer device is used for forming a visual picture and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 15 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the training method of the pedestrian detection model or the steps of the pedestrian detection method when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, realizes the above-mentioned steps of the training method of the pedestrian detection model, or the steps of the pedestrian detection method.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the method of training a pedestrian detection model or the steps of the method of pedestrian detection described above
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, and the computer program may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A method of training a pedestrian detection model, the method comprising:
acquiring a plurality of panoramic images of the surrounding environment of the vehicle body;
carrying out inverse perspective transformation on the plurality of panoramic images, and splicing to obtain a bird-eye view image;
acquiring a pedestrian leg grounding point marked in the aerial view image;
inputting the aerial view image into a pedestrian detection model to be trained for processing, and outputting a response characteristic diagram and a representation characteristic diagram of the aerial view image;
obtaining predicted pedestrian leg grounding points and pedestrians to which the pedestrian leg grounding points belong according to the response characteristic diagram and the characterization characteristic diagram;
adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, and obtaining a trained pedestrian detection model until the pedestrian detection model converges; the pedestrian detection model is used for pedestrian detection.
2. The method of claim 1, wherein said deriving the predicted pedestrian leg grounding point and the pedestrian to which the pedestrian leg grounding point belongs from the response signature and the characterization signature comprises:
obtaining a predicted grounding point of the leg of the pedestrian according to the response characteristic diagram;
acquiring a grounding point characteristic vector of the grounding point of the leg of the pedestrian on the characteristic diagram;
calculating the distance between the grounding points of any two pedestrian legs according to the characteristic vectors of the grounding points of any two pedestrian legs;
and determining whether the grounding points of the two pedestrian legs belong to the same pedestrian or not according to the distance between the grounding points of the any two pedestrian legs.
3. The method of claim 2, wherein the adjusting the pedestrian detection model based on the difference in the labeled pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding points for the same pedestrian, and the characteristic distance constraint of the pedestrian leg grounding points for different pedestrians, comprises:
calculating the position return loss of the grounding point according to the difference between the marked grounding point of the pedestrian leg and the predicted grounding point of the pedestrian leg;
calculating a first distance loss of a characteristic distance of the pedestrian leg grounding point of the same pedestrian; the first distance loss aims at reducing the characteristic distance of the grounding points of the legs of two pedestrians of the same pedestrian;
calculating a second distance loss of characteristic distances of the pedestrian leg grounding points of different pedestrians; the second distance loss is aimed at enlarging the characteristic distance of the grounding points of the legs of two pedestrians of different pedestrians;
and obtaining total loss according to the position regression loss of the grounding point, the first distance loss and the second distance loss, and adjusting the pedestrian detection model according to the total loss.
4. A pedestrian detection method, characterized in that the method comprises:
acquiring a plurality of panoramic images of the surrounding environment of the vehicle body;
carrying out inverse perspective transformation on the plurality of panoramic images, and splicing to obtain a bird-eye view image;
inputting the aerial view image into a pre-trained pedestrian detection model, and outputting a response characteristic diagram and a representation characteristic diagram through the pedestrian detection model;
determining the grounding point of the leg of the pedestrian according to the response characteristic diagram;
acquiring grounding point characteristic vectors of grounding points of the leg parts of each pedestrian on the characteristic diagram;
and determining the grounding point of the pedestrian in the bird's-eye view image and the pedestrian leg of each pedestrian according to the grounding point characteristic vectors.
5. The method of claim 4, wherein determining the pedestrian leg grounding point in the bird's eye image and for each pedestrian according to the grounding point eigenvectors comprises:
calculating the distance between the characteristic vectors of the grounding points to obtain a characteristic distance;
and if the characteristic distance is smaller than the distance threshold value, determining that the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to the same pedestrian.
6. The method of claim 5, further comprising:
and if the characteristic distance is greater than the distance threshold value, determining that the grounding points of the legs of the two pedestrians corresponding to the characteristic distance belong to different pedestrians.
7. The method of claim 5 or 6, further comprising:
mapping the grounding point of the leg of the pedestrian corresponding to each pedestrian to obtain a corresponding pedestrian detection result; the pedestrian detection result includes position information.
8. A training apparatus for a pedestrian detection model, the apparatus comprising:
the all-round looking image acquisition module is used for acquiring all-round looking multiple images of the surrounding environment of the vehicle body;
the image splicing module is used for performing inverse perspective transformation on the plurality of annular-view images and splicing to obtain a bird-eye view image;
the grounding point acquisition module is used for acquiring a grounding point of a leg of a pedestrian marked in the aerial view image;
the first processing module is used for inputting the bird-eye view image into a pedestrian detection model to be trained for processing and outputting a response characteristic map and a representation characteristic map of the bird-eye view image;
the prediction module is used for obtaining predicted pedestrian leg grounding points and pedestrians to which the pedestrian leg grounding points belong according to the response characteristic diagram and the characterization characteristic diagram;
the training module is used for adjusting the pedestrian detection model according to the difference between the marked pedestrian leg grounding point and the predicted pedestrian leg grounding point, the characteristic distance constraint of the pedestrian leg grounding point of the same pedestrian and the characteristic distance constraint of the pedestrian leg grounding points of different pedestrians, and obtaining a trained pedestrian detection model until the pedestrian detection model converges; the pedestrian detection model is used for pedestrian detection.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 3, or implements the steps of the method of any of claims 4 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 3, or carries out the steps of the method of any one of claims 4 to 7.
CN202310166416.XA 2023-02-27 2023-02-27 Training method and device for pedestrian detection model and pedestrian detection method Active CN115861316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310166416.XA CN115861316B (en) 2023-02-27 2023-02-27 Training method and device for pedestrian detection model and pedestrian detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310166416.XA CN115861316B (en) 2023-02-27 2023-02-27 Training method and device for pedestrian detection model and pedestrian detection method

Publications (2)

Publication Number Publication Date
CN115861316A true CN115861316A (en) 2023-03-28
CN115861316B CN115861316B (en) 2023-09-29

Family

ID=85658986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310166416.XA Active CN115861316B (en) 2023-02-27 2023-02-27 Training method and device for pedestrian detection model and pedestrian detection method

Country Status (1)

Country Link
CN (1) CN115861316B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555512A (en) * 1993-08-19 1996-09-10 Matsushita Electric Industrial Co., Ltd. Picture processing apparatus for processing infrared pictures obtained with an infrared ray sensor and applied apparatus utilizing the picture processing apparatus
US20140348382A1 (en) * 2013-05-22 2014-11-27 Hitachi, Ltd. People counting device and people trajectory analysis device
CN110956069A (en) * 2019-05-30 2020-04-03 初速度(苏州)科技有限公司 Pedestrian 3D position detection method and device and vehicle-mounted terminal
CN111160379A (en) * 2018-11-07 2020-05-15 北京嘀嘀无限科技发展有限公司 Training method and device of image detection model and target detection method and device
CN112749653A (en) * 2020-12-31 2021-05-04 平安科技(深圳)有限公司 Pedestrian detection method, device, electronic equipment and storage medium
JP2021163401A (en) * 2020-04-03 2021-10-11 戸田建設株式会社 Person detection system, person detection program, leaned model generation program and learned model
CN114359976A (en) * 2022-03-18 2022-04-15 武汉北大高科软件股份有限公司 Intelligent security method and device based on person identification
CN114495056A (en) * 2022-01-14 2022-05-13 广州小鹏自动驾驶科技有限公司 Parking lot pillar detection method, detection device, vehicle and storage medium
CN114734989A (en) * 2022-04-14 2022-07-12 江苏新通达电子科技股份有限公司 Auxiliary parking device and method based on around vision

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555512A (en) * 1993-08-19 1996-09-10 Matsushita Electric Industrial Co., Ltd. Picture processing apparatus for processing infrared pictures obtained with an infrared ray sensor and applied apparatus utilizing the picture processing apparatus
US20140348382A1 (en) * 2013-05-22 2014-11-27 Hitachi, Ltd. People counting device and people trajectory analysis device
CN111160379A (en) * 2018-11-07 2020-05-15 北京嘀嘀无限科技发展有限公司 Training method and device of image detection model and target detection method and device
CN110956069A (en) * 2019-05-30 2020-04-03 初速度(苏州)科技有限公司 Pedestrian 3D position detection method and device and vehicle-mounted terminal
JP2021163401A (en) * 2020-04-03 2021-10-11 戸田建設株式会社 Person detection system, person detection program, leaned model generation program and learned model
CN112749653A (en) * 2020-12-31 2021-05-04 平安科技(深圳)有限公司 Pedestrian detection method, device, electronic equipment and storage medium
CN114495056A (en) * 2022-01-14 2022-05-13 广州小鹏自动驾驶科技有限公司 Parking lot pillar detection method, detection device, vehicle and storage medium
CN114359976A (en) * 2022-03-18 2022-04-15 武汉北大高科软件股份有限公司 Intelligent security method and device based on person identification
CN114734989A (en) * 2022-04-14 2022-07-12 江苏新通达电子科技股份有限公司 Auxiliary parking device and method based on around vision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈昀等: "特征图聚集多尺度行人检测高效算法", 浙江大学学报(工学版), no. 06, pages 205 - 211 *

Also Published As

Publication number Publication date
CN115861316B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN109683699B (en) Method and device for realizing augmented reality based on deep learning and mobile terminal
US10757395B2 (en) Camera parameter set calculation method, recording medium, and camera parameter set calculation apparatus
CN108520536B (en) Disparity map generation method and device and terminal
CN107038723B (en) Method and system for estimating rod-shaped pixels
WO2022165809A1 (en) Method and apparatus for training deep learning model
CN109407547A (en) Multi-camera in-loop simulation test method and system for panoramic visual perception
EP3676796A1 (en) Systems and methods for correcting a high-definition map based on detection of obstructing objects
JP2019096072A (en) Object detection device, object detection method and program
US20180181816A1 (en) Handling Perspective Magnification in Optical Flow Proessing
WO2023016271A1 (en) Attitude determining method, electronic device, and readable storage medium
CN109741241B (en) Fisheye image processing method, device, equipment and storage medium
CN111295667A (en) Image stereo matching method and driving assisting device
CN114913506A (en) 3D target detection method and device based on multi-view fusion
CN114339185A (en) Image colorization for vehicle camera images
CN115147328A (en) Three-dimensional target detection method and device
CN113743163A (en) Traffic target recognition model training method, traffic target positioning method and device
CN116012805B (en) Target perception method, device, computer equipment and storage medium
CN116363185B (en) Geographic registration method, geographic registration device, electronic equipment and readable storage medium
CN117726747A (en) Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
CN116486351A (en) Driving early warning method, device, equipment and storage medium
CN114648639B (en) Target vehicle detection method, system and device
CN114897987B (en) Method, device, equipment and medium for determining vehicle ground projection
CN115861316B (en) Training method and device for pedestrian detection model and pedestrian detection method
CN116012609A (en) Multi-target tracking method, device, electronic equipment and medium for looking around fish eyes
CN116958195A (en) Object tracking integration method and integration device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Floor 25, Block A, Zhongzhou Binhai Commercial Center Phase II, No. 9285, Binhe Boulevard, Shangsha Community, Shatou Street, Futian District, Shenzhen, Guangdong 518000

Applicant after: Shenzhen Youjia Innovation Technology Co.,Ltd.

Address before: 518048 401, building 1, Shenzhen new generation industrial park, No. 136, Zhongkang Road, Meidu community, Meilin street, Futian District, Shenzhen, Guangdong Province

Applicant before: SHENZHEN MINIEYE INNOVATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant