CN118155186A

CN118155186A - Pedestrian crossing intention prediction method, device, system and storage medium

Info

Publication number: CN118155186A
Application number: CN202410330596.5A
Authority: CN
Inventors: 朱达
Original assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date: 2024-03-21
Filing date: 2024-03-21
Publication date: 2024-06-07

Abstract

The invention discloses a pedestrian crossing intention prediction method, a device, a system and a storage medium, wherein the method comprises the following steps: acquiring video data of a driving view angle of a vehicle and own vehicle movement information of the vehicle; extracting pedestrian walking characteristics and/or road space-time characteristics from the video data; extracting the motion characteristics of the own vehicle from the motion information of the own vehicle; and inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian. The method and the system avoid that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction, improve the intention prediction accuracy and robustness of the automatic driving system, and meet the requirements of the automatic driving system on safety and reliability.

Description

Pedestrian crossing intention prediction method, device, system and storage medium

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a pedestrian crossing intention prediction method, device and system and a storage medium.

Background

The existing automatic driving system is used for predicting the crossing intention of the pedestrian, is often only based on the track characteristics of the movement of the pedestrian, has large freedom degree of walking of the pedestrian, has randomness and dynamic property, and flexible walking gesture of the pedestrian, and can increase errors generated by track detection and tracking, so that the single track prediction cannot meet the high-precision requirement of intelligent automobiles on the pedestrian intention prediction; while the interpretability and robustness of existing machine learning models present certain challenges for pedestrian traversal intention, it is difficult to meet the requirements of the automated driving system for safety and reliability.

Disclosure of Invention

The main purpose of the embodiment of the invention is to provide a pedestrian crossing intention prediction method, a device, a system and a storage medium, aiming at improving the accuracy and the robustness of intention prediction of an automatic driving system and meeting the requirements of the automatic driving system on safety and reliability.

To achieve the above object, an embodiment of the present invention provides a pedestrian crossing intention prediction method, including:

Acquiring video data of a driving view angle of a vehicle and own vehicle movement information of the vehicle;

Extracting pedestrian walking characteristics and/or road space-time characteristics from the video data;

Extracting own vehicle motion characteristics from the own vehicle motion information;

And inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian.

Optionally, the pedestrian walking feature includes a pedestrian motion feature and a pedestrian gesture feature, and the step of extracting the pedestrian walking feature and the road space-time feature from the video data includes:

Extracting pedestrian motion characteristics from the video data by a pedestrian detector and a pedestrian tracker;

Extracting pedestrian gesture features from the video data through a pedestrian detector and a skeleton simulation module;

and detecting positions of lane lines, traffic lights and sidewalks from the video data through a road detection module, and extracting therefrom to obtain road space-time characteristics.

Optionally, the pedestrian crossing intention prediction model includes a pedestrian crossing intention classifier, and the step of inputting the pedestrian walking characteristics and/or road space-time characteristics and the self-vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, and obtaining the crossing intention result of the pedestrian includes:

Inputting the pedestrian walking characteristics and/or road space-time characteristics and the self-vehicle movement characteristics into a pre-trained pedestrian crossing intention classifier;

and predicting the input characteristics through the pedestrian crossing intention classifier to obtain a crossing intention result of the pedestrian.

Optionally, the pedestrian crossing intention prediction model further includes a spatiotemporal interaction module, and before the step of inputting the pedestrian walking feature and/or road spatiotemporal feature, and the self-vehicle movement feature into the pre-trained pedestrian crossing intention classifier, the method further includes:

performing feature association fusion processing on the pedestrian walking features and/or road space-time features and the self-vehicle movement features through the space-time interaction module to obtain fusion space-time features;

The step of inputting the pedestrian walking characteristics and/or road space-time characteristics, and the self-vehicle movement characteristics into a pre-trained pedestrian crossing intention classifier comprises the following steps:

the fused spatiotemporal features are input to a pre-trained pedestrian crossing intent classifier.

Optionally, the pedestrian crossing intent classifier includes: the step of predicting the input characteristics through the pedestrian crossing intention classifier to obtain the crossing intention result of the pedestrian comprises the following steps of:

extracting the input features by the feature extractor, and pooling the input features by the pooling layer to obtain feature images subjected to feature extraction and pooling;

And mapping the feature map subjected to feature extraction and pooling to an output layer of a target class through the full connection layer, and classifying to obtain a crossing intention result of the pedestrian.

Optionally, the step of performing feature association fusion processing on the pedestrian walking feature and/or the road space-time feature and the vehicle motion feature through the space-time interaction module to obtain a fused space-time feature includes:

based on the pedestrian motion characteristics and the pedestrian gesture characteristics, fusing hidden characteristics of pedestrian motion and gesture;

based on the pedestrian motion characteristics, the pedestrian posture characteristics and the road space-time characteristics, fusing the pedestrian and road interaction relationship;

Based on the pedestrian movement characteristics, the pedestrian posture characteristics and the vehicle movement characteristics, fusing the interaction relationship between the pedestrian and the vehicle;

Based on the implicit characteristics of the pedestrian movement and the gesture, the interactive relation between the pedestrian and the road and the interactive relation between the pedestrian and the vehicle, the fusion space-time characteristics are obtained.

Optionally, the method further comprises:

Post-processing the traverse intent result by an intent holder created based on a preset rule.

Optionally, the preset rule includes: one or more of pedestrian priority rules, traffic light rules, pedestrian action analysis rules, safe distance rules, traffic flow rules.

Optionally, the step of inputting the pedestrian walking feature and/or the road space-time feature and the self-vehicle motion feature into a pre-trained pedestrian crossing intention prediction model to predict, and obtaining the crossing intention result of the pedestrian further comprises the following steps:

Training a neural network through pre-collected historical data to obtain a trained pedestrian crossing intention prediction model.

The present invention also proposes a pedestrian crossing intention prediction apparatus, the apparatus comprising:

the acquisition module is used for acquiring video data of a vehicle driving visual angle and vehicle motion information of the vehicle;

The extraction module is used for extracting pedestrian walking characteristics and/or road space-time characteristics from the video data and extracting vehicle motion characteristics from the vehicle motion information;

The prediction module is used for inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict so as to obtain a crossing intention result of the pedestrian.

The present invention also proposes a pedestrian crossing intent prediction system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor implements a pedestrian crossing intent prediction method as described above.

The present invention also proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the pedestrian crossing intention prediction method as described above.

The embodiment of the invention provides a pedestrian crossing intention prediction method, a device, a system and a storage medium, wherein video data of a vehicle driving visual angle and self-vehicle movement information of a vehicle are obtained; extracting pedestrian walking characteristics and/or road space-time characteristics from the video data; extracting own vehicle motion characteristics from the own vehicle motion information; and inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian. Because the video data of the driving visual angle of the vehicle and the self-vehicle movement information of the vehicle are considered, and the multi-source information such as the pedestrian walking characteristics, the road space-time characteristics, the self-vehicle movement characteristics and the like are combined, the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics are fused in multiple layers, more robust pedestrian crossing intention prediction is realized, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction is avoided, the intention prediction accuracy and the robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

Drawings

FIG. 1 is a schematic diagram of functional modules of an apparatus to which a pedestrian crossing intention prediction device of the present invention belongs;

FIG. 2 is a flowchart of a first embodiment of a pedestrian crossing intention prediction method according to the present invention;

FIG. 3 is a schematic diagram of a refinement flow for extracting pedestrian walking features and/or road space-time features from the video data in an embodiment of the invention;

FIG. 4 is a flowchart of a fifth embodiment of the pedestrian crossing intention prediction method of the present invention;

FIG. 5 is a schematic overall flow chart of an embodiment of a pedestrian crossing intent prediction method of the present invention;

FIG. 6 is a flowchart of a sixth embodiment of the pedestrian crossing intention prediction method of the present invention;

FIG. 7 is a functional block diagram of an embodiment of a pedestrian crossing intention prediction apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The main solutions of the embodiments of the present invention are: acquiring video data of a driving view angle of a vehicle and own vehicle movement information of the vehicle; extracting pedestrian walking characteristics and/or road space-time characteristics from the video data; extracting own vehicle motion characteristics from the own vehicle motion information; and inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian. Because the video data of the driving visual angle of the vehicle and the self-vehicle movement information of the vehicle are considered, and the multi-source information such as the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics are combined, the multi-layer fusion of the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics is realized, more robust pedestrian crossing intention prediction is realized, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction is avoided, the intention prediction accuracy and the robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

The embodiment of the invention considers that: the existing automatic driving system is used for predicting the crossing intention of the pedestrian, is often only based on the track characteristics of the movement of the pedestrian, has large freedom of walking, has randomness and dynamic property, can provide more accurate indication for the crossing intention of the pedestrian when the direction and the body posture of the higher part of the pedestrian are in eye communication with a driver or the action of turning to the street is carried out, and the single track prediction cannot meet the high-precision requirement of the intelligent automobile for the pedestrian intention prediction; in addition, pedestrians encountered by the intelligent automobile in a traffic scene are simultaneously constrained by traffic rules and environments, for example, the distance between the pedestrians and the vehicles, the zebra stripes of the street crossing areas and the positions of the lane lines can provide important information for the pedestrians to cross, and the existing technology can effectively utilize the information; the walking gesture of the pedestrian is flexible, the face direction and the walking direction of the pedestrian are possibly inconsistent, and errors generated by detecting and tracking the direction and the track of the pedestrian can be increased so as to influence the accuracy of the crossing prediction of the pedestrian; in addition, the existing method rarely considers the interaction of pedestrians and vehicles; moreover, the interpretability and the robustness of the existing machine learning model have certain challenges to the crossing intention of pedestrians, and the requirements of an automatic driving system on safety and reliability are difficult to meet, for example, pedestrians should be preferentially allowed to pass near a traffic light-free zebra crossing, and pedestrians running red lights should be actively avoided.

Therefore, the embodiment of the invention provides a solution, the cost of the intelligent automobile for sensing the movement of the pedestrian from the top view angle is reduced by taking the camera from the driving view angle as an information source, and meanwhile, the information such as the gesture, the direction and the like of the pedestrian is effectively sensed; on the basis of the historical track information of the pedestrians, multisource information such as the gestures of the pedestrians, traffic environments, the motion states of the vehicles, actual traffic rules and the like is added, the characteristics of the motions and the gestures of the pedestrians, the road traffic characteristics and the motion characteristics of the vehicles are integrated in multiple layers, and more robust prediction of the crossing intention of the pedestrians is realized; after the end-to-end pedestrian crossing intention prediction, the post-processing module based on driving preference and traffic rules can keep crossing intention of red light running pedestrians, road crossing pedestrians and other special conditions, so that the safety of the prediction system for intelligent automobiles is ensured, and the intention prediction accuracy and the robustness of an automatic driving system are improved.

Specifically, referring to fig. 1, fig. 1 is a schematic diagram of functional modules of an apparatus to which the pedestrian crossing intention prediction device of the present invention belongs. The pedestrian crossing intention prediction means may be a device-independent means capable of data processing, which may be carried on the device in the form of hardware or software. The device can be an intelligent mobile terminal with a data processing function such as a mobile phone and a tablet personal computer, and can also be a fixed device (such as a vehicle-mounted device) or a server with a data processing function.

In this embodiment, the apparatus to which the pedestrian crossing intention prediction device belongs includes at least an output module 110, a processor 120, a memory 130, and a communication module 140.

The memory 130 stores therein an operating system and a pedestrian crossing intention prediction program; the output module 110 may be a display screen or the like. The communication module 140 may include a WIFI module, a bluetooth module, and the like, and communicate with an external device or a server through the communication module 140.

Wherein the pedestrian crossing intent prediction program in memory 130 when executed by the processor performs the steps of:

Further, the pedestrian crossing intent prediction program in the memory 130 when executed by the processor also implements the steps of:

extracting the input features by a feature extractor, and pooling the input features by a pooling layer to obtain feature images subjected to feature extraction and pooling;

And mapping the feature images subjected to feature extraction and pooling to an output layer of the target category through a full connection layer, and classifying to obtain the crossing intention result of the pedestrian.

According to the scheme, the video data of the driving view angle of the vehicle and the self-vehicle movement information of the vehicle are obtained; extracting pedestrian walking characteristics and/or road space-time characteristics from the video data; extracting own vehicle motion characteristics from the own vehicle motion information; and inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian. Because the video data of the driving visual angle of the vehicle and the self-vehicle movement information of the vehicle are considered, and the multi-source information such as the pedestrian walking characteristics, the road space-time characteristics, the self-vehicle movement characteristics and the like are combined, the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics are fused in multiple layers, more robust pedestrian crossing intention prediction is realized, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction is avoided, the intention prediction accuracy and the robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

Based on the above device architecture, but not limited to the above architecture, the method embodiments of the present invention are presented.

The main body of execution of the method of the present embodiment may be a pedestrian crossing intention prediction device, which may be a device-independent, data processing-capable device that may be carried on a device in the form of hardware or software. The device can be a navigation device or a vehicle-mounted device, the navigation device can be a vehicle-mounted navigation device or a mobile terminal with a navigation function, the vehicle-mounted navigation device arranged on a vehicle is taken as an example, the prediction of the pedestrian crossing intention is realized through the vehicle-mounted navigation device, the accuracy and the robustness of the intention prediction of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the pedestrian crossing intention prediction method according to the present invention. The pedestrian crossing intention prediction method comprises the following steps:

step S101, video data of a driving view angle of a vehicle and own vehicle movement information of the vehicle are obtained;

the scheme of the embodiment is suitable for intelligent vehicles, and is particularly suitable for automatic driving scenes of the intelligent vehicles.

The pedestrian crossing intention may refer to an intention of whether a pedestrian crosses a road or crosses a street.

The automatic driving system has high precision requirement for the prediction of the pedestrian crossing intention, and in order to improve the accuracy and the robustness of the intention prediction of the automatic driving system and meet the requirements of the automatic driving system for safety and reliability, the embodiment firstly acquires the video data of the vehicle driving visual angle and the vehicle movement information of the vehicle.

By considering the video data of the driving visual angle of the vehicle and the self-vehicle movement information of the vehicle and combining the pedestrian walking characteristics and/or road space-time characteristics, the self-vehicle movement characteristics and other multi-source information, the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics are fused in multiple layers, so that more robust pedestrian crossing intention prediction is realized, and meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction is avoided.

In one embodiment, the video data of the driving view angle of the vehicle and the self-vehicle movement information of the vehicle can be acquired through a camera of the driving view angle on the intelligent vehicle. Through the camera of intelligent vehicle driving visual angle as the information source, can reduce the cost that intelligent vehicle to the perception pedestrian motion of top view visual angle, effectively perception pedestrian's information such as gesture, orientation simultaneously.

Step S102, pedestrian walking characteristics and/or road space-time characteristics are extracted from the video data;

Wherein, the video data of the vehicle driving view angle includes: and the information such as the motion track, the gesture, the orientation and the like of the pedestrians during walking can be extracted from the video data, so that the crossing intention of the pedestrians can be judged by combining the pedestrians walking characteristics and the road space-time characteristics.

Step S103, extracting the self-vehicle motion characteristics from the self-vehicle motion information;

The own-vehicle movement information of the vehicle may include information such as a running track, a speed, and a road position of the vehicle. And extracting the self-vehicle motion characteristics from the self-vehicle motion information, and then judging the crossing intention of the pedestrian by combining the self-vehicle motion characteristics, the pedestrian walking characteristics and the road space-time characteristics.

Therefore, on the basis of the motion trail information of the pedestrians, the gestures, the traffic environment and the motion states of the self-vehicles are increased, and further, the multi-source information such as actual traffic rules and the like can be combined, the motion and gesture characteristics of the pedestrians, the road traffic characteristics and the motion characteristics of the self-vehicles are fused in multiple layers, so that more robust prediction of the crossing intention of the pedestrians is realized.

And step S104, the pedestrian walking characteristics and/or the road space-time characteristics and the self-vehicle movement characteristics are input into a pre-trained pedestrian crossing intention prediction model to be predicted, and a crossing intention result of the pedestrian is obtained.

In this embodiment, the neural network is trained in advance based on collected historical data to obtain a trained pedestrian crossing intention prediction model, and the input features can be predicted through the pedestrian crossing intention prediction model, so as to obtain a crossing intention result of the pedestrian.

Wherein the pedestrian crossing intent prediction model may be constructed based on a neural network.

As one embodiment, the pedestrian crossing intent prediction model may employ a pedestrian crossing intent classifier that is trained based on a neural network in advance.

The architecture of the pedestrian crossing intention classifier can adopt a Convolutional Neural Network (CNN) as a feature extractor, wherein the feature extractor is connected with a plurality of pooling layers and full-connection layers, and finally a classification result of the pedestrian crossing intention is output.

As another implementation, the pedestrian crossing intent prediction model may employ an architecture of a pedestrian crossing intent classifier in combination with a spatio-temporal interaction module.

The pedestrian crossing intention classifier and the space-time interaction module form a large neural network together, and a pedestrian crossing intention prediction model is obtained by training the neural network.

The space-time interaction module is mainly used for carrying out operations such as data integration, space-time feature extraction, space-time association analysis, feature fusion and the like on the input pedestrian walking features and/or road space-time features and the vehicle motion features to obtain fused space-time features, uniformly inputting the fused space-time features into a pedestrian crossing intention classifier, fusing hidden features of pedestrian motion and gesture, pedestrian and road interaction relationship and pedestrian and vehicle interaction relationship, carrying out reasoning, and finally classifying crossing intentions of pedestrians.

Referring to fig. 3, fig. 3 is a schematic diagram of a refinement flow of the present invention for extracting pedestrian walking characteristics and/or road space-time characteristics from the video data.

As shown in fig. 3, the present embodiment refines the above step S102 based on the embodiment shown in fig. 2.

In this embodiment, the pedestrian walking feature includes a pedestrian motion feature and a pedestrian gesture feature, and the extracting the pedestrian walking feature and/or the road space-time feature from the video data in step S102 may include:

Step S1021, extracting pedestrian motion characteristics from the video data through a pedestrian detector and a pedestrian tracker;

step S1022, extracting pedestrian gesture features from the video data through a pedestrian detector and a skeleton simulation module;

Step S1023, detecting positions of the lane lines, the traffic lights and the sidewalks from the video data through a road detection module, and extracting therefrom to obtain road space-time characteristics.

In the scheme of the embodiment, whether a pedestrian exists in the video data can be detected through a corresponding pedestrian detection algorithm, the tracking of the pedestrian track in the video stream can be realized through a pedestrian tracking algorithm, so that the motion characteristics of the pedestrian can be obtained, the detection of the pedestrian gesture in the video stream can be realized through a pedestrian skeleton detection algorithm, and the gesture characteristics of the pedestrian can be obtained.

In addition, the detection of the space-time characteristics of the roads such as the lane lines, the traffic lights and the like can be realized through corresponding road inspection algorithms (such as lane lines, traffic light detection algorithms and the like).

Specifically, as one embodiment, referring to fig. 6, the pedestrian crossing intent prediction system architecture may include a pedestrian detector, a pedestrian tracker, a skeleton simulation module, and a road detection module, wherein:

The pedestrian detector may detect whether there is a pedestrian from the video data;

The pedestrian tracker can track and detect the pedestrian track in the video stream, so that the motion characteristics of pedestrians can be obtained.

The skeleton simulation module can detect the gesture of the pedestrian in the video stream to obtain the gesture characteristics of the pedestrian.

The road detection module can detect the road space-time characteristics such as lane lines, traffic lights and the like in the video stream to obtain the road space-time characteristics.

It should be noted that, in this embodiment, detection of pedestrians in video data may be implemented using different pedestrian detection algorithms. The pedestrian detection algorithm used in the invention can be replaced or improved according to actual requirements, for example YOLOv is used as a pedestrian detector; the method is used for an intelligent automobile driving visual angle pedestrian crossing intention prediction system, and the accuracy and the instantaneity of a pedestrian detection algorithm are important. In addition to pedestrian detection algorithms such as YOLOv mentioned in this invention, other advanced deep learning algorithms such as Faster R-CNN, SSD, retinaNet and the like are contemplated. The algorithm performs feature extraction and target detection on image data input by the camera through the convolutional neural network structure, and can realize efficient and accurate pedestrian detection under different scenes.

Specifically, in the present system, the Faster R-CNN algorithm may be employed as the pedestrian detector. The algorithm extracts candidate frames through a regional suggestion network (Region Proposal Network), classifies each candidate frame by utilizing a convolutional neural network and carries out bounding box regression, so that accurate positioning and identification of pedestrians are realized. In addition, in order to better adapt to the requirements of pedestrian crossing intention prediction, an attention mechanism or a multi-scale information fusion technology can be combined to improve the robustness and accuracy of pedestrian detection.

In addition, in the scheme of the embodiment, different pedestrian tracking algorithms can be used for tracking the pedestrian track in the video stream, so that the motion characteristics of pedestrians can be obtained.

For example, in the invention, a Kalman filtering-based method is used for estimating the target motion state, and a Hungary matching algorithm is used for matching the position; the choice of pedestrian tracking algorithm is critical to real-time and robustness. In addition to the kalman filter based method and hungarian matching algorithm mentioned in the present invention, other advanced tracking algorithms, such as SORT (Simple Online AND REALTIME TRACKING), deep SORT, etc., may be considered.

Specifically, in the present system, the method based on kalman filtering is used for estimating the motion state of the target, which is a common and effective tracking technology. The Kalman filtering can estimate the position and the speed of the target by fusing sensor data and a system model, and continuously update the state of the target according to observation information, thereby realizing the tracking of the target. The method can solve the problems of uncertainty and noise interference in the target movement process to a certain extent, and is suitable for pedestrian tracking tasks in complex traffic environments.

Meanwhile, in order to better perform position matching and improve tracking accuracy, the system also adopts a Hungary matching algorithm. The hungarian matching algorithm determines the best match between the target and the detection frame by minimizing the overall matching cost, thereby achieving accurate positioning and tracking of the target. The algorithm can effectively solve the problem of multi-target tracking, has higher matching precision and operation efficiency, and is suitable for scenes needing real-time target tracking.

Therefore, by combining a Kalman filtering-based method and a Hungary matching algorithm, the pedestrian motion state estimation method can realize accurate estimation and continuous tracking of the pedestrian motion state, and reliable data support is provided for subsequent pedestrian crossing intention prediction. The combination can enable the system to be more stable and more efficient in tracking and predicting pedestrians, and improve the driving safety and automation level of the intelligent automobile.

In addition, for the pedestrian skeleton detection algorithm, the embodiment outputs the gesture feature of the pedestrian through the pedestrian skeleton detection algorithm (or the pedestrian skeleton detection module) in order to more comprehensively understand the actions and intentions of the pedestrian in the traffic scene, and the intention classification system is used for understanding and predicting the behaviors of the pedestrian. The pedestrian skeleton detection is generally based on a deep learning technology, and is mainly implemented by using architectures such as a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN). Typical skeleton detection algorithms include OpenPose, alphaPose and the like, which can accurately extract skeleton key point coordinates of a pedestrian from input image data and describe gesture information of the pedestrian.

The detection principle is as follows:

first, it is necessary to prepare image or video data containing pedestrians and mark the position of each pedestrian's key point as a training sample. The existing skeleton detection algorithm is adopted as a basic model, and network parameters are optimized through a back propagation algorithm in a training stage, so that the network can accurately predict key points of the skeleton of the pedestrian. After model training is completed, the pedestrian image to be detected is input into a trained model, and coordinates of key points of a pedestrian skeleton are extracted to serve as gesture features. After skeleton key points of the pedestrians are obtained, posture representations such as angles, distances and the like of the pedestrians can be constructed according to the relative positions and the connection relation of the key points. The high-dimensional characteristics of the key points output by the pooling layer and the full-connection layer can be used for identifying the actions, the postures, the walking directions and the like of pedestrians. As an important feature of the street crossing prediction system, more abundant semantic information can be provided.

For the road inspection algorithm, different lane line and traffic light detection algorithms can be adopted in the embodiment. For example, a semantic segmentation algorithm based on deep learning, hough transform, convolutional neural network, and the like are used.

Specifically, in the system, the detection of lane lines and traffic lights by adopting a semantic segmentation algorithm based on deep learning is an effective method. The algorithm realizes pixel level segmentation of different types of objects (such as lane lines, traffic lights and the like) in the image by training the neural network, so that the target area is accurately extracted and the type of the target area is identified. By further processing and analyzing the extracted information, the system can monitor the states of the lane lines and the traffic lights in real time, and provide accurate navigation and traffic signal information for the intelligent automobile.

In addition, the use of Hough transform algorithms to detect lane lines and traffic lights is also contemplated. The Hough transformation is a common image processing technology, can be used for detecting geometric shapes such as straight lines or circles, and is suitable for detecting and identifying linear or circular targets such as lane lines and traffic lights. By setting proper parameters and thresholds, the system can detect the positions of the lane lines and the traffic lights according to the characteristic points in the images, and monitor and identify the states of the lane lines and the traffic lights.

By combining the semantic segmentation algorithm based on deep learning and the Hough transformation algorithm, the method can realize efficient detection and identification of the lane lines and the traffic lights, and provides reliable support for automatic driving and traffic planning of intelligent automobiles. Such a combination would help to improve the perceptibility and decision making capabilities of the system, thereby enhancing the driving safety and reliability of intelligent automobiles in complex traffic environments.

According to the scheme, the video data of the driving view angle of the vehicle and the self-vehicle movement information of the vehicle are obtained; extracting pedestrian walking characteristics and/or road space-time characteristics from the video data through a corresponding checking algorithm; extracting own vehicle motion characteristics from the own vehicle motion information; and inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian. By means of a corresponding checking algorithm, pedestrian motion characteristics, pedestrian posture characteristics, road space-time characteristics such as positions of lane lines, traffic lights and sidewalks and the like are extracted from video data, so that the system can track and predict pedestrians more stably and efficiently, and driving safety and reliability of the intelligent automobile in a complex traffic environment are enhanced. According to the embodiment, the video data of the driving visual angle of the vehicle and the self-vehicle movement information of the vehicle are considered, and the multi-source information such as the pedestrian walking characteristics, the road space-time characteristics and the self-vehicle movement characteristics are combined, so that more robust pedestrian crossing intention prediction is realized by combining the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics in multiple layers, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction is avoided, the intention prediction accuracy and the robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

Further, a third embodiment of the pedestrian crossing intention prediction method of the present invention is presented, in which a scheme of predicting the crossing intention of a pedestrian based on a human crossing intention classifier is explained in detail.

As one embodiment, the pedestrian crossing intention prediction model includes a pedestrian crossing intention classifier, and the step of inputting the pedestrian walking characteristics and/or road space-time characteristics and the self-vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, and obtaining the crossing intention result of the pedestrian includes:

Wherein, as an embodiment, the pedestrian crossing intention classifier includes: the step of predicting the input characteristics through the pedestrian crossing intention classifier to obtain the crossing intention result of the pedestrian comprises the following steps of:

Specifically, the architecture of the pedestrian crossing intention classifier in this embodiment may be selected according to the actual situation.

In this embodiment, the pedestrian crossing intention classifier may include: the system comprises a feature extractor, a pooling layer and a fully connected layer, wherein a pedestrian crossing intention classifier framework can adopt a Convolutional Neural Network (CNN) as the feature extractor, and the feature extractor is connected with a plurality of pooling layers and fully connected layers.

For the pooling layer (Pooling Layer): pooling layers are typically used to reduce the spatial dimensions of feature maps in convolutional neural networks in order to reduce computational complexity and increase the robustness of the model. Typical pooling operations include maximum pooling (Max Pooling) and average pooling (Average Pooling), where features are extracted by taking the maximum or average value over a region. The pooling layer can effectively preserve important features in the image and help reduce the risk of overfitting.

For the fully connected layer (Fully Connected Layer): the full-connection layer is a basic structure in the deep learning network, and connects all neurons of the previous layer with all neurons of the next layer, so that global transmission and integration of information are realized. In pedestrian crossing intent classifiers, a fully connected layer is typically used to map feature extracted and pooled feature graphs to the output layer of the target class to complete the final classification task. The fully connected layer can perform nonlinear transformation on the features and learn higher-level abstract feature representations, so that more accurate classification and prediction are realized.

In the training stage, the loss function is calculated through forward propagation, and then the network parameters are updated through a backward propagation algorithm, so that the classifier can gradually adjust the weight and the bias, and the network structure is optimized to improve the classification accuracy. The design of the pooling layer and the full connection layer is integrated, the intention classifier can effectively classify various input high-latitude features, and finally, the result of whether pedestrians cross or not is given.

Therefore, through the pedestrian crossing intention classifier with the framework, pedestrian walking characteristics, road traffic characteristics and self-vehicle movement characteristics are combined in multiple layers, more robust pedestrian crossing intention prediction is realized, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobiles on pedestrian intention prediction is avoided, the intention prediction accuracy and robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

Further, a fourth embodiment of the pedestrian crossing intention prediction method is provided, and in this embodiment, a scheme of predicting the crossing intention of the pedestrian based on the human crossing intention classifier and combined with the space-time interaction module is described in detail.

As one embodiment, the pedestrian crossing intention prediction model includes a pedestrian crossing intention classifier and a space-time interaction module, and before the step of inputting the pedestrian walking feature and/or road space-time feature, and the self-vehicle movement feature into the pre-trained pedestrian crossing intention classifier, the method further includes:

And carrying out feature association fusion processing on the pedestrian walking features and/or the road space-time features and the self-vehicle movement features through the space-time interaction module to obtain fusion space-time features.

Specifically, as an implementation manner, the step of performing feature association fusion processing on the pedestrian walking feature and/or the road space-time feature and the self-vehicle movement feature through the space-time interaction module to obtain a fused space-time feature includes:

The fused spatiotemporal features may then be input into a pre-trained pedestrian crossing intent classifier.

That is, a pedestrian space-time interaction module can be added before the classifier is intended, so that the space-time characteristics of multiple interaction information of pedestrians, roads and other vehicles can be further analyzed.

The space-time interaction module is mainly used for carrying out operations such as data integration, space-time feature extraction, space-time association analysis, space-time model establishment, space-time interaction feature fusion and the like on the input pedestrian walking features and/or road space-time features and the vehicle motion features to obtain fused space-time features, uniformly inputting the fused space-time features into a pedestrian crossing intention classifier, fusing hidden features of pedestrian motion and gestures, a pedestrian and road interaction relationship, a pedestrian and vehicle interaction relationship, carrying out reasoning, and finally classifying crossing intentions of pedestrians.

Specifically, the data integration is to combine the characteristics of pedestrian detection, the characteristics of pedestrian track tracking, the characteristics of lane lines and the characteristics of pedestrian gestures to form comprehensive input data.

The space-time feature extraction is to extract space-time features of pedestrians, riders and road environments, including information on actions, postures, speeds and the like, by using the 2D posture estimation result and the PIE data set.

The space-time correlation analysis is to analyze the space-time correlation between different objects, including the interaction relation of pedestrians or cyclists with vehicles and road lines, and the evolution rule of the behavior mode and the traffic intention.

The space-time model is built based on the extracted space-time characteristics and the associated analysis result, and the space-time model is built to describe complex interaction relations among pedestrians, cyclists and surrounding environments, so that the behavior intention of the pedestrians and the cyclists can be well understood.

The space-time interaction feature fusion is to fuse the space-time features from the 2D gesture estimation and PIE data set to form a more comprehensive and rich feature representation, and provide more diversified input information for the subsequent intention classifier.

Finally, integrating the space-time characteristics extracted from the pedestrian space-time interaction module into the intention classifier, and improving the accuracy of recognition and prediction of the system on the pedestrian and cyclist's intention. Through the above process, by combining 2D attitude estimation and models in the large-scale data set PIE, the intelligent automobile system can deeply and comprehensively analyze the behavior intention of pedestrians and cyclists, and provide more accurate and intelligent environment perception and decision support for an automatic driving system.

Therefore, the space-time characteristics are fused through the space-time interaction module, the characteristics are uniformly input into the pedestrian crossing intention classifier, the implicit characteristics of the pedestrian movement and the gesture are fused, the interaction relationship between the pedestrian and the road is fused, the interaction relationship between the pedestrian and the vehicle is inferred, and finally the crossing intention of the pedestrian is classified, so that the intention prediction accuracy and the robustness of the automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met.

Referring to fig. 4, fig. 4 is a flowchart illustrating a fifth embodiment of the pedestrian crossing intention prediction method according to the present invention.

As shown in fig. 4, the present embodiment is based on the embodiment shown in fig. 2 or fig. 3, and the method further includes:

Step S105, post-processing the traversing intention result by an intention holder created based on a preset rule.

The preset rules may be set based on driving style and traffic rules, for example, the preset rules may include: one or more of pedestrian priority rules, traffic light rules, pedestrian action analysis rules, safe distance rules, traffic flow rules.

In this embodiment, in order to compensate for the unexplainability of the deep learning model and to supplement the robustness of the end-to-end intention prediction, an intention holder based on driving style and traffic rules is set after the intention classifier, and the output result of the intention model is post-processed.

For example, the traffic light is changed from red to green, and pedestrians are on the sidewalk and the pedestrians are going to go to the bottom under the special conditions of red light running and the like. In addition, more conservative or aggressive intention predictions can be selected according to the driving style of the intelligent automobile driver and the interaction of the pedestrian and the automobile, so that the driving style is influenced.

Specifically, the intention holder based on the driving style and the traffic rules may formulate a corresponding rule according to the driving style or the actual demand of the intelligent vehicle.

For example, some relevant rules based on driving style and traffic rules may be formulated as a post-processing module of the intent holder to stabilize the deep learning model output results. Some specific example rules are set forth below:

Pedestrian preference rules: if the deep learning model cannot accurately determine the intention of pedestrians to cross the road, but pedestrians are around and there are potential crossing signs (e.g., pedestrians standing at the crosswalk edges), the keeper rules should guide the intelligent vehicle owner to slow down or stop, giving priority to pedestrians.

Traffic light rules: when the traffic light displays a red light, the retainer rules should require the intelligent vehicle to wait for a stop, adhering to the traffic rules, even if the deep learning model predicts that the pedestrian is likely to continue traversing the road.

Pedestrian action analysis rules: by monitoring the body posture and motion of the pedestrian, if the pedestrian is found to be accelerating or looking into the direction of the vehicle, the retainer rule may consider the pedestrian to be intentionally traversing the road, prompting the intelligent vehicle to pay attention to the pedestrian's behavior.

Safety distance rule: when a pedestrian crosses a road, the retainer rule can set a proper safety distance according to the speed of the vehicle and the position of the pedestrian, so that the intelligent vehicle can stop or avoid in time when the pedestrian crosses.

Traffic flow rules: when the vehicles on the road are dense or the traffic condition is complex, the retainer rule should encourage the intelligent vehicle to wait for the pedestrian to completely traverse the road and then drive so as to ensure the driving safety.

State retention rules: the pedestrian's intention to traverse remains at least 1s, and breaking the intention also requires at least 1s of continuous non-traversing intention.

Through the formulation and implementation of the rules, the stability and the safety of the pedestrian crossing intention classification system can be effectively improved, and the intelligent vehicle is ensured to accord with traffic rules and respect the pedestrian interests when recognizing and understanding the pedestrian crossing intention, so that the safety of pedestrians and drivers is better ensured.

Therefore, after the end-to-end pedestrian crossing intention prediction, the intention holder based on driving preference and traffic rules carries out post-processing, so that the crossing intention can be kept for special situations such as red light running pedestrians, road crossing pedestrians and the like, and the safety of the prediction system for intelligent automobiles is ensured. The intention holder increases the robustness and interpretability of pedestrian crossing intentions, and is more valuable for practical commercial, specific use fields and application scenarios of the invention.

The overall flow of an embodiment of the pedestrian crossing intention prediction method of the present invention can be shown with reference to fig. 5.

By considering vehicle driving visual angle video data and vehicle self-vehicle movement information, combining multi-source information such as pedestrian walking characteristics, road space-time characteristics, self-vehicle movement characteristics and the like, and fusing the pedestrian walking characteristics, road traffic characteristics and self-vehicle movement characteristics in multiple layers, more robust pedestrian crossing intention prediction is realized, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobiles on pedestrian intention prediction is avoided, the intention prediction accuracy and robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met; in addition, in order to make up for the unexplainability of the deep learning model and supplement the robustness of end-to-end intention prediction, an intention holder based on driving style and traffic rules is arranged behind the intention classifier, the output result of the intention model is subjected to post-processing, and special conditions such as red light running pedestrians, road crossing pedestrians and the like can be subjected to transverse intention holding by carrying out post-processing on the intention holder based on driving preference and traffic rules, so that the safety of the prediction system for intelligent automobiles is ensured, and the robustness and the interpretability of the pedestrian transverse intention are increased.

Referring to fig. 6, fig. 6 is a flowchart illustrating a sixth embodiment of the pedestrian crossing intention prediction method of the present invention.

As shown in fig. 6, this embodiment is based on the embodiment shown in fig. 2, 3 or 4, and the method further includes, before the step S104 of inputting the pedestrian walking characteristic and/or the road space-time characteristic and the self-vehicle movement characteristic into a pre-trained pedestrian crossing intention prediction model to predict, a step of obtaining a crossing intention result of the pedestrian:

Step S100, training a neural network through pre-collected historical data to obtain a trained pedestrian crossing intention prediction model.

Wherein the pedestrian crossing intent prediction model may employ the architecture of the spatial interaction module and the intent classifier. The space interaction module and the intention classifier form a large neural network together, and the pedestrian crossing intention prediction model is obtained by training the neural network.

The architecture of the pedestrian crossing intention classifier can also adopt a Convolutional Neural Network (CNN) as a feature extractor, wherein the feature extractor is connected with a plurality of pooling layers and full-connection layers, and finally a classification result of the pedestrian crossing intention is output. In the training stage, the loss function is calculated through forward propagation, and then the network parameters are updated through a backward propagation algorithm, so that the classifier can gradually adjust the weight and the bias, and the network structure is optimized to improve the classification accuracy. The design of the pooling layer and the full connection layer is integrated, the intention classifier can effectively classify various input high-latitude features, and finally, the result of whether pedestrians cross or not is given.

The basic principle of constructing the pedestrian crossing intention prediction model based on the neural network training in this embodiment can refer to the training principle of various neural network models, and will not be described here again.

It should be noted that, the pedestrian crossing intention prediction model in the embodiment of the present invention may be used alone as a pedestrian crossing intention prediction model in an automatic driving system, or may be used as a complement to other pedestrian track prediction systems, and the crossing intention is used to generate a predicted track, which is not described herein.

As shown in fig. 7, an embodiment of the present invention proposes a pedestrian crossing intention prediction apparatus including:

The present embodiment realizes the principle and implementation process of pedestrian crossing intention prediction, please refer to the above embodiments, and the description thereof is omitted herein.

In addition, the embodiment of the invention also provides a pedestrian crossing intention prediction system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the pedestrian crossing intention prediction method.

Since all the technical solutions of all the embodiments are adopted when the program for predicting the pedestrian crossing intention is executed by the processor, at least all the beneficial effects brought by all the technical solutions of all the embodiments are provided, and will not be described in detail herein.

In addition, the embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the pedestrian crossing intention prediction method when being executed by a processor.

Compared with the prior art, the pedestrian crossing intention prediction method, device, system and storage medium provided by the embodiment of the invention are used for acquiring video data of a vehicle driving visual angle and own vehicle movement information of a vehicle; extracting pedestrian walking characteristics and/or road space-time characteristics from the video data; extracting own vehicle motion characteristics from the own vehicle motion information; and inputting the pedestrian walking characteristics and/or the road space-time characteristics and the vehicle movement characteristics into a pre-trained pedestrian crossing intention prediction model to predict, so as to obtain a crossing intention result of the pedestrian. Because the video data of the driving visual angle of the vehicle and the self-vehicle movement information of the vehicle are considered, and the multi-source information such as the pedestrian walking characteristics, the road space-time characteristics, the self-vehicle movement characteristics and the like are combined, the pedestrian walking characteristics, the road traffic characteristics and the self-vehicle movement characteristics are fused in multiple layers, more robust pedestrian crossing intention prediction is realized, meanwhile, the problem that a single track prediction cannot meet the high-precision requirement of intelligent automobile on pedestrian intention prediction is avoided, the intention prediction accuracy and the robustness of an automatic driving system are improved, and the requirements of the automatic driving system on safety and reliability are met; in addition, in order to make up for the unexplainability of the deep learning model and supplement the robustness of end-to-end intention prediction, an intention holder based on driving style and traffic rules is arranged behind the intention classifier, the output result of the intention model is subjected to post-processing, and special conditions such as red light running pedestrians, road crossing pedestrians and the like can be subjected to transverse intention holding by carrying out post-processing on the intention holder based on driving preference and traffic rules, so that the safety of the prediction system for intelligent automobiles is ensured, and the robustness and the interpretability of the pedestrian transverse intention are increased.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or method that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a controlled terminal, or a network device, etc.) to perform the method of each embodiment of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A pedestrian crossing intent prediction method, characterized in that the method comprises:

2. The method of claim 1, wherein the pedestrian walking features include pedestrian motion features and pedestrian pose features, and wherein the step of extracting pedestrian walking features and road spatiotemporal features from the video data comprises:

3. The method of claim 2, wherein the pedestrian crossing intent prediction model comprises a pedestrian crossing intent classifier, and the step of inputting the pedestrian walking characteristics and/or road spatiotemporal characteristics, and the self-vehicle movement characteristics into a pre-trained pedestrian crossing intent prediction model for prediction, obtaining crossing intent results of pedestrians comprises:

4. The method of claim 3, wherein the pedestrian crossing intent prediction model further comprises a spatiotemporal interaction module, the step of inputting the pedestrian walking feature and/or road spatiotemporal feature, and the self-vehicle movement feature into a pre-trained pedestrian crossing intent classifier further comprising:

5. The method of claim 3 or 4, wherein the pedestrian crossing intent classifier comprises: the step of predicting the input characteristics through the pedestrian crossing intention classifier to obtain the crossing intention result of the pedestrian comprises the following steps of:

6. The method of claim 4, wherein the step of performing feature-related fusion processing on the pedestrian walking feature and/or the road spatiotemporal feature and the vehicle movement feature by the spatiotemporal interaction module to obtain a fused spatiotemporal feature comprises:

7. The method according to claim 1, wherein the method further comprises:

8. The method of claim 7, wherein the preset rule comprises: one or more of pedestrian priority rules, traffic light rules, pedestrian action analysis rules, safe distance rules, traffic flow rules.

9. The method of claim 1, wherein the step of inputting the pedestrian walking characteristics and/or road spatiotemporal characteristics, and the self-vehicle movement characteristics into a pre-trained pedestrian crossing intent prediction model for prediction, further comprises, prior to the step of obtaining a crossing intent result of the pedestrian:

10. A pedestrian crossing intention prediction apparatus, characterized in that the apparatus comprises:

11. A pedestrian crossing intent prediction system, characterized in that the system comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, implements the pedestrian crossing intent prediction method as claimed in any one of claims 1-9.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the pedestrian crossing intention prediction method according to any one of claims 1 to 9.