CN114818707A

CN114818707A - Automatic driving decision method and system based on knowledge graph

Info

Publication number: CN114818707A
Application number: CN202210201601.3A
Authority: CN
Inventors: 杨世春; 冯鑫杰; 曹耀光; 陈昱伊; 张梦月; 彭朝霞; 高灵飞; 马源
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-07-29

Abstract

An automatic driving decision system based on a knowledge graph comprises a knowledge graph library and a decision reinforcement learning module. The knowledge map library maps the driving knowledge on the Internet into a triple form, and the driving knowledge map is used for knowledge expression and reasoning, so that not only can massive knowledge be classified and managed, but also the time spent in the traditional rule-case matching process can be reduced, and the real-time performance of knowledge retrieval is improved. The knowledge map library acquires a driving scene, provides expert experience for the decision-making reinforcement learning module through a driving scene experience sample stored in the knowledge map library, further outputs a driving strategy with high confidence level to the decision-making module, guides the decision-making module to adapt to a complex and changeable traffic environment, ensures the safety of a vehicle, realizes the interpretability of automatic driving decision information through a knowledge map of a knowledge map, increases the confidence level of an automatic driving decision-making system, and improves the confidence level of passengers on automatic driving vehicles.

Description

Automatic driving decision method and system based on knowledge graph

Technical Field

The invention relates to the technical field of unmanned driving, in particular to an automatic driving decision method and an automatic driving decision system based on a knowledge graph.

Background

As knowledge expansion grows, the use of knowledge-graphs is becoming more common. The knowledge graph can use a proper knowledge representation method to mine the connection between data through a relation layer, so that knowledge can be more easily circulated and cooperatively processed between computers. The knowledge graph represents the relationship between the entities by the data structure of the graph, and compared with the simple text information, the representation method of the graph is easier to understand and accept. The knowledge graph is formed by connecting edges of relationships of entity-relationship-entity or entity-attribute value, and the relationship between entities or the relationship between the entities and the attributes is represented.

The relation search based on the knowledge graph is widely applied to entity matching and question-answering systems in various industries, can process and store known knowledge, and can quickly perform knowledge matching and answer search. Meanwhile, with the maturity of the internet technology, knowledge can be acquired not only from literature data and structured data, but also from some unstructured data, the knowledge sources are widely increased, and the capacity of the knowledge map can be effectively increased. For example, the relationship between a patient and a drug can be extracted from a massive unstructured medical text, or the risk value of financial activity can be obtained from financial knowledge. But such applications are sensitive to the source and volume of knowledge. Many researchers have attempted to obtain relatively rich knowledge from networks using internet technology, or to accurately identify relationships between entities using techniques such as deep learning.

In the field of automatic driving, knowledge acquired by the knowledge map can effectively improve the knowledge capacity of a decision-making system, so that the automatic driving system has the decision-making power of an old driver, can make a decision quickly in the face of a complex traffic environment, and enhances the safety of an automatic driving vehicle. At present, the research on the aspect is not many, and part of the research only uses a simple rule base to provide knowledge guarantee for an automatic driving decision-making system, and some researches adopt large-scale data marked manually and use technologies such as reinforcement learning and the like to train the automatic driving decision-making system. However, the knowledge acquisition does not only contain simple expert knowledge or obtain unexplained data by training with labeled data, and more importantly, the knowledge acquisition not only needs to contain wide knowledge capacity, but also has knowledge interpretability to express the causal relationship among the knowledge. Therefore, the automatic driving system has a complete and interpretable driving brain, and the automatic driving vehicle can make a quick and accurate decision strategy in the face of a complex and variable traffic environment.

The invention discloses an automatic driving decision method, a device, a system, a storage medium and a terminal, wherein the Chinese patent number is CN110288847A, and the invention is entitled' an automatic driving decision method, a device, a system, a storage medium and a terminal, road condition information is sent to a processing module of a corresponding driving function, and the processing module generates corresponding control information; the control information is sent to decision modules of different levels to carry out decision step by step, and decomposition or synthesis is carried out until the final control information of the vehicle is obtained.

The invention discloses an intelligent vehicle decision lane changing method based on an intelligent traffic system, which is disclosed as CN106940933A, and relates to an intelligent vehicle decision lane changing method based on the intelligent traffic system. Similarly, the invention only outputs the lane change decision through simple rules, does not consider the mixed situation of a plurality of states, and is only suitable for simple scenes.

Disclosure of Invention

The invention provides an automatic driving decision-making system based on a knowledge graph and an operation method thereof, which can increase the accuracy of the automatic driving decision-making system in processing a complex environment, ensure the safety of a vehicle, realize the interpretability of automatic driving decision-making information through the knowledge graph of the knowledge graph, increase the credibility of the automatic driving decision-making system and improve the credibility of passengers for the automatic driving vehicle.

The technical scheme of the invention is as follows:

an automatic driving decision system based on knowledge graph is characterized by comprising a knowledge graph library and a decision reinforcement learning module; the knowledge map library acquires a driving scene, provides expert experience for the decision-making reinforcement learning module through a driving scene experience sample stored in the knowledge map library, and further outputs a driving strategy with high confidence level to the decision-making module;

the knowledge map library comprises a data acquisition module, an entity identification module, a relation extraction module and a map storage module;

the data acquisition module comprises a data crawling submodule and a data cleaning submodule, the data crawling submodule crawls massive traffic original data from the internet, and the data cleaning submodule removes repeated data and null data in the traffic original data through data cleaning to obtain effective data;

the entity identification module stably identifies the effective data through an entity identification deep learning model to obtain each entity name;

the relation extraction module marks and selects the characteristics of the identified entity names through a relation extraction model to obtain entity relations among the entities corresponding to the entity names;

the map storage module stores the knowledge map comprising the name and the entity relationship of each entity through a Neo4j map database, and performs real-time storage and query work on the knowledge map by using an attribute map model;

the decision reinforcement learning module comprises a driving scene complexity calculating sub-module and an optimal driving behavior strategy calculating sub-module; the driving scene complexity calculating and dividing module calculates the complexity of the driving scene based on a knowledge spectrum library; and the optimal driving behavior strategy calculation sub-module introduces the complexity of the driving scene into a decision-making reinforcement learning model to screen out an optimal driving behavior strategy.

Preferably, the driving scene complexity calculation and classification module performs comprehensive safety evaluation on the category of the entity relationship in the driving scene to obtain a scene complexity comprehensive value E; the scene complexity comprehensive value E is used as a triple 'value' element of an automatic driving decision system to participate in the construction of a knowledge map; the safety comprehensive evaluation means that the category with larger influence factors on the safety of the main vehicle in the entity relationship is classified into a high-level safety level and is endowed with a grading value X of safety evaluation according to a safety level i _i (ii) a The scene complexity comprehensive value E ═ alpha ₁ X ₁ +α ₂ X ₂ +…+α _i X _i Wherein α is _i Obtaining environment elements in a driving scene through a perception layer for scene complexity variables, inquiring a mapping safety evaluation value of a knowledge map library, and if a certain safety level has alpha _i Is 1, otherwise alpha _i Is 0.

Preferably, the entity identification module comprises an entity identification deep learning model building sub-module; the entity identification deep learning model building submodule comprises an entity small sample labeling data set, an entity data preprocessing unit and an entity deep learning unit; the entity small sample labeling data set is an entity data set which is labeled manually; the entity data preprocessing unit recombines continuous word sequences in the effective data into continuous word sequences according to the entity data set, and removes words irrelevant to recognition by using a stop dictionary to obtain preprocessed entity data; the entity deep learning unit adopts a bidirectional long-short term memory network and a conditional random model to realize effective identification of driving scene element entities in the preprocessed entity data.

Preferably, the entity recognition deep learning model comprises a tested and trained BilSTM-CRF model.

Preferably, the relation extraction module comprises a relation extraction model building submodule, and the relation extraction model building submodule comprises a relation small sample labeling data set, a data relation preprocessing unit and a relation deep learning unit; the small relational sample labeling data set is an entity relational data set labeled manually; the data relation preprocessing unit forms entity pairs by the identified entity names according to the entity relation data set, and stores the positions of the entity pairs in the effective data, the types of the entities, the parts of speech and the modifiers around the entity pairs to obtain the labeled entity relation data set; the relation extraction model associates the identified entity name with the corresponding entity relation according to the labeled entity relation data set to obtain the relation characteristic; the relationship features comprise entity features, entity category features, context features, part-of-speech features, position features and modifier features.

Preferably, the relationship extraction model comprises a tested and trained BilSTM-Attention model.

Preferably, the decision-making reinforcement learning model adopts a Q learning algorithm to obtain an optimal value function, and a value function V designed by the Q learning algorithm ^π (s _t ) The expression is as follows:

value function V ^π (s _t ) And Q function Q _t (s _t ，a _t ) The relationship of (1) is:

wherein action a is performed at time t _t Obtaining a reward R from the environment; the current environmental state is s _t The state of the environment after the action is finished is s _t+1 The return signal obtained by this action is R(s) _t ，a _t ) (ii) a The policy π (a | s) is a function that determines the next action a of the agent, V, based on the environment state s ^π (s _t ) Is Q _t (s _t ，a _t ) A desire for action a;

the Q learning algorithm obtains an optimal strategy pi by iterating a Q value function and accumulating an enhanced signal according to discount of the optimized action sequence during execution; namely, the method comprises the following steps:

Q _t+1 (s _t ，a _t )＝R(s _t ，a _t )+γmaxQ(s _t+1 ，a _t )+εE

the initial value of the Q function can be selected at will, after each action is completed and the return is obtained, the Q function is updated, wherein alpha is a learning factor:

Q _t (s _t ，a _t )＝(1-α)Q _t-1 (s _t ，a _t )+α[R(s _t ，a _t )+γmaxQ _t-1 (s _t+1 ，a _t+1 )+εE]。

preferably, the categories of the entity relationships include intervention triggering, behavior limitation, warning reminding, influence existence and mutual noninterference, the corresponding security levels are 1-5 levels respectively, and the corresponding security evaluation values are 2, 4, 6, 8 and 10 respectively.

Preferably, the data crawling and splitting module acquires data by using a Scapy distributed crawler framework.

The operation method of the automatic driving decision-making system based on the knowledge graph comprises the following steps:

step 1, crawling data about traffic rules and driving accidents on the Internet by using a Scapy distributed crawler structure;

step 2, word segmentation processing is carried out on the crawling data through a jieba library, spaces, commas and useless symbols are removed, and the relation between the entity type and the entity of the small sample scene is marked;

initializing parameters of an entity recognition model and a relation extraction model, inputting a small entity sample labeling data set, training an entity recognition deep learning model, inputting a recognized entity into the relation extraction model after the entity recognition model passes a model test, training the relation extraction model according to the small relation sample labeling data set, and outputting the relation between the entities through the test set test;

step 4, storing the identified entity categories and the relationships among the categories in a Neo4j gallery in the form of triples;

step 5, training and updating a decision-making reinforcement learning model according to the collected empirical data and the complexity of the driving scene;

and 6, judging whether the decision learning model learns a reliable decision, outputting a high-confidence action if the decision learning model achieves the reliable decision capability, inputting a low-confidence action into the knowledge map to store and increase the knowledge quantity, and repeating the step 5 if the decision learning model does not achieve the reliable decision capability.

Compared with the prior art, the invention has the advantages that:

1. according to the automatic driving decision system based on the knowledge graph and the operation method thereof, driving knowledge on the Internet is mapped into a triple form, and the driving knowledge graph is used for knowledge expression and inference, so that not only can massive knowledge be classified and managed, but also the time spent in the traditional rule-case matching process can be reduced, and the real-time performance of knowledge retrieval is improved. The introduction of the knowledge unit fuses the knowledge based on symbols and connection, and can solve the problem of combined explosion.

2. According to the automatic driving decision system based on the knowledge graph and the operation method thereof, the knowledge graph carries out knowledge transfer to the decision reinforcement learning model in a teaching learning mode, the transfer process of knowledge has definite theoretical expression, the knowledge exists in a triple mode, so that the decision system has the rapid learning capacity same as that of human beings, and the calculation real-time performance cannot be reduced along with the increase of the state space. The source knowledge of the knowledge map comes from abundant rules and driving scenes on the Internet, the practicability of reinforcement learning is higher, the black box problem of reinforcement learning is transparentized, the driving trend under different driving scenes is clearly displayed, and the riding confidence is enhanced.

Drawings

FIG. 1 is a schematic workflow diagram of a knowledge-graph based automated driving decision system according to the present invention;

FIG. 2 is a flow chart of the construction of a knowledge graph library for a knowledge graph-based automated driving decision system of the present invention;

FIG. 3 is a decision-making reinforcement learning module workflow diagram of the knowledge-graph based automated driving decision system of the present invention;

FIG. 4 is a flow chart of the operation of the data acquisition module of the knowledge-graph based automated driving decision system of the present invention.

Detailed Description

In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples.

As shown in FIG. 1, the knowledge-graph-based automatic driving decision system of the present invention comprises: a knowledge map library and a decision reinforcement learning module. The knowledge map library acquires a driving scene, provides expert experience for the decision-making reinforcement learning module through a driving scene experience sample stored in the knowledge map library, further outputs a driving strategy with high confidence level to the decision-making module, and guides the decision-making module to adapt to a complex and variable traffic environment.

The knowledge map library comprises a data acquisition module, an entity identification module, a relation extraction module and a map storage module.

The work flow chart of the data acquisition module is shown in fig. 4, and the data acquisition module comprises a data crawling submodule and a data cleaning submodule. The data crawling sub-module crawls massive traffic original data from the Internet, knowledge of the traffic original data mainly comes from hundred-degree websites including road traffic regulations, a traffic examination question bank and driving experiences, and data sources are provided for subsequent abundant driving resources. The traffic original data also comprises pictures, videos and the like of driving scenes acquired in real time by vehicles. The data crawling submodule acquires data by using a Scapy distributed crawler frame, the Scapy distributed frame firstly takes out a link from a scheduler queue for subsequent crawling, then a Scapy engine encapsulates the link and crawling parameters into a request to be transmitted to a downloader, and the downloader downloads resources according to the obtained link and encapsulates the request into a response packet to be analyzed by the crawler. If the entity is analyzed, the entity is handed to the entity pipeline for further processing. If the analyzed link is a link, the link needs to be handed over to the scheduler to wait for grabbing again.

After mass data are acquired, the data cleaning sub-module removes repeated data and empty data in the original traffic data through data cleaning to obtain effective data. Specifically, the data cleaning and segmenting module divides long sentences into continuous word sequences by using a Chinese dictionary in a Jieba library, for example, the word sequence of 'the vehicle should stop when meeting red light on a straight road' can be divided into 'the vehicle should stop when meeting red light on a straight road', and stop words such as symbols, tone and names after segmentation are removed by using a stop dictionary to obtain effective data.

And the entity identification module stably identifies the effective data through an entity identification deep learning model to obtain the name of each entity. Specifically, the entity identification module further comprises an entity identification deep learning model building sub-module; the entity recognition deep learning model building submodule comprises an entity small sample labeling data set, an entity data preprocessing unit and an entity deep learning unit.

The entity small sample labeling data set is an entity data set which is labeled manually; the entity data preprocessing unit recombines continuous word sequences in the effective data into continuous word sequences according to the entity data set, and removes words irrelevant to recognition by using a stop dictionary to obtain preprocessed entity data; the entity deep learning unit adopts a bidirectional long-short term memory network and a conditional random model to realize effective identification of driving scene element entities in the preprocessed entity data.

The entity small sample labeling data set mainly comprises 500 entity small sample data sets, wherein the types to be labeled mainly comprise drivers, motor vehicles, pedestrians, intersections, traffic lights, traffic signs, non-motor vehicles, weather, road surfaces, motor vehicle behaviors, non-motor vehicle behaviors and the like. And dividing the 500 labeled entity small sample data sets into a training set and a testing set, wherein the entity deep learning unit uses a BilSTM-CRF model to train and test entities.

The entity deep learning unit takes an artificially labeled entity small sample labeling data set as a standard corpus and an entity obtained through a BilSTM-CRF model test as a corpus to be corrected, and when the BilSTM-CRF model is not optimized, some errors exist in the output corpus to be corrected, and at the moment, the correction must be carried out through the standard corpus. When the entity and the entity category in the corpus to be corrected are equal to those in the standard corpus, the entity identified by the BiLSTM-CRF model test is completely correct and does not need to be corrected; when the entity of the corpus to be corrected is equal to the entity of the standard corpus but the entity category of the corpus to be corrected is not equal to the entity category of the standard corpus, it is indicated that the entity identified by the BiLSTM-CRF model test is inconsistent with the standard corpus, and manual assistance is needed to correct the corpus to be corrected at the moment; when the entity of the corpus to be corrected is not equal to the entity of the standard corpus, the result shows that the BiLSTM-CRF model test entity makes mistakes or has unknown words, and in this case, manual review and proofreading are also needed. Adding the corrected corpus to be corrected into the small sample labeled dataset, continuing the training and testing process of the model to obtain the corpus to be corrected, correcting the corpus again by the method, then adding the small sample labeled dataset, repeating the process, performing repeated iteration, and continuously optimizing the BiLSTM-CRF model to ensure that the testing results of entities and entity categories are more and more accurate, and finally generating complete and high-quality entity labeled corpus. When the model identifies the accuracy. And when the recall rate and the accuracy are maintained at a higher level, the BiLSTM-CRF model can correctly identify the marked entities and can correctly classify, and the BiLSTM-CRF obtained at the moment is the finally established entity identification deep learning model.

And the relation extraction module marks and selects the characteristics of the identified entity names through a relation extraction model to obtain the entity relation between the entities corresponding to the entity names.

The relation extraction module comprises a relation extraction model building submodule which comprises a relation small sample labeling data set, a data relation preprocessing unit and a relation deep learning unit; the small relational sample labeling data set is an entity relational data set labeled manually; the data relation preprocessing unit forms entity pairs by the identified entity names according to the entity relation data set, stores the positions of the entity pairs in the effective data, the types of the entities, and the parts of speech and the modifiers around the entity pairs, and tags the entity relations by using an artificial tagging form to obtain a tagged entity relation data set; and the relationship extraction model associates the identified entity name with the corresponding entity relationship according to the labeled entity relationship data set to obtain relationship characteristics, and expresses each entity pair and the relationship thereof in the form of a characteristic vector as the input of a subsequent BilSTM-orientation model. The relationship categories are divided into: alert prompts, behavioral restrictions, intervention triggers, complementary interventions, and interactions. The relationship features comprise entity features, entity category features, context features, part-of-speech features, position features and modifier features.

And (3) training the BilSTM-Attention model by using the labeled entity relationship data set, and judging whether a relationship exists between a pair of entity pairs identified from the text during testing. The positive examples are directly extracted from the relation classes included in the labeled corpus, and the negative examples are relation classes generated according to the relation between unlabeled entities; and labeling the identified relationship with a category label. And when the relation between the entities can be stably identified, storing the knowledge graph, otherwise, continuously carrying out debugging and optimization on the model.

The use of the knowledge map library and the work flow chart are shown in fig. 2, and effective data are obtained after the obtained driving scene is subjected to word segmentation and denoising. And then, the entity after stable identification is identified through a relation small sample labeling data set and a BiLSTM-Attention model (a relation extraction model) after testing and training to obtain the entity relation after stable identification. And storing the knowledge graph of each entity name and entity relationship after stable identification by using the graph storage module through a Neo4j graph database, and performing real-time storage and query work of the knowledge graph by using an attribute graph model. Specifically, the identified knowledge graph is stored in Neo4j, and the relationship data between the entities extracted above is stored and integrated. Using a LOADCSV statement to import data in batches, converting the data into a CSV format, reading the data through a LOADCSV in Neo4j, loading the data into an import directory of an installation directory, wherein Neo4j stores the data through a relation connecting edge, the relation connecting edge between nodes is established before visualization of a knowledge graph, and the data meeting query conditions, namely a main query statement and a MATCH statement, are returned by the fastest path during data query: for matching data of a database, obtaining data meeting query conditions, WHERE statement: and (3) matching with MATCH statements for use, setting query conditions, and returning statements: and specifying which contents the query needs to return, and searching nodes, relations and attributes.

In the decision-making reinforcement learning module, the complexity of a driving scene is introduced into a reinforcement learning model, the reinforcement model executes actions at the time t, a reward R is obtained from the environment, and the algorithm completes one-time state-action value updating and is stored in a Q value table until a termination state is reached. Then the intelligent agent is reset to the initial state, the Q value table is learned and updated for many times, and finally convergence is achieved. The value of the value function stored in the Q value table is the experience of the intelligent agent obtained from the environment, the value in the table is continuously modified along with the training process, the final Q value tends to be convergent, and the intelligent agent selects the action with the highest value in the final Q value table to obtain an optimal decision.

Specifically, the decision reinforcement learning module comprises a driving scene complexity calculating sub-module and an optimal driving behavior strategy calculating sub-module; the driving scene complexity calculating and dividing module calculates the complexity of the driving scene based on a knowledge spectrum library; and the optimal driving behavior strategy calculation sub-module introduces the complexity of the driving scene into a decision-making reinforcement learning model to screen out an optimal driving behavior strategy.

The driving scene complexity calculation and classification module carries out safety comprehensive judgment on the category of the entity relationship in the driving scene to obtain a scene complexity comprehensive value E; the scene complexity comprehensive value E is used as a triple 'value' element of an automatic driving decision system to participate in the construction of a knowledge map; the safety comprehensive evaluation means that the category with larger influence factors on the safety of the main vehicle in the entity relationship is classified into a high-level safety level and is endowed with a grading value X of safety evaluation according to a safety level i _i (ii) a The scene complexity comprehensive value E ═ alpha ₁ X ₁ +α ₂ X ₂ +…+α _i X _i Wherein α is _i Obtaining environment elements in a driving scene through a perception layer for scene complexity variables, inquiring a mapping safety evaluation value of a knowledge map library, and if a certain safety level has alpha _i Is 1, otherwise alpha _i Is 0. The relationship categories include the following five types: intervention triggering, behavior limitation, warning reminding, influence existence and mutual noninterference, wherein the corresponding safety levels are 1-5 levels respectively, and the corresponding safety evaluation values are 2, 4, 6, 8 and 10 respectively.

The decision-making reinforcement learning model adopts a Q learning algorithm with strong comprehensiveness and practicability to obtain an optimal value function of the decision-making reinforcement learning model, and the specific process is shown in FIG. 3. The Q learning algorithm is a table value learning algorithm, a state-action Q value table is established in the interaction process of an agent and the environment, and the obtained return can influence the Q value. The Q is defined as the sum of the returns that will be achieved after the associated action is performed according to a policy. The optimal Q value may be denoted as Q, and is defined as the sum of the returns that will be obtained after the action is performed and executed according to the optimal policy. By continuously exploring the state space, the Q value will gradually approach Q. And the Q value is gradually increased by generating a positive return value in the correct behavior, the Q value corresponding to the wrong behavior is reduced under the action of the negative return value, and finally, the optimal action is screened out from the action selection strategy, so that the intelligent agent obtains the optimal action strategy.

The Q learning algorithm starts from a starting state and performs the following process: performing action a at time t _t The reward R is obtained from the context, and the algorithm completes one state-action value update and stores in the Q value table until the end state is reached. Then the intelligent agent is reset to the initial state, the Q value table is learned and updated for many times, and finally convergence is achieved. The value of the value function stored in the Q value table is the experience of the intelligent agent obtained from the environment, the value in the table is continuously modified along with the training process, the final Q value tends to be convergent, and the intelligent agent selects the action with the highest value in the final Q value table to obtain an optimal decision pi ^* 。

Specifically, assume that the current environmental state is s _t The action taken by the agent, the state of the environment after the action is finished and the return signal obtained by the action are respectively a _t ，s _t+1 And R(s) _t ，a _t ). The value function expression designed by the Q learning algorithm is:

the Q learning algorithm obtains an environment model not by predicting the state, and obtains an optimal strategy pi by iterating a Q value function and accumulating and strengthening signals according to discount of an optimized action sequence during execution; let all possible action sets for an agent be:

Q _t+1 (s _t ，a _t )＝R(s _t ，a _t )+γmaxQ(s _t+1 ，a _t )+εE

the initial value of the Q function can be selected at will, after each action is completed and a return is obtained, the Q function is updated, wherein alpha is a learning factor:

step 4, storing the identified entity categories and the relationships among the categories in a Neo4j gallery in a form of triples;

It should be noted that the above-described embodiments may enable those skilled in the art to more fully understand the present invention, but do not limit the present invention in any way. Therefore, although the present invention has been described in detail with reference to the drawings and examples, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention.

Claims

1. An automatic driving decision system based on knowledge graph is characterized by comprising a knowledge graph library and a decision reinforcement learning module; the knowledge map library acquires a driving scene, provides expert experience for the decision-making reinforcement learning module through a driving scene experience sample stored in the knowledge map library, and further outputs a driving strategy with high confidence level to the decision-making module;

the decision reinforcement learning module comprises a driving scene complexity calculating sub-module and an optimal driving behavior strategy calculating sub-module; the driving scene complexity calculating and scoring module calculates the complexity of the driving scene based on a knowledge graph library; and the optimal driving behavior strategy calculation sub-module introduces the complexity of the driving scene into a decision-making reinforcement learning model to screen out an optimal driving behavior strategy.

2. The knowledge-graph-based automatic driving decision system according to claim 1, wherein the driving scene complexity calculation and classification module performs safety comprehensive evaluation on the category of the entity relationship in the driving scene to obtain a scene complexity comprehensive value E; the scene complexity comprehensive value E is used as a triple value element of the knowledge graph to participate in the construction of an automatic driving decision system; the safety comprehensive evaluation means that the category with larger influence factors on the safety of the main vehicle in the entity relationship is classified into a high-level safety level and is endowed with a grading value X of safety evaluation according to a safety level i _i (ii) a The scene complexity comprehensive value E ═ alpha ₁ X ₁ +α ₂ X ₂ +…+α _i X _i Wherein α is _i Obtaining environment elements in a driving scene through a perception layer for scene complexity variables, inquiring a mapping safety evaluation value of a knowledge map library, and if a certain safety level has alpha _i Is 1, otherwise alpha _i Is 0.

3. The knowledge-graph-based automated driving decision system of claim 1, wherein the entity recognition module comprises an entity recognition deep learning model building sub-module; the entity identification deep learning model building submodule comprises an entity small sample labeling data set, an entity data preprocessing unit and an entity deep learning unit; the entity small sample labeling data set is an entity data set which is labeled manually; the entity data preprocessing unit recombines continuous word sequences in the effective data into continuous word sequences according to the entity data set, and removes words irrelevant to recognition by using a stop dictionary to obtain preprocessed entity data; the entity deep learning unit adopts a bidirectional long-short term memory network and a conditional random model to realize effective identification of driving scene element entities in the preprocessed entity data.

4. The knowledge-graph-based automated driving decision system of claim 3, wherein the entity recognition deep learning model comprises a tested and trained BilSTM-CRF model.

5. The knowledge-graph-based automated driving decision system of claim 1, wherein the relationship extraction module comprises a relationship extraction model building sub-module comprising a relationship small sample labeling dataset, a data relationship preprocessing unit, and a relationship deep learning unit; the small relational sample labeling data set is an entity relational data set labeled manually; the data relation preprocessing unit forms entity pairs by the identified entity names according to the entity relation data set, and stores the positions of the entity pairs in the effective data, the types of the entities, the parts of speech and the modifiers around the entity pairs to obtain the labeled entity relation data set; the relation extraction model associates the identified entity name with the corresponding entity relation according to the labeled entity relation data set to obtain the relation characteristic; the relationship features comprise entity features, entity category features, context features, part-of-speech features, position features and modifier features.

6. The knowledge-graph based automated driving decision system of claim 5, wherein the relationship extraction model comprises a tested and trained BilSTM-Attention model.

7. The knowledge-graph-based automated driving decision system of claim 1, whereinThen, the decision-making reinforcement learning model adopts a Q learning algorithm to obtain an optimal value function, and a value function V designed by the Q learning algorithm ^π (s _t ) The expression is as follows:

value function V ^π (s _t ) And Q function Q _t (s _t ，α _t ) The relationship of (1) is:

Q _t+1 (s _t ，a _t )＝R(s _t ，a _t )+γmaxQ(s _t+1 ，a _t )+εE

8. the knowledge-graph-based automatic driving decision system according to claim 2, wherein the categories of entity relationships include intervention triggering, behavior restriction, warning reminding, presence influence and mutual noninterference, and respectively correspond to a safety level of 1-5, and corresponding safety evaluation values of 2, 4, 6, 8 and 10.

9. The knowledge-graph-based automated driving decision system of claim 1, wherein the data crawling sub-module uses a script distributed crawler framework to obtain data.

10. A method of operating a knowledge-graph-based automatic driving decision system, comprising the steps of: