CN108791302A - Driving behavior modeling - Google Patents

Driving behavior modeling Download PDF

Info

Publication number
CN108791302A
CN108791302A CN201810662040.0A CN201810662040A CN108791302A CN 108791302 A CN108791302 A CN 108791302A CN 201810662040 A CN201810662040 A CN 201810662040A CN 108791302 A CN108791302 A CN 108791302A
Authority
CN
China
Prior art keywords
driving
reward program
state
feature
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810662040.0A
Other languages
Chinese (zh)
Other versions
CN108791302B (en
Inventor
邹启杰
李昊宇
裴腾达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201810662040.0A priority Critical patent/CN108791302B/en
Publication of CN108791302A publication Critical patent/CN108791302A/en
Application granted granted Critical
Publication of CN108791302B publication Critical patent/CN108791302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • B60W2050/0029Mathematical model of the driver

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mechanical Engineering (AREA)
  • Evolutionary Computation (AREA)
  • Transportation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a kind of driving behavior modelings, specifically include feature extractor, extraction structure Reward Program feature;Reward Program generator obtains the Reward Program needed for structure driving strategy;Driving strategy getter completes the structure of driving strategy;Judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then rebuilding Reward Program, repetition builds optimal driving strategy, iterates, until meeting judgment criteria;It is final to obtain the true driving strategy for driving demonstration of description.The application can be applicable in new state scene, to obtain its respective action, substantially increase the generalization ability of the driver behavior model of foundation, and applicable scene is wider, and robustness is stronger.

Description

Driving behavior modeling
Technical field
The present invention relates to a kind of modeling method, specifically driving behavior modeling.
Background technology
A pith of the autonomous driving as intelligent transportation field.By reasons such as current techniques, Autonomous Vehicles still need Intelligent driving system (intelligent DAS (Driver Assistant System)) and human driver is wanted to cooperate to complete driving task.And in this mistake Cheng Zhong, either preferably the information of quantization driver is not all for intelligence system decision, or by distinguish driver People provide personalized service, and driver modeling is all essential important step.
Current related in the method for driver modeling, intensified learning method because of for driver vehicle drive in this way Complicated sequential decision problem with extensive continuous space and multiple optimization aims has and solves effect well, then It is also that one kind modeling effective method for driving behavior.Intensified learning be used as based on MDP the problem of solution, need and Environmental interaction, the feedback signal for taking action to obtain the evaluation property from environment returns (reward), and makes long-term Return maximization.
By the retrieval discovery for existing literature, in the existing modeling for driving behavior, for Reward Program Setting method includes mainly:Traditional method for being directed to different scenes state manually by researcher and being configured, and by The method of reverse intensified learning is come the method that is arranged.Traditional method relies on the subjectivity of researcher in very big, return letter Several quality depends on the ability and experience of researcher.Simultaneously as in vehicle travel process, in order to correctly be arranged back Report function, need to balance a large amount of decision variable, these variables exist greatly can not the property spent together even paradox, and study people Member can not often design the Reward Program that can balance every demand.
And reverse intensified learning distributes suitable weight for all kinds of driving characteristics by example data is driven, it can be automatic Study obtains required Reward Program, and then solves the deficiency of original artificial decision.But traditional reverse intensified learning side Method can only learn for driving already present scene state in example data, and when actually drive, because weather, The difference of the factors such as scenery, true Driving Scene often surmount driving example range.Thus, the method for reverse intensified learning solves The relationship for driving example data Scene and decision action is shown to the problem of generalization ability deficiency.
There are mainly two types of thinkings for the existing driving behavior modeling method based on intensified learning theory:Thinking one, using biography Unite the method for intensified learning, the setting dependence researcher of Reward Program for the analysis of scene, arrangement, screening and conclusion, And then a series of related feature of Driving Decision-makings is acquired, such as:Chinese herbaceous peony away from, whether far from curb, whether far from pedestrian, rationally speed Degree, lane change frequency etc.;Further according to Driving Scene demand, a series of experiment is designed to seek these features in corresponding scene environment Under Reward Program in weight accounting, finally complete the global design for Reward Program, and drive as description driver The model of behavior.Thinking two is based on probabilistic model modeling method, and driving behavior is solved using the reverse intensified learning of maximum entropy Function.It assumes first that in the presence of potential, specific one probability distribution, produces the demonstration track of driving;In turn, it needs to look for The problem of probability distribution for driving demonstration can be fitted to one, and seek this probability distribution, can be converted into Non-Linear Programming and ask Topic, i.e.,:
max-plogp
∑ P=1
P acute pyogenic infection of finger tip is exactly the probability distribution of track of demonstrating, and is solved after obtaining probability distribution by formula above, by
It seeks obtaining relevant parameter, you can acquire Reward Program r=θTf(st)。
Traditional driver's driving behavior model, using the analysis of known driving data, description and reasoning driving behavior, however Inexhaustible driving behavior can not be completely covered in the driving data acquired, unlikely obtain whole states and correspond to The case where action.Under practical Driving Scene, because of the difference of weather, scene, object, driving condition has numerous possibility, time It is impossible thing to go through whole states.Therefore traditional driver's driving behavior model generalization ability is weak, model hypothesis condition It is more, poor robustness.
Secondly, in actual driving problem, the method for Reward Program is only set with researcher, needs balance too many right In the demand of various features, it is completely dependent on the experience setting of researcher entirely, reconciles, takes time and effort manually repeatedly, it is more fatal It is excessively subjective.Under different scenes and environment, researcher then needs to face too many scene state;Meanwhile even needle To the scene state of some determination, the difference of demand also results in the variation of driving behavior.For accurate description, this is driven The task of sailing will distribute a series of weights with these factors of accurate description.In existing method, the reverse reinforcing based on probabilistic model Study is mainly from existing example data, using example data as data with existing, and then seeks point of corresponding current data Cloth situation, the action that can be just sought based on this under corresponding states are chosen.But the distribution of given data can not indicate total data Distribution, it is correct to obtain distribution, the case where needing to obtain whole state respective actions.
Invention content
To solve the problems, such as that driver modeling generalization is weak, i.e., not showing for Driving Scene in the presence of the prior art Norm can not establish corresponding Reward Program come the technical issues of carrying out driving behavior modeling, the application carries in the case of Driving behavior modeling has been supplied, new state scene can be applicable in, to obtain its respective action, be greatly improved The generalization ability for the driver behavior model established, is applicable in that scene is wider, and robustness is stronger.
To achieve the goals above, the technical essential of the present invention program is:Driving behavior modeling, specifically includes:
Feature extractor, extraction structure Reward Program feature;
Reward Program generator obtains driving strategy;
Driving strategy getter completes the structure of driving strategy;
Judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then weighing New structure Reward Program, repetition build optimal driving strategy, iterate, until meeting judgment criteria;It is final to obtain description very The real driving strategy for driving demonstration.
Further, the specific implementation process of feature extractor extraction structure Reward Program feature is:
S11. in vehicle travel process, driving video is adopted using vehicle windscreen subsequent video camera is placed on Sample obtains the picture of N group different vehicle driving environment road conditions;Corresponding driver behavior data simultaneously, i.e., under the road environment Steering angle situation, joint mapping gets up training data;
S12. it translated to collecting the picture come, cut, change brightness operation, to simulate the field of different illumination and weather Scape;
S13. convolutional neural networks are built, using picture after treatment as input, the operation data of corresponding picture is made It for label value, is trained, optimizes god using optimal solution is sought to mean square error loss based on the optimization method of Nadam optimizers Weight parameter through network;
S14. the network structure of the convolutional neural networks after the completion of training and weights are preserved, to establish a new convolution Neural network, completion status feature extractor.
Further, the convolutional neural networks established in step S13 include 1 input layer, 3 convolutional layers, 3 ponds Layer, 4 full articulamentums;Input layer is sequentially connected first convolutional layer, first pond layer, then connect second convolutional layer, Second pond layer reconnects third convolutional layer, third pond layer, is finally sequentially connected first full articulamentum, second A full articulamentum, third full articulamentum, the 4th full articulamentum.
Further, the convolutional neural networks after the completion of the training in step S14 do not include output layer.
Further, Reward Program generator, which obtains driving strategy specific implementation process, is:
S21. the driving example data of expert is obtained:Example data is driven from the sampling for driving video data of demonstrating Extraction, samples according to one section of continuous driving video of certain frequency pair, obtains one group of track demonstration;One expert's demonstration number According to including a plurality of track, totally it is denoted as:
Wherein DEIndicate whole driving example data, (sj,aj) indicating that corresponding states j corresponds to the data pair of decision instruction composition with the state, M represents driving example data in total Number, NTIt represents and drives demonstration trace number, LiIt represents i-th and drives the state-decision instruction for including in demonstration track to (sj, aj) number;
S22. seek driving the feature desired value of demonstration;
Example data D will be driven firstEIn each description driving environment situation state stInput state feature extractor In, obtain corresponding states stUnder feature situation f (st,at), f (st,at) one group of correspondence s of acute pyogenic infection of finger tiptInfluence Driving Decision-making result Driving environment scene characteristic value, be then based on following formula calculate drive demonstration feature desired value:
Wherein γ is discount factor, and according to the difference of problem, correspondence is configured;
S23. state-behavior aggregate under greedy strategy is sought;
S24. the weights of Reward Program are sought.
Further, comprising the concrete steps that for state-behavior aggregate under greedy strategy is sought:Due to Reward Program generator It is two parts of cycle with driving strategy getter;First, the neural network in driving strategy getter is obtained:It demonstrates driving Data DEExtract the state feature f (s of obtained description ambient conditionst,at) input neural network, obtain output gw(st);gw (st) it is about description state stOne group of Q value set, i.e. [Q (st,a1),...,Q(st,an)]T, and Q (st,ai) represent state- Working value, for describing in current Driving Scene state stUnder, choose decision driver behavior aiQuality, based on formula Q (s, a) (s a) is acquired=θ μ, the weights in the current Reward Program of θ acute pyogenic infection of finger tip in the formula, μ (s, a) acute pyogenic infection of finger tip feature desired value.
ε-greedy strategies are then based on, carry out choosing description Driving Scene state stCorresponding Driving Decision-making action It chooses about current Driving Scene stUnder Q value sets in allow the maximum decision action of Q valuesOtherwise, then it randomly selects It has chosenLater, it records at this time
So for driving the D that demonstratesEIn each state state feature f (st,at), the neural network is inputted, is obtained altogether M state-action is obtained to (st,at), which depict the Driving Scene state s of t momenttLower selection Driving Decision-making acts at;Together When based on action choose the case where, obtain the Q values of M corresponding states-action pair, be denoted as Q.
Further, the weights for seeking Reward Program comprise the concrete steps that:
It is primarily based on following formula, builds object function:
Loss function is represented, i.e., according to current state-action to whether there is among driving demonstration, if depositing It is being then 0, is being otherwise 1;For the corresponding states-working value recorded above;For S22 In the driving exemplary features sought it is expected and the product of the weights θ of Reward Program;For regular terms;
The object function, i.e. t=min are minimized by gradient descent methodθJ (θ), acquisition enable the minimization of object function Variable θ, the θ are the weights of striked required Reward Program.
Further, Reward Program generator acquisition driving strategy specific implementation process further includes:S25. it is based on obtaining Correspondence Reward Program weights θ, according to formula r (s, a)=θT(s a) builds Reward Program generator to f.
As further, driving strategy getter completes the specific implementation process that driving strategy is built and is:
S31 builds the training data of driving strategy getter
Training data is obtained, each data include two parts:One is that t moment Driving Scene state is inputted driving condition The Driving Decision-making feature f (s that extractor obtainst), another is namely based on what following formula obtained
Wherein, rθ(st,at) by Reward Program of the Reward Program generator based on driving example data generation;Qπ(st, at) and Qπ(st+1,at+1) come from Q values recorded in S23, choose t moment Driving Scene s described in ittQ values and selection T+1 moment Driving Scenes s described in itt+1Q values;
S32. neural network is established
Neural network includes three layers, and first layer is as input layer, the output of neuron number and feature extractor therein Feature type is identical for k, the feature f (s for inputting Driving Scenet,at), the hidden layer number of the second layer is 10, third layer Neuron number in motion space carry out decision driver behavior number n it is identical;Input layer and the activation primitive of hidden layer are all For sigmoid functions, i.e.,Have:
Z=w(1)X=w(1)[1,ft]T
H=sigmoid (z)
gw(st)=sigmoid (w(2)[1,h]T)
Wherein w(1)For the weights of hidden layer;ftFor the state s of t moment Driving ScenetFeature, that is, neural network is defeated Enter;Network layer output when z is without hidden layer sigmoid activation primitives;H is hidden after sigmoid activation primitives Layer output;w(2)For the weights of output layer;
The g of network outputw(st) it is t moment Driving Scene state stQ set, i.e. [Q (st,a1),...,Q(st,an)]T, Q in S31π(st,at) it is exactly by state stInput neural network, a in selection outputtObtained by;
S33. optimization neural network
The loss function of optimization for the neural network, foundation is cross entropy cost function, and formula is as follows:
The wherein number of N acute pyogenic infection of finger tip training data;Qπ(st,at) it is that will describe t moment Driving Scene state stInput nerve net Network, the correspondence Driving Decision-making in selection output act atThe obtained numerical value of item;For the numerical value acquired in S31;It is regular terms, W={ w therein(1),w(2)Weights in neural network above acute pyogenic infection of finger tip;
The training data that will be obtained in S31 inputs the Neural Network Optimization cost function;By gradient descent method completion pair In the minimum of the cross entropy cost function, the neural network of obtained optimization completion, and then obtain driving strategy getter.
As further, judging device implements process and includes:
Regard current Reward Program generator and driving strategy getter as an entirety, checks the t in current S22 Value, if meet t < ε, ε be judge object function whether the threshold value of meet demand, that is, judge to be currently used in acquisition driving Whether the Reward Program of strategy meets the requirements;Its numerical value carries out different settings according to specific needs;
When the numerical value of t, when being unsatisfactory for the formula;It needs to rebuild Reward Program generator, be needed at this time by current S23 The neural network of middle needs is substituted for the new neural network after having already passed through optimization in S33, i.e., will be used to generate description and exist Driving Scene state stUnder, the decision driver behavior a of selectioniGood and bad Q (st,ai) value network, be substituted in S33 by ladder The new network structure that degree descending method optimized;Then Reward Program generator is rebuild, driving strategy is obtained and obtains Take device, judge again t numerical value whether meet demand;
When meeting the formula, current θ is exactly the weights of required Reward Program;Reward Program generator, which then meets, to be wanted It asks, driving strategy getter is also met the requirements;Then acquisition needs to establish the driving data of certain driver of pilot model, i.e., Environment scene image in driving procedure and corresponding operation data input driving environment feature extractor, obtain for current The decision feature of scene;Then feature extraction obtained inputs Reward Program generator, obtains the return of corresponding scene state Function;Then the decision feature of acquisition and the Reward Program being calculated are inputted driving strategy getter, obtains the driver Corresponding driving strategy.
Advantageous effect is the present invention compared with prior art:For describing driver's decision in the present invention, establishes and drive Member behavior model method, because using neural network come Descriptive strategies, when neural network parameter determines, state and action Correspond, then for state-action to it is possible in the case of be no longer limited by demonstration track.Then in actual driving situation In, because the corresponding big state space of Driving Scene various caused by the reasons such as weather, scenery, outstanding by means of neural network Approximate expression arbitrary function ability, approximately can be by a kind of this Policy Table up to regarding black box as:Pass through the feature of input state Value exports corresponding state-working value, while further being acted according to the case where output valve to choose, corresponding dynamic to obtain Make.To make to be greatly enhanced come the applicability modeled for driving behavior by reverse intensified learning, conventional method is because attempting It is fitted to demonstration track by a certain probability distribution, thus the optimal policy obtained remains unchanged and is limited to having in demonstration track State status, and the present invention new state scene can be applicable in, to obtain its respective action, substantially increase and build The generalization ability of vertical driver behavior model, applicable scene is wider, and robustness is stronger.
Description of the drawings
Fig. 1 is new depth convolutional neural networks;
Fig. 2 is driving video sample graph;
Fig. 3 is this system workflow block diagram;
Fig. 4 is to establish neural network structure figure in step S32.
Specific implementation mode
Below in conjunction with Figure of description, the invention will be further described.Following embodiment is only used for clearly Illustrate technical scheme of the present invention, and not intended to limit the protection scope of the present invention.
The present embodiment provides a kind of driving behavior modelings, including:
1. feature extractor, extraction structure Reward Program feature, concrete mode are:
S11. in vehicle travel process, the driving video that is obtained using the subsequent video camera of the windshield for being placed on vehicle into Row sampling, sample graph are as shown in Figure 2.
Obtain the picture of N group different vehicle driving road environment road conditions and corresponding steering angle situation.Including N1 Straight way and N2 bends, the value of N1, N2 can be N1>=300, N2>=3000, at the same corresponding driver behavior data, joint Construct training data.
S12. carry out relevant translation to collecting the image come, cut, the change operations such as brightness, with simulate different illumination and The scene of weather.
S13. convolutional neural networks are built, using picture after treatment as input, the operation data of corresponding picture is made For label value, it is trained;Optimize to seek optimal solution to mean square error loss using based on the optimization method of Nadam optimizers The weight parameter of neural network.
Convolutional neural networks include 1 input layer, 3 convolutional layers, 3 pond layers, 4 full articulamentums.Input layer is successively First convolutional layer, first pond layer are connected, second convolutional layer, second pond layer are then connected, reconnects third Convolutional layer, third pond layer, be then sequentially connected the full articulamentum of first full articulamentum, second full articulamentum, third, 4th full articulamentum.
S14. the network structure by the convolutional neural networks after the completion of training in addition to the last output layer and weights preserve, To establish a new convolutional neural networks, completion status feature extractor.
2. Reward Program generator, obtains driving strategy, concrete mode is:
Reward Program returns letter as the standard for acting selection in intensified learning method in the acquisition process of driving strategy Several quality plays the role of conclusive, directly determines the quality of the driving strategy of acquisition, and the strategy obtained is No strategy corresponding with true driving example data is identical.The formula of Reward Program is reward=θTf(st,at), f (st, at) acute pyogenic infection of finger tip corresponds to the t moment state s under driving environment scene " vehicle-periphery "tOne group of influence Driving Decision-making result spy Value indicative, for describing vehicle-periphery scenario.And θ acute pyogenic infection of finger tip corresponds to one group of weights of the feature for influencing Driving Decision-making, power The corresponding environmental characteristic of the numbers illustrated of value proportion shared in Reward Program, embodies importance.It is carried in state feature On the basis of taking device, need to solve this weights θ, to come build influence driving strategy Reward Program.
S21. the driving example data of expert is obtained
Example data is driven from sampling extraction (and the driving environment feature extraction before for driving video data of demonstrating Data used in device are different), it can be sampled according to the continuous driving video of one section of frequency pair of 10hz, obtain one group of track and show Model.One expert's demonstration should have a plurality of track.Totally it is denoted as: Wherein DEIndicate whole driving example data, (sj,aj) indicate the corresponding states j (videos of the driving environment of the time j of sampling Picture) data pair that decision instruction (steering angle in such as steering order) is constituted are corresponded to the state, M represents driving in total The number of example data, NTIt represents and drives demonstration trace number, LiIt represents i-th and drives the state-decision for including in demonstration track Instruction is to (sj,aj) number
S22. the feature for seeking driving demonstration it is expected
Example data D will be driven firstEIn each description driving environment situation state stInput state feature extraction Device obtains corresponding states stUnder feature situation f (st,at), f (st,at) one group of correspondence s of acute pyogenic infection of finger tiptInfluence Driving Decision-making result Driving environment scene characteristic value, be then based on following formula calculate drive demonstration feature it is expected:
Wherein γ is discount factor, and according to the difference of problem, correspondence is configured, and referential data can be set as 0.65.
S23. state-behavior aggregate under greedy strategy is sought
First, the neural network in the driving strategy getter in S32 is obtained.(because of Reward Program generator and drive plan Slightly getter is two parts in a cycle, and most neural network is the neural network just initialized in S32 at first. With the progress of cycle, each step in cycle is all:In the structure for completing the primary Reward Program for influencing Driving Decision-making, then Corresponding optimal driving strategy is obtained in driving strategy getter based on current Reward Program, judges whether to meet end loop Standard, rebuild in Reward Program if not satisfied, being then put into the neural network that the process in current S34 optimized)
Driving example data DEExtract the state feature f (s of obtained description ambient conditionst,at), neural network is inputted, Obtain output gw(st);gw(st) it is about description state stOne group of Q value set, i.e. [Q (st,a1),...,Q(st,an)]T, and Q(st,ai) state-working value is represented, for describing in current Driving Scene state stUnder, choose decision driver behavior aiIt is excellent It is bad, can be based on formula Q (s, a)=θ μ (s a) is acquired, the weights in the current Reward Program of the θ acute pyogenic infection of finger tip in the formula, μ (s, a) acute pyogenic infection of finger tip feature expectation.
ε-greedy strategies are then based on, if setting ε is 0.5, carry out choosing description Driving Scene state stIt is corresponding Driving Decision-making actsThat is there is 50 percent possibility, choose about current Driving Scene stUnder Q value collection The maximum decision of Q values is allowed to act in conjunctionOtherwise, then it randomly selectsIt has chosenLater, it records at this time
So for driving the D that demonstratesEIn each state state feature f (st,at), the neural network is inputted, is obtained altogether M state-action is obtained to (st,at) which depict the Driving Scene state s of t momenttLower selection Driving Decision-making acts at.Together When based on action choose the case where, obtain the Q values of M corresponding states-action pair, be denoted as Q.
S24. the weights of Reward Program are sought
It is primarily based on following formula, builds object function:
Represent loss function, i.e., according to current state-action to whether there is among driving demonstration, if It is otherwise 1 in the presence of being then 0.For the corresponding states-working value recorded above.For The driving exemplary features sought in S22 it is expected and the product of the weights θ of Reward Program.For regular terms, to prevent over-fitting The appearance of problem, the γ can be 0.9.
The object function, i.e. t=min are minimized by gradient descent methodθJ (θ), acquisition enable the minimization of object function Variable θ, the θ are the weights of striked required Reward Program.
S25. the correspondence Reward Program weights θ based on acquisition, according to formula r (s, a)=θT(s a) builds Reward Program to f Generator.
3. driving strategy getter, completes the structure of driving strategy, concrete mode is:
The structure of the training data of S31 driving strategy getters
Obtain training data.Data come from the sampling to example data before, but are handled to obtain one group The data of new type amount to N number of.Each data include two parts in data:One is to input t moment Driving Scene state The Driving Decision-making feature f (s that driving condition extractor obtainst), another is namely based on what following formula obtained
Include parameter r in the formulaθ(st,at) by return of the Reward Program generator based on driving example data generation Function.Qπ(st,at) and Qπ(st+1,at+1) come from that group of Q value Q recorded in S23, choose t moment driver training ground described in it Scape stQ values and choose its described in t+1 moment Driving Scenes st+1Q values.
S32. neural network is established
Neural network includes three layers, and first layer is as input layer, the output of neuron number and feature extractor therein Feature type is identical for k, the feature f (s for inputting Driving Scenet,at), the hidden layer number of the second layer is 10, third layer Neuron number in motion space carry out decision driver behavior number n as number;The activation of input layer and hidden layer Function is all sigmoid functions, i.e.,Have:
Z=w(1)X=w(1)[1,ft]T
H=sigmoid (z)
gw(st)=sigmoid (w(2)[1,h]T)
Wherein w(1)The weights of acute pyogenic infection of finger tip hidden layer;ftThe state s of acute pyogenic infection of finger tip t moment Driving ScenetFeature, that is, neural network Input;The output of network layer when z acute pyogenic infection of finger tip is without hidden layer sigmoid activation primitives;H acute pyogenic infection of finger tip is activated by sigmoid Hidden layer output after function;w(2)The weights of acute pyogenic infection of finger tip output layer;Network structure such as Fig. 3:
The g of network outputw(st) it is t moment Driving Scene state stQ set, i.e. [Q (st,a1),...,Q(st,an)]T, Q in S31π(st,at) it is exactly by state stInput neural network, a in selection outputtObtained by.
S33. optimization neural network
The loss function of optimization for the neural network, foundation is cross entropy cost function, and formula is as follows:
The wherein number of N acute pyogenic infection of finger tip training data.Qπ(st,at) it is exactly that will describe t moment Driving Scene state stInput nerve Network, the correspondence Driving Decision-making in selection output act atThe obtained numerical value of item.For the numerical value acquired in S31.Equally it is regular terms, prevents over-fitting and be arranged.The γ may be 0.9.W={ w therein(1),w(2)Acute pyogenic infection of finger tip Weights in neural network above.
The training data that will be obtained in S31 inputs the Neural Network Optimization cost function.By gradient descent method completion pair In the minimum of the cross entropy cost function, the neural network of obtained optimization completion obtains driving strategy getter.
4. judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then Reward Program is rebuild, repetition builds optimal driving strategy, iterates, until meeting judgment criteria;Finally described The true driving strategy for driving demonstration.
Regard current Reward Program generator and driving strategy getter as an entirety, checks the t in current S22 Value, if meet t < ε, ε be judge object function whether the threshold value of meet demand, that is, judge to be currently used in acquisition driving Whether the Reward Program of strategy meets the requirements.Its numerical value carries out different settings according to specific needs.
When the numerical value of t is unsatisfactory for the formula.It needs to rebuild Reward Program generator, needs to work as at this time The neural network needed in preceding S23 is substituted for the new neural network after having already passed through optimization in S33, i.e., will be used to generate and retouch It states in Driving Scene state stUnder, the decision driver behavior a of selectioniGood and bad Q (st,ai) value network, be substituted in S33 and pass through Cross the new network structure that gradient descent method optimized.Then it rebuilds Reward Program generator, obtain driving plan Slightly getter, judge again t numerical value whether meet demand.
When meeting the formula, current θ is exactly the weights of required Reward Program.Reward Program generator is then full Foot requires, and driving strategy getter is also met the requirements.It then can be with:Acquisition needs to establish driving for certain driver of pilot model Data, i.e., the environment scene image in driving procedure and corresponding operation data are sailed, steering angle is such as driven.It is special to input driving environment Extractor is levied, the decision feature for current scene is obtained.Then feature extraction obtained inputs Reward Program generator, obtains To the Reward Program of corresponding scene state.Then the decision feature of acquisition and the Reward Program being calculated are inputted driving strategy Getter obtains the corresponding driving strategy of the driver.
In markov decision process, a kind of strategy needs connection status to its corresponding action.But have for one When large-scale state space, for the region not traversed, indicated it is difficult to be depicted and carry out a determining strategy, tradition Also the description to this part is had ignored among method, is only based on demonstration track, to illustrate the probability mould of entire track distribution Type does not provide specific strategy for new state and indicates, i.e., takes the possibility for determining and acting not for new state Provide specific method.Strategy is described by neural network in the present invention, neural network can be in any essence because of it The characteristic of approximate representation arbitrary function in exactness, while having outstanding generalization ability.By the expression of state feature, on the one hand Those states being not included in demonstration track can be represented, in addition, inputting neural network by by corresponding state feature. Corresponding working value can be sought, to seek deserved action according to strategy, thus, conventional method can not extensive driving demonstration number It is addressed according to not traversing Driving Scene state issues.
The preferable specific implementation mode of the above, only the invention, but the protection domain of the invention is not It is confined to this, any one skilled in the art is in the technical scope that the invention discloses, according to the present invention The technical solution of creation and its inventive concept are subject to equivalent substitution or change, should all cover the invention protection domain it It is interior.

Claims (10)

1. driving behavior modeling, which is characterized in that specifically include:
Feature extractor, extraction structure Reward Program feature;
Reward Program generator obtains driving strategy;
Driving strategy getter completes the structure of driving strategy;
Judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then structure again Reward Program is built, repetition builds optimal driving strategy, iterates, until meeting judgment criteria.
2. driving behavior modeling according to claim 1, which is characterized in that feature extractor extraction structure return letter Counting the specific implementation process of feature is:
S11. in vehicle travel process, driving video is sampled using vehicle windscreen subsequent video camera is placed on, Obtain the picture of N group different vehicle driving environment road conditions and corresponding steering angle situation;Corresponding driver behavior data simultaneously, Joint mapping gets up training data;
S12. it translated to collecting the picture come, cut, change brightness operation, to simulate the scene of different illumination and weather;
S13. convolutional neural networks are built, using picture after treatment as input, the operation data of corresponding picture is as mark Label value, is trained, and optimizes nerve net using optimal solution is sought to mean square error loss based on the optimization method of Nadam optimizers The weight parameter of network;
S14. the network structure of the convolutional neural networks after the completion of training and weights are preserved, to establish a new convolutional Neural Network, completion status feature extractor.
3. driving behavior modeling according to claim 2, which is characterized in that the convolutional Neural established in step S13 Network includes 1 input layer, 3 convolutional layers, 3 pond layers, 4 full articulamentums;Input layer be sequentially connected first convolutional layer, Then first pond layer connects second convolutional layer, second pond layer, reconnect third convolutional layer, third pond Layer is finally sequentially connected first full articulamentum, second full articulamentum, third full articulamentum, the 4th full articulamentum.
4. driving behavior modeling according to claim 2, which is characterized in that after the completion of the training in step S14 Convolutional neural networks do not include output layer.
5. driving behavior modeling according to claim 1, which is characterized in that Reward Program generator, which obtains, drives plan Slightly implementing process is:
S21. the driving example data of expert is obtained:Drive example data from for demonstrate driving video data sampling carry It takes, is sampled according to one section of continuous driving video of certain frequency pair, obtain one group of track demonstration;One expert's example data Including a plurality of track, totally it is denoted as:
DE={ (s1,a1),(s2,a2),...,(sM,aM)}Wherein DEIndicate whole driving example data, (sj, aj) indicating that corresponding states j corresponds to the data pair of decision instruction composition with the state, M represents of driving example data in total Number, NTIt represents and drives demonstration trace number, LiIt represents i-th and drives the state-decision instruction for including in demonstration track to (sj, aj) number;
S22. seek driving the feature desired value of demonstration;
Example data D will be driven firstEIn each description driving environment situation state stIn input state feature extractor, Obtain corresponding states stUnder feature situation f (st,at), f (st,at) one group of correspondence s of acute pyogenic infection of finger tiptInfluence Driving Decision-making result drive Environment scene characteristic value is sailed, following formula is then based on and calculates the feature desired value for driving demonstration:
Wherein γ is discount factor, and according to the difference of problem, correspondence is configured;
S23. state-behavior aggregate under greedy strategy is sought;
S24. the weights of Reward Program are sought.
6. driving behavior modeling according to claim 5, which is characterized in that the state-sought under greedy strategy is dynamic That makees to collect comprises the concrete steps that:Since Reward Program generator and driving strategy getter are two parts of cycle;First, acquisition is driven Sail the neural network in tactful getter:Driving example data DEExtract the state feature f (s of obtained description ambient conditionst, at), neural network is inputted, output g is obtainedw(st);gw(st) it is about description state stOne group of Q value set, i.e. [Q (st, a1),...,Q(st,an)]T, and Q (st,ai) state-working value is represented, for describing in current Driving Scene state stUnder, choosing Depend on plan driver behavior aiQuality, ((s a) is acquired, and the θ acute pyogenic infection of finger tip in the formula is current by s, a)=θ μ based on formula Q Weights in Reward Program, μ (s, a) acute pyogenic infection of finger tip feature desired value.
ε-greedy strategies are then based on, carry out choosing description Driving Scene state stCorresponding Driving Decision-making actionIt chooses About current Driving Scene stUnder Q value sets in allow the maximum decision action of Q valuesOtherwise, then it randomly selectsIt chooses It is completeLater, it records at this time
So for driving the D that demonstratesEIn each state state feature f (st,at), the neural network is inputted, is acquired altogether M state-action is to (st,at), which depict the Driving Scene state s of t momenttLower selection Driving Decision-making acts at;Base simultaneously In acting the case where choosing, the Q values of M corresponding states-action pair are obtained, Q is denoted as.
7. driving behavior modeling according to claim 5, which is characterized in that the weights for seeking Reward Program specifically walk Suddenly it is:It is primarily based on following formula, builds object function:
Loss function is represented, i.e., according to current state-action to whether there is among driving demonstration, if in the presence of if It is 0, is otherwise 1;For the corresponding states-working value recorded above;To be asked in S22 The driving exemplary features taken it is expected and the product of the weights θ of Reward Program;For regular terms;
The object function, i.e. t=min are minimized by gradient descent methodθJ (θ) obtains the variable for enabling the minimization of object function θ, the θ are the weights of striked required Reward Program.
8. driving behavior modeling according to claim 5, which is characterized in that Reward Program generator, which obtains, drives plan Slightly implementing process further includes:S25. the correspondence Reward Program weights θ based on acquisition, according to formula r (s, a)=θTf(s,a) Build Reward Program generator.
9. driving behavior modeling according to claim 1, which is characterized in that driving strategy getter is completed to drive plan The specific implementation process slightly built is:
S31 builds the training data of driving strategy getter
Training data is obtained, each data include two parts:One is by the input driving condition extraction of t moment Driving Scene state The Driving Decision-making feature f (s that device obtainst), another is namely based on what following formula obtained
Wherein, rθ(st,at) by Reward Program of the Reward Program generator based on driving example data generation;Qπ(st,at) and Qπ (st+1,at+1) come from Q values recorded in S23, choose t moment Driving Scene s described in ittQ values and choose wherein retouch State t+1 moment Driving Scenes st+1Q values;
S32. neural network is established
Neural network includes three layers, and first layer is as input layer, the output feature of neuron number and feature extractor therein Type is all mutually k, the feature f (s for inputting Driving Scenet,at), the hidden layer number of the second layer is 10, the god of third layer It is identical with the driver behavior number n of progress decision in motion space through first number;Input layer and the activation primitive of hidden layer are all Sigmoid functions, i.e.,Have:
Z=w(1)X=w(1)[1,ft]T
H=sigmoid (z)
gw(st)=sigmoid (w(2)[1,h]T)
Wherein w(1)For the weights of hidden layer;ftFor the state s of t moment Driving ScenetFeature, that is, neural network input;z Network layer output when for without hidden layer sigmoid activation primitives;H is that the hidden layer after sigmoid activation primitives is defeated Go out;w(2)For the weights of output layer;
The g of network outputw(st) it is t moment Driving Scene state stQ set, i.e. [Q (st,a1),...,Q(st,an)]T, S31 In Qπ(st,at) it is exactly by state stInput neural network, a in selection outputtObtained by;
S33. optimization neural network
The loss function of optimization for the neural network, foundation is cross entropy cost function, and formula is as follows:
The wherein number of N acute pyogenic infection of finger tip training data;Qπ(st,at) it is that will describe t moment Driving Scene state stInput neural network, choosing Select the correspondence Driving Decision-making action a in outputtThe obtained numerical value of item;For the numerical value acquired in S31;It is just Then item, W={ w therein(1),w(2)Weights in neural network above acute pyogenic infection of finger tip;
The training data that will be obtained in S31 inputs the Neural Network Optimization cost function;It is completed for this by gradient descent method The minimum of cross entropy cost function, the neural network that obtained optimization is completed, and then obtain driving strategy getter.
10. driving behavior modeling according to claim 1, which is characterized in that judging device implements process and includes:
Regard current Reward Program generator and driving strategy getter as an entirety, checks the t values in current S22, be It is no to meet t < ε, ε be judge object function whether the threshold value of meet demand, that is, judge to be currently used in and obtain driving strategy Whether Reward Program meets the requirements;Its numerical value carries out different settings according to specific needs;
When the numerical value of t, when being unsatisfactory for the formula;It needs to rebuild Reward Program generator, needs to need in current S23 at this time The neural network wanted is substituted for the new neural network after having already passed through optimization in S33, i.e., will be used to generate description and drive Scene state stUnder, the decision driver behavior a of selectioniGood and bad Q (st,ai) value network, be substituted in S33 by under gradient The new network structure that drop method optimized;Then it rebuilds Reward Program generator, obtain driving strategy getter, Judge again t numerical value whether meet demand;
When meeting the formula, current θ is exactly the weights of required Reward Program;Reward Program generator is then met the requirements, Driving strategy getter is also met the requirements;Then acquisition needs to establish the driving data of certain driver of pilot model, that is, drives Environment scene image during sailing and corresponding operation data input driving environment feature extractor, obtain for working as front court The decision feature of scape;Then feature extraction obtained inputs Reward Program generator, obtains the return letter of corresponding scene state Number;Then the decision feature of acquisition and the Reward Program being calculated are inputted driving strategy getter, obtains the driver couple The driving strategy answered.
CN201810662040.0A 2018-06-25 2018-06-25 Driver behavior modeling system Active CN108791302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810662040.0A CN108791302B (en) 2018-06-25 2018-06-25 Driver behavior modeling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810662040.0A CN108791302B (en) 2018-06-25 2018-06-25 Driver behavior modeling system

Publications (2)

Publication Number Publication Date
CN108791302A true CN108791302A (en) 2018-11-13
CN108791302B CN108791302B (en) 2020-05-19

Family

ID=64070795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810662040.0A Active CN108791302B (en) 2018-06-25 2018-06-25 Driver behavior modeling system

Country Status (1)

Country Link
CN (1) CN108791302B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110481561A (en) * 2019-08-06 2019-11-22 北京三快在线科技有限公司 Automatic driving vehicle automatic control signal generation method and device
CN111923928A (en) * 2019-05-13 2020-11-13 长城汽车股份有限公司 Decision making method and system for automatic vehicle
CN112052776A (en) * 2020-09-01 2020-12-08 中国人民解放军国防科技大学 Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment
CN112373482A (en) * 2020-11-23 2021-02-19 浙江天行健智能科技有限公司 Driving habit modeling method based on driving simulator
WO2021093011A1 (en) * 2019-11-14 2021-05-20 深圳大学 Unmanned vehicle driving decision-making method, unmanned vehicle driving decision-making device, and unmanned vehicle
CN112997128A (en) * 2021-04-19 2021-06-18 华为技术有限公司 Method, device and system for generating automatic driving scene

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive cruise control method based on approximate policy iteration
CN105955930A (en) * 2016-05-06 2016-09-21 天津科技大学 Guidance-type policy search reinforcement learning algorithm
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN107203134A (en) * 2017-06-02 2017-09-26 浙江零跑科技有限公司 A kind of front truck follower method based on depth convolutional neural networks
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN107679557A (en) * 2017-09-19 2018-02-09 平安科技(深圳)有限公司 Driving model training method, driver's recognition methods, device, equipment and medium
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive cruise control method based on approximate policy iteration
CN105955930A (en) * 2016-05-06 2016-09-21 天津科技大学 Guidance-type policy search reinforcement learning algorithm
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving
CN107203134A (en) * 2017-06-02 2017-09-26 浙江零跑科技有限公司 A kind of front truck follower method based on depth convolutional neural networks
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN107679557A (en) * 2017-09-19 2018-02-09 平安科技(深圳)有限公司 Driving model training method, driver's recognition methods, device, equipment and medium
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王勇鑫,钱徽,金卓军,朱淼良: "基于轨迹分析的自主导航性能评估方法", 《计算机工程》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111923928A (en) * 2019-05-13 2020-11-13 长城汽车股份有限公司 Decision making method and system for automatic vehicle
CN110481561A (en) * 2019-08-06 2019-11-22 北京三快在线科技有限公司 Automatic driving vehicle automatic control signal generation method and device
WO2021093011A1 (en) * 2019-11-14 2021-05-20 深圳大学 Unmanned vehicle driving decision-making method, unmanned vehicle driving decision-making device, and unmanned vehicle
CN112052776A (en) * 2020-09-01 2020-12-08 中国人民解放军国防科技大学 Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment
CN112373482A (en) * 2020-11-23 2021-02-19 浙江天行健智能科技有限公司 Driving habit modeling method based on driving simulator
CN112373482B (en) * 2020-11-23 2021-11-05 浙江天行健智能科技有限公司 Driving habit modeling method based on driving simulator
CN112997128A (en) * 2021-04-19 2021-06-18 华为技术有限公司 Method, device and system for generating automatic driving scene
CN112997128B (en) * 2021-04-19 2022-08-26 华为技术有限公司 Method, device and system for generating automatic driving scene

Also Published As

Publication number Publication date
CN108791302B (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN108819948A (en) Driving behavior modeling method based on reverse intensified learning
CN108791302A (en) Driving behavior modeling
CN108920805A (en) Driving behavior modeling with state feature extraction functions
CN111079561B (en) Robot intelligent grabbing method based on virtual training
CN112232490B (en) Visual-based depth simulation reinforcement learning driving strategy training method
CN106920243A (en) The ceramic material part method for sequence image segmentation of improved full convolutional neural networks
CN108891421A (en) A method of building driving strategy
CN110458060A (en) A kind of vehicle image optimization method and system based on confrontation study
CN106444379A (en) Intelligent drying remote control method and system based on internet of things recommendation
CN111136659A (en) Mechanical arm action learning method and system based on third person scale imitation learning
Li et al. Facial feedback for reinforcement learning: a case study and offline analysis using the TAMER framework
CN108944940A (en) Driving behavior modeling method neural network based
CN109726676A (en) The planing method of automated driving system
CN110321956A (en) A kind of herbage pest management method and device based on artificial intelligence
CN110110794A (en) The image classification method that neural network parameter based on characteristic function filtering updates
CN113779289A (en) Drawing step reduction system based on artificial intelligence
CN116957866A (en) Individualized teaching device of digital man teacher
CN116353623A (en) Driving control method based on self-supervision imitation learning
Hafez et al. Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination
CN103793054B (en) A kind of action identification method simulating declarative memory process
CN108875555A (en) Video interest neural network based region and well-marked target extraction and positioning system
CN110990589A (en) Knowledge graph automatic generation method based on deep reinforcement learning
CN112329498A (en) Street space quality quantification method based on machine learning
CN110222822A (en) The construction method of black box prediction model internal feature cause-and-effect diagram
CN108791308A (en) The system for building driving strategy based on driving environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared