CN107169567A - The generation method and device of a kind of decision networks model for Vehicular automatic driving - Google Patents

The generation method and device of a kind of decision networks model for Vehicular automatic driving Download PDF

Info

Publication number
CN107169567A
CN107169567A CN201710201086.8A CN201710201086A CN107169567A CN 107169567 A CN107169567 A CN 107169567A CN 201710201086 A CN201710201086 A CN 201710201086A CN 107169567 A CN107169567 A CN 107169567A
Authority
CN
China
Prior art keywords
sample
default
training
experience database
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710201086.8A
Other languages
Chinese (zh)
Other versions
CN107169567B (en
Inventor
夏伟
李慧云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201710201086.8A priority Critical patent/CN107169567B/en
Publication of CN107169567A publication Critical patent/CN107169567A/en
Application granted granted Critical
Publication of CN107169567B publication Critical patent/CN107169567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Traffic Control Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention is applicable field of computer technology there is provided a kind of generation method of decision networks model for Vehicular automatic driving and device, and methods described includes:The car status information gathered according to each experiment moment, default vehicle set of actions and default prize payouts function, generation each experiment moment corresponding sample triple, it is the sample data in the experience database built in advance by all sample triple stores, and clustering is carried out to all sample datas, according to default oversampling ratio value, training sample is equably gathered in each cluster that experience database clustering is obtained, and calculate the return aggregate-value of each training sample, according to all training samples, the return aggregate-value and default deep learning algorithm of each training sample, training obtains the decision networks model of Vehicular automatic driving, so as to be effectively improved the training effectiveness of the decision networks model and the generalization ability of the tactful network model.

Description

The generation method and device of a kind of decision networks model for Vehicular automatic driving
Technical field
The invention belongs to field of computer technology, more particularly to a kind of decision networks model for Vehicular automatic driving Generation method and device.
Background technology
With expanding economy and the propulsion of Development of China's Urbanization, Global Auto recoverable amount and mileages of transport route are stepped up, and are led The problem of a series of orthodox cars such as cause traffic congestion, accident, pollution, land resource in short supply can not be properly settled is increasingly convex It is aobvious.Pilotless automobile technology is considered as the effective solution of these problems, and its development gets most of the attention.
Pilotless automobile, i.e., travelled in the case of without driver by the DAS (Driver Assistant System) of itself on road, Possesses environment sensing ability.At present, the control method of DAS (Driver Assistant System) is mainly rule-based control decision, i.e., according to The driving experience known, builds to the expert decision system of situation of remote output control decision-making, as similar Expert Rules system Shallow-layer learning method is considered as from the process that rule is found between labeled data, when rule is difficult to be conceptualized as formula or letter During single logic, shallow-layer study can not just prove effective.
With the fast development of deeply learning art, some research institutions propose that the automatic Pilot of " end-to-end " formula is calculated Method, the control decision model in DAS (Driver Assistant System) is built by depth network, the input of depth network is camera, laser The status datas such as radar, GPS location, speed, the output of depth network is directly as the actuating signal for controlling vehicle drive.It is this Method need not carry out rule-based identification to the state of vehicle, but the training of depth network generally requires substantial amounts of data Sample, with the raising and the increase of complicated network structure degree of vehicle sensor data dimension, the computing resource of model training disappears Consumption is greatly increased, and the huge consumption of computing resource is referred to as a big obstruction of depth network model training.
The content of the invention
It is an object of the invention to provide a kind of generation method of decision networks model for Vehicular automatic driving and dress Put, it is intended to which the decision model training effectiveness for solving Vehicular automatic driving is relatively low and learning ability of Vehicle Decision Method model is weaker, nothing Method preferably adapts to different route and scene.
On the one hand, it is described the invention provides a kind of generation method of the decision networks model for Vehicular automatic driving Method comprises the steps:
Returned according to car status information, default vehicle set of actions and the default reward that each experiment moment gathers Function is reported, corresponding sample triple of each experiment moment is generated;
It is the sample data in the experience database built in advance by all sample triple stores, and to the experience number Clustering is carried out according to sample data all in storehouse;
According to default oversampling ratio value, equably gathered in each cluster that the experience database clustering is obtained Training sample, and calculate the corresponding accumulative return value of each training sample;
According to all training samples, the accumulative return value of each training sample and default deep learning algorithm, Training obtains the decision networks model of Current vehicle automatic Pilot.
On the other hand, the invention provides a kind of generating means of the decision networks model for Vehicular automatic driving, institute Stating device includes:
Sample generation module, for the car status information according to each experiment moment collection, default vehicle behavior aggregate Close and default prize payouts function, generate corresponding sample triple of each experiment moment;
Cluster Analysis module, for being the sample number in the experience database that builds in advance by all sample triple stores According to, and clustering is carried out to sample data all in the experience database;
Sampling module is trained, for according to default oversampling ratio value, being obtained in the experience database clustering Training sample is equably gathered in each cluster, and calculates the corresponding accumulative return value of each training sample;And
Model generation module, for according to all training samples, the accumulative return value of each training sample and pre- If deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
The present invention obtains experience number according to the car status information, vehicle set of actions and prize payouts function of collection According to the sample data in storehouse, i.e. sample triple, the sample data in experience database is classified by clustering, and by than Example uniform sampling in each cluster of classification, obtains the training sample of depth network, and finally training obtains Vehicular automatic driving Decision networks model, so as to simplify training sample by clustering method, and uses return value and deep learning training decision-making mode Network model, not only enables decision networks model quickly to be trained under the data set simplified, and effective raising The learning ability and generalization ability of decision networks model in control loop.
Brief description of the drawings
Fig. 1 is the reality of the generation method for the decision networks model for Vehicular automatic driving that the embodiment of the present invention one is provided Existing flow chart;
Fig. 2 is the knot of the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention two is provided Structure schematic diagram;And
Fig. 3 is the knot of the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention three is provided Structure schematic diagram.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Implementing for the present invention is described in detail below in conjunction with specific embodiment:
Embodiment one:
Fig. 1 shows the generation method for the decision networks model for Vehicular automatic driving that the embodiment of the present invention one is provided Implementation process, for convenience of description, illustrate only the part related to the embodiment of the present invention, details are as follows:
In step S101, according to it is each experiment the moment gather car status information, default vehicle set of actions with And default prize payouts function, generation each experiment moment corresponding sample triple.
The present invention is applied to be based on racing car analogue simulation platform or race simulator (such as open race simulator TORCS, The open racing car simulation) interaction platform set up, on the interaction platform carrying out nobody drives Sail the traveling interaction experiment of automobile.In current interaction process of the test, pass through default multiple sensor collecting vehicles on vehicle Status information, car status information may include vehicle from a distance from center line of road, the folder of vehicle forward direction and road tangentially The velocity component of angle, the distance value of vehicle front laser range finder and vehicle on road is tangential etc..In addition to the initial trial moment, The vehicle-state at each experiment moment is the vehicle-state of last moment and the result or function of vehicle action, for example, using St Car status information when representing to carve t at the trial, then car status information when carving t+1 at the trial is St+1=f (St,at) =f (f (St-1,at-1))=..., wherein, atVehicle action message during for experiment moment t.Specifically, vehicle action message can Including straight trip, brake etc..
In embodiments of the present invention, after the car status information for collecting the current test moment, rewarded back according to default Value function is reported, travels through and the vehicle action that can obtain maximal rewards value is searched in default vehicle set of actions, and sent out to vehicle Give the vehicle to act, for the ease of distinguishing, vehicle action is referred to as the action of maximal rewards value, by car status information, maximum Return value is acted and the return value of maximal rewards value action is combined into sample triple, for example, the sample three during experiment moment t Tuple is represented by (St,at,rt), rtFor the return value of experiment moment t maximal rewards value action.
As illustratively, in the current traveling-position of consideration vehicle, the formula of prize payouts function can be:
R=Δs dis*cos (α * angle) * sgn (trackPos-threshold), wherein, r is prize payouts function Return value, Δ dis is the coverage that vehicle was run at the adjacent experiment moment, and α is default weight zoom factor, and angle is Vehicle current driving direction and the tangential angle of road, trackPos be vehicle from center line of road with a distance from, threshold is pre- If threshold value, when trackPos is more than threshold, r is infinitesimal, can represent to punish when being too near to road boundary to vehicle Penalize.In addition, it is also possible to consider travel speed, SFC, smoothness etc. for prize payouts function.
In step s 102, by all sample triple stores it is sample data in the experience database that builds in advance, And clustering is carried out to sample data all in experience database.
In embodiments of the present invention, in current interaction off-test, each experiment moment pair during the interaction is tested The sample triple answered all is stored as the sample data in experience database.Each sample is calculated by the disaggregated model initialized Classification belonging to notebook data (i.e. sample triple) so that all sample datas are assigned in corresponding cluster, so as to pass through cluster point Some inwardness or rule of sample in experience database are found in analysis, with reduce decision networks model training sample dimension or Quantity, reaches the purpose for simplifying training sample.
In embodiments of the present invention, when initializing disaggregated model, driving number of the professional driver of collection in driving According to can be described as professional driving data, car status information and vehicle action when professional driving data includes professional driver driving " state-action " two tuples of information composition, carry out clustering, with first by default clustering algorithm to professional driving data Beginningization disaggregated model.Wherein, clustering algorithm can be the clustering algorithm such as K-means or principal component analysis (PCA).
As illustratively, when using K-means algorithm initialization disaggregated models, randomly selected in professional driving data Multiple cluster centres, and the classification of each " state-action " two tuples in professional driving data is calculated, update the poly- of each classification Class center, calculating the formula of classification can be:
Wherein, x(i)For i-th of " state-action " two tuple in professional driving data, ujFor j-th of cluster centre.
Updating the formula of the cluster centre of each classification can be:
Preferably, whether occur in the driving procedure of interaction experiment unexpected or whether complete to preset by detecting vehicle Test drive task, to determine whether current interaction experiment terminates, when vehicle occur in driving procedure it is unexpected or complete During into default test drive task, it is determined that current interactive task terminates, so as to obtain a series of sample three of Time Continuous Tuple.Specifically, occur unexpected may include to roll road away from, collide or fuel tank oil starvation etc. in driving procedure.
In step s 103, according to default oversampling ratio value, in each cluster that experience database clustering is obtained Training sample is equably gathered, and calculates the corresponding accumulative return value of each training sample.
In embodiments of the present invention, the uniform sampling in each cluster of experience database, to select representative instruction Practice sample and keep independent same distribution between training sample, so, can have when with these training sample Training strategy network models Effect improves the training effectiveness of tactful network model.Car status information, the action of maximal rewards value in each training sample With maximal rewards value, it can calculate and obtain corresponding accumulative return value, because each car status information is a upper vehicle shape State information and the result of action, therefore optimal strategy can be determined by accumulative return value.Accumulative return value is also tactful network mould The output of type.
Specifically, return value Q (s are added upt,at) r can be passed through0+γr12r2+ ... calculate, γ is parameter preset and 0≤γ <1。
In step S104, according to the accumulative return value and default depth of all training samples, each training sample Learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
In embodiments of the present invention, by the car status information S in training samplet, vehicle action message atAnd training sample Accumulative return value Q store into default data set, according to the data set and default deep learning algorithm, training vehicle from The dynamic decision networks model driven.Wherein, can be using elastic BP neural network (Rprop), Back Propagation Algorithm or length memory Algorithm (LSTM) even depth learning algorithm.
Preferably, experiment is repeatedly interacted, multiple training is carried out to decision-making network model, after once training terminates Whether detection experience database meets default constraints, when being unsatisfactory for, reject empirical data can in undesirable sample Notebook data, so that the empirical data that upgrades in time, improves the quality of training sample.
Specifically, constraints can be len (DSh) < μrms&&num(DSh) < Knum, wherein, DShFor experience database, Len () is used for calculating the number of sample data in experience database, and num () is used for counting the number of times of interaction experiment, μrmsFor sample The maximum quantity of notebook data, KnumFor the maximum times of interaction experiment.
Specifically, time gap that can be rule of thumb in data between adjacent sample data comes whether judgement sample data meet It is required that, when the time gap between current sample data and a upper sample data is less than pre-determined distance threshold value, it is determined that current sample Data are undesirable.
In embodiments of the present invention, it is corresponding most by the car status information at different tests moment, the car status information Big return value action and the maximal rewards value act the sample data in corresponding return value composition experience database, to all Sample data carry out clustering and to after clustering each cluster carry out uniform sampling, obtain be used for train vehicle automatic The training sample of driving strategy network model, and experience database is updated according to constraints after each training terminates, to pick Except undesirable sample data, so as to improve the representativeness of training sample, the dimension of training sample is reduced, and use Prize payouts and deep learning training decision networks model, are effectively improved training effectiveness, the study energy of decision networks model Power and generalization ability.
Can be with one of ordinary skill in the art will appreciate that realizing that all or part of step in above-described embodiment method is The hardware of correlation is instructed to complete by program, described program can be stored in a computer read/write memory medium, Described storage medium, such as ROM/RAM, disk, CD.
Embodiment two:
Fig. 2 shows the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention two is provided Structure, for convenience of description, illustrate only the part related to the embodiment of the present invention, including:
Sample generation module 21, for the car status information according to each experiment moment collection, the action of default vehicle Set and default prize payouts function, generation each experiment moment corresponding sample triple.
In embodiments of the present invention, after the car status information for collecting the current test moment, rewarded back according to default Value function is reported, the action of maximal rewards value, i.e. maximal rewards value can be obtained and move by traveling through to search in default vehicle set of actions Make, sample triple is combined into by the return value of car status information, the action of maximal rewards value and the action of maximal rewards value.
Cluster Analysis module 22, for being the sample in the experience database that builds in advance by all sample triple stores Data, and clustering is carried out to sample data all in experience database.
In embodiments of the present invention, by the disaggregated model initialized through default clustering algorithm and professional driving data, Calculate the classification belonging to each sample data (i.e. sample triple) so that all sample datas are assigned in corresponding cluster, so that Some inwardness or rule of sample in experience database are found by clustering, to reduce decision networks model training sample This dimension or quantity, reaches the purpose for simplifying training sample.
Train sampling module 23, for according to default oversampling ratio value, experience database clustering obtain it is every Training sample is equably gathered in individual cluster, and calculates the corresponding accumulative return value of each training sample.
In embodiments of the present invention, the uniform sampling in each cluster of experience database, to select representative instruction Practice sample and keep independent same distribution between training sample, so, can have when with these training sample Training strategy network models Effect improves the training effectiveness of tactful network model.Car status information, the action of maximal rewards value in each training sample With maximal rewards value, it can calculate and obtain corresponding accumulative return value, specifically, add up return value Q (st,at) r can be passed through0+γr12r2+ ... calculate, γ is parameter preset and 0≤γ<1.
Model generation module 24, for according to all training samples, the accumulative return value of each training sample and default Deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
In embodiments of the present invention, by the car status information S in training samplet, vehicle action message atAnd training sample Accumulative return value Q store into default data set, according to the data set and default deep learning algorithm, training vehicle from The dynamic decision networks model driven.
In embodiments of the present invention, it is corresponding most by the car status information at different tests moment, the car status information Big return value action and the maximal rewards value act the sample data in corresponding return value composition experience database, and pass through Clustering and uniform sampling choose representative training sample from sample data, pass through prize payouts and deep learning Training sample is trained and obtains decision networks model, so that by the simplifying of training sample, prize payouts and deep learning, It is effectively improved training effectiveness, learning ability and the generalization ability of decision networks model.
Embodiment three:
Fig. 3 shows the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention three is provided Structure, including:
Sample generation module 31, for the car status information according to each experiment moment collection, the action of default vehicle Set and default prize payouts function, generation each experiment moment corresponding sample triple.
In embodiments of the present invention, after the car status information for collecting the current test moment, rewarded back according to default Value function is reported, the action of maximal rewards value, i.e. maximal rewards value can be obtained and move by traveling through to search in default vehicle set of actions Make, sample triple is combined into by the return value of car status information, the action of maximal rewards value and the action of maximal rewards value.
Off-test module 32, occurs the unexpected or default examination of completion for that ought detect vehicle during test drive When testing driving task, terminate current interaction experiment, and obtain the corresponding sample ternary of each experiment moment in interactive process of the test Group.
In embodiments of the present invention, occur surprisingly may include to roll road away from, collide or fuel tank in driving procedure Oil starvation etc..
Cluster Analysis module 33, for being the sample in the experience database that builds in advance by all sample triple stores Data, and clustering is carried out to sample data all in experience database.
In embodiments of the present invention, in embodiments of the present invention, by through at the beginning of default clustering algorithm and professional driving data The good disaggregated model of beginningization, calculates the classification belonging to each sample data (i.e. sample triple) so that all sample datas point Into corresponding cluster, so that some inwardness or rule of sample in experience database are found by clustering, to reduce The dimension or quantity of decision networks model training sample, reach the purpose for simplifying training sample.
Train sampling module 34, for according to default oversampling ratio value, experience database clustering obtain it is every Training sample is equably gathered in individual cluster, and calculates the corresponding accumulative return value of each training sample.
In embodiments of the present invention, the uniform sampling in each cluster of experience database, to select representative instruction Practice sample and keep independent same distribution between training sample, so, can have when with these training sample Training strategy network models Effect improves the training effectiveness of tactful network model.Car status information, the action of maximal rewards value in each training sample With maximal rewards value, it can calculate and obtain corresponding accumulative return value.
Model generation module 35, for according to all training samples, the accumulative return value of each training sample and default Deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
In embodiments of the present invention, by the car status information S in training samplet, vehicle action message atAnd training sample Accumulative return value Q store into default data set, according to the data set and default deep learning algorithm, training vehicle from The dynamic decision networks model driven.
Experience update module 36, for detecting whether the experience database meets default constraints, when the warp When testing database and being unsatisfactory for the constraints, the sample data in the experience database is updated.
In embodiments of the present invention, experiment is repeatedly interacted, multiple training is carried out to decision-making network model, once Training terminate whether rear detection experience database meets default constraints, when being unsatisfactory for, reject empirical data can in be not inconsistent Desired sample data is closed, so that the empirical data that upgrades in time, improves the quality of training sample.
Specifically, constraints can be len (DSh) < μrms&&num(DSh) < Knum, wherein, DShFor experience database, Len () is used for calculating the number of sample data in experience database, and num () is used for counting the number of times of interaction experiment, μrmsFor sample The maximum quantity of notebook data, KnumFor the maximum times of interaction experiment.
Specifically, time gap that can be rule of thumb in data between adjacent sample data comes whether judgement sample data meet It is required that, when the time gap between current sample data and a upper sample data is less than pre-determined distance threshold value, it is determined that current sample Data are undesirable.
Preferably, sample generation module 31 includes vehicle-state acquisition module 311, vehicle action searching modul 312 and sample This generation submodule 313, wherein:
Vehicle-state acquisition module 311, the car status information for gathering the current test moment;
Vehicle acts searching modul 312, for the car status information according to the current test moment and prize payouts function, The action of maximal rewards value is searched in vehicle set of actions and order vehicle performs the maximal rewards value action;And
Sample generate submodule 313, for by the car status information at current test moment, maximal rewards value action and The return value of maximal rewards value action is combined, and obtains current test moment corresponding sample triple.
Preferably, Cluster Analysis module 33 includes disaggregated model initialization module 331 and classification generation module 332, wherein:
Disaggregated model initialization module 331, for according to default clustering algorithm and the professional driving data gathered in advance, Initialize the disaggregated model for carrying out clustering to experience database;And
Classification generation module 332, for being calculated by disaggregated model in experience database, vehicle-state is believed in sample data Classification belonging to breath, to obtain the classification of all sample datas in experience database.
In embodiments of the present invention, it is corresponding most by the car status information at different tests moment, the car status information Big return value action and the maximal rewards value act the sample data in corresponding return value composition experience database, and pass through Clustering and uniform sampling choose representative training sample from sample data, pass through prize payouts and deep learning Training sample is trained and obtains decision networks model, carrying out test of many times to decision-making network model repeatedly trains, and every Experience database is updated after secondary training, so as to by the simplifying of training sample, prize payouts and deep learning, be effectively improved Training effectiveness, learning ability and the generalization ability of decision networks model.
In embodiments of the present invention, each module for the generating means of the decision networks model of Vehicular automatic driving can be by Corresponding hardware or software module realize that each module can be independent soft and hardware module, can also be integrated into one it is soft, hard Part module, herein not to limit the present invention.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.

Claims (10)

1. the generation method of a kind of decision networks model for Vehicular automatic driving, it is characterised in that under methods described includes State step:
Car status information, default vehicle set of actions and the default prize payouts letter gathered according to each experiment moment Number, generates corresponding sample triple of each experiment moment;
It is the sample data in the experience database built in advance by all sample triple stores, and to the experience database In all sample data carry out clustering;
According to default oversampling ratio value, the equably collection training in each cluster that the experience database clustering is obtained Sample, and calculate the corresponding accumulative return value of each training sample;
According to all training samples, the accumulative return value of each training sample and default deep learning algorithm, training Obtain the decision networks model of Current vehicle automatic Pilot.
2. the method as described in claim 1, it is characterised in that training obtains the decision networks model of the Vehicular automatic driving The step of after, methods described also includes:
Detect whether the experience database meets default constraints, when the experience database is unsatisfactory for the constraint bar During part, the sample data in the experience database is updated.
3. the method as described in claim 1, it is characterised in that according to the car status information of each experiment moment collection, in advance If vehicle set of actions and default prize payouts function, generate each experiment moment corresponding sample triple Step, including:
Gather the car status information at current test moment;
According to the car status information at the current test moment and the prize payouts function, in the vehicle set of actions The action of maximal rewards value is searched, and the maximal rewards value is sent to the vehicle and is acted;
By the car status information at the current test moment, maximal rewards value action and maximal rewards value action Return value be combined, obtain current test moment corresponding sample triple.
4. the method as described in claim 1, it is characterised in that generate corresponding sample triple of each experiment moment After step, the step of be the sample data in the experience database that builds in advance by all sample triple stores before, institute Stating method also includes:
When detecting the vehicle in generation accident during test drive or the default test drive task of completion, terminate to work as Preceding interaction experiment, and obtain in the interactive process of the test corresponding sample triple of each experiment moment.
5. the method as described in claim 1, it is characterised in that gathered to sample data all in the experience database The step of alanysis, including:
According to default clustering algorithm and the professional driving data gathered in advance, initialize for being carried out to the experience database The disaggregated model of clustering;
Classification in the experience database in sample data belonging to car status information is calculated by the disaggregated model, with The classification of all sample datas into the experience database.
6. the generating means of a kind of decision networks model for Vehicular automatic driving, it is characterised in that described device includes:
Sample generation module, for the car status information according to each experiment moment collection, default vehicle set of actions with And default prize payouts function, generate corresponding sample triple of each experiment moment;
Cluster Analysis module, for being the sample data in the experience database that builds in advance by all sample triple stores, And clustering is carried out to sample data all in the experience database;
Train sampling module, for according to default oversampling ratio value, the experience database clustering obtain it is each Training sample is equably gathered in cluster, and calculates the corresponding accumulative return value of each training sample;And
Model generation module, for according to all training samples, the accumulative return value of each training sample and default Deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
7. device as claimed in claim 6, it is characterised in that described device also includes:
Experience update module, for detecting whether the experience database meets default constraints, when the empirical data When storehouse is unsatisfactory for the constraints, the sample data in the experience database is updated.
8. device as claimed in claim 6, it is characterised in that the sample generation module includes:
Vehicle-state acquisition module, the car status information for gathering the current test moment;
Vehicle acts searching modul, for the car status information according to the current test moment and the prize payouts letter Number, maximal rewards value is searched in the vehicle set of actions and acts and orders the vehicle execution maximal rewards value to be moved Make;And
Sample generate submodule, for by the car status information at the current test moment, the maximal rewards value action with And the return value of the maximal rewards value action is combined, and obtains the current test moment corresponding sample triple.
9. device as claimed in claim 6, it is characterised in that described device also includes:
Off-test module, occurs the unexpected or default experiment of completion for that ought detect the vehicle during test drive During driving task, terminate current interaction experiment, and obtain the corresponding sample three of each experiment moment in the interactive process of the test Tuple.
10. device as claimed in claim 6, it is characterised in that the Cluster Analysis module includes:
Disaggregated model initialization module, for according to default clustering algorithm and the professional driving data gathered in advance, initialization Disaggregated model for carrying out clustering to the experience database;And
Classification generation module, for being calculated by the disaggregated model in the experience database, vehicle-state is believed in sample data Classification belonging to breath, to obtain the classification of all sample datas in the experience database.
CN201710201086.8A 2017-03-30 2017-03-30 Method and device for generating decision network model for automatic vehicle driving Active CN107169567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710201086.8A CN107169567B (en) 2017-03-30 2017-03-30 Method and device for generating decision network model for automatic vehicle driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710201086.8A CN107169567B (en) 2017-03-30 2017-03-30 Method and device for generating decision network model for automatic vehicle driving

Publications (2)

Publication Number Publication Date
CN107169567A true CN107169567A (en) 2017-09-15
CN107169567B CN107169567B (en) 2020-04-07

Family

ID=59849244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710201086.8A Active CN107169567B (en) 2017-03-30 2017-03-30 Method and device for generating decision network model for automatic vehicle driving

Country Status (1)

Country Link
CN (1) CN107169567B (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544516A (en) * 2017-10-11 2018-01-05 苏州大学 Automated driving system and method based on relative entropy depth against intensified learning
CN107826105A (en) * 2017-10-31 2018-03-23 清华大学 Translucent automatic Pilot artificial intelligence system and vehicle
CN107862346A (en) * 2017-12-01 2018-03-30 驭势科技(北京)有限公司 A kind of method and apparatus for carrying out driving strategy model training
CN108647789A (en) * 2018-05-15 2018-10-12 浙江大学 A kind of intelligent body deep value function learning method based on the sampling of state distributed awareness
CN108932840A (en) * 2018-07-17 2018-12-04 北京理工大学 Automatic driving vehicle urban intersection passing method based on intensified learning
CN108944944A (en) * 2018-07-09 2018-12-07 深圳市易成自动驾驶技术有限公司 Automatic Pilot model training method, terminal and readable storage medium storing program for executing
CN108995655A (en) * 2018-07-06 2018-12-14 北京理工大学 A kind of driver's driving intention recognition methods and system
CN109344969A (en) * 2018-11-01 2019-02-15 石家庄创天电子科技有限公司 Nerve network system and its training method and computer-readable medium
CN109739216A (en) * 2019-01-25 2019-05-10 深圳普思英察科技有限公司 The test method and system of the practical drive test of automated driving system
CN109747655A (en) * 2017-11-07 2019-05-14 北京京东尚科信息技术有限公司 Steering instructions generation method and device for automatic driving vehicle
CN109752952A (en) * 2017-11-08 2019-05-14 华为技术有限公司 Method and device for acquiring multi-dimensional random distribution and strengthening controller
CN109871010A (en) * 2018-12-25 2019-06-11 南方科技大学 method and system based on reinforcement learning
CN109901446A (en) * 2017-12-08 2019-06-18 广州汽车集团股份有限公司 Controlling passing of road junction, apparatus and system
CN109934171A (en) * 2019-03-14 2019-06-25 合肥工业大学 Driver's passiveness driving condition online awareness method based on layered network model
CN110160804A (en) * 2019-05-31 2019-08-23 中国科学院深圳先进技术研究院 A kind of test method of automatic driving vehicle, apparatus and system
CN110196587A (en) * 2018-02-27 2019-09-03 中国科学院深圳先进技术研究院 Vehicular automatic driving control strategy model generating method, device, equipment and medium
CN110363295A (en) * 2019-06-28 2019-10-22 电子科技大学 A kind of intelligent vehicle multilane lane-change method based on DQN
CN110378460A (en) * 2018-04-13 2019-10-25 北京智行者科技有限公司 Decision-making technique
CN110478911A (en) * 2019-08-13 2019-11-22 苏州钛智智能科技有限公司 The unmanned method of intelligent game vehicle and intelligent vehicle, equipment based on machine learning
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110738221A (en) * 2018-07-18 2020-01-31 华为技术有限公司 operation system and method
CN110764496A (en) * 2018-07-09 2020-02-07 株式会社日立制作所 Automatic driving assistance device and method thereof
CN110824912A (en) * 2018-08-08 2020-02-21 华为技术有限公司 Method and apparatus for training a control strategy model for generating an autonomous driving strategy
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
CN110991095A (en) * 2020-03-05 2020-04-10 北京三快在线科技有限公司 Training method and device for vehicle driving decision model
CN111091020A (en) * 2018-10-22 2020-05-01 百度在线网络技术(北京)有限公司 Automatic driving state distinguishing method and device
CN111325230A (en) * 2018-12-17 2020-06-23 上海汽车集团股份有限公司 Online learning method and online learning device of vehicle lane change decision model
CN111426933A (en) * 2020-05-19 2020-07-17 浙江巨磁智能技术有限公司 Safety type power electronic module and safety detection method thereof
CN111443621A (en) * 2020-06-16 2020-07-24 深圳市城市交通规划设计研究中心股份有限公司 Model generation method, model generation device and electronic equipment
CN111899594A (en) * 2019-05-06 2020-11-06 百度(美国)有限责任公司 Automated training data extraction method for dynamic models of autonomous vehicles
CN112100787A (en) * 2019-05-28 2020-12-18 顺丰科技有限公司 Vehicle motion prediction method, device, electronic device, and storage medium
CN112201070A (en) * 2020-09-29 2021-01-08 上海交通大学 Deep learning-based automatic driving expressway bottleneck section behavior decision method
CN112924177A (en) * 2021-04-02 2021-06-08 哈尔滨理工大学 Rolling bearing fault diagnosis method for improved deep Q network
CN113511222A (en) * 2021-08-27 2021-10-19 清华大学 Scene self-adaptive vehicle interactive behavior decision and prediction method and device
CN113807503A (en) * 2021-09-28 2021-12-17 中国科学技术大学先进技术研究院 Autonomous decision making method, system, device and terminal suitable for intelligent automobile
CN114332500A (en) * 2021-09-14 2022-04-12 腾讯科技(深圳)有限公司 Image processing model training method and device, computer equipment and storage medium
CN114624645A (en) * 2022-03-10 2022-06-14 扬州宇安电子科技有限公司 Miniature rotor unmanned aerial vehicle radar reconnaissance system based on micro antenna array
CN114880938A (en) * 2022-05-16 2022-08-09 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN116757272A (en) * 2023-07-03 2023-09-15 西湖大学 Continuous motion control reinforcement learning framework and learning method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102109821A (en) * 2010-12-30 2011-06-29 中国科学院自动化研究所 System and method for controlling adaptive cruise of vehicles
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive cruise control method based on approximate policy iteration
CN105109485A (en) * 2015-08-24 2015-12-02 奇瑞汽车股份有限公司 Driving method and system
CN106295637A (en) * 2016-07-29 2017-01-04 电子科技大学 A kind of vehicle identification method based on degree of depth study with intensified learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102109821A (en) * 2010-12-30 2011-06-29 中国科学院自动化研究所 System and method for controlling adaptive cruise of vehicles
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive cruise control method based on approximate policy iteration
CN105109485A (en) * 2015-08-24 2015-12-02 奇瑞汽车股份有限公司 Driving method and system
CN106295637A (en) * 2016-07-29 2017-01-04 电子科技大学 A kind of vehicle identification method based on degree of depth study with intensified learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEI XIA ET AL: "A Control Strategy of Autonomous Vehicles Based on Deep Reinforcement Learning", 《2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN》 *
毛喆: "机动车疲劳驾驶行为识别方法研究", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 *

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107544516A (en) * 2017-10-11 2018-01-05 苏州大学 Automated driving system and method based on relative entropy depth against intensified learning
CN107826105A (en) * 2017-10-31 2018-03-23 清华大学 Translucent automatic Pilot artificial intelligence system and vehicle
CN109747655A (en) * 2017-11-07 2019-05-14 北京京东尚科信息技术有限公司 Steering instructions generation method and device for automatic driving vehicle
CN109747655B (en) * 2017-11-07 2021-10-15 北京京东乾石科技有限公司 Driving instruction generation method and device for automatic driving vehicle
CN109752952B (en) * 2017-11-08 2022-05-13 华为技术有限公司 Method and device for acquiring multi-dimensional random distribution and strengthening controller
CN109752952A (en) * 2017-11-08 2019-05-14 华为技术有限公司 Method and device for acquiring multi-dimensional random distribution and strengthening controller
CN107862346A (en) * 2017-12-01 2018-03-30 驭势科技(北京)有限公司 A kind of method and apparatus for carrying out driving strategy model training
CN107862346B (en) * 2017-12-01 2020-06-30 驭势科技(北京)有限公司 Method and equipment for training driving strategy model
CN109901446A (en) * 2017-12-08 2019-06-18 广州汽车集团股份有限公司 Controlling passing of road junction, apparatus and system
US11348455B2 (en) 2017-12-08 2022-05-31 Guangzhou Automobile Group Co., Ltd. Intersection traffic control method, apparatus and system
CN109901446B (en) * 2017-12-08 2020-07-07 广州汽车集团股份有限公司 Intersection passage control method, device and system
CN110196587A (en) * 2018-02-27 2019-09-03 中国科学院深圳先进技术研究院 Vehicular automatic driving control strategy model generating method, device, equipment and medium
CN110378460B (en) * 2018-04-13 2022-03-08 北京智行者科技有限公司 Decision making method
CN110378460A (en) * 2018-04-13 2019-10-25 北京智行者科技有限公司 Decision-making technique
CN108647789B (en) * 2018-05-15 2022-04-19 浙江大学 Intelligent body depth value function learning method based on state distribution sensing sampling
CN108647789A (en) * 2018-05-15 2018-10-12 浙江大学 A kind of intelligent body deep value function learning method based on the sampling of state distributed awareness
CN108995655A (en) * 2018-07-06 2018-12-14 北京理工大学 A kind of driver's driving intention recognition methods and system
CN110764496A (en) * 2018-07-09 2020-02-07 株式会社日立制作所 Automatic driving assistance device and method thereof
CN110764496B (en) * 2018-07-09 2023-10-17 株式会社日立制作所 Automatic driving assistance device and method thereof
CN108944944A (en) * 2018-07-09 2018-12-07 深圳市易成自动驾驶技术有限公司 Automatic Pilot model training method, terminal and readable storage medium storing program for executing
CN108932840A (en) * 2018-07-17 2018-12-04 北京理工大学 Automatic driving vehicle urban intersection passing method based on intensified learning
CN110738221B (en) * 2018-07-18 2024-04-26 华为技术有限公司 Computing system and method
CN110738221A (en) * 2018-07-18 2020-01-31 华为技术有限公司 operation system and method
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
CN110850861B (en) * 2018-07-27 2023-05-23 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane-changing depth reinforcement learning
CN110824912A (en) * 2018-08-08 2020-02-21 华为技术有限公司 Method and apparatus for training a control strategy model for generating an autonomous driving strategy
CN111091020A (en) * 2018-10-22 2020-05-01 百度在线网络技术(北京)有限公司 Automatic driving state distinguishing method and device
CN109344969B (en) * 2018-11-01 2022-04-08 石家庄创天电子科技有限公司 Neural network system, training method thereof, and computer-readable medium
CN109344969A (en) * 2018-11-01 2019-02-15 石家庄创天电子科技有限公司 Nerve network system and its training method and computer-readable medium
CN111325230A (en) * 2018-12-17 2020-06-23 上海汽车集团股份有限公司 Online learning method and online learning device of vehicle lane change decision model
CN111325230B (en) * 2018-12-17 2023-09-12 上海汽车集团股份有限公司 Online learning method and online learning device for vehicle lane change decision model
CN109871010A (en) * 2018-12-25 2019-06-11 南方科技大学 method and system based on reinforcement learning
CN109739216A (en) * 2019-01-25 2019-05-10 深圳普思英察科技有限公司 The test method and system of the practical drive test of automated driving system
CN109934171B (en) * 2019-03-14 2020-03-17 合肥工业大学 Online perception method for passive driving state of driver based on hierarchical network model
CN109934171A (en) * 2019-03-14 2019-06-25 合肥工业大学 Driver's passiveness driving condition online awareness method based on layered network model
US11704554B2 (en) 2019-05-06 2023-07-18 Baidu Usa Llc Automated training data extraction method for dynamic models for autonomous driving vehicles
CN111899594A (en) * 2019-05-06 2020-11-06 百度(美国)有限责任公司 Automated training data extraction method for dynamic models of autonomous vehicles
CN112100787B (en) * 2019-05-28 2023-12-08 深圳市丰驰顺行信息技术有限公司 Vehicle motion prediction method, device, electronic equipment and storage medium
CN112100787A (en) * 2019-05-28 2020-12-18 顺丰科技有限公司 Vehicle motion prediction method, device, electronic device, and storage medium
CN110160804A (en) * 2019-05-31 2019-08-23 中国科学院深圳先进技术研究院 A kind of test method of automatic driving vehicle, apparatus and system
CN110160804B (en) * 2019-05-31 2020-07-31 中国科学院深圳先进技术研究院 Test method, device and system for automatically driving vehicle
CN110363295A (en) * 2019-06-28 2019-10-22 电子科技大学 A kind of intelligent vehicle multilane lane-change method based on DQN
CN110478911A (en) * 2019-08-13 2019-11-22 苏州钛智智能科技有限公司 The unmanned method of intelligent game vehicle and intelligent vehicle, equipment based on machine learning
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110673602B (en) * 2019-10-24 2022-11-25 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110991095A (en) * 2020-03-05 2020-04-10 北京三快在线科技有限公司 Training method and device for vehicle driving decision model
CN110991095B (en) * 2020-03-05 2020-07-03 北京三快在线科技有限公司 Training method and device for vehicle driving decision model
CN111426933A (en) * 2020-05-19 2020-07-17 浙江巨磁智能技术有限公司 Safety type power electronic module and safety detection method thereof
CN111443621A (en) * 2020-06-16 2020-07-24 深圳市城市交通规划设计研究中心股份有限公司 Model generation method, model generation device and electronic equipment
CN111443621B (en) * 2020-06-16 2020-10-27 深圳市城市交通规划设计研究中心股份有限公司 Model generation method, model generation device and electronic equipment
CN112201070A (en) * 2020-09-29 2021-01-08 上海交通大学 Deep learning-based automatic driving expressway bottleneck section behavior decision method
CN112924177A (en) * 2021-04-02 2021-06-08 哈尔滨理工大学 Rolling bearing fault diagnosis method for improved deep Q network
CN113511222A (en) * 2021-08-27 2021-10-19 清华大学 Scene self-adaptive vehicle interactive behavior decision and prediction method and device
CN113511222B (en) * 2021-08-27 2023-09-26 清华大学 Scene self-adaptive vehicle interaction behavior decision and prediction method and device
CN114332500A (en) * 2021-09-14 2022-04-12 腾讯科技(深圳)有限公司 Image processing model training method and device, computer equipment and storage medium
CN113807503B (en) * 2021-09-28 2024-02-09 中国科学技术大学先进技术研究院 Autonomous decision making method, system, device and terminal suitable for intelligent automobile
CN113807503A (en) * 2021-09-28 2021-12-17 中国科学技术大学先进技术研究院 Autonomous decision making method, system, device and terminal suitable for intelligent automobile
CN114624645B (en) * 2022-03-10 2022-09-30 扬州宇安电子科技有限公司 Miniature rotor unmanned aerial vehicle radar reconnaissance system based on micro antenna array
CN114624645A (en) * 2022-03-10 2022-06-14 扬州宇安电子科技有限公司 Miniature rotor unmanned aerial vehicle radar reconnaissance system based on micro antenna array
CN114880938A (en) * 2022-05-16 2022-08-09 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN116757272A (en) * 2023-07-03 2023-09-15 西湖大学 Continuous motion control reinforcement learning framework and learning method

Also Published As

Publication number Publication date
CN107169567B (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN107169567A (en) The generation method and device of a kind of decision networks model for Vehicular automatic driving
Li et al. Humanlike driving: Empirical decision-making system for autonomous vehicles
CN107229973A (en) The generation method and device of a kind of tactful network model for Vehicular automatic driving
CN109709956B (en) Multi-objective optimized following algorithm for controlling speed of automatic driving vehicle
CN103364006B (en) For determining the system and method for vehicle route
CN106991251B (en) Cellular machine simulation method for highway traffic flow
CN109466543A (en) Plan autokinetic movement
CN113044064B (en) Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN110196587A (en) Vehicular automatic driving control strategy model generating method, device, equipment and medium
CN107310550A (en) Road vehicles travel control method and device
Scheel et al. Situation assessment for planning lane changes: Combining recurrent models and prediction
Bayar et al. Impact of different spacing policies for adaptive cruise control on traffic and energy consumption of electric vehicles
CN115601954B (en) Lane change judgment method, device, equipment and medium for intelligent networked fleet
CN113715842B (en) High-speed moving vehicle control method based on imitation learning and reinforcement learning
Koenig et al. Bridging the gap between open loop tests and statistical validation for highly automated driving
CN117668413A (en) Automatic driving comprehensive decision evaluation method and device considering multiple types of driving elements
Jia et al. An LSTM-based speed predictor based on traffic simulation data for improving the performance of energy-optimal adaptive cruise control
Wen et al. Modeling human driver behaviors when following autonomous vehicles: An inverse reinforcement learning approach
CN108839655A (en) A kind of cooperating type self-adaptation control method based on minimum safe spacing
CN114954498A (en) Reinforced learning lane change behavior planning method and system based on simulated learning initialization
Jebessa et al. Analysis of reinforcement learning in autonomous vehicles
CN115096305A (en) Intelligent driving automobile path planning system and method based on generation of countermeasure network and simulation learning
Mao et al. Deep learning based vehicle position estimation for human drive vehicle at connected freeway
Tang et al. Research on decision-making of lane-changing of automated vehicles in highway confluence area based on deep reinforcement learning
Zhang et al. Lane Change Decision Algorithm Based on Deep Q Network for Autonomous Vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant