CN109255442B - Training method, device and readable medium for control decision module based on artificial intelligence - Google Patents

Training method, device and readable medium for control decision module based on artificial intelligence Download PDF

Info

Publication number
CN109255442B
CN109255442B CN201811132192.6A CN201811132192A CN109255442B CN 109255442 B CN109255442 B CN 109255442B CN 201811132192 A CN201811132192 A CN 201811132192A CN 109255442 B CN109255442 B CN 109255442B
Authority
CN
China
Prior art keywords
intervention
intelligent equipment
data
decision module
control decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811132192.6A
Other languages
Chinese (zh)
Other versions
CN109255442A (en
Inventor
王凡
周波
陈科
来杰
周古月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811132192.6A priority Critical patent/CN109255442B/en
Publication of CN109255442A publication Critical patent/CN109255442A/en
Application granted granted Critical
Publication of CN109255442B publication Critical patent/CN109255442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Feedback Control In General (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention provides a training method, equipment and a readable medium for a control decision module based on artificial intelligence. The method comprises the following steps: acquiring intervention data of intelligent equipment in a field test scene; and training a control decision module in the intelligent equipment according to the intervention data of the intelligent equipment. The training method is an intervention learning process, and the control decision module can be trained more effectively through the intervention learning, so that the control and decision capability of the control decision module in the intelligent equipment is improved, and the intelligence of the control decision module is enhanced.

Description

Training method, device and readable medium for control decision module based on artificial intelligence
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of computer application, in particular to a training method, equipment and a readable medium for a control decision module based on artificial intelligence.
[ background of the invention ]
Artificial Intelligence (AI) is a new technical science for researching and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others.
With the development of artificial intelligence, many intelligent devices need to use a control decision module, and the control decision module is trained to learn to perform hardware control and decision on the intelligent device. For example, in intelligent devices such as unmanned aerial vehicles, unmanned vehicles, and robots, control decision modules for controlling learning through decision are provided. In addition, the control decision module in the prior art has two main schemes for realizing hardware control and decision: one is classical control, the control signal is obtained by physical modeling and precise calculation, or mathematical model; the other is intelligent control, which is enhanced by learning human operation or directly from feedback signals. The Learning method corresponding to the former is usually Supervised Learning (referred Learning), and the Learning method corresponding to the latter is usually enhanced semi-Supervised Learning (referred Learning). The former has a great disadvantage in application of relying on expensive expert data, which in addition to being expensive to acquire, has a problem in that expert data often cannot cover all the required state space, and once a state that is not in the training data occurs, the control may fail and be very unstable. The reinforcement learning is more effective in practical application, and can be learned autonomously and more stably. However, in some hardware, such as unmanned vehicles, a great obstacle exists in the control of unmanned vehicles, namely the problem of training cost. Typically, reinforcement learning requires passing through failure and learning from these experiences. Taking the case of unmanned aerial vehicle learning to avoid obstacles, during the training process, the unmanned aerial vehicle needs to learn failure experience through collision, and the cost is generally unacceptable.
Based on the above, it can be known that the reinforcement learning mode training control decision module in the prior art cannot be realized in practical application; the generalization capability of the training mode of supervised learning is weak, so that the control decision module in the trained intelligent device cannot respond to the training and control failure occurs when the training mode is in a state other than the training state, and the intelligence of the control decision module of the intelligent device is poor.
[ summary of the invention ]
The invention provides a training method, equipment and a readable medium for a control decision module based on artificial intelligence, which are used for improving the intelligence of the control decision module of intelligent equipment.
The invention provides a training method of a control decision module based on artificial intelligence, wherein the control decision module is arranged in intelligent equipment, and the method comprises the following steps:
acquiring intervention data of the intelligent equipment in a field test scene;
and training the control decision module in the intelligent equipment according to the intervention data of the intelligent equipment.
Further optionally, in the method described above, in a field test scenario, collecting intervention data of the smart device specifically includes:
and in the field test scene, acquiring the intervention data of the intelligent equipment corresponding to the intervention operation of an operator on the intelligent equipment.
Further optionally, in the method as described above, the intervention data includes status data of the smart device at the time of intervention, and output signals and/or status data of the smart device in response to the intervention operation.
Further optionally, in the method described above, in a field test scenario, collecting intervention data of the smart device specifically includes:
in a field test scene, acquiring intervention data generated by the intelligent equipment according to a preset guarantee rule.
Further optionally, in the method as described above, the intervention data includes intervention conditions in the assurance rules, and output signals and/or status data of the smart device in response to the intervention conditions.
Further optionally, in the method, training the control decision module in the smart device according to the intervention data of the smart device specifically includes:
and training the control decision module in the intelligent equipment by adopting a reinforcement learning training mode or a supervision learning training mode according to the intervention data of the intelligent equipment.
The invention provides a training device of a control decision module based on artificial intelligence, wherein the control decision module is arranged in intelligent equipment, and the training device comprises:
the acquisition module is used for acquiring intervention data of the intelligent equipment in a field test scene;
and the training module is used for training the control decision module in the intelligent equipment according to the intervention data of the intelligent equipment.
Further optionally, in the apparatus as described above, the acquisition module is specifically configured to:
and in the field test scene, acquiring the corresponding intervention data of the intelligent equipment when an operator performs intervention operation on the intelligent equipment.
Further optionally, in the apparatus as described above, the intervention data includes state data of the smart device while the smart device is being intervened, and output signals and/or state data of the smart device in response to the intervention operation.
Further optionally, in the apparatus as described above, the acquisition module is specifically configured to:
in a field test scene, acquiring intervention data generated by the intelligent equipment according to a preset guarantee rule.
Further optionally, in the apparatus as described above, the intervention data includes intervention conditions in the assurance rules, and output signals and/or status data of the smart devices in response to the intervention conditions.
Further optionally, in the apparatus as described above, the training module is specifically configured to:
and training the control decision module in the intelligent equipment by adopting a reinforcement learning training mode or a supervision learning training mode according to the intervention data of the intelligent equipment.
The present invention also provides a computer apparatus, the apparatus comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method of training an artificial intelligence based control decision module as described above.
The present invention also provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the artificial intelligence based control decision module training method as described above.
The training method, the training equipment and the readable medium of the control decision module based on artificial intelligence acquire intervention data of intelligent equipment in a field test scene; and training a control decision module in the intelligent equipment according to the intervention data of the intelligent equipment. The training method is an intervention learning process, and the control decision module can be trained more effectively through the intervention learning, so that the control and decision capability of the control decision module in the intelligent equipment is improved, and the intelligence of the control decision module is enhanced.
[ description of the drawings ]
FIG. 1 is a flowchart of an embodiment of a method for training an artificial intelligence-based control decision module according to the present invention.
Fig. 2 is a schematic diagram of controlling an intelligent device by using the artificial intelligence-based control decision module according to this embodiment.
FIG. 3 is an exemplary diagram of the expert intervention scenario of the present invention.
FIG. 4 is a block diagram of an embodiment of an artificial intelligence based control decision module training apparatus of the present invention.
FIG. 5 is a block diagram of an embodiment of a computer device of the present invention.
Fig. 6 is an exemplary diagram of a computer device provided by the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
In the field test scene of intelligent equipment such as an unmanned vehicle or an unmanned aerial vehicle, a type of relatively important data, namely test data, also exists. Such test data is typically collected in the presence of human monitoring and intervention. Such as drive test data for an unmanned vehicle. Such test data are characterized by: most of the time of the intelligent equipment depends on the control decision module to autonomously decide and execute, but an operator monitors the state at any time, and once a problem occurs, the operator performs intervention (intervention). Such test data is typically present in large quantities in real engineering, but is rarely used efficiently. The invention provides a training scheme of a control decision module based on artificial intelligence, which is convenient for providing test data based on the actual scene. In this scenario, a special reinforcement Learning may be performed on such data, such as may be referred to as "interference From interference" (LFI). The purpose of such learning is to effectively utilize the test data to strengthen the control decision module of the intelligent device in return, so that the control decision module of the intelligent device can be more perfect in the test process.
FIG. 1 is a flowchart of an embodiment of a training method for an artificial intelligence based control decision module according to the present invention. As shown in fig. 1, the training method of the control decision module based on artificial intelligence in this embodiment may specifically include the following steps:
100. acquiring intervention data of intelligent equipment in a field test scene;
101. and training a control decision module in the intelligent equipment according to the intervention data of the intelligent equipment.
The execution subject of the training method of the control decision module based on artificial intelligence of the embodiment can be a training device of the control decision module based on artificial intelligence, and the control decision module is arranged in the intelligent device. The training device can also be arranged in the intelligent equipment to realize the training of a control decision module in the intelligent equipment.
In particular, in practical applications, in order to effectively test the smart device, the smart device needs to be placed in the field and tested in the field. Before field testing, the control decision module in the intelligent device can be trained in a supervised learning manner, or can be trained in a simulated environment in a reinforcement learning manner. At this time, the control decision module in the smart device may be a warm start. Therefore, in a field test scene, the control decision module of the intelligent equipment can be used for controlling the intelligent equipment to operate, the expert monitors the whole test process, and when the expert finds that the intelligent equipment has an undesirable condition, the expert can intervene in the control of the control decision module of the intelligent equipment to influence the operation of the intelligent equipment. Or in a field test scene, for example, in the field test scene of the unmanned vehicle, a tester can be seated on the unmanned vehicle, and in the test process, for example, when the unmanned vehicle avoids an obstacle or does not comply with the traffic rules, the tester can intervene in the decision control module of the unmanned vehicle to successfully avoid the obstacle or comply with the traffic rules. Or, in the intelligent device, some preset guarantee rules that guarantee normal operation of the intelligent device may also be stored, for example, for an unmanned vehicle, the preset guarantee rules may be: when an obstacle exists in the surrounding preset distance range, the speed cannot exceed a preset speed threshold value, or when the obstacle in front of 50 meters is detected, the vehicle should be decelerated, and the like. Therefore, when the intelligent equipment runs according to the control decision module, if the current state is detected to be just in accordance with the intervention condition in the guarantee rule, the intelligent equipment is intervened to run according to the guarantee rule.
Correspondingly, the step 100 may specifically include the following two cases:
in the first case, in a field test scenario, intervention data of the corresponding intelligent device is collected when an operator performs an intervention operation on the intelligent device.
The operator in the first case may be an expert in the above embodiment, or may be a tester of the intelligent device. The intervention data of the smart device in this case may include state data at the time of the intervention of the smart device, and output signals and/or state data of the smart device in response to the intervention operation.
For example, the state data of the intelligent device can be the speed, the position and the like of the intelligent device when the intelligent device is interfered. The data can be acquired by a speedometer, a camera, a radar, various sensors and the like installed on the intelligent equipment. The output signal of the intelligent device responding to the intervention operation can be a control signal output when the intelligent device is intervened, such as a brake signal, a steering signal and the like. The state data of the intelligent device responding to the intervention operation can identify the state of the intelligent device after responding to the intervention operation, such as the state that the intelligent device is subjected to the intervention and the speed of the intelligent device is reduced to 0 or other numerical values, or the intelligent device is subjected to the intervention and the lane is changed or other states, and the like.
Specifically, after the intervention data of the intelligent device is collected, according to the intervention data of the intelligent device, a process of training a control decision module in the intelligent device may be referred to as intervention learning on the control decision module, and the intervention learning process may be based on supervised learning or based on reinforcement learning.
For example, state data when the intelligent device intervenes and an output signal of the intelligent device responding to the intervention operation can be used as data of supervised learning, and the control decision module can be subjected to the intervention learning by adopting a training mode of the supervised learning. Or the state data when the intelligent equipment intervenes and the state data of the intelligent equipment responding to the intervention operation can be used as the data of reinforcement learning, and the control decision module is intervened and learned by adopting a reinforcement learning training mode. In practical application, the process of intervening learning can simultaneously include the training mode of supervised learning and the training mode of reinforcement learning, so that the control decision module can be trained more effectively, the control and decision capability of the control decision module in the intelligent device can be improved, and the intelligence of the control decision module can be enhanced.
In the second case, in the field test scenario, the intervention data generated by the intelligent device according to the preset assurance rules is collected. Specifically, when the state data of the intelligent device reaches the intervention condition in the guarantee rule, the intelligent device is intervened, so that the intervention data of the intelligent device can be collected.
The corresponding intervention data may then comprise the intervention conditions in the assurance rules, as well as output signals of the smart device in response to the intervention conditions and/or status data of the smart device in response to the intervention conditions. The intervention condition in the assurance rule may specifically be state data that satisfies the intervention to the intelligent device.
Similarly, after the intervention data of the intelligent equipment is collected, the intervention conditions in the guarantee rule and the output signals of the intelligent equipment responding to the intervention conditions can be used as supervised learning data, and the control decision module is subjected to intervention learning by adopting a supervised learning training mode. Or the intervention condition in the guarantee rule and the state data of the intelligent equipment responding to the intervention condition can be used as the reinforcement learning data, and the reinforcement learning training mode is adopted to perform the intervention learning on the control decision module. Similarly, in practical application, the process of intervention learning may include the above training mode of supervised learning and the training mode of reinforcement learning at the same time, so as to enhance the intelligence of the control decision module.
In addition, before the intelligent device is tested in the field test scene, no training is needed, and at this time, the control decision module in the intelligent device can be cold-started. And the control decision module is mainly trained through the process of the intervention learning. This process requires more intervention data to be collected for learning than for the hot start described above.
Based on the above, it can be known that the intelligent device can receive three aspects of control in the operation process, control of the control decision module, control of the preset guarantee rule, and control of the operator. In normal driving, the preset guarantee rules and the control of operators cannot be triggered, and the control is mainly controlled by a control decision module. And when any one of the preset guarantee rule and the control of the operator is triggered, the control right of the control decision module is captured by the control of the preset guarantee rule or the control of the operator. And if the control of the preset guarantee rule and the control of the operator are triggered simultaneously, the priority of the control of the operator is greater than the priority of the control of the preset guarantee rule.
By s t To indicate the current state of the intelligent device, s t Some data indexes (such as speed instruments and positions) of current system sensors of the intelligent device and raw data acquired by a camera, a radar, ultrasound and the like can be included, and a time step is recorded by t, wherein the time step is a period of observation sampling or control. The control system output control signal of the smart device may be denoted as u t =f(s t (ii) a θ) where u) t May include the speed and attitude of the aircraft's rotors, as well as steering wheel rotation, throttle/brake amplitude, etc. of the unmanned vehicle. Theta is some of the parameters used to adjust the control system of the smart device, and f is the manual bankIn some forms.
In general, in the classical control, f is based on some classical mathematical models, or artificial experiences and rules, such as PID control. In the intelligent control, the processes of supervised learning and reinforcement learning are respectively explained. For example, in supervised learning, control is first performed by an expert, such as driving a vehicle or operating an aircraft over a distance where an obstacle is present. During the process, data pairs of states and expert operations can be collected
Figure BDA0001813832870000081
Wherein
Figure BDA0001813832870000082
The symbol "medium" - "indicates data corresponding to the expert operation. Fitting f by the collected data such that u t As close as possible to
Figure BDA0001813832870000083
Thus, an objective function is set up
Figure BDA0001813832870000084
The objective function L (θ) is generally called the mean square error, and other errors are not listed.
In reinforcement learning, expert operation is generally not necessary, but the system is completely self-exploring, and in the process of self-operation, feedback (reward) is generally determined by the currently observed state and certain rules. Feedback is usually based on the target to be achieved, for example, negative feedback is obtained when a collision occurs, positive feedback is obtained when the target point is reached, and negative feedback can be used when energy consumption is high (such as rapid acceleration and rapid deceleration). Recording(s) each time step during autonomous operation of the intelligent device t ,u t ,s t+1 ,r t ) Wherein r is t Is feedback of the current time step, and then is learned through a training mode of reinforcement learning. The techniques for reinforcement learning are of many kinds, and are continuousControl problem (u) t In continuous space), it is a broad class of methods that can be used to generate Policy gradients (Policy Gradient), the simplest being to generate a Policy by the following formula:
u t =f(s t ,θ)+∈
where e is the additional noise used for exploration, while the parameter θ is optimized by the following equation:
Figure BDA0001813832870000091
wherein T represents the time length of the whole fragment epicode. In addition, Policy Gradient can also be replaced by strategies such as DDPG, TRPO and the like, which are not listed here.
Fig. 2 is a schematic diagram of the control decision module based on artificial intelligence according to the embodiment for controlling an intelligent device. As shown in fig. 2, the control decision module based on artificial intelligence is represented by Agent in this embodiment. Wherein s is t The representation of the information generated by the observer mainly refers to the information input of the sensor to start the work of the control decision module based on artificial intelligence. f(s) t And theta) is the action output when the Agent does not intervene. Sensor-based information input s t After the control decision module of the smart device is started, the intervention signal intervention control decision module may be further led to output of the intervention signal intervention control decision module in this embodiment. Specifically, the Merge combines the action output when the Agent is not intervened with the control signal intervened by the expert, and finally obtains the control signal u actually received by the intelligent equipment t To act in a Controller (Controller) corresponding to the smart device.
As shown in FIG. 2, in the intervening learning, a symbol is added
Figure BDA0001813832870000092
Indicating whether there is expert intervention at time t, if so, i.e.
Figure BDA0001813832870000093
Otherwise is
Figure BDA0001813832870000094
And the control signal of expert intervention is expressed as
Figure BDA0001813832870000095
The control signal actually received by the smart device may be expressed as:
Figure BDA0001813832870000096
i.e. indicating the presence of an intervention, an expert signal is used; in the absence of intervention, the output of the artificial intelligence based control decision module is used. FIG. 3 is an exemplary diagram of the expert intervention scenario of the present invention. As shown in fig. 3, for a complete epsode, expert intervention may segment it into segments, each segment time point comprising: a starting point in time (or a previous intervention round ending point in time) and an intervention starting point in time. Such as T (k) and T ' (k), T (k +1) and T ' (k +1), T (k +2) and T ' (k +2), and so forth.
On the premise, the following adjustment can be made for the intervening learning process:
Figure BDA0001813832870000097
based on the above, in the aspect of the technical solution of this embodiment, a rewarded penalty is performed on the interference process
Figure BDA0001813832870000101
On the other hand, the cross-layer of the interference is subjected to the imitation learning.
That is, with the intervention learning of the present embodiment, the smart device can avoid a dangerous situation after the intervention of a human or a rule is generated. The intervention can then be learned into the network model of the control decision module in two ways, on the one hand, the intervention generates a negative feedback which is learned into the network model to avoid a subsequent intervention, i.e. the intervention informs the control decision module of some danger in advance, so that the control decision module avoids the output action approaching any dangerous state. In a second aspect, the signal during the intervention is learned as a supervisory signal to the control decision module to speed up convergence of the network model of the control decision module.
In the training method of the control decision module based on artificial intelligence, the intervention data of intelligent equipment is acquired in a field test scene; and training a control decision module in the intelligent equipment according to the intervention data of the intelligent equipment. The training method in the embodiment is an intervention learning process, and the control decision module can be trained more effectively through the intervention learning of the embodiment, so that the control and decision capability of the control decision module in the intelligent device is improved, and the intelligence of the control decision module is enhanced.
FIG. 4 is a block diagram of an embodiment of an artificial intelligence based control decision module training apparatus of the present invention. As shown in fig. 4, in the training apparatus for a control decision module based on artificial intelligence according to this embodiment, the control decision module is disposed in an intelligent device, and specifically may include:
the acquisition module 10 is used for acquiring intervention data of the intelligent device in a field test scene;
the training module 11 is configured to train a control decision module in the intelligent device according to the intervention data of the intelligent device acquired by the acquisition module 10.
In the training apparatus for controlling a decision module based on artificial intelligence according to this embodiment, the implementation principle and technical effect of implementing the training process of the control decision module based on artificial intelligence by using the above modules are the same as those of the related method embodiments, and reference may be made to the description of the related method embodiments in detail, which is not repeated herein.
Further optionally, in the training apparatus for controlling a decision module based on artificial intelligence according to the embodiment shown in fig. 4, the acquisition module 10 is specifically configured to:
in a field test scene, acquiring corresponding intervention data of the intelligent equipment when an operator performs intervention operation on the intelligent equipment.
Further optionally, in the training apparatus of the artificial intelligence based control decision module according to the foregoing embodiment, the intervention data includes state data when the intelligent device is intervened, and an output signal and/or state data of the intelligent device in response to the intervention operation.
Further optionally, in the training apparatus for controlling a decision module based on artificial intelligence in the embodiment of fig. 4, the acquisition module 10 is specifically configured to:
in a field test scene, intervention data generated by intelligent equipment according to preset guarantee rules are collected.
Further optionally, in the training apparatus of the artificial intelligence-based control decision module according to the foregoing embodiment, the intervention data includes intervention conditions in the assurance rules, and output signals and/or status data of the intelligent device in response to the intervention conditions.
Further optionally, in the training apparatus of the artificial intelligence based control decision module in the embodiment of fig. 4, the training module 11 is specifically configured to:
according to the intervention data of the intelligent device acquired by the acquisition module 10, a control decision module in the intelligent device is trained by adopting a reinforcement learning training mode or a supervision learning training mode.
The implementation principle and technical effect of the training device for the artificial intelligence-based control decision module according to the embodiment are the same as those of the related method embodiment, and reference may be made to the description of the related method embodiment for details, which is not repeated herein.
FIG. 5 is a block diagram of an embodiment of a computer device of the present invention. As shown in fig. 5, the computer device of the present embodiment includes: one or more processors 30, and a memory 40, the memory 40 for storing one or more programs, when the one or more programs stored in the memory 40 are executed by the one or more processors 30, cause the one or more processors 30 to implement the artificial intelligence based control decision module training method of the embodiment shown in fig. 1 above. The embodiment shown in fig. 5 is exemplified by including a plurality of processors 30. The computer device of the present embodiment may be an intelligent device.
For example, fig. 6 is an exemplary diagram of a computer device provided by the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12a suitable for use in implementing embodiments of the present invention. The computer device 12a shown in FIG. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 6, computer device 12a is in the form of a general purpose computing device. The components of computer device 12a may include, but are not limited to: one or more processors 16a, a system memory 28a, and a bus 18a that couples various system components including the system memory 28a and the processors 16 a.
Bus 18a represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12a typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12a and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28a may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30a and/or cache memory 32 a. Computer device 12a may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34a may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18a by one or more data media interfaces. System memory 28a may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of the various embodiments of the invention described above in fig. 1-4.
A program/utility 40a having a set (at least one) of program modules 42a may be stored, for example, in system memory 28a, such program modules 42a including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42a generally perform the functions and/or methodologies described above in connection with the various embodiments of fig. 1-4 of the present invention.
Computer device 12a may also communicate with one or more external devices 14a (e.g., keyboard, pointing device, display 24a, etc.), with one or more devices that enable a user to interact with computer device 12a, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12a to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22 a. Also, computer device 12a may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) through network adapter 20 a. As shown, network adapter 20a communicates with the other modules of computer device 12a via bus 18 a. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12a, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 16a executes programs stored in the system memory 28a to perform various functional applications and data processing, such as implementing the training method of the artificial intelligence based control decision module shown in the above-described embodiment.
The present invention also provides a computer-readable medium on which a computer program is stored, which program, when executed by a processor, implements the artificial intelligence based control decision module training method as shown in the above embodiments.
The computer-readable media of this embodiment may include RAM30a, and/or cache memory 32a, and/or storage system 34a in system memory 28a in the embodiment illustrated in fig. 6 described above.
With the development of technology, the propagation path of computer programs is no longer limited to tangible media, and the computer programs can be directly downloaded from a network or acquired by other methods. Accordingly, the computer-readable medium in the present embodiment may include not only tangible media but also intangible media.
The computer-readable medium of the present embodiments may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. A training method of a control decision module based on artificial intelligence is disclosed, wherein the control decision module is arranged in intelligent equipment, and the intelligent equipment comprises an unmanned aerial vehicle; the method comprises the following steps:
acquiring intervention data of the intelligent equipment in a field test scene; the control decision module in the intelligent equipment can be started in a hot mode in the field test scene and controls the intelligent equipment to operate; the intervention data is test data generated after intervention of an operator or intervention of a preset guarantee rule when the intelligent equipment is tested in the field test scene; when the intelligent equipment normally runs in the field test scene, the control decision module controls the intelligent equipment to run; when the intelligent equipment is in danger, the intelligent equipment is controlled by the operator or the preset guarantee rule to run, and the intervention data is generated at the moment; if the control of the operator and the control of the preset guarantee rule are triggered simultaneously, the priority of the control of the operator is greater than the priority of the control of the preset guarantee rule; and training the control decision module in the intelligent equipment according to the intervention data of the intelligent equipment.
2. The method of claim 1, wherein collecting intervention data of the smart device in a field test scenario specifically comprises:
and in the field test scene, acquiring the corresponding intervention data of the intelligent equipment when the operator performs the intervention operation on the intelligent equipment.
3. The method of claim 2, wherein the intervention data comprises status data when the smart device is being intervened, and output signals and/or status data of the smart device in response to the intervention operation.
4. The method of claim 1, wherein collecting intervention data of the smart device in a field test scenario specifically comprises:
and in a field test scene, acquiring intervention data generated by the intelligent equipment according to the preset guarantee rule.
5. The method of claim 4, wherein the intervention data comprises intervention conditions in the assurance rules, and output signals and/or status data of the smart device in response to the intervention conditions.
6. The method according to any one of claims 1 to 5, wherein training the control decision module in the smart device based on intervention data of the smart device comprises:
and training the control decision module in the intelligent equipment by adopting a reinforcement learning training mode or a supervision learning training mode according to the intervention data of the intelligent equipment.
7. A training device based on an artificial intelligence control decision module, wherein the control decision module is arranged in intelligent equipment, and is characterized in that the intelligent equipment comprises an unmanned aerial vehicle; the device comprises:
the acquisition module is used for acquiring intervention data of the intelligent equipment in a field test scene; the control decision module in the intelligent equipment can be started in a hot mode in the field test scene and controls the intelligent equipment to operate; the intervention data are test data generated after intervention of an operator or intervention of a preset guarantee rule when the intelligent equipment is tested in the field test scene; when the intelligent equipment normally runs in the field test scene, the control decision module controls the intelligent equipment to run; when the intelligent equipment is in danger, the intelligent equipment is controlled by the operator or the preset guarantee rule to run, and the intervention data is generated at the moment; if the control of the operator and the control of the preset guarantee rule are triggered simultaneously, the priority of the control of the operator is greater than the priority of the control of the preset guarantee rule;
and the training module is used for training the control decision module in the intelligent equipment according to the intervention data of the intelligent equipment.
8. The apparatus according to claim 7, wherein the acquisition module is specifically configured to:
and in the field test scene, acquiring the intervention data of the intelligent equipment corresponding to the intervention operation of the operator on the intelligent equipment.
9. The apparatus of claim 8, wherein the intervention data comprises status data of the smart device at the time of intervention, and output signals and/or status data of the smart device in response to the intervention operation.
10. The apparatus according to claim 7, wherein the acquisition module is specifically configured to:
and acquiring intervention data generated by the intelligent equipment according to the preset guarantee rule in a field test scene.
11. The apparatus of claim 10, wherein the intervention data comprises intervention conditions in the assurance rules, and output signals and/or status data of the smart device in response to the intervention conditions.
12. The apparatus according to any one of claims 7-11, wherein the training module is specifically configured to:
and training the control decision module in the intelligent equipment by adopting a reinforcement learning training mode or a supervision learning training mode according to the intervention data of the intelligent equipment.
13. A computer device, the device comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201811132192.6A 2018-09-27 2018-09-27 Training method, device and readable medium for control decision module based on artificial intelligence Active CN109255442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811132192.6A CN109255442B (en) 2018-09-27 2018-09-27 Training method, device and readable medium for control decision module based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811132192.6A CN109255442B (en) 2018-09-27 2018-09-27 Training method, device and readable medium for control decision module based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN109255442A CN109255442A (en) 2019-01-22
CN109255442B true CN109255442B (en) 2022-08-23

Family

ID=65047906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811132192.6A Active CN109255442B (en) 2018-09-27 2018-09-27 Training method, device and readable medium for control decision module based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN109255442B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7188279B2 (en) * 2019-05-29 2022-12-13 トヨタ自動車株式会社 Machine learning methods and mobile robots
CN110853389B (en) * 2019-11-21 2022-03-18 白犀牛智达(北京)科技有限公司 Drive test monitoring system suitable for unmanned commodity circulation car
CN112114592B (en) * 2020-09-10 2021-12-17 南京大学 Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
CN112327768B (en) * 2020-10-27 2023-09-19 深圳Tcl新技术有限公司 Intelligent scene building system, method and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529673A (en) * 2016-11-17 2017-03-22 北京百度网讯科技有限公司 Deep learning network training method and device based on artificial intelligence
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180239352A1 (en) * 2016-08-31 2018-08-23 Faraday&Future Inc. System and method for operating vehicles at different degrees of automation
CN106347359B (en) * 2016-09-14 2019-03-12 北京百度网讯科技有限公司 Method and apparatus for operating automatic driving vehicle
CN107506830A (en) * 2017-06-20 2017-12-22 同济大学 Towards the artificial intelligence training platform of intelligent automobile programmed decision-making module
CN107862346B (en) * 2017-12-01 2020-06-30 驭势科技(北京)有限公司 Method and equipment for training driving strategy model
CN108009587B (en) * 2017-12-01 2021-04-16 驭势科技(北京)有限公司 Method and equipment for determining driving strategy based on reinforcement learning and rules
CN108021754A (en) * 2017-12-06 2018-05-11 北京航空航天大学 A kind of unmanned plane Autonomous Air Combat Decision frame and method
CN108319173B (en) * 2018-02-12 2019-07-05 苏州车付通信息科技有限公司 Decision making device, system and method
CN108319286B (en) * 2018-03-12 2020-09-22 西北工业大学 Unmanned aerial vehicle air combat maneuver decision method based on reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529673A (en) * 2016-11-17 2017-03-22 北京百度网讯科技有限公司 Deep learning network training method and device based on artificial intelligence
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction

Also Published As

Publication number Publication date
CN109255442A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109255442B (en) Training method, device and readable medium for control decision module based on artificial intelligence
JP7174063B2 (en) Obstacle avoidance method and device for driverless vehicle
US11829896B2 (en) Uncertainty-based data filtering in a vehicle
US9361795B2 (en) Regional driving trend modification using autonomous vehicles
EP3837633A2 (en) Driving scenarios for autonomous vehicles
US11558483B2 (en) Value-based data transmission in an autonomous vehicle
Valls et al. Design of an autonomous racecar: Perception, state estimation and system integration
CN109131340A (en) Active vehicle adjusting performance based on driving behavior
US11176007B2 (en) Redundant processing fabric for autonomous vehicles
CN116108717B (en) Traffic transportation equipment operation prediction method and device based on digital twin
US11875177B1 (en) Variable access privileges for secure resources in an autonomous vehicle
CN110824912A (en) Method and apparatus for training a control strategy model for generating an autonomous driving strategy
WO2022245916A1 (en) Device health code broadcasting on mixed vehicle communication networks
CN113635896B (en) Driving behavior determination method and related equipment thereof
US11592810B2 (en) Systems and methods for injecting faults into an autonomy system
CN116461507A (en) Vehicle driving decision method, device, equipment and storage medium
US20230001950A1 (en) Using predictive visual anchors to control an autonomous vehicle
WO2023187121A1 (en) Simulation-based testing for robotic systems
US20220371530A1 (en) Device-level fault detection
CN112700001A (en) Authentication countermeasure robustness for deep reinforcement learning
US20220402499A1 (en) Detecting operator contact with a steering wheel
WO2023214159A1 (en) Controlling an autonomous vehicle
GB2618419A (en) Controlling an autonomous vehicle
CN117585022A (en) PyQt 5-based data processing method, device, equipment and storage medium
WO2023286097A1 (en) Drone and relative command method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant