CN109726808A - Neural network training method and device, storage medium and electronic device - Google Patents
Neural network training method and device, storage medium and electronic device Download PDFInfo
- Publication number
- CN109726808A CN109726808A CN201711037964.3A CN201711037964A CN109726808A CN 109726808 A CN109726808 A CN 109726808A CN 201711037964 A CN201711037964 A CN 201711037964A CN 109726808 A CN109726808 A CN 109726808A
- Authority
- CN
- China
- Prior art keywords
- neural network
- interaction
- training
- human
- computer interaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a kind of neural network training methods and device, storage medium and electronic device.Wherein, this method comprises: obtaining the offline sample set for training the neural network in man-machine interactive application, wherein include the offline sample for meeting predetermined configurations condition in offline sample set;Using the offline initial neural network of sample set off-line training, object neural network is obtained, wherein in human-computer interaction application, the processing capacity of object neural network is higher than the processing capacity of initial neural network;The on-line operation environment of object neural network access human-computer interaction application is subjected to on-line training, obtains target nerve network.The present invention solves the lower technical problem of training effectiveness present in the neural network training method of the relevant technologies offer.
Description
Technical field
The present invention relates to computer fields, in particular to a kind of neural network training method and device, storage medium
And electronic device.
Background technique
Depth Q network (Deep Q Network, abbreviation DQN) algorithm is a kind of fusion convolutional neural networks and Q-
The method of Learning is applied in depth enhancing study (Deep Reinforcement Learning, abbreviation DRL),
In, depth enhancing study DRL is to combine deep learning and enhancing study, to realize from perceiving the end-to-end of movement
The completely new algorithm of one kind of study.That is, by deep neural network, directly output is dynamic after inputting perception information
Make, so that robot realizes entirely autonomous the study even potentiality of a variety of technical ability, to realize artificial intelligence (Artificial
Intelligence, abbreviation AI) operation.In order to make robot preferably complete autonomous learning, to be skillfully applied to different fields
Jing Zhong, by training rapidly and accurately to obtain neural network, just become currently there is an urgent need to the problem of.
Currently, the sample object for accessing on-line training environmental training neural network, usual rank is very low, first in training
When the phase, very maximum probability is to make random action, although the state space of training environment can be explored well, is extended
Training time generally requires to carry out constantly enquiry learning in training environment, can be only achieved one further, since rank is very low
Fixed training goal.
That is, the training time needed for the neural network training method provided in the related technology is longer, so as to cause mind
Through the lower problem of network training efficiency.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of neural network training methods and device, storage medium and electronic device, so that
The lower technical problem of training effectiveness present in the neural network training method that the relevant technologies provide is solved less.
According to an aspect of an embodiment of the present invention, a kind of neural network training method is provided, comprising: obtain for instructing
Practice the offline sample set of the neural network in human-computer interaction application, wherein include meeting to make a reservation in above-mentioned offline sample set
The offline sample of configuration condition;Using the above-mentioned offline initial neural network of sample set off-line training, object neural network is obtained,
Wherein, in above-mentioned human-computer interaction application, the processing capacity of above-mentioned object neural network is higher than the place of above-mentioned initial neural network
Reason ability;The on-line operation environment that above-mentioned object neural network accesses above-mentioned human-computer interaction application is subjected to on-line training, is obtained
Target nerve network.
According to another aspect of an embodiment of the present invention, a kind of neural metwork training device is additionally provided, comprising: obtain single
Member, for obtaining the offline sample set for training the neural network in man-machine interactive application, wherein above-mentioned offline sample set
It include the offline sample for meeting predetermined configurations condition in conjunction;Off-line training unit, for offline using above-mentioned offline sample set
The initial neural network of training, obtains object neural network, wherein in above-mentioned human-computer interaction application, above-mentioned object neural network
Processing capacity be higher than above-mentioned initial neural network processing capacity;On-line training unit is used for above-mentioned object neural network
The on-line operation environment for accessing above-mentioned human-computer interaction application carries out on-line training, obtains target nerve network.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and above-mentioned storage medium includes storage
Program, wherein above procedure run when execute above-mentioned method.
Another aspect according to an embodiment of the present invention, additionally provides a kind of electronic device, including memory, processor and deposits
The computer program that can be run on above-mentioned memory and on above-mentioned processor is stored up, above-mentioned processor passes through above-mentioned computer journey
Sequence executes above-mentioned method.
In embodiments of the present invention, the offline sample for being used to train the neural network in man-machine interactive application got is utilized
This set, the initial neural network of off-line training, to obtain object neural network, wherein the processing capacity of the object neural network
Higher than the processing capacity of initial neural network.Then, by the on-line operation of above-mentioned object neural network access human-computer interaction application
Environment, to realize on-line training, to obtain with human-computer interaction using matched target nerve network.That is, by pre-
The offline sample set for meeting predetermined configurations condition is first obtained, off-line training is carried out to initial neural network, obtains processing energy
The higher object neural network of power, and be no longer that initial neural network access on-line operation environment is directly subjected to on-line training,
To overcome being only capable of of providing in presently relevant technology to obtain training duration caused by target nerve network by on-line training
It is longer, the lower problem of training effectiveness.In addition, obtaining object neural network using offline sample set off-line training, also expand
For carrying out the sample range of neural metwork training, in order to obtain more high-quality or different grades of offline sample, further
It ensure that the training effectiveness of neural metwork training.And then it solves present in the neural network training method of the relevant technologies offer
The lower technical problem of training effectiveness.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware environment schematic diagram of optional neural network training method according to an embodiment of the present invention;
Fig. 2 is a kind of flow chart of optional neural network training method according to an embodiment of the present invention;
Fig. 3 is a kind of application schematic diagram of optional neural network training method according to an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of optional neural network training method according to an embodiment of the present invention;
Fig. 5 is the schematic diagram of another optional neural network training method according to an embodiment of the present invention;
Fig. 6 is the flow chart of another optional neural network training method according to an embodiment of the present invention;
Fig. 7 is the flow chart of another optional neural network training method according to an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of optional neural metwork training device according to an embodiment of the present invention;
Fig. 9 is the schematic diagram of another optional neural network training method according to an embodiment of the present invention;
Figure 10 is a kind of schematic diagram of optional electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
In embodiments of the present invention, a kind of embodiment of above-mentioned neural network training method is provided.As a kind of optional
Embodiment, the neural network training method can be, but not limited to be applied to application environment as shown in Figure 1 in, terminal 102
In the client of human-computer interaction application is installed, apply such as human-computer interaction and be illustrated by taking game application as an example, object A is user
Object is manipulated, object B is that machine manipulates object.It is applied by operation human-computer interaction to obtain offline sample, database is arrived in storage
In 104, wherein the database 104 can be, but not limited to be located in Training Control server, also can be, but not limited to be located at third
In the independent server in side;Further, what the offline sample that acquisition meets predetermined configurations condition was constituted is used to train nerve net
The offline sample set of network.And the offline sample set initial neural network of off-line training in terminal 106 is used, to obtain pair
As neural network, wherein the processing capacity of the object neural network is higher than the processing capacity of initial neural network.It then, will be whole
The object neural network that off-line training obtains in end 106 accesses the on-line operation environment of human-computer interaction application by network 108, with
On-line training is realized, to obtain with human-computer interaction using matched target nerve network.
In the present embodiment, the offline sample set for being used to train the neural network in man-machine interactive application got is utilized
It closes, the initial neural network of off-line training, to obtain object neural network, wherein the processing capacity of the object neural network is higher than
The processing capacity of initial neural network.Then, the on-line operation environment access human-computer interaction of above-mentioned object neural network applied,
To realize on-line training, to obtain with human-computer interaction using matched target nerve network.That is, by obtaining in advance
The offline sample set for meeting predetermined configurations condition carries out off-line training to initial neural network, it is higher to obtain processing capacity
Object neural network, and be no longer by initial neural network access on-line operation environment directly carry out on-line training, thus gram
Longer, the instruction that is only capable of obtaining training duration caused by target nerve network by on-line training provided in presently relevant technology is provided
Practice the lower problem of efficiency.In addition, obtain object neural network using offline sample set off-line training, also expand for into
The sample range of row neural metwork training further ensures mind in order to obtain more high-quality or different grades of offline sample
Training effectiveness through network training.
Optionally, in the present embodiment, above-mentioned terminal can include but is not limited at least one of: mobile phone, plate electricity
Brain, laptop, desktop PC, DTV and other can run human-computer interaction application hardware device.Above-mentioned network
It can include but is not limited at least one of: wide area network, Metropolitan Area Network (MAN), local area network.A kind of above-mentioned only example, the present embodiment pair
This does not do any restriction.
According to embodiments of the present invention, a kind of neural network training method is provided, as shown in Fig. 2, this method comprises:
S202 obtains the offline sample set for training the neural network in man-machine interactive application, wherein offline sample
It include the offline sample for meeting predetermined configurations condition in set;
S204 obtains object neural network, wherein in people using the offline initial neural network of sample set off-line training
In machine interactive application, the processing capacity of object neural network is higher than the processing capacity of initial neural network;
The on-line operation environment of object neural network access human-computer interaction application is carried out on-line training, obtains mesh by S206
Mark neural network.
Optionally, in the present embodiment, above-mentioned neural network training method can be, but not limited to be applied to following man-machine friendship
In the scene mutually applied: 1) in the application of man-machine confrontation class, the target nerve network that training obtains is used to realize people with online account
Machine antagonistic process;2) in on-hook confrontation application, the target nerve network that training obtains can replace online account, continue subsequent
Man-machine confrontation process.That is, provided in through this embodiment using offline sample set by off-line training and online
The target nerve network for having multinomial technical ability that training obtains, to complete the intelligent operation in human-computer interaction application.
It should be noted that in the present embodiment, the offline sample set of predetermined configurations condition is met by obtaining in advance,
To carry out off-line training to initial neural network, obtain the higher object neural network of processing capacity, and is no longer by initial mind
Directly carry out on-line training through network insertion on-line operation environment, thus overcome being only capable of of being provided in presently relevant technology by
Line training obtains that training duration caused by target nerve network is longer, the lower problem of training effectiveness.In addition, utilizing offline sample
This set off-line training obtains object neural network, also expands the sample range for carrying out neural metwork training, in order to
More high-quality or different grades of offline sample is obtained, the training effectiveness of neural metwork training is further ensured.
Optionally, in the present embodiment, the target nerve network in above-mentioned different application scene can include but is not limited to
It is obtained by following on-line training mode:
1) online in the on-line operation environment that object neural network is accessed to human-computer interaction application, with human-computer interaction application
Account carries out online dual training;Or
2) by the on-line operation environment of object neural network access human-computer interaction application, in substitution human-computer interaction application the
One online account continues to carry out online dual training with the second online account.
It should be noted that online account can be, but not limited to as the user's control account in man-machine interactive application, such as with
It is illustrated for shown in Fig. 3, object A can manipulate object for user, and object B is that machine manipulates object, above-mentioned for obtaining
The object neural network of target nerve network can be, but not limited to as object B, by online dual training, further to improve pair
As the weighted value in neural network, corresponding target nerve network is obtained;In addition, being still illustrated taking what is shown in fig. 3 as an example, object
A can manipulate object for user, and object B can also manipulate object with user, run a period of time and selection on-hook operation in object A
Afterwards, it can be, but not limited to object A replacing with object neural network, by continuing man-machine confrontation process with object B, into
One step improves the weighted value in object neural network, obtains corresponding target nerve network.
Optionally, in the present embodiment, using the offline initial neural network of sample set off-line training, object nerve is obtained
Network includes:
1) in the case where predetermined configurations condition indicates to obtain high-grade object neural network, high-grade offline sample is used
Set training obtains high-grade object neural network, wherein the offline sample in high-grade offline sample set is in human-computer interaction
Operation result in is higher than predetermined threshold;Or
2) in the case where predetermined configurations condition indicates to obtain the object neural network of multiple grades, respectively using each etc.
The offline sample set training of grade obtains the object neural network of corresponding grade, wherein in the offline sample set of multiple grades
Offline sample human-computer interaction application in operation result be within the scope of different targets thresholds respectively, wherein it is multiple etc.
The object neural network of grade includes at least the first estate object network, the second class object network, wherein the first estate object Petri net
The processing capacity of network is higher than the processing capacity of the second class object network.
It should be noted that in the present embodiment, above-mentioned target nerve network can be, but not limited to according to different off-line sample
The level of interaction of offline sample in this set, and training obtains the neural network with different grades of level of interaction.For example,
Aforesaid way 1), the high-quality offline sample that operation result is higher than predetermined threshold is obtained from offline sample, is obtained by off-line training
To high-grade object neural network, to promote the winning rate of machine in man-machine confrontation, so that it is man-machine to attract more users account to participate in
Interactive application;Aforesaid way 2), it is in respectively from acquisition operation result in offline sample more within the scope of different targets thresholds
The offline sample set of a grade obtains the object neural network of multiple grades by off-line training, in abundant human-computer interaction
Confrontation level.
Optionally, in the present embodiment, above-mentioned offline sample can be, but not limited to obtain in the following manner: use instruction
During practicing account operation human-computer interaction application, the parameter value of interaction parameter of the training account in each status frames is acquired,
Wherein, interaction parameter includes: interaction mode, interactive action, interaction feedback excitation;It is obtained according to the parameter value of interaction parameter offline
Sample.
It should be noted that can be, but not limited to refer to human-computer interaction application run during according to frame number successively by
Frame shows each status frames, and acquires the parameter value of the interaction parameter in each status frames, to obtain each interaction ginseng
The frame sequence of several parameter values, and then offline sample is obtained using the frame sequence.Wherein, interaction mode can be, but not limited to basis
The interactive picture of human-computer interaction application determines, interactive action can be, but not limited to be applied according to human-computer interaction in the interaction behaviour that receives
It determines, interaction feedback excitation can be, but not limited to be motivated according to the matched interaction feedback of application type with human-computer interaction application
The parameter value of parameter determines.
By embodiment provided by the present application, the offline sample set of predetermined configurations condition is met by obtaining in advance, is come
Off-line training is carried out to initial neural network, obtains the higher object neural network of processing capacity, and is no longer by initial nerve
Network insertion on-line operation environment directly carries out on-line training, to overcome what is provided in presently relevant technology to be only capable of by online
Training obtains that training duration caused by target nerve network is longer, the lower problem of training effectiveness.In addition, utilizing offline sample
Set off-line training obtains object neural network, also expands the sample range for carrying out neural metwork training, in order to
To more high-quality or different grades of offline sample, the training effectiveness of neural metwork training is further ensured.
As a kind of optional scheme, the offline sample set for training the neural network in man-machine interactive application is obtained
Include:
S1 is obtained using the offline sample obtained after training account operation human-computer interaction application;
S2 is screened from the offline sample got according to predetermined configurations condition and is obtained offline sample set.
Optionally, in the present embodiment, it obtains using the offline sample obtained after training account operation human-computer interaction application
Include:
S11 acquires training account in each status frames during applying using training account operation human-computer interaction
Interaction parameter parameter value, wherein interaction parameter include: interaction mode, interactive action, interaction feedback excitation;
S12 obtains offline sample according to the parameter value of interaction parameter.
It should be noted that in the present embodiment, interaction feedback excitation is the root by DQN algorithm in human-computer interaction application
Current state is calculated to the feedback excitation value of movement, to obtain the ginseng of above-mentioned interaction feedback excitation according to the variation of interaction mode
Numerical value.Specific calculation formula can be, but not limited to be applied according to different types of human-computer interaction and be set as different disclosures.Example
Such as, by taking multi-person interactive game is applied as an example, the parameter of above-mentioned interaction feedback excitation be can be, but not limited to as each character object
Blood volume, get in the training process trained account blood volume it is higher when, positive energize value of feedback can be configured, otherwise, configure and negative swash
Encourage value of feedback.In another example the parameter of above-mentioned interaction feedback excitation can be, but not limited to be complete by taking distance sports class application as an example
At mileage, get in the training process mileage that trained account is completed it is remoter when, it is bigger to configure excitation value of feedback, no
Then, configuration excitation value of feedback is smaller.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.In addition, in this reality
It applies in example, the parameter of above-mentioned interaction feedback excitation can be, but not limited to successively record according to the frame number of status frames.
It is specifically illustrated in conjunction with example as shown in Figure 4, during human-computer interaction application is run, acquires interaction shape
State st, record obtain state frame sequence (s0, s1 ... st);To acquire interactive action at, record is acted for acquisition movement output
Frame sequence (a0, a1 ... at);The parameter value of interaction feedback excitation parameters is further calculated to determine the parameter of interaction feedback excitation
Value rt, record obtain feedback excitation frame sequence (r0, r1 ... rt).And the above-mentioned intermediate sample collected is further passed through
Above-mentioned intermediate sample is combined to obtain offline sample, and the determining offline sample of combination is stored into offline sample database.
In the present embodiment, by above-mentioned interaction mode, the acquisition data of interactive action, interaction feedback excitation three parts press shape
The frame number of state frame synchronizes combination, to generate offline sample, such as DQN sample, further arrives the DQN Sample preservation of generation
In offline sample database.
As a kind of optional scheme, obtaining offline sample according to the parameter value of interaction parameter includes:
S1, according to the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames
Parameter value, combination determine offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is the total of operation human-computer interaction application
Number of frames.
Specifically be illustrated as shown in connection with fig. 5, above-mentioned offline sample can be, but not limited to for a four-tuple (s, a, r,
S '), meaning is respectively as follows:
Interaction mode (state, abbreviation s) in s: i-th status frames
Interactive action (action, abbreviation a) in a: i-th status frames
Interaction in r: i-th status frames is made under interaction mode s, and after making movement a, the interaction feedback of acquisition is motivated
(reward, abbreviation r)
S ': the interaction mode (next state, abbreviation s ') in i+1 status frames
As shown in figure 5, by the parameter value of the interaction parameter in i-th of status frames of current time, with subsequent time i+1
The parameter value of interaction parameter in status frames is combined, to obtain one group of offline sample on right side.It is actually current shape
The parameter value of the interaction parameter of state frame and the interaction parameter value of the interaction parameter of NextState frame are combined.
In the present embodiment, by will be in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames
Interaction parameter parameter value, combination determines offline sample, accurate offline sample data can be generated, with accelerans network
Convergence process.
As a kind of optional scheme, acquire interaction parameter of the training account in each status frames parameter value include with
It is at least one lower:
1) status indicator for acquiring the interaction mode in each status frames obtains answering using training account operation human-computer interaction
State frame sequence during;
2) action identification for acquiring the interactive action in each status frames obtains answering using training account operation human-computer interaction
Movement frame sequence during;
3) the matched interaction feedback excitation parameters of application type with human-computer interaction application are obtained;Calculate interaction feedback excitation
The parameter value of parameter obtains the feedback excitation frame sequence during using training account operation human-computer interaction to apply.
It is illustrated with example as shown in Figure 4, during human-computer interaction application is run, acquires interaction mode st,
Record obtains state frame sequence (s0, s1 ... st);To acquire interactive action at, record obtains movement frame sequence for acquisition movement output
(a0, a1 ... at);The parameter value of interaction feedback excitation parameters is further calculated to determine the parameter value rt of interaction feedback excitation, note
Record obtains feedback excitation frame sequence (r0, r1 ... rt).
In the present embodiment, interaction mode, the interactive action in each status frames are obtained.According to interaction feedback excitation parameters
The parameter value of interaction feedback excitation parameters is obtained to obtain the corresponding state frame sequence in human-computer interaction application process, is acted
Frame sequence and feedback excitation frame sequence, in order to combine to obtain DQN (neural network) offline sample.
As a kind of optional scheme, the status indicator for acquiring the interaction mode in each status frames includes:
S1, the status screen of the interaction mode in each status frames of screenshotss;
S2 determines the status indicator of interaction mode according to status screen.
It is specifically illustrated as shown in connection with fig. 6, acquires the status indicator of the interaction mode in each status frames, specifically include
Following steps:
S602 starts the real-time screen capture module in terminal;
S604 runs human-computer interaction application;
S606, status screen during running human-computer interaction application, in real-time screenshotss status frames;
S608 obtains multiple status screens, stores to obtain state frame sequence according to frame number.
In the present embodiment, then the status screen of the interaction mode of each status frames of screenshotss is determined according to status screen
The status indicator of interaction mode acquires the friendship in each status frames to realize during human-computer interaction application is run in real time
The status indicator of mutual state.
As a kind of optional scheme, the action identification for acquiring the interactive action in each status frames includes:
1) contact action is acquired;Obtain the movement mark of the interactive action corresponding with contact action in human-computer interaction application
Know;Or
2) incoming event of external equipment is acquired, wherein incoming event includes at least one of: keypad input event,
Body-sensing incoming event, sensing equipment incoming event;Obtain the interactive action corresponding with incoming event in human-computer interaction application
Action identification.
The incoming event of acquisition contact action and acquisition external equipment is specifically described below:
(1) it is illustrated for acquiring contact action first, it will usually it is acquired contact action on mobile terminals,
In human-computer interaction application on mobile terminal, usually following several operation modes: universal wheel operation on touch key-press, touch screen,
Gyroscope operation, electronic curtain touch operation in terminal etc., the mainly touching by being mapped to interactive action on mobile terminal
Key, the universal wheel on touch screen, touch screen etc. are touched, the movement acquisition module monitoring key in mobile terminal or interactive application is passed through
Disk event records the corresponding movement of the event after getting corresponding event, to save movement frame sequence.
(2) usually external equipment includes keyboard, infrared ray perception, temperature sensor etc., which can be according to phase
The operation answered carries out event input to interactive application.It is illustrated so that external equipment is keyboard as an example, as shown in fig. 7, acquisition is outer
The step of incoming event of portion's equipment the following steps are included:
S702, first by human-computer interaction apply needed for interactive action be mapped in keyboard, establish KeyEvent;
Then S704 monitors KeyEvent by movement acquisition module;
S706 is getting KeyEvent;
S708 records the corresponding movement of the KeyEvent, to save movement frame sequence.
In the present embodiment, the action identification for acquiring the interactive action in each status frames includes applied to adopting in terminal
Collect contact action and acquire the incoming event of external equipment, provides the various ways of the action identification of acquisition interactive action,
Improve the range of interactive application acquisition action identification.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
According to embodiments of the present invention, a kind of neural network instruction for implementing above-mentioned neural network training method is additionally provided
Practice device, as shown in figure 8, the device includes:
1) acquiring unit 802, for obtaining the offline sample set for training the neural network in man-machine interactive application,
It wherein, include the offline sample for meeting predetermined configurations condition in offline sample set;
2) off-line training unit 804 obtains object mind for using the offline initial neural network of sample set off-line training
Through network, wherein in human-computer interaction application, the processing capacity of object neural network is higher than the processing energy of initial neural network
Power;
3) on-line training unit 806, for by object neural network access human-computer interaction application on-line operation environment into
Row on-line training obtains target nerve network.
Optionally, in the present embodiment, above-mentioned neural network training method can be, but not limited to be applied to following man-machine friendship
In the scene mutually applied: 1) in the application of man-machine confrontation class, the target nerve network that training obtains is used to realize people with online account
Machine antagonistic process;2) in on-hook confrontation application, the target nerve network that training obtains can replace online account, continue subsequent
Man-machine confrontation process.That is, provided in through this embodiment using offline sample set by off-line training and online
The target nerve network for having multinomial technical ability that training obtains, to complete the intelligent operation in human-computer interaction application.
It should be noted that in the present embodiment, the offline sample set of predetermined configurations condition is met by obtaining in advance,
To carry out off-line training to initial neural network, obtain the higher object neural network of processing capacity, and is no longer by initial mind
Directly carry out on-line training through network insertion on-line operation environment, thus overcome being only capable of of being provided in presently relevant technology by
Line training obtains that training duration caused by target nerve network is longer, the lower problem of training effectiveness.In addition, utilizing offline sample
This set off-line training obtains object neural network, also expands the sample range for carrying out neural metwork training, in order to
More high-quality or different grades of offline sample is obtained, the training effectiveness of neural metwork training is further ensured.
Optionally, in the present embodiment, the target nerve network in above-mentioned different application scene can include but is not limited to
It is obtained by following on-line training mode:
1) online in the on-line operation environment that object neural network is accessed to human-computer interaction application, with human-computer interaction application
Account carries out online dual training;Or
2) by the on-line operation environment of object neural network access human-computer interaction application, in substitution human-computer interaction application the
One online account continues to carry out online dual training with the second online account.
It should be noted that online account can be, but not limited to as the user's control account in man-machine interactive application, such as with
It is illustrated for shown in Fig. 3, object A can manipulate object for user, and object B is that machine manipulates object, above-mentioned for obtaining
The object neural network of target nerve network can be, but not limited to as object B, by online dual training, further to improve pair
As the weighted value in neural network, corresponding target nerve network is obtained;In addition, being still illustrated taking what is shown in fig. 3 as an example, object
A can manipulate object for user, and object B can also manipulate object with user, run a period of time and selection on-hook operation in object A
Afterwards, it can be, but not limited to object A replacing with object neural network, by continuing man-machine confrontation process with object B, into
One step improves the weighted value in object neural network, obtains corresponding target nerve network.
Optionally, in the present embodiment, using the offline initial neural network of sample set off-line training, object nerve is obtained
Network includes:
1) in the case where predetermined configurations condition indicates to obtain high-grade object neural network, high-grade offline sample is used
Set training obtains high-grade object neural network, wherein the offline sample in high-grade offline sample set is in human-computer interaction
Operation result in is higher than predetermined threshold;Or
2) in the case where predetermined configurations condition indicates to obtain the object neural network of multiple grades, respectively using each etc.
The offline sample set training of grade obtains the object neural network of corresponding grade, wherein in the offline sample set of multiple grades
Offline sample human-computer interaction application in operation result be within the scope of different targets thresholds respectively, wherein it is multiple etc.
The object neural network of grade includes at least the first estate object network, the second class object network, wherein the first estate object Petri net
The processing capacity of network is higher than the processing capacity of the second class object network.
It should be noted that in the present embodiment, above-mentioned target nerve network can be, but not limited to according to different off-line sample
The level of interaction of offline sample in this set, and training obtains the neural network with different grades of level of interaction.For example,
Aforesaid way 1), the high-quality offline sample that operation result is higher than predetermined threshold is obtained from offline sample, is obtained by off-line training
To high-grade object neural network, to promote the winning rate of machine in man-machine confrontation, so that it is man-machine to attract more users account to participate in
Interactive application;Aforesaid way 2), it is in respectively from acquisition operation result in offline sample more within the scope of different targets thresholds
The offline sample set of a grade obtains the object neural network of multiple grades by off-line training, in abundant human-computer interaction
Confrontation level.
Optionally, in the present embodiment, above-mentioned offline sample can be, but not limited to obtain in the following manner: use instruction
During practicing account operation human-computer interaction application, the parameter value of interaction parameter of the training account in each status frames is acquired,
Wherein, interaction parameter includes: interaction mode, interactive action, interaction feedback excitation;It is obtained according to the parameter value of interaction parameter offline
Sample.
It should be noted that can be, but not limited to refer to human-computer interaction application run during according to frame number successively by
Frame shows each status frames, and acquires the parameter value of the interaction parameter in each status frames, to obtain each interaction ginseng
The frame sequence of several parameter values, and then offline sample is obtained using the frame sequence.Wherein, interaction mode can be, but not limited to basis
The interactive picture of human-computer interaction application determines, interactive action can be, but not limited to be applied according to human-computer interaction in the interaction behaviour that receives
It determines, interaction feedback excitation can be, but not limited to be motivated according to the matched interaction feedback of application type with human-computer interaction application
The parameter value of parameter determines.
By embodiment provided by the present application, the offline sample set of predetermined configurations condition is met by obtaining in advance, is come
Off-line training is carried out to initial neural network, obtains the higher object neural network of processing capacity, and is no longer by initial nerve
Network insertion on-line operation environment directly carries out on-line training, to overcome what is provided in presently relevant technology to be only capable of by online
Training obtains that training duration caused by target nerve network is longer, the lower problem of training effectiveness.In addition, utilizing offline sample
Set off-line training obtains object neural network, also expands the sample range for carrying out neural metwork training, in order to
To more high-quality or different grades of offline sample, the training effectiveness of neural metwork training is further ensured.
As a kind of optional scheme, as shown in figure 9, acquiring unit 802 includes:
1) module 902 is obtained, for obtaining using the offline sample obtained after training account operation human-computer interaction application;
2) screening module 904 obtain offline sample for screening from the offline sample got according to predetermined configurations condition
This set.
As a kind of optional scheme, obtaining module includes:
1) submodule is acquired, for acquiring training account during applying using training account operation human-computer interaction
The parameter value of interaction parameter in each status frames, wherein interaction parameter includes: interaction mode, interactive action, interaction feedback
Excitation;
2) acquisition submodule, for obtaining offline sample according to the parameter value of interaction parameter.
It should be noted that in the present embodiment, interaction feedback excitation is the root by DQN algorithm in human-computer interaction application
Current state is calculated to the feedback excitation value of movement, to obtain the ginseng of above-mentioned interaction feedback excitation according to the variation of interaction mode
Numerical value.Specific calculation formula can be, but not limited to be applied according to different types of human-computer interaction and be set as different disclosures.Example
Such as, by taking multi-person interactive game is applied as an example, the parameter of above-mentioned interaction feedback excitation be can be, but not limited to as each character object
Blood volume, get in the training process trained account blood volume it is higher when, positive energize value of feedback can be configured, otherwise, configure and negative swash
Encourage value of feedback.In another example the parameter of above-mentioned interaction feedback excitation can be, but not limited to be complete by taking distance sports class application as an example
At mileage, get in the training process mileage that trained account is completed it is remoter when, it is bigger to configure excitation value of feedback, no
Then, configuration excitation value of feedback is smaller.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.In addition, in this reality
It applies in example, the parameter of above-mentioned interaction feedback excitation can be, but not limited to successively record according to the frame number of status frames.
It is specifically illustrated in conjunction with example as shown in Figure 4, during human-computer interaction application is run, acquires interaction shape
State st, record obtain state frame sequence (s0, s1 ... st);To acquire interactive action at, record is acted for acquisition movement output
Frame sequence (a0, a1 ... at);The parameter value of interaction feedback excitation parameters is further calculated to determine the parameter of interaction feedback excitation
Value rt, record obtain feedback excitation frame sequence (r0, r1 ... rt).And the above-mentioned intermediate sample collected is further passed through
Above-mentioned intermediate sample is combined to obtain offline sample, and the determining offline sample of combination is stored into offline sample database.
In the present embodiment, by above-mentioned interaction mode, the acquisition data of interactive action, interaction feedback excitation three parts press shape
The frame number of state frame synchronizes combination, to generate offline sample, such as DQN sample, further arrives the DQN Sample preservation of generation
In offline sample database.
As a kind of optional scheme, acquisition submodule is realized by following steps and is obtained according to the parameter value of interaction parameter
Offline sample:
1) according to the ginseng of the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames
Numerical value, combination determine offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is the total frame for running human-computer interaction application
Quantity.
Specifically be illustrated as shown in connection with fig. 5, above-mentioned offline sample can be, but not limited to for a four-tuple (s, a, r,
S '), meaning is respectively as follows:
Interaction mode (state, abbreviation s) in s: i-th status frames
Interactive action (action, abbreviation a) in a: i-th status frames
Interaction in r: i-th status frames is made under interaction mode s, and after making movement a, the interaction feedback of acquisition is motivated
(reward, abbreviation r)
S ': the interaction mode (next state, abbreviation s ') in i+1 status frames
As shown in figure 5, by the parameter value of the interaction parameter in i-th of status frames of current time, with subsequent time i+1
The parameter value of interaction parameter in status frames is combined, to obtain one group of offline sample on right side.It is actually current shape
The parameter value of the interaction parameter of state frame and the interaction parameter value of the interaction parameter of NextState frame are combined.
In the present embodiment, by will be in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames
Interaction parameter parameter value, combination determines offline sample, accurate offline sample data can be generated, with accelerans network
Convergence process.
As a kind of optional scheme, acquires submodule and acquire trained account by way of following at least one in each shape
The parameter value of interaction parameter in state frame:
1) status indicator for acquiring the interaction mode in each status frames obtains answering using training account operation human-computer interaction
State frame sequence during;
2) action identification for acquiring the interactive action in each status frames obtains answering using training account operation human-computer interaction
Movement frame sequence during;
3) the matched interaction feedback excitation parameters of application type with human-computer interaction application are obtained;Calculate interaction feedback excitation
The parameter value of parameter obtains the feedback excitation frame sequence during using training account operation human-computer interaction to apply.
It is illustrated with example as shown in Figure 4, during human-computer interaction application is run, acquires interaction mode st,
Record obtains state frame sequence (s0, s1 ... st);To acquire interactive action at, record obtains movement frame sequence for acquisition movement output
(a0, a1 ... at);The parameter value of interaction feedback excitation parameters is further calculated to determine the parameter value rt of interaction feedback excitation, note
Record obtains feedback excitation frame sequence (r0, r1 ... rt).
In the present embodiment, interaction mode, the interactive action in each status frames are obtained.According to interaction feedback excitation parameters
The parameter value of interaction feedback excitation parameters is obtained to obtain the corresponding state frame sequence in human-computer interaction application process, is acted
Frame sequence and feedback excitation frame sequence, in order to combine to obtain DQN (neural network) offline sample.
As a kind of optional scheme, acquires submodule and acquire the interaction mode in each status frames by following steps
Status indicator:
S1, the status screen of the interaction mode in each status frames of screenshotss;
S2 determines the status indicator of interaction mode according to status screen.
It is specifically illustrated as shown in connection with fig. 6, acquires the status indicator of the interaction mode in each status frames, specifically include
Following steps:
S602 starts the real-time screen capture module in terminal;
S604 runs human-computer interaction application;
S606, status screen during running human-computer interaction application, in real-time screenshotss status frames;
S608 obtains multiple status screens, stores to obtain state frame sequence according to frame number.
In the present embodiment, then the status screen of the interaction mode of each status frames of screenshotss is determined according to status screen
The status indicator of interaction mode acquires the friendship in each status frames to realize during human-computer interaction application is run in real time
The status indicator of mutual state.
As a kind of optional scheme, acquires submodule and acquire the interactive action in each status frames by following steps
Action identification:
1) contact action is acquired;Obtain the movement mark of the interactive action corresponding with contact action in human-computer interaction application
Know;Or
2) incoming event of external equipment is acquired, wherein incoming event includes at least one of: keypad input event,
Body-sensing incoming event, sensing equipment incoming event;Obtain the interactive action corresponding with incoming event in human-computer interaction application
Action identification.
The incoming event of acquisition contact action and acquisition external equipment is specifically described below:
(1) it is illustrated for acquiring contact action first, it will usually it is acquired contact action on mobile terminals,
In human-computer interaction application on mobile terminal, usually following several operation modes: universal wheel operation on touch key-press, touch screen,
Gyroscope operation, electronic curtain touch operation in terminal etc., the mainly touching by being mapped to interactive action on mobile terminal
Key, the universal wheel on touch screen, touch screen etc. are touched, the movement acquisition module monitoring key in mobile terminal or interactive application is passed through
Disk event records the corresponding movement of the event after getting corresponding event, to save movement frame sequence.
(2) usually external equipment includes keyboard, infrared ray perception, temperature sensor etc., which can be according to phase
The operation answered carries out event input to interactive application.It is illustrated so that external equipment is keyboard as an example, as shown in fig. 7, acquisition is outer
The step of incoming event of portion's equipment the following steps are included:
S702, first by human-computer interaction apply needed for interactive action be mapped in keyboard, establish KeyEvent;
Then S704 monitors KeyEvent by movement acquisition module;
S706 is getting KeyEvent;
S708 records the corresponding movement of the KeyEvent, to save movement frame sequence.
In the present embodiment, the action identification for acquiring the interactive action in each status frames includes applied to adopting in terminal
Collect contact action and acquire the incoming event of external equipment, provides the various ways of the action identification of acquisition interactive action,
Improve the range of interactive application acquisition action identification.
Embodiment 3
According to embodiments of the present invention, additionally provide it is a kind of for implementing the electronic device of above-mentioned neural network training method,
As shown in Figure 10, which includes: one or more (only showing one in figure) processors 1002, memory 1004, shows
Show device 1006, user interface 1008, transmitting device 1010.Wherein, memory 1004 can be used for storing software program and module,
As in the embodiment of the present invention security flaw detection method and the corresponding program instruction/module of device, processor 1002 pass through fortune
The software program and module that row is stored in memory 1004, thereby executing various function application and data processing, i.e., in fact
The detection method of existing above-mentioned system vulnerability attack.Memory 1004 may include high speed random access memory, can also include non-easy
The property lost memory, such as one or more magnetic storage device, flash memory or other non-volatile solid state memories.Some
In example, memory 1004 can further comprise the memory remotely located relative to processor 1002, these remote memories
Network connection to terminal A can be passed through.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, shifting
Dynamic communication network and combinations thereof.
Above-mentioned transmitting device 1010 is used to that data to be received or sent via a network.Above-mentioned network specific example
It may include cable network and wireless network.In an example, transmitting device 1010 includes a network adapter (Network
Interface Controller, NIC), can be connected by cable with other network equipments with router so as to interconnection
Net or local area network are communicated.In an example, transmitting device 1010 is radio frequency (Radio Frequency, RF) module,
For wirelessly being communicated with internet.
Wherein, specifically, memory 1004 is used to store information, the Yi Jiying of deliberate action condition and default access user
Use program.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2
Example, details are not described herein for the present embodiment.
It will appreciated by the skilled person that structure shown in Fig. 10 is only to illustrate, electronic device is also possible to intelligence
It can mobile phone (such as Android phone, iOS mobile phone), tablet computer, applause computer and mobile internet device (Mobile
Internet Devices, MID), the terminal devices such as PAD.Figure 10 it does not cause to limit to the structure of above-mentioned electronic device.Example
Such as, electronic device may also include than shown in Figure 10 more perhaps less component (such as network interface, display device) or
With the configuration different from shown in Figure 10.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing the relevant hardware of terminal device by program, which can store in a computer readable storage medium
In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random
Access Memory, RAM), disk or CD etc..
Embodiment 4
The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can
With at least one network equipment in multiple network equipments in the network that is located at.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
S1 obtains the offline sample set for training the neural network in man-machine interactive application, wherein offline sample set
It include the offline sample for meeting predetermined configurations condition in conjunction;
S2 obtains object neural network, wherein man-machine using the offline initial neural network of sample set off-line training
In interactive application, the processing capacity of object neural network is higher than the processing capacity of initial neural network;
The on-line operation environment of object neural network access human-computer interaction application is carried out on-line training, obtains target by S3
Neural network.
Optionally, storage medium is also configured to store the program code for executing following steps:
S1 is obtained using the offline sample obtained after training account operation human-computer interaction application;
S2 is screened from the offline sample got according to predetermined configurations condition and is obtained offline sample set.
Optionally, storage medium is also configured to store the program code for executing following steps:
S1 acquires training account in each status frames during applying using training account operation human-computer interaction
Interaction parameter parameter value, wherein interaction parameter include: interaction mode, interactive action, interaction feedback excitation;
S2 obtains offline sample according to the parameter value of interaction parameter.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or
The various media that can store program code such as CD.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2
Example, details are not described herein for the present embodiment.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention
State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (18)
1. a kind of neural network training method characterized by comprising
Obtain the offline sample set for training the neural network in man-machine interactive application, wherein the offline sample set
In include meeting the offline sample of predetermined configurations condition;
Using the offline initial neural network of sample set off-line training, object neural network is obtained, wherein described man-machine
In interactive application, the processing capacity of the object neural network is higher than the processing capacity of the initial neural network;
The on-line operation environment that the object neural network accesses the human-computer interaction application is subjected to on-line training, obtains target
Neural network.
2. the method according to claim 1, wherein nerve of the acquisition for training in man-machine interactive application
The offline sample set of network includes:
It obtains and runs the offline sample obtained after the human-computer interaction application using training account;
It is screened from the offline sample got according to the predetermined configurations condition and obtains the offline sample set.
3. according to the method described in claim 2, it is characterized in that, the acquisition runs the human-computer interaction using training account
The offline sample obtained after includes:
During running human-computer interaction application using the trained account, the trained account is acquired in each state
The parameter value of interaction parameter in frame, wherein the interaction parameter includes: interaction mode, interactive action, interaction feedback excitation;
The offline sample is obtained according to the parameter value of the interaction parameter.
4. according to the method described in claim 3, it is characterized in that, described according to the acquisition of the parameter value of the interaction parameter
Sample includes: offline
According to the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames
Parameter value, combination determine the offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is to run the primary human-computer interaction
The totalframes amount of application.
5. according to the method described in claim 3, it is characterized in that, the acquisition trained account is in each status frames
The parameter value of interaction parameter includes at least one of:
The status indicator for acquiring the interaction mode in each status frames is obtained using described in the trained account operation
State frame sequence during human-computer interaction application;
The action identification for acquiring the interactive action in each status frames is obtained using described in the trained account operation
Movement frame sequence during human-computer interaction application;
Obtain the matched interaction feedback excitation parameters of application type with human-computer interaction application;The interaction feedback is calculated to swash
The parameter value for encouraging parameter obtains running the feedback excitation frame sequence during the human-computer interaction is applied using the trained account
Column.
6. according to the method described in claim 5, it is characterized in that, the interactive shape acquired in each status frames
The status indicator of state includes:
The status screen of the interaction mode in each status frames of screenshotss;
The status indicator of the interaction mode is determined according to the status screen.
7. according to the method described in claim 5, it is characterized in that, the interaction in each status frames of acquisition is dynamic
The action identification of work includes:
Acquire contact action;Obtain the institute of the interactive action corresponding with the contact action in human-computer interaction application
State action identification;Or
Acquire the incoming event of external equipment, wherein the incoming event includes at least one of: keypad input event, body
Feel incoming event, sensing equipment incoming event;It obtains corresponding with the incoming event described in human-computer interaction application
The action identification of interactive action.
8. the method according to claim 1, wherein described initial using the offline sample set off-line training
Neural network, obtaining object neural network includes:
In the case where the predetermined configurations condition indicates to obtain high-grade object neural network, high-grade offline sample set is used
It closes training and obtains the high-grade object neural network, wherein the offline sample in the high-grade offline sample set
Operation result in human-computer interaction application is higher than predetermined threshold;Or
In the case where the predetermined configurations condition indicates to obtain the object neural network of multiple grades, each grade is used respectively
Offline sample set training obtain the object neural network of corresponding grade, wherein in the offline sample set of multiple grades
Operation result of the offline sample in human-computer interaction application is in respectively within the scope of different targets thresholds, wherein described
The object neural network of multiple grades includes at least the first estate object network, the second class object network, wherein described first
The processing capacity of class object network is higher than the processing capacity of the second class object network.
9. the method according to claim 1, wherein described access the man-machine friendship for the object neural network
The on-line operation environment mutually applied carries out on-line training, and obtaining target nerve network includes:
The on-line operation environment that the object neural network is accessed to the human-computer interaction application, is answered with the human-computer interaction
Online account in carries out online dual training;Or
The on-line operation environment that the object neural network is accessed to the human-computer interaction application, substitutes the human-computer interaction
The first online account in continues to carry out online dual training with the second online account.
10. a kind of neural metwork training device characterized by comprising
Acquiring unit, for obtaining the offline sample set for training the neural network in man-machine interactive application, wherein described
It include the offline sample for meeting predetermined configurations condition in offline sample set;
Off-line training unit obtains object nerve net for using the offline initial neural network of sample set off-line training
Network, wherein in human-computer interaction application, the processing capacity of the object neural network is higher than the initial neural network
Processing capacity;
On-line training unit, the on-line operation environment for the object neural network to be accessed the human-computer interaction application carry out
On-line training obtains target nerve network.
11. device according to claim 10, which is characterized in that the acquiring unit includes:
Module is obtained, runs the offline sample obtained after the human-computer interaction application using training account for obtaining;
Screening module, for screened from the offline sample got according to the predetermined configurations condition obtain it is described offline
Sample set.
12. device according to claim 11, which is characterized in that the acquisition module includes:
Submodule is acquired, for acquiring the instruction during running human-computer interaction application using the trained account
The parameter value of interaction parameter of the experienced account in each status frames, wherein the interaction parameter includes: interaction mode, interacts and move
Make, interaction feedback excitation;
Acquisition submodule, for obtaining the offline sample according to the parameter value of the interaction parameter.
13. device according to claim 12, which is characterized in that the acquisition submodule realizes basis by following steps
The parameter value of the interaction parameter obtains the offline sample:
According to the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames
Parameter value, combination determine the offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is to run the primary human-computer interaction
The totalframes amount of application.
14. device according to claim 12, which is characterized in that the acquisition submodule is by way of following at least one
Acquire the parameter value of interaction parameter of the trained account in each status frames:
The status indicator for acquiring the interaction mode in each status frames is obtained using described in the trained account operation
State frame sequence during human-computer interaction application;
The action identification for acquiring the interactive action in each status frames is obtained using described in the trained account operation
Movement frame sequence during human-computer interaction application;
Obtain the matched interaction feedback excitation parameters of application type with human-computer interaction application;The interaction feedback is calculated to swash
The parameter value for encouraging parameter obtains running the feedback excitation frame sequence during the human-computer interaction is applied using the trained account
Column.
15. device according to claim 14, which is characterized in that the acquisition submodule is acquired each by following steps
The status indicator of the interaction mode in the status frames:
The status screen of the interaction mode in each status frames of screenshotss;
The status indicator of the interaction mode is determined according to the status screen.
16. device according to claim 15, which is characterized in that the acquisition submodule is acquired each by following steps
The action identification of the interactive action in the status frames:
Acquire contact action;Obtain the institute of the interactive action corresponding with the contact action in human-computer interaction application
State action identification;Or
Acquire the incoming event of external equipment, wherein the incoming event includes at least one of: keypad input event, body
Feel incoming event, sensing equipment incoming event;It obtains corresponding with the incoming event described in human-computer interaction application
The action identification of interactive action.
17. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein when described program is run
Execute method described in any one of claim 1 to 9.
18. a kind of electronic device, including memory, processor and it is stored on the memory and can transports on the processor
Capable computer program, which is characterized in that the processor executes the claim 1 to 9 times by the computer program
Method described in one.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711037964.3A CN109726808B (en) | 2017-10-27 | 2017-10-27 | Neural network training method and device, storage medium and electronic device |
PCT/CN2018/111914 WO2019080900A1 (en) | 2017-10-27 | 2018-10-25 | Neural network training method and device, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711037964.3A CN109726808B (en) | 2017-10-27 | 2017-10-27 | Neural network training method and device, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109726808A true CN109726808A (en) | 2019-05-07 |
CN109726808B CN109726808B (en) | 2022-12-09 |
Family
ID=66246220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711037964.3A Active CN109726808B (en) | 2017-10-27 | 2017-10-27 | Neural network training method and device, storage medium and electronic device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109726808B (en) |
WO (1) | WO2019080900A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104925A (en) * | 2019-12-30 | 2020-05-05 | 上海商汤临港智能科技有限公司 | Image processing method, image processing apparatus, storage medium, and electronic device |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110796248A (en) * | 2019-08-27 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Data enhancement method, device, equipment and storage medium |
CN110610169B (en) * | 2019-09-20 | 2023-12-15 | 腾讯科技(深圳)有限公司 | Picture marking method and device, storage medium and electronic device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101630144A (en) * | 2009-08-18 | 2010-01-20 | 湖南大学 | Self-learning inverse model control method of electronic throttle |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
US20170024643A1 (en) * | 2015-07-24 | 2017-01-26 | Google Inc. | Continuous control with deep reinforcement learning |
CN106462801A (en) * | 2014-10-07 | 2017-02-22 | 谷歌公司 | Training neural networks on partitioned training data |
CN106650721A (en) * | 2016-12-28 | 2017-05-10 | 吴晓军 | Industrial character identification method based on convolution neural network |
CN106940801A (en) * | 2016-01-04 | 2017-07-11 | 中国科学院声学研究所 | A kind of deeply for Wide Area Network learns commending system and method |
CN107209872A (en) * | 2015-02-06 | 2017-09-26 | 谷歌公司 | The distributed training of reinforcement learning system |
CN107291232A (en) * | 2017-06-20 | 2017-10-24 | 深圳市泽科科技有限公司 | A kind of somatic sensation television game exchange method and system based on deep learning and big data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6471934B2 (en) * | 2014-06-12 | 2019-02-20 | パナソニックIpマネジメント株式会社 | Image recognition method, camera system |
-
2017
- 2017-10-27 CN CN201711037964.3A patent/CN109726808B/en active Active
-
2018
- 2018-10-25 WO PCT/CN2018/111914 patent/WO2019080900A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101630144A (en) * | 2009-08-18 | 2010-01-20 | 湖南大学 | Self-learning inverse model control method of electronic throttle |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
CN106462801A (en) * | 2014-10-07 | 2017-02-22 | 谷歌公司 | Training neural networks on partitioned training data |
CN107209872A (en) * | 2015-02-06 | 2017-09-26 | 谷歌公司 | The distributed training of reinforcement learning system |
US20170024643A1 (en) * | 2015-07-24 | 2017-01-26 | Google Inc. | Continuous control with deep reinforcement learning |
CN106940801A (en) * | 2016-01-04 | 2017-07-11 | 中国科学院声学研究所 | A kind of deeply for Wide Area Network learns commending system and method |
CN106650721A (en) * | 2016-12-28 | 2017-05-10 | 吴晓军 | Industrial character identification method based on convolution neural network |
CN107291232A (en) * | 2017-06-20 | 2017-10-24 | 深圳市泽科科技有限公司 | A kind of somatic sensation television game exchange method and system based on deep learning and big data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104925A (en) * | 2019-12-30 | 2020-05-05 | 上海商汤临港智能科技有限公司 | Image processing method, image processing apparatus, storage medium, and electronic device |
WO2021135424A1 (en) * | 2019-12-30 | 2021-07-08 | 上海商汤临港智能科技有限公司 | Image processing method and apparatus, storage medium, and electronic device |
Also Published As
Publication number | Publication date |
---|---|
WO2019080900A1 (en) | 2019-05-02 |
CN109726808B (en) | 2022-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Brown et al. | Finding waldo: Learning about users from their interactions | |
CN110339569B (en) | Method and device for controlling virtual role in game scene | |
CN107340944B (en) | The display methods and device of interface | |
TW201814445A (en) | Performing operations based on gestures | |
CN109726808A (en) | Neural network training method and device, storage medium and electronic device | |
CN109508789A (en) | Predict method, storage medium, processor and the equipment of hands | |
CN109905696A (en) | A kind of recognition methods of the Video service Quality of experience based on encryption data on flows | |
CN107404656A (en) | Live video recommends method, apparatus and server | |
CN108579086A (en) | Processing method, device, storage medium and the electronic device of object | |
CN109600336A (en) | Store equipment, identifying code application method and device | |
CN109806590A (en) | Object control method and apparatus, storage medium and electronic device | |
CN109993308A (en) | Learning system and method, shared platform and method, medium are shared based on cloud platform | |
CN115064020B (en) | Intelligent teaching method, system and storage medium based on digital twin technology | |
CN112748941B (en) | Method and device for updating target application program based on feedback information | |
CN109214330A (en) | Video Semantic Analysis method and apparatus based on video timing information | |
CN110339563A (en) | The generation method and device of virtual objects, storage medium and electronic device | |
CN108319974A (en) | Data processing method, device, storage medium and electronic device | |
CN107273869A (en) | Gesture identification control method and electronic equipment | |
CN111124902A (en) | Object operating method and device, computer-readable storage medium and electronic device | |
CN113633983A (en) | Method, device, electronic equipment and medium for controlling expression of virtual character | |
CN110300089A (en) | Processing method, device, storage medium and the electronic device of target account number | |
CN112269943B (en) | Information recommendation system and method | |
CN110325965B (en) | Object processing method, device and storage medium in virtual scene | |
CN109731338A (en) | Artificial intelligence training method and device, storage medium and electronic device in game | |
CN109165347A (en) | Data push method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |