CN109726808A

CN109726808A - Neural network training method and device, storage medium and electronic device

Info

Publication number: CN109726808A
Application number: CN201711037964.3A
Authority: CN
Inventors: 杨夏; 张力柯
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2019-05-07
Anticipated expiration: 2037-10-27
Also published as: WO2019080900A1; CN109726808B

Abstract

The invention discloses a kind of neural network training methods and device, storage medium and electronic device.Wherein, this method comprises: obtaining the offline sample set for training the neural network in man-machine interactive application, wherein include the offline sample for meeting predetermined configurations condition in offline sample set；Using the offline initial neural network of sample set off-line training, object neural network is obtained, wherein in human-computer interaction application, the processing capacity of object neural network is higher than the processing capacity of initial neural network；The on-line operation environment of object neural network access human-computer interaction application is subjected to on-line training, obtains target nerve network.The present invention solves the lower technical problem of training effectiveness present in the neural network training method of the relevant technologies offer.

Description

Neural network training method and device, storage medium and electronic device

Technical field

The present invention relates to computer fields, in particular to a kind of neural network training method and device, storage medium And electronic device.

Background technique

Depth Q network (Deep Q Network, abbreviation DQN) algorithm is a kind of fusion convolutional neural networks and Q- The method of Learning is applied in depth enhancing study (Deep Reinforcement Learning, abbreviation DRL), In, depth enhancing study DRL is to combine deep learning and enhancing study, to realize from perceiving the end-to-end of movement The completely new algorithm of one kind of study.That is, by deep neural network, directly output is dynamic after inputting perception information Make, so that robot realizes entirely autonomous the study even potentiality of a variety of technical ability, to realize artificial intelligence (Artificial Intelligence, abbreviation AI) operation.In order to make robot preferably complete autonomous learning, to be skillfully applied to different fields Jing Zhong, by training rapidly and accurately to obtain neural network, just become currently there is an urgent need to the problem of.

Currently, the sample object for accessing on-line training environmental training neural network, usual rank is very low, first in training When the phase, very maximum probability is to make random action, although the state space of training environment can be explored well, is extended Training time generally requires to carry out constantly enquiry learning in training environment, can be only achieved one further, since rank is very low Fixed training goal.

That is, the training time needed for the neural network training method provided in the related technology is longer, so as to cause mind Through the lower problem of network training efficiency.

For above-mentioned problem, currently no effective solution has been proposed.

Summary of the invention

The embodiment of the invention provides a kind of neural network training methods and device, storage medium and electronic device, so that The lower technical problem of training effectiveness present in the neural network training method that the relevant technologies provide is solved less.

According to an aspect of an embodiment of the present invention, a kind of neural network training method is provided, comprising: obtain for instructing Practice the offline sample set of the neural network in human-computer interaction application, wherein include meeting to make a reservation in above-mentioned offline sample set The offline sample of configuration condition；Using the above-mentioned offline initial neural network of sample set off-line training, object neural network is obtained, Wherein, in above-mentioned human-computer interaction application, the processing capacity of above-mentioned object neural network is higher than the place of above-mentioned initial neural network Reason ability；The on-line operation environment that above-mentioned object neural network accesses above-mentioned human-computer interaction application is subjected to on-line training, is obtained Target nerve network.

According to another aspect of an embodiment of the present invention, a kind of neural metwork training device is additionally provided, comprising: obtain single Member, for obtaining the offline sample set for training the neural network in man-machine interactive application, wherein above-mentioned offline sample set It include the offline sample for meeting predetermined configurations condition in conjunction；Off-line training unit, for offline using above-mentioned offline sample set The initial neural network of training, obtains object neural network, wherein in above-mentioned human-computer interaction application, above-mentioned object neural network Processing capacity be higher than above-mentioned initial neural network processing capacity；On-line training unit is used for above-mentioned object neural network The on-line operation environment for accessing above-mentioned human-computer interaction application carries out on-line training, obtains target nerve network.

Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and above-mentioned storage medium includes storage Program, wherein above procedure run when execute above-mentioned method.

Another aspect according to an embodiment of the present invention, additionally provides a kind of electronic device, including memory, processor and deposits The computer program that can be run on above-mentioned memory and on above-mentioned processor is stored up, above-mentioned processor passes through above-mentioned computer journey Sequence executes above-mentioned method.

In embodiments of the present invention, the offline sample for being used to train the neural network in man-machine interactive application got is utilized This set, the initial neural network of off-line training, to obtain object neural network, wherein the processing capacity of the object neural network Higher than the processing capacity of initial neural network.Then, by the on-line operation of above-mentioned object neural network access human-computer interaction application Environment, to realize on-line training, to obtain with human-computer interaction using matched target nerve network.That is, by pre- The offline sample set for meeting predetermined configurations condition is first obtained, off-line training is carried out to initial neural network, obtains processing energy The higher object neural network of power, and be no longer that initial neural network access on-line operation environment is directly subjected to on-line training, To overcome being only capable of of providing in presently relevant technology to obtain training duration caused by target nerve network by on-line training It is longer, the lower problem of training effectiveness.In addition, obtaining object neural network using offline sample set off-line training, also expand For carrying out the sample range of neural metwork training, in order to obtain more high-quality or different grades of offline sample, further It ensure that the training effectiveness of neural metwork training.And then it solves present in the neural network training method of the relevant technologies offer The lower technical problem of training effectiveness.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is a kind of hardware environment schematic diagram of optional neural network training method according to an embodiment of the present invention；

Fig. 2 is a kind of flow chart of optional neural network training method according to an embodiment of the present invention；

Fig. 3 is a kind of application schematic diagram of optional neural network training method according to an embodiment of the present invention；

Fig. 4 is a kind of schematic diagram of optional neural network training method according to an embodiment of the present invention；

Fig. 5 is the schematic diagram of another optional neural network training method according to an embodiment of the present invention；

Fig. 6 is the flow chart of another optional neural network training method according to an embodiment of the present invention；

Fig. 7 is the flow chart of another optional neural network training method according to an embodiment of the present invention；

Fig. 8 is a kind of schematic diagram of optional neural metwork training device according to an embodiment of the present invention；

Fig. 9 is the schematic diagram of another optional neural network training method according to an embodiment of the present invention；

Figure 10 is a kind of schematic diagram of optional electronic device according to an embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

Embodiment 1

In embodiments of the present invention, a kind of embodiment of above-mentioned neural network training method is provided.As a kind of optional Embodiment, the neural network training method can be, but not limited to be applied to application environment as shown in Figure 1 in, terminal 102 In the client of human-computer interaction application is installed, apply such as human-computer interaction and be illustrated by taking game application as an example, object A is user Object is manipulated, object B is that machine manipulates object.It is applied by operation human-computer interaction to obtain offline sample, database is arrived in storage In 104, wherein the database 104 can be, but not limited to be located in Training Control server, also can be, but not limited to be located at third In the independent server in side；Further, what the offline sample that acquisition meets predetermined configurations condition was constituted is used to train nerve net The offline sample set of network.And the offline sample set initial neural network of off-line training in terminal 106 is used, to obtain pair As neural network, wherein the processing capacity of the object neural network is higher than the processing capacity of initial neural network.It then, will be whole The object neural network that off-line training obtains in end 106 accesses the on-line operation environment of human-computer interaction application by network 108, with On-line training is realized, to obtain with human-computer interaction using matched target nerve network.

In the present embodiment, the offline sample set for being used to train the neural network in man-machine interactive application got is utilized It closes, the initial neural network of off-line training, to obtain object neural network, wherein the processing capacity of the object neural network is higher than The processing capacity of initial neural network.Then, the on-line operation environment access human-computer interaction of above-mentioned object neural network applied, To realize on-line training, to obtain with human-computer interaction using matched target nerve network.That is, by obtaining in advance The offline sample set for meeting predetermined configurations condition carries out off-line training to initial neural network, it is higher to obtain processing capacity Object neural network, and be no longer by initial neural network access on-line operation environment directly carry out on-line training, thus gram Longer, the instruction that is only capable of obtaining training duration caused by target nerve network by on-line training provided in presently relevant technology is provided Practice the lower problem of efficiency.In addition, obtain object neural network using offline sample set off-line training, also expand for into The sample range of row neural metwork training further ensures mind in order to obtain more high-quality or different grades of offline sample Training effectiveness through network training.

Optionally, in the present embodiment, above-mentioned terminal can include but is not limited at least one of: mobile phone, plate electricity Brain, laptop, desktop PC, DTV and other can run human-computer interaction application hardware device.Above-mentioned network It can include but is not limited at least one of: wide area network, Metropolitan Area Network (MAN), local area network.A kind of above-mentioned only example, the present embodiment pair This does not do any restriction.

According to embodiments of the present invention, a kind of neural network training method is provided, as shown in Fig. 2, this method comprises:

S202 obtains the offline sample set for training the neural network in man-machine interactive application, wherein offline sample It include the offline sample for meeting predetermined configurations condition in set；

S204 obtains object neural network, wherein in people using the offline initial neural network of sample set off-line training In machine interactive application, the processing capacity of object neural network is higher than the processing capacity of initial neural network；

The on-line operation environment of object neural network access human-computer interaction application is carried out on-line training, obtains mesh by S206 Mark neural network.

Optionally, in the present embodiment, above-mentioned neural network training method can be, but not limited to be applied to following man-machine friendship In the scene mutually applied: 1) in the application of man-machine confrontation class, the target nerve network that training obtains is used to realize people with online account Machine antagonistic process；2) in on-hook confrontation application, the target nerve network that training obtains can replace online account, continue subsequent Man-machine confrontation process.That is, provided in through this embodiment using offline sample set by off-line training and online The target nerve network for having multinomial technical ability that training obtains, to complete the intelligent operation in human-computer interaction application.

It should be noted that in the present embodiment, the offline sample set of predetermined configurations condition is met by obtaining in advance, To carry out off-line training to initial neural network, obtain the higher object neural network of processing capacity, and is no longer by initial mind Directly carry out on-line training through network insertion on-line operation environment, thus overcome being only capable of of being provided in presently relevant technology by Line training obtains that training duration caused by target nerve network is longer, the lower problem of training effectiveness.In addition, utilizing offline sample This set off-line training obtains object neural network, also expands the sample range for carrying out neural metwork training, in order to More high-quality or different grades of offline sample is obtained, the training effectiveness of neural metwork training is further ensured.

Optionally, in the present embodiment, the target nerve network in above-mentioned different application scene can include but is not limited to It is obtained by following on-line training mode:

1) online in the on-line operation environment that object neural network is accessed to human-computer interaction application, with human-computer interaction application Account carries out online dual training；Or

2) by the on-line operation environment of object neural network access human-computer interaction application, in substitution human-computer interaction application the One online account continues to carry out online dual training with the second online account.

It should be noted that online account can be, but not limited to as the user's control account in man-machine interactive application, such as with It is illustrated for shown in Fig. 3, object A can manipulate object for user, and object B is that machine manipulates object, above-mentioned for obtaining The object neural network of target nerve network can be, but not limited to as object B, by online dual training, further to improve pair As the weighted value in neural network, corresponding target nerve network is obtained；In addition, being still illustrated taking what is shown in fig. 3 as an example, object A can manipulate object for user, and object B can also manipulate object with user, run a period of time and selection on-hook operation in object A Afterwards, it can be, but not limited to object A replacing with object neural network, by continuing man-machine confrontation process with object B, into One step improves the weighted value in object neural network, obtains corresponding target nerve network.

Optionally, in the present embodiment, using the offline initial neural network of sample set off-line training, object nerve is obtained Network includes:

1) in the case where predetermined configurations condition indicates to obtain high-grade object neural network, high-grade offline sample is used Set training obtains high-grade object neural network, wherein the offline sample in high-grade offline sample set is in human-computer interaction Operation result in is higher than predetermined threshold；Or

2) in the case where predetermined configurations condition indicates to obtain the object neural network of multiple grades, respectively using each etc. The offline sample set training of grade obtains the object neural network of corresponding grade, wherein in the offline sample set of multiple grades Offline sample human-computer interaction application in operation result be within the scope of different targets thresholds respectively, wherein it is multiple etc. The object neural network of grade includes at least the first estate object network, the second class object network, wherein the first estate object Petri net The processing capacity of network is higher than the processing capacity of the second class object network.

It should be noted that in the present embodiment, above-mentioned target nerve network can be, but not limited to according to different off-line sample The level of interaction of offline sample in this set, and training obtains the neural network with different grades of level of interaction.For example, Aforesaid way 1), the high-quality offline sample that operation result is higher than predetermined threshold is obtained from offline sample, is obtained by off-line training To high-grade object neural network, to promote the winning rate of machine in man-machine confrontation, so that it is man-machine to attract more users account to participate in Interactive application；Aforesaid way 2), it is in respectively from acquisition operation result in offline sample more within the scope of different targets thresholds The offline sample set of a grade obtains the object neural network of multiple grades by off-line training, in abundant human-computer interaction Confrontation level.

Optionally, in the present embodiment, above-mentioned offline sample can be, but not limited to obtain in the following manner: use instruction During practicing account operation human-computer interaction application, the parameter value of interaction parameter of the training account in each status frames is acquired, Wherein, interaction parameter includes: interaction mode, interactive action, interaction feedback excitation；It is obtained according to the parameter value of interaction parameter offline Sample.

It should be noted that can be, but not limited to refer to human-computer interaction application run during according to frame number successively by Frame shows each status frames, and acquires the parameter value of the interaction parameter in each status frames, to obtain each interaction ginseng The frame sequence of several parameter values, and then offline sample is obtained using the frame sequence.Wherein, interaction mode can be, but not limited to basis The interactive picture of human-computer interaction application determines, interactive action can be, but not limited to be applied according to human-computer interaction in the interaction behaviour that receives It determines, interaction feedback excitation can be, but not limited to be motivated according to the matched interaction feedback of application type with human-computer interaction application The parameter value of parameter determines.

By embodiment provided by the present application, the offline sample set of predetermined configurations condition is met by obtaining in advance, is come Off-line training is carried out to initial neural network, obtains the higher object neural network of processing capacity, and is no longer by initial nerve Network insertion on-line operation environment directly carries out on-line training, to overcome what is provided in presently relevant technology to be only capable of by online Training obtains that training duration caused by target nerve network is longer, the lower problem of training effectiveness.In addition, utilizing offline sample Set off-line training obtains object neural network, also expands the sample range for carrying out neural metwork training, in order to To more high-quality or different grades of offline sample, the training effectiveness of neural metwork training is further ensured.

As a kind of optional scheme, the offline sample set for training the neural network in man-machine interactive application is obtained Include:

S1 is obtained using the offline sample obtained after training account operation human-computer interaction application；

S2 is screened from the offline sample got according to predetermined configurations condition and is obtained offline sample set.

Optionally, in the present embodiment, it obtains using the offline sample obtained after training account operation human-computer interaction application Include:

S11 acquires training account in each status frames during applying using training account operation human-computer interaction Interaction parameter parameter value, wherein interaction parameter include: interaction mode, interactive action, interaction feedback excitation；

S12 obtains offline sample according to the parameter value of interaction parameter.

It should be noted that in the present embodiment, interaction feedback excitation is the root by DQN algorithm in human-computer interaction application Current state is calculated to the feedback excitation value of movement, to obtain the ginseng of above-mentioned interaction feedback excitation according to the variation of interaction mode Numerical value.Specific calculation formula can be, but not limited to be applied according to different types of human-computer interaction and be set as different disclosures.Example Such as, by taking multi-person interactive game is applied as an example, the parameter of above-mentioned interaction feedback excitation be can be, but not limited to as each character object Blood volume, get in the training process trained account blood volume it is higher when, positive energize value of feedback can be configured, otherwise, configure and negative swash Encourage value of feedback.In another example the parameter of above-mentioned interaction feedback excitation can be, but not limited to be complete by taking distance sports class application as an example At mileage, get in the training process mileage that trained account is completed it is remoter when, it is bigger to configure excitation value of feedback, no Then, configuration excitation value of feedback is smaller.Above-mentioned is only a kind of example, does not do any restriction to this in the present embodiment.In addition, in this reality It applies in example, the parameter of above-mentioned interaction feedback excitation can be, but not limited to successively record according to the frame number of status frames.

It is specifically illustrated in conjunction with example as shown in Figure 4, during human-computer interaction application is run, acquires interaction shape State st, record obtain state frame sequence (s0, s1 ... st)；To acquire interactive action at, record is acted for acquisition movement output Frame sequence (a0, a1 ... at)；The parameter value of interaction feedback excitation parameters is further calculated to determine the parameter of interaction feedback excitation Value rt, record obtain feedback excitation frame sequence (r0, r1 ... rt).And the above-mentioned intermediate sample collected is further passed through Above-mentioned intermediate sample is combined to obtain offline sample, and the determining offline sample of combination is stored into offline sample database.

In the present embodiment, by above-mentioned interaction mode, the acquisition data of interactive action, interaction feedback excitation three parts press shape The frame number of state frame synchronizes combination, to generate offline sample, such as DQN sample, further arrives the DQN Sample preservation of generation In offline sample database.

As a kind of optional scheme, obtaining offline sample according to the parameter value of interaction parameter includes:

S1, according to the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames Parameter value, combination determine offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is the total of operation human-computer interaction application Number of frames.

Specifically be illustrated as shown in connection with fig. 5, above-mentioned offline sample can be, but not limited to for a four-tuple (s, a, r, S '), meaning is respectively as follows:

Interaction mode (state, abbreviation s) in s: i-th status frames

Interactive action (action, abbreviation a) in a: i-th status frames

Interaction in r: i-th status frames is made under interaction mode s, and after making movement a, the interaction feedback of acquisition is motivated (reward, abbreviation r)

S ': the interaction mode (next state, abbreviation s ') in i+1 status frames

As shown in figure 5, by the parameter value of the interaction parameter in i-th of status frames of current time, with subsequent time i+1 The parameter value of interaction parameter in status frames is combined, to obtain one group of offline sample on right side.It is actually current shape The parameter value of the interaction parameter of state frame and the interaction parameter value of the interaction parameter of NextState frame are combined.

In the present embodiment, by will be in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames Interaction parameter parameter value, combination determines offline sample, accurate offline sample data can be generated, with accelerans network Convergence process.

As a kind of optional scheme, acquire interaction parameter of the training account in each status frames parameter value include with It is at least one lower:

1) status indicator for acquiring the interaction mode in each status frames obtains answering using training account operation human-computer interaction State frame sequence during；

2) action identification for acquiring the interactive action in each status frames obtains answering using training account operation human-computer interaction Movement frame sequence during；

3) the matched interaction feedback excitation parameters of application type with human-computer interaction application are obtained；Calculate interaction feedback excitation The parameter value of parameter obtains the feedback excitation frame sequence during using training account operation human-computer interaction to apply.

It is illustrated with example as shown in Figure 4, during human-computer interaction application is run, acquires interaction mode st, Record obtains state frame sequence (s0, s1 ... st)；To acquire interactive action at, record obtains movement frame sequence for acquisition movement output (a0, a1 ... at)；The parameter value of interaction feedback excitation parameters is further calculated to determine the parameter value rt of interaction feedback excitation, note Record obtains feedback excitation frame sequence (r0, r1 ... rt).

In the present embodiment, interaction mode, the interactive action in each status frames are obtained.According to interaction feedback excitation parameters The parameter value of interaction feedback excitation parameters is obtained to obtain the corresponding state frame sequence in human-computer interaction application process, is acted Frame sequence and feedback excitation frame sequence, in order to combine to obtain DQN (neural network) offline sample.

As a kind of optional scheme, the status indicator for acquiring the interaction mode in each status frames includes:

S1, the status screen of the interaction mode in each status frames of screenshotss；

S2 determines the status indicator of interaction mode according to status screen.

It is specifically illustrated as shown in connection with fig. 6, acquires the status indicator of the interaction mode in each status frames, specifically include Following steps:

S602 starts the real-time screen capture module in terminal；

S604 runs human-computer interaction application；

S606, status screen during running human-computer interaction application, in real-time screenshotss status frames；

S608 obtains multiple status screens, stores to obtain state frame sequence according to frame number.

In the present embodiment, then the status screen of the interaction mode of each status frames of screenshotss is determined according to status screen The status indicator of interaction mode acquires the friendship in each status frames to realize during human-computer interaction application is run in real time The status indicator of mutual state.

As a kind of optional scheme, the action identification for acquiring the interactive action in each status frames includes:

1) contact action is acquired；Obtain the movement mark of the interactive action corresponding with contact action in human-computer interaction application Know；Or

2) incoming event of external equipment is acquired, wherein incoming event includes at least one of: keypad input event, Body-sensing incoming event, sensing equipment incoming event；Obtain the interactive action corresponding with incoming event in human-computer interaction application Action identification.

The incoming event of acquisition contact action and acquisition external equipment is specifically described below:

(1) it is illustrated for acquiring contact action first, it will usually it is acquired contact action on mobile terminals, In human-computer interaction application on mobile terminal, usually following several operation modes: universal wheel operation on touch key-press, touch screen, Gyroscope operation, electronic curtain touch operation in terminal etc., the mainly touching by being mapped to interactive action on mobile terminal Key, the universal wheel on touch screen, touch screen etc. are touched, the movement acquisition module monitoring key in mobile terminal or interactive application is passed through Disk event records the corresponding movement of the event after getting corresponding event, to save movement frame sequence.

(2) usually external equipment includes keyboard, infrared ray perception, temperature sensor etc., which can be according to phase The operation answered carries out event input to interactive application.It is illustrated so that external equipment is keyboard as an example, as shown in fig. 7, acquisition is outer The step of incoming event of portion's equipment the following steps are included:

S702, first by human-computer interaction apply needed for interactive action be mapped in keyboard, establish KeyEvent；

Then S704 monitors KeyEvent by movement acquisition module；

S706 is getting KeyEvent；

S708 records the corresponding movement of the KeyEvent, to save movement frame sequence.

In the present embodiment, the action identification for acquiring the interactive action in each status frames includes applied to adopting in terminal Collect contact action and acquire the incoming event of external equipment, provides the various ways of the action identification of acquisition interactive action, Improve the range of interactive application acquisition action identification.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.

Embodiment 2

According to embodiments of the present invention, a kind of neural network instruction for implementing above-mentioned neural network training method is additionally provided Practice device, as shown in figure 8, the device includes:

1) acquiring unit 802, for obtaining the offline sample set for training the neural network in man-machine interactive application, It wherein, include the offline sample for meeting predetermined configurations condition in offline sample set；

2) off-line training unit 804 obtains object mind for using the offline initial neural network of sample set off-line training Through network, wherein in human-computer interaction application, the processing capacity of object neural network is higher than the processing energy of initial neural network Power；

3) on-line training unit 806, for by object neural network access human-computer interaction application on-line operation environment into Row on-line training obtains target nerve network.

As a kind of optional scheme, as shown in figure 9, acquiring unit 802 includes:

1) module 902 is obtained, for obtaining using the offline sample obtained after training account operation human-computer interaction application；

2) screening module 904 obtain offline sample for screening from the offline sample got according to predetermined configurations condition This set.

As a kind of optional scheme, obtaining module includes:

1) submodule is acquired, for acquiring training account during applying using training account operation human-computer interaction The parameter value of interaction parameter in each status frames, wherein interaction parameter includes: interaction mode, interactive action, interaction feedback Excitation；

2) acquisition submodule, for obtaining offline sample according to the parameter value of interaction parameter.

As a kind of optional scheme, acquisition submodule is realized by following steps and is obtained according to the parameter value of interaction parameter Offline sample:

1) according to the ginseng of the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames Numerical value, combination determine offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is the total frame for running human-computer interaction application Quantity.

Interaction mode (state, abbreviation s) in s: i-th status frames

Interactive action (action, abbreviation a) in a: i-th status frames

S ': the interaction mode (next state, abbreviation s ') in i+1 status frames

As a kind of optional scheme, acquires submodule and acquire trained account by way of following at least one in each shape The parameter value of interaction parameter in state frame:

As a kind of optional scheme, acquires submodule and acquire the interaction mode in each status frames by following steps Status indicator:

S602 starts the real-time screen capture module in terminal；

S604 runs human-computer interaction application；

As a kind of optional scheme, acquires submodule and acquire the interactive action in each status frames by following steps Action identification:

Then S704 monitors KeyEvent by movement acquisition module；

S706 is getting KeyEvent；

Embodiment 3

According to embodiments of the present invention, additionally provide it is a kind of for implementing the electronic device of above-mentioned neural network training method, As shown in Figure 10, which includes: one or more (only showing one in figure) processors 1002, memory 1004, shows Show device 1006, user interface 1008, transmitting device 1010.Wherein, memory 1004 can be used for storing software program and module, As in the embodiment of the present invention security flaw detection method and the corresponding program instruction/module of device, processor 1002 pass through fortune The software program and module that row is stored in memory 1004, thereby executing various function application and data processing, i.e., in fact The detection method of existing above-mentioned system vulnerability attack.Memory 1004 may include high speed random access memory, can also include non-easy The property lost memory, such as one or more magnetic storage device, flash memory or other non-volatile solid state memories.Some In example, memory 1004 can further comprise the memory remotely located relative to processor 1002, these remote memories Network connection to terminal A can be passed through.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, shifting Dynamic communication network and combinations thereof.

Above-mentioned transmitting device 1010 is used to that data to be received or sent via a network.Above-mentioned network specific example It may include cable network and wireless network.In an example, transmitting device 1010 includes a network adapter (Network Interface Controller, NIC), can be connected by cable with other network equipments with router so as to interconnection Net or local area network are communicated.In an example, transmitting device 1010 is radio frequency (Radio Frequency, RF) module, For wirelessly being communicated with internet.

Wherein, specifically, memory 1004 is used to store information, the Yi Jiying of deliberate action condition and default access user Use program.

Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2 Example, details are not described herein for the present embodiment.

It will appreciated by the skilled person that structure shown in Fig. 10 is only to illustrate, electronic device is also possible to intelligence It can mobile phone (such as Android phone, iOS mobile phone), tablet computer, applause computer and mobile internet device (Mobile Internet Devices, MID), the terminal devices such as PAD.Figure 10 it does not cause to limit to the structure of above-mentioned electronic device.Example Such as, electronic device may also include than shown in Figure 10 more perhaps less component (such as network interface, display device) or With the configuration different from shown in Figure 10.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing the relevant hardware of terminal device by program, which can store in a computer readable storage medium In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..

Embodiment 4

The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can With at least one network equipment in multiple network equipments in the network that is located at.

Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:

S1 obtains the offline sample set for training the neural network in man-machine interactive application, wherein offline sample set It include the offline sample for meeting predetermined configurations condition in conjunction；

S2 obtains object neural network, wherein man-machine using the offline initial neural network of sample set off-line training In interactive application, the processing capacity of object neural network is higher than the processing capacity of initial neural network；

The on-line operation environment of object neural network access human-computer interaction application is carried out on-line training, obtains target by S3 Neural network.

Optionally, storage medium is also configured to store the program code for executing following steps:

S1 acquires training account in each status frames during applying using training account operation human-computer interaction Interaction parameter parameter value, wherein interaction parameter include: interaction mode, interactive action, interaction feedback excitation；

S2 obtains offline sample according to the parameter value of interaction parameter.

Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or The various media that can store program code such as CD.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention State all or part of the steps of method.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of neural network training method characterized by comprising

Obtain the offline sample set for training the neural network in man-machine interactive application, wherein the offline sample set In include meeting the offline sample of predetermined configurations condition；

Using the offline initial neural network of sample set off-line training, object neural network is obtained, wherein described man-machine In interactive application, the processing capacity of the object neural network is higher than the processing capacity of the initial neural network；

The on-line operation environment that the object neural network accesses the human-computer interaction application is subjected to on-line training, obtains target Neural network.

2. the method according to claim 1, wherein nerve of the acquisition for training in man-machine interactive application The offline sample set of network includes:

It obtains and runs the offline sample obtained after the human-computer interaction application using training account；

It is screened from the offline sample got according to the predetermined configurations condition and obtains the offline sample set.

3. according to the method described in claim 2, it is characterized in that, the acquisition runs the human-computer interaction using training account The offline sample obtained after includes:

During running human-computer interaction application using the trained account, the trained account is acquired in each state The parameter value of interaction parameter in frame, wherein the interaction parameter includes: interaction mode, interactive action, interaction feedback excitation；

The offline sample is obtained according to the parameter value of the interaction parameter.

4. according to the method described in claim 3, it is characterized in that, described according to the acquisition of the parameter value of the interaction parameter Sample includes: offline

According to the interaction parameter in the parameter value of the interaction parameter in i-th of status frames and i+1 status frames Parameter value, combination determine the offline sample, wherein i is more than or equal to 1, is less than or equal to N, and N is to run the primary human-computer interaction The totalframes amount of application.

5. according to the method described in claim 3, it is characterized in that, the acquisition trained account is in each status frames The parameter value of interaction parameter includes at least one of:

The status indicator for acquiring the interaction mode in each status frames is obtained using described in the trained account operation State frame sequence during human-computer interaction application；

The action identification for acquiring the interactive action in each status frames is obtained using described in the trained account operation Movement frame sequence during human-computer interaction application；

Obtain the matched interaction feedback excitation parameters of application type with human-computer interaction application；The interaction feedback is calculated to swash The parameter value for encouraging parameter obtains running the feedback excitation frame sequence during the human-computer interaction is applied using the trained account Column.

6. according to the method described in claim 5, it is characterized in that, the interactive shape acquired in each status frames The status indicator of state includes:

The status screen of the interaction mode in each status frames of screenshotss；

The status indicator of the interaction mode is determined according to the status screen.

7. according to the method described in claim 5, it is characterized in that, the interaction in each status frames of acquisition is dynamic The action identification of work includes:

Acquire contact action；Obtain the institute of the interactive action corresponding with the contact action in human-computer interaction application State action identification；Or

Acquire the incoming event of external equipment, wherein the incoming event includes at least one of: keypad input event, body Feel incoming event, sensing equipment incoming event；It obtains corresponding with the incoming event described in human-computer interaction application The action identification of interactive action.

8. the method according to claim 1, wherein described initial using the offline sample set off-line training Neural network, obtaining object neural network includes:

In the case where the predetermined configurations condition indicates to obtain high-grade object neural network, high-grade offline sample set is used It closes training and obtains the high-grade object neural network, wherein the offline sample in the high-grade offline sample set Operation result in human-computer interaction application is higher than predetermined threshold；Or

In the case where the predetermined configurations condition indicates to obtain the object neural network of multiple grades, each grade is used respectively Offline sample set training obtain the object neural network of corresponding grade, wherein in the offline sample set of multiple grades Operation result of the offline sample in human-computer interaction application is in respectively within the scope of different targets thresholds, wherein described The object neural network of multiple grades includes at least the first estate object network, the second class object network, wherein described first The processing capacity of class object network is higher than the processing capacity of the second class object network.

9. the method according to claim 1, wherein described access the man-machine friendship for the object neural network The on-line operation environment mutually applied carries out on-line training, and obtaining target nerve network includes:

The on-line operation environment that the object neural network is accessed to the human-computer interaction application, is answered with the human-computer interaction Online account in carries out online dual training；Or

The on-line operation environment that the object neural network is accessed to the human-computer interaction application, substitutes the human-computer interaction The first online account in continues to carry out online dual training with the second online account.

10. a kind of neural metwork training device characterized by comprising

Acquiring unit, for obtaining the offline sample set for training the neural network in man-machine interactive application, wherein described It include the offline sample for meeting predetermined configurations condition in offline sample set；

Off-line training unit obtains object nerve net for using the offline initial neural network of sample set off-line training Network, wherein in human-computer interaction application, the processing capacity of the object neural network is higher than the initial neural network Processing capacity；

On-line training unit, the on-line operation environment for the object neural network to be accessed the human-computer interaction application carry out On-line training obtains target nerve network.

11. device according to claim 10, which is characterized in that the acquiring unit includes:

Module is obtained, runs the offline sample obtained after the human-computer interaction application using training account for obtaining；

Screening module, for screened from the offline sample got according to the predetermined configurations condition obtain it is described offline Sample set.

12. device according to claim 11, which is characterized in that the acquisition module includes:

Submodule is acquired, for acquiring the instruction during running human-computer interaction application using the trained account The parameter value of interaction parameter of the experienced account in each status frames, wherein the interaction parameter includes: interaction mode, interacts and move Make, interaction feedback excitation；

Acquisition submodule, for obtaining the offline sample according to the parameter value of the interaction parameter.

13. device according to claim 12, which is characterized in that the acquisition submodule realizes basis by following steps The parameter value of the interaction parameter obtains the offline sample:

14. device according to claim 12, which is characterized in that the acquisition submodule is by way of following at least one Acquire the parameter value of interaction parameter of the trained account in each status frames:

15. device according to claim 14, which is characterized in that the acquisition submodule is acquired each by following steps The status indicator of the interaction mode in the status frames:

16. device according to claim 15, which is characterized in that the acquisition submodule is acquired each by following steps The action identification of the interactive action in the status frames:

17. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein when described program is run Execute method described in any one of claim 1 to 9.

18. a kind of electronic device, including memory, processor and it is stored on the memory and can transports on the processor Capable computer program, which is characterized in that the processor executes the claim 1 to 9 times by the computer program Method described in one.