CN109726826A

CN109726826A - Training method, device, storage medium and the electronic equipment of random forest

Info

Publication number: CN109726826A
Application number: CN201811557766.4A
Authority: CN
Inventors: 高睿
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-12-19
Filing date: 2018-12-19
Publication date: 2019-05-07
Anticipated expiration: 2038-12-19
Also published as: CN109726826B

Abstract

This disclosure relates to a kind of training method of random forest, device, storage medium and electronic equipment, this method comprises: determining n group training dataset in the first training data；The n tree trained by the description data judgement of first training data, obtains n prediction result；N tree is deleted according to the accuracy of n prediction result and preset threshold, obtains m tree；It is voted according to the corresponding weight of each tree in m tree m tree, to obtain goal tree；It is the second training data that the corresponding prediction result of the goal tree, which is described Data Synthesis with this,；Using second training data as first training data, circulation executes above-mentioned steps, until the accuracy of n prediction result obtains random forest both greater than or equal to the preset threshold.Persistently whole training data can be optimized in the multiple training process to random forest, while the tree for having single features in avoiding training process is increased, improve the accuracy of classification prediction.

Description

Training method, device, storage medium and the electronic equipment of random forest

Technical field

This disclosure relates to machine learning field, and in particular, to a kind of training method of random forest, device, storage are situated between Matter and electronic equipment.

Background technique

Random forest is the classifier comprising multiple decision trees, and prediction result of its output is defeated by each tree Depending on the mode of prediction result out.The decision tree is a kind of Tree-structure Model for supervised learning.It, can in supervised learning First to give one group of sample, each sample includes one group of attribute (description data) and a classification (prediction result), these classes Be not it is pre-determined, by learning available one, this group of sample decision tree for having classification feature, the decision tree energy It is enough that correctly classification (output prediction result) is provided to emerging object.In the related technology, it is trained to random forest When, usually every decision tree in random forest is once trained by a part of data of full dose training data, then The most prediction result of number of votes obtained is obtained by ballot mode when carrying out classification prediction to new data.This mode classification can be kept away Exempt from the over-fitting in classification prediction, improves the generalization of classifier.But the prediction for the decision tree that only experience single is trained is just True rate is not high, can not cope with the feelings of data characteristics unbalanced (data of some classification are extremely more) in training data in training process The problem of condition, in turn resulting in the accuracy of entire classification prediction reduces.

Summary of the invention

To overcome the problems in correlation technique, purpose of this disclosure is to provide a kind of training method of random forest, Device, storage medium and electronic equipment.

To achieve the goals above, according to the first aspect of the embodiments of the present disclosure, a kind of training side of random forest is provided Method, which comprises

N group training dataset is determined in the first training data, first training data includes the same of event to be predicted The prediction result of the corresponding description data of class event and the similar event；

The n tree trained by the n group training dataset is judged by the description data, described in obtaining The corresponding n prediction result of n tree；

Delete operation is executed to the n tree according to the accuracy of the n prediction result and preset threshold, to obtain tree Set, the tree set include m tree, wherein m is less than or equal to n；

It carries out the first ballot to the m tree according to the corresponding ballot weight of each tree in described m tree to operate, to obtain Goal tree；

It is the second training data by the corresponding prediction result of the goal tree and the description Data Synthesis；

Using second training data as first training data, circulation is executed from described in full dose training data Determine that n group training dataset trains the corresponding prediction result of the goal tree and the description Data Synthesis for second to described The step of data, until the accuracy of the n prediction result is both greater than or equal to the preset threshold, it is random gloomy to obtain Woods, the random forest include to execute all trees got after the delete operation in one or more circulation implementation procedure Set.

Optionally, the method also includes:

It is described random gloomy to obtain using the corresponding description data of the event to be predicted as the input of the random forest Multiple prediction results of more tree output in woods；

The most prediction result of the frequency of occurrence in the multiple prediction result is determined by the second ballot operation, as institute State the prediction result of event to be predicted.

Optionally, described that the n tree trained by the n group training dataset is commented by the description data Sentence, to obtain the corresponding n prediction result of the n tree, comprising:

N tree is trained by the n group training dataset；

Using the description data as the input of each tree in described n tree, to obtain the institute of the n tree output State n prediction result.

Optionally, the accuracy and preset threshold according to the n prediction result, which executes the n tree, deletes behaviour Make, to obtain tree set, comprising:

When in the n prediction result there are when the u prediction result that accuracy is less than the preset threshold, described in deletion The corresponding u tree of u prediction result, to obtain the tree set comprising m tree, wherein m=n-u；Alternatively,

When the accuracy of the n prediction result is both greater than or is equal to the preset threshold, the tree comprising m tree is obtained Set, wherein m=n.

Optionally, described that first ballot is carried out to the m tree according to the corresponding ballot weight of each tree in described m tree Operation, to obtain goal tree, comprising:

The error rate of each tree is determined according to the accuracy of the prediction result of each tree；

Using the error rate of each tree as the input of preset ballot weight calculation formula, to obtain the franchise The ballot weight of each tree of re-computation formula output；

Described m tree is divided into multiple ballot groups, wherein each ballot group includes to have the more of identical prediction result Tree, the quantity of the more trees are the corresponding votes of each ballot group；

The product for appointing the corresponding ballot weight of one tree in the votes and each ballot group is obtained, as described The percentage of votes obtained of each ballot group；

It obtains to have in the ballot group of highest percentage of votes obtained and appoints one tree, as the goal tree.

According to the second aspect of an embodiment of the present disclosure, a kind of training device of random forest is provided, described device includes:

Data set determining module, for determining n group training dataset, first training data in the first training data The prediction result of the corresponding description data of similar event and the similar event including event to be predicted；

Random forest judge module, for by the description data to n trained by the n group training dataset Tree is judged, to obtain the corresponding n prediction result of the n tree；

Random forest removing module, for being set according to the accuracy and preset threshold of the n prediction result described n Delete operation is executed, to obtain tree set, the tree set includes m tree, wherein m is less than or equal to n；

Goal tree obtains module, for being carried out according to the corresponding ballot weight of each tree in described m tree to the m tree First ballot operation, to obtain goal tree；

Data Synthesis module, for being the second instruction by the corresponding prediction result of the goal tree and the description Data Synthesis Practice data；

Execution module is recycled, for using second training data as first training data, circulation to be executed from institute It states and determines n group training dataset to described by the corresponding prediction result of the goal tree and the description in full dose training data The step of Data Synthesis is the second training data, until the accuracy of the n prediction result is both greater than or default equal to described Threshold value, to obtain random forest, the random forest, which is included in one or more circulation implementation procedures, executes the deletion behaviour All tree set got after work.

Optionally, described device further include:

Data input module, using the corresponding description data of the event to be predicted as the input of the random forest, with Obtain multiple prediction results of more tree output in the random forest；

As a result determining module determines most pre- of the frequency of occurrence in the multiple prediction result by the second ballot operation Survey the prediction result as a result, as the event to be predicted.

Optionally, the random forest judges module, comprising:

Random forest trains submodule, for training n tree by the n group training dataset；

Random forest judge submodule, for using it is described description data as described n tree in each tree input, To obtain the n prediction result of the n tree output.

Optionally, the random forest removing module, is used for:

Optionally, the goal tree obtains module, comprising:

Error rate determines submodule, and the accuracy for the prediction result according to each tree determines each tree Error rate；

Weight calculation submodule, for using the error rate of each tree as the defeated of preset ballot weight calculation formula Enter, to obtain the ballot weight of each tree of the ballot weight calculation formula output；

Ballot group divides submodule, for described m tree to be divided into multiple ballot groups, wherein each ballot group includes Have more trees of identical prediction result, the quantity of the more trees is the corresponding votes of each ballot group；

Percentage of votes obtained acquisition submodule appoints the corresponding throwing of one tree for obtaining in the votes and each ballot group The product of ticket weight, the percentage of votes obtained as each ballot group；

Goal tree acquisition submodule appoints one tree for obtaining in the ballot group for having highest percentage of votes obtained, as described Goal tree.

According to the third aspect of an embodiment of the present disclosure, a kind of computer readable storage medium is provided, calculating is stored thereon with Machine program realizes the training for the random forest that embodiment of the present disclosure first aspect provides when the computer program is executed by processor The step of method.

According to a fourth aspect of embodiments of the present disclosure, a kind of electronic equipment is provided, comprising:

Memory is stored thereon with computer program；

Processor, for executing the computer program in the memory, to realize embodiment of the present disclosure first party The step of training method for the random forest that face provides.

Through the above technical solutions, the disclosure can determine n group training dataset in the first training data, first instruction Practicing data includes the corresponding description data of similar event of event to be predicted and the prediction result of the similar event；By this Description data judge the n tree trained by the n group training dataset, to obtain the corresponding n prediction knot of this n tree Fruit；Delete operation is executed to this n tree according to the accuracy of the n prediction result and preset threshold, to obtain tree set, the tree Set includes m tree, wherein m is less than or equal to n；According to this m tree in the corresponding ballot weight of each tree to this m set into The ballot operation of row first, to obtain goal tree；It is the second instruction that the corresponding prediction result of the goal tree, which is described Data Synthesis with this, Practice data；Using second training data as first training data, circulation is executed determines n from above-mentioned in full dose training data The corresponding prediction result of the goal tree is described the step that Data Synthesis is the second training data with this to above-mentioned by group training dataset Suddenly, until the accuracy of the n prediction result is both greater than or equal to the preset threshold, to obtain random forest, the random forest Comprising executing all tree set got after the delete operation in one or more circulation implementation procedure.It can be to random Persistently whole training data is optimized in the multiple training process of forest, has single features in avoiding training process While setting excessive, the accuracy of classification prediction is improved.

Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.

Detailed description of the invention

Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:

Fig. 1 is a kind of flow chart of the training method of random forest shown according to an exemplary embodiment；

Fig. 2 is the flow chart for implementing the training method of another random forest exemplified according to Fig. 1；

Fig. 3 is the flow chart for implementing a kind of random forest evaluation method exemplified according to Fig.2,；

Fig. 4 is the flow chart for implementing a kind of goal tree acquisition methods exemplified according to Fig.2,；

Fig. 5 is a kind of block diagram of the training device of random forest shown according to an exemplary embodiment；

Fig. 6 is the block diagram for implementing the training device of another random forest exemplified according to Fig.5,；

Fig. 7 is that a kind of random forest for implementing to exemplify according to Fig.6, judges the block diagram of module；

Fig. 8 is that a kind of goal tree for implementing to exemplify according to Fig.6, obtains the block diagram of module；

Fig. 9 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 is a kind of flow chart of the training method of random forest shown according to an exemplary embodiment, such as Fig. 1 institute Show, this method comprises:

Step 101, n group training dataset is determined in the first training data.

Wherein, which includes that the similar event of event to be predicted is corresponding Data and the prediction result (classification results) of the similar event are described.First training data needs as detailed as possible in principle Most is described a kind of similar event.The n group training dataset is the n group instruction randomly selected from first training data Practice data set, each training dataset may include may include mutually between entirely different example or training dataset The part of coincidence.

By taking the classification prediction that the event to be predicted is some fruit as an example, which is the classification prediction thing of fruit Part needs to acquire the description data and prediction knot of the classification predicted events (more example as far as possible) of the fruit of types more as far as possible Fruit, completely to be described to entire classification predicted events.It should be noted that first training data can be as following Shown in table 1.

Table 1

A	B	C	D	E
					Calusena lansium	White flesh	Crescent shape	It is sweet	Banana
Shagreen	Red flesh	It is spherical	It is sweet	Watermelon
					Red skin	White flesh	It is spherical	Sweet and sour	Apple
…	…	…	…	…

Wherein, one predicted events (or example) of each behavior in table 1, A, B, C and D this four column in data be retouch Data portion is stated, the data in E column are prediction result part.It should be noted that including a large amount of (example in the first training data Such as, 100,000) the corresponding a large amount of description data of example and prediction result, table 1 herein only with banana, watermelon and apple these three It is shown for the corresponding description data of example and prediction result.The n group training data selected from the first training data The every group of training dataset concentrated has form identical with first training data, that is, also includes the description above data portion Divide and above-mentioned prediction result part.

Step 102, it describes data by this to judge the n tree trained by the n group training dataset, to obtain The corresponding n prediction result of this n tree.

Wherein, above-mentioned tree is decision tree (or classification tree), which is the machine learning of existing tree structure Model, random forest are made of multiple decision trees.

Illustratively, in the step 102, n is trained by the n group training dataset selected in a step 101 first Decision tree, as initial random forest.It, can be by above-mentioned first training data after getting n decision tree Be described data judge each tree, that is, all prediction results in above-mentioned first training data are deleted, it will be remaining Be described data input every and have completed trained decision tree, obtain every decision tree and are directed to above-mentioned be described data New prediction result, and then get the corresponding accuracy of every decision tree.

By taking above-mentioned table 1 as an example, the description data in this four column of A, B, C and D obtain one group of prediction knot as the input for setting A Fruit.The prediction result is actually also the column for including a large amount of prediction result data.Can by the prediction result data with Data in above-mentioned E column compare, to obtain the accuracy of the pre- geodesic structure.It is with 3 examples shown in table 1 , it include: banana, watermelon and lichee in the prediction result, then compared with above-mentioned E column, the accuracy of the prediction result is 2/3。

Step 103, delete operation is executed to this n tree according to the accuracy of the n prediction result and preset threshold, to obtain Tree is taken to gather.

Wherein, which includes m tree, and m is less than or equal to n.

Illustratively, which may include: when there are the u that accuracy is less than the preset threshold in the n prediction result When a prediction result, the corresponding u tree of the u prediction result is deleted, to obtain the tree set comprising m tree, wherein m=n- u；Alternatively, the tree set comprising m tree is obtained when the accuracy of the n prediction result is both greater than or is equal to the preset threshold, Wherein, m=n.It is understood that when there is the accuracy of any one prediction result default less than this in the n prediction result When threshold value, the accuracy for needing to delete prediction result in n tree is less than the tree of the preset threshold, retains remaining m tree conduct Above-mentioned tree set, and the first ballot operation in 104 (the ballot operation of subsidiary weight) is selected wherein just through the following steps True rate highest and the prediction result of the strongest decision tree output of generalization.When the accuracy of the n prediction result is both greater than or waits When the preset threshold, it is believed that have been realized in the purpose optimized to decision tree needed for composition random forest, then This programme the step 103 stop, and obtain this n tree (and this accuracy judgment step before reservation one or more A tree set), as trained random forest.

Illustratively, the step 106 of this programme is held actually to recycle the step of executing to step 101 to 105 in each circulation During row, it is less than u prediction result of the preset threshold there are accuracy in the n prediction result and remains m in turn After tree, this m tree can be regarded as a tree and gathered.When progress, for example, determining the n after 5 circulation implementation procedures The accuracy of a prediction result is then retaining the corresponding n tree of n current prediction result both greater than or equal to the preset threshold Outside, the 5 trees set retained in 5 times above-mentioned circulation implementation procedures is also obtained.It is understood that being wrapped in each tree set The quantity of the decision tree contained is not quite similar.In this way, the random forest that this programme finally determines is (practical also at this by this n tree The tree set got in step 103) and other 5 all trees compositions set in set.

Step 104, the first ballot is carried out to this m tree according to the corresponding ballot weight of each tree in this m tree to operate, with Obtain goal tree.

Illustratively, the ballot weight is used to describe the important of the votes that each tree obtains in the first ballot operation Degree.For example, the numerical value in bracket is the power of each tree for this 4 tree A (1), tree B (2), tree C (4) and tree D (10) trees Weight.When carry out first is voted and is operated, obtained voting results are as follows: tree A obtains 5 tickets, and tree B obtains 3 tickets, and tree C obtains 2 tickets, and tree D obtains 1 ticket, Then while tree A gained vote is more, tree D gained vote is less, but because tree D weight is big, and " victor " of this ballot is tree D.

Step 105, the corresponding prediction result of the goal tree is described Data Synthesis with this is the second training data.

Step 106, using second training data as first training data, circulation is executed from above-mentioned in full dose training number Data Synthesis is described into for the second training with this for the corresponding prediction result of the goal tree to above-mentioned according to middle determining n group training dataset The step of data, until the accuracy of the n prediction result is both greater than or is equal to the preset threshold, to obtain random forest.

Wherein, which includes to get after executing the delete operation in one or more circulation implementation procedures All tree set.

In conclusion the disclosure can determine n group training dataset in the first training data, the first training data packet Include the corresponding description data of similar event of event to be predicted and the prediction result of the similar event；Data are described by this The n tree trained by the n group training dataset is judged, to obtain the corresponding n prediction result of this n tree；According to The accuracy and preset threshold of the n prediction result execute delete operation to this n tree, to obtain tree set, the tree set packet It is set containing m, wherein m is less than or equal to n；First is carried out to this m tree according to the corresponding ballot weight of each tree in this m tree Ballot operation, to obtain goal tree；It is the second training data that the corresponding prediction result of the goal tree, which is described Data Synthesis with this,； Using second training data as first training data, circulation is executed determines the training of n group from above-mentioned in full dose training data The corresponding prediction result of the goal tree is described the step of Data Synthesis is the second training data with this to above-mentioned by data set, until The accuracy of the n prediction result is both greater than or equal to the preset threshold, and to obtain random forest, which is included in one All tree set got after the delete operation are executed in a or multiple circulation implementation procedures.It can be to the more of random forest Persistently whole training data is optimized in secondary training process, the tree for having single features in avoiding random forest is excessive, While guaranteeing the generalization of random forest classification prediction, the accuracy of classification prediction is improved.

Fig. 2 is the flow chart for implementing the training method of another random forest exemplified according to Fig. 1, such as Fig. 2 institute Show, after above-mentioned steps 106, this method can also include:

Step 107, random to obtain this using the corresponding description data of the event to be predicted as the input of the random forest Multiple prediction results of more tree output in forest.

Step 108, the most prediction knot of the frequency of occurrence in above-mentioned multiple prediction results is determined by the second ballot operation Fruit, the prediction result as the event to be predicted.

Illustratively, after getting the random forest, existing event to be predicted can be retouched by the random forest It states data to be predicted, wherein every decision tree in random forest can all export a prediction result.In above-mentioned multiple predictions As a result in, the most prediction knot of a frequency of occurrence can be selected by the second ballot operation (no weight votes) of random forest Final prediction result of the fruit as the event to be predicted.

Still by taking the classification of above-mentioned fruit prediction as an example, if including 30 trees, the event to be predicted in the random forest Corresponding description data are shagreen, green flesh, spherical, sweet.The random forest can describe data according to this and export 30 prediction knots Fruit, wherein 25 are grape, and 3 are green apple, and 2 are Kiwi berry.In this way, then will wherein (frequency of occurrence be most for percentage of votes obtained highest It is more) grape as the final prediction result.

Fig. 3 is the flow chart for implementing a kind of random forest evaluation method exemplified according to Fig.2, as shown in figure 3, on Stating step 102 may include:

Step 1021, n tree is trained by the n group training dataset.

Illustratively, which is properly termed as the pre-training step of random forest, and what is obtained after the pre-training step is first Beginning random forest is also equipped with certain defect in terms of precision of classifying, it is therefore desirable to combine Ada in the following steps The theory of Boosting method repeatedly trains each tree in random forest, and in the training process persistently to first Training data (i.e. full dose training data) optimizes, and suitably to strengthen the effect of crucial training data, it is random to improve this The precision of forest.

Step 1022, data are described into as the input of each tree in this n tree, to obtain this n tree output for this The n prediction result.

Fig. 4 is the flow chart for implementing a kind of goal tree acquisition methods exemplified according to Fig.2, as shown in figure 4, above-mentioned Step 104 may include:

Step 1041, the error rate of each tree is determined according to the accuracy of the prediction result of each tree.

Illustratively, the error rate of decision tree is the difference of 1 accuracy for subtracting the decision tree.

Step 1042, using the error rate of each tree as the input of preset ballot weight calculation formula, to be somebody's turn to do The ballot weight of each tree of ballot weight calculation formula output.

Illustratively, shown in the ballot weight calculation formula such as following equation (1):

Wherein, W indicates the ballot weight, e_iIndicate the error rate of i-th tree in this m tree.

Step 1043, this m tree is divided into multiple ballot groups.

Wherein, each ballot group includes to have more trees of identical prediction result, and the quantity of this more trees is that this is each The corresponding votes of ballot group.

Step 1044, the product for obtaining votes ballot weight corresponding with one tree is appointed in each ballot group, makees For the percentage of votes obtained of each ballot group.

Step 1045, it obtains to have in the ballot group of highest percentage of votes obtained and appoints one tree, as the goal tree.

Illustratively, in the first ballot operation of subsidiary weight, the percentage of votes obtained of each ballot group is ballot weight and reality The product of obtained votes.And, it is understood that multiple prediction results in each ballot group are identical pre- It surveys as a result, these prediction results correspond to multiple identical trees.It therefore, can be from the ballot group for having highest percentage of votes obtained Optional one tree, as the goal tree.

Fig. 5 is a kind of block diagram of the training device of random forest shown according to an exemplary embodiment, as shown in figure 5, The device 500 includes:

Data set determining module 510, for determining n group training dataset in the first training data, the first training number According to the prediction result of the corresponding description data of similar event and the similar event that include event to be predicted；

Random forest judges module 520, for describing data to n trained by the n group training dataset by this Tree is judged, to obtain the corresponding n prediction result of this n tree；

Random forest removing module 530, for being set according to the accuracy and preset threshold of the n prediction result this n Delete operation is executed, to obtain tree set, which includes m tree, wherein m is less than or equal to n；

Goal tree obtains module 540, for being carried out according to the corresponding ballot weight of each tree in this m tree to this m tree First ballot operation, to obtain goal tree；

Data Synthesis module 550 is the second instruction for the corresponding prediction result of the goal tree to be described Data Synthesis with this Practice data；

Recycle execution module 560, for using second training data as first training data, circulation execute from this Determine that the corresponding prediction result of the goal tree is described Data Synthesis with this to this and is by n group training dataset in full dose training data The step of second training data, until the accuracy of the n prediction result is both greater than or equal to the preset threshold, it is random to obtain Forest, the random forest include all trees collection for executing in one or more circulation implementation procedures and getting after the delete operation It closes.

Fig. 6 is the block diagram for implementing the training device of another random forest exemplified according to Fig.5, as shown in fig. 6, The device 500 further include:

Data input module 570, using the corresponding description data of the event to be predicted as the input of the random forest, to obtain Take more in the random forest multiple prediction results for setting output；

As a result determining module 580 determine that the frequency of occurrence in above-mentioned multiple prediction results is most by the second ballot operation Prediction result, the prediction result as the event to be predicted.

Fig. 7 is the block diagram that a kind of random forest for implementing to exemplify according to Fig.6, judges module, as shown in fig. 7, should be with Machine forest judges module 520, comprising:

Random forest trains submodule 521, for training n tree by the n group training dataset；

Random forest judge submodule 522, for using this describe data as this n tree in each tree input, To obtain the n prediction result of this n tree output.

Optionally, the random forest removing module 530, is used for:

When pre- there are when the u prediction result that accuracy is less than the preset threshold, deleting the u in the n prediction result The corresponding u tree of result is surveyed, to obtain the tree set comprising m tree, wherein m=n-u；Alternatively,

When the accuracy of the n prediction result is both greater than or is equal to the preset threshold, obtains the tree comprising m tree and collect It closes, wherein m=n.

Fig. 8 is that a kind of goal tree for implementing to exemplify according to Fig.6, obtains the block diagram of module, as shown in figure 8, the target Tree obtains module 540, comprising:

Error rate determines submodule 541, and the accuracy for the prediction result according to each tree determines each tree Error rate；

Weight calculation submodule 542, for using the error rate of each tree as preset ballot weight calculation formula Input, to obtain the ballot weight of each tree of ballot weight calculation formula output；

Ballot group divides submodule 543, for this m tree to be divided into multiple ballot groups, wherein each ballot group includes Have more of identical prediction result trees, the quantity of this more trees are the corresponding votes of each ballot group；

Percentage of votes obtained acquisition submodule 544, for obtaining votes throwing corresponding with one tree is appointed in each ballot group The product of ticket weight, the percentage of votes obtained as each ballot group；

Goal tree acquisition submodule 545 appoints one tree for obtaining in the ballot group for having highest percentage of votes obtained, as this Goal tree.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Fig. 9 is the block diagram of a kind of electronic equipment 900 shown according to an exemplary embodiment.As shown in figure 9, the electronics is set Standby 900 may include: processor 901, memory 902, multimedia component 903, input/output (I/O) interface 904, Yi Jitong Believe component 905.

Wherein, processor 901 is used to control the integrated operation of the electronic equipment 900, to complete above-mentioned random forest All or part of the steps in training method.Memory 902 is for storing various types of data to support in the electronic equipment 900 operation, these data for example may include any application or method for operating on the electronic equipment 900 Instruction and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..This is deposited Reservoir 902 can by any kind of volatibility or non-volatile memory device or they synthesize realization, such as it is static with Machine accesses memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 903 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 902 is sent by communication component 905.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 904 provides interface between processor 901 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 905 is for the electronic equipment 900 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of synthesis, therefore corresponding communication Component 905 may include: Wi-Fi module, bluetooth module, NFC module.

In one exemplary embodiment, electronic equipment 900 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing the training method of above-mentioned random forest.

In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided It such as include the memory 902 of program instruction, above procedure instruction can be executed by the processor 901 of electronic equipment 900 on to complete The training method for the random forest stated.

The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, those skilled in the art are considering specification and practice After the disclosure, it is readily apparent that other embodiments of the disclosure, belongs to the protection scope of the disclosure.

It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, it can be combined in any appropriate way.Simultaneously between a variety of different embodiments of the disclosure Any combination can also be carried out, as long as it, without prejudice to the thought of the disclosure, equally should be considered as disclosure disclosure of that. The disclosure is not limited to the precision architecture being described above out, and the scope of the present disclosure is only limited by the attached claims System.

Claims

1. a kind of training method of random forest, which is characterized in that the described method includes:

N group training dataset is determined in the first training data, first training data includes the similar thing of event to be predicted The prediction result of the corresponding description data of part and the similar event；

The n tree trained by the n group training dataset is judged by the description data, to obtain described n Set corresponding n prediction result；

Delete operation is executed to the n tree according to the accuracy of the n prediction result and preset threshold, to obtain tree collection It closes, the tree set includes m tree, wherein m is less than or equal to n；

It carries out the first ballot to the m tree according to the corresponding ballot weight of each tree in described m tree to operate, to obtain target Tree；

Using second training data as first training data, circulation is executed from described and is determined in full dose training data N group training dataset to it is described by the corresponding prediction result of the goal tree and the description Data Synthesis be the second training data The step of, until the accuracy of the n prediction result is both greater than or equal to the preset threshold, to obtain random forest, institute Stating random forest includes to execute all tree set got after the delete operation in one or more circulation implementation procedure.

2. the method according to claim 1, wherein the method also includes:

Using the corresponding description data of the event to be predicted as the input of the random forest, to obtain in the random forest More tree output multiple prediction results；

Determine the most prediction result of the frequency of occurrence in the multiple prediction result by the second ballot operation, as it is described to The prediction result of predicted events.

3. the method according to claim 1, wherein described trained by the description data to by the n group The n tree that data set trains is judged, to obtain the corresponding n prediction result of the n tree, comprising:

N tree is trained by the n group training dataset；

Using the description data as the input of each tree in described n tree, to obtain the n of the n tree output A prediction result.

4. the method according to claim 1, wherein the accuracy according to the n prediction result and pre- If threshold value executes delete operation to the n tree, to obtain tree set, comprising:

When, there are when the u prediction result that accuracy is less than the preset threshold, it is a deleting the u in the n prediction result The corresponding u tree of prediction result, to obtain the tree set comprising m tree, wherein m=n-u；Alternatively,

5. the method according to claim 1, wherein described according to the corresponding ballot of each tree in described m tree Weight carries out the first ballot to the m tree and operates, to obtain goal tree, comprising:

Using the error rate of each tree as the input of preset ballot weight calculation formula, to obtain the franchise restatement Calculate the ballot weight of each tree of formula output；

Described m tree is divided into multiple ballot groups, wherein each ballot group includes more for having identical prediction result Tree, the quantity of the more trees are the corresponding votes of each ballot group；

The product for appointing the corresponding ballot weight of one tree in the votes and each ballot group is obtained, as described each The percentage of votes obtained of ballot group；

6. a kind of training device of random forest, which is characterized in that described device includes:

Data set determining module, for n group training dataset determining in the first training data, first training data includes The prediction result of the corresponding description data of the similar event of event to be predicted and the similar event；

Random forest judge module, for by the description data to n trained by the n group training dataset set into Row is judged, to obtain the corresponding n prediction result of the n tree；

Random forest removing module, for being executed according to the accuracy and preset threshold of the n prediction result to the n tree Delete operation, to obtain tree set, the tree set includes m tree, wherein m is less than or equal to n；

Goal tree obtains module, for carrying out first to the m tree according to the corresponding ballot weight of each tree in described m tree Ballot operation, to obtain goal tree；

Data Synthesis module, for being the second training number by the corresponding prediction result of the goal tree and the description Data Synthesis According to；

Recycle execution module, for will second training data as first training data, circulation execution from it is described Determine n group training dataset to described by the corresponding prediction result of the goal tree and the description data in full dose training data The step of synthesizing the second training data, until the accuracy of the n prediction result is both greater than or is equal to the preset threshold, To obtain random forest, the random forest includes to obtain after executing the delete operation in one or more circulation implementation procedure All tree set got.

7. device according to claim 6, which is characterized in that described device further include:

Data input module, using the corresponding description data of the event to be predicted as the input of the random forest, to obtain Multiple prediction results of more tree output in the random forest；

As a result determining module determines the most prediction knot of the frequency of occurrence in the multiple prediction result by the second ballot operation Fruit, the prediction result as the event to be predicted.

8. device according to claim 6, which is characterized in that the goal tree obtains module, comprising:

Error rate determines submodule, and the error of each tree is determined for the accuracy according to the prediction result of each tree Rate；

Weight calculation submodule, for using the error rate of each tree as the input of preset ballot weight calculation formula, To obtain the ballot weight of each tree of the ballot weight calculation formula output；

Ballot group divides submodule, for described m tree to be divided into multiple ballot groups, wherein each ballot group includes to have More trees of identical prediction result, the quantity of the more trees are the corresponding votes of each ballot group；

Percentage of votes obtained acquisition submodule appoints the corresponding franchise of one tree for obtaining in the votes and each ballot group The product of weight, the percentage of votes obtained as each ballot group；

Goal tree acquisition submodule appoints one tree for obtaining in the ballot group for having highest percentage of votes obtained, as the target Tree.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The step of any one of claim 1-5 the method is realized when processor executes.

10. a kind of electronic equipment characterized by comprising

Memory is stored thereon with computer program；

Processor, for executing the computer program in the memory, to realize described in any one of claim 1-5 The step of method.