CN109255368A

CN109255368A - Randomly select method, apparatus, electronic equipment and the storage medium of feature

Info

Publication number: CN109255368A
Application number: CN201810892174.1A
Authority: CN
Inventors: 叶俊锋; 赖云辉; 罗先贤; 孙成; 龙觉刚
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2019-01-22
Anticipated expiration: 2038-08-07
Also published as: CN109255368B

Abstract

The embodiment of the present application provides a kind of method, apparatus for randomly selecting feature, electronic equipment and storage medium.This method comprises: determining the metric of each candidate feature, the first measurement value set is obtained；First measurement value set is standardized；Difference in the first measurement value set after expanding standardization by preset algorithm between each metric, obtains the second measurement value set；The metric second measured in value set inputs roulette model as the fitness of each candidate feature, using feature that roulette model exports as choosing feature.Difference between metric of the embodiment of the present application by expanding each feature, expand the difference between the corresponding select probability of each feature, so that the diversity factor of metric height and the selected probability of the low feature of metric is larger, the selected probability of feature is improved, finally to choose the algorithm of feature to can make full use of effective characteristic information using this, improves arithmetic accuracy.

Description

Randomly select method, apparatus, electronic equipment and the storage medium of feature

Technical field

This application involves technical field of data processing, specifically, this application involves a kind of method for randomly selecting feature, Device, electronic equipment and storage medium.

Background technique

Feature selecting is also referred to as feature subset selection or Attributions selection, refers to and selects N number of feature from existing M feature, So that the specific indexes of system are optimal.In addition, can be selected from primitive character by feature selecting some most effective Feature is an important means for improving learning algorithm performance to reduce the dimension of data set.

Existing feature selection approach is the metric for calculating each feature, such as nicety of grading or AUC (Area Under the Curve) etc. classification of assessment algorithm performance index, then using the metric of each feature as weight substitute into wheel disc It gambles in algorithm, obtain random output chooses feature.In existing feature selection approach, the weight of each feature is distinguished unknown It is aobvious so that metric height and metric it is low feature it is selected probability it is very nearly the same, it is selected that validity feature cannot be promoted Probability causes algorithm that cannot make full use of the information of validity feature, reduces arithmetic accuracy.

Summary of the invention

This application provides a kind of method, apparatus for randomly selecting feature, electronic equipment and computer readable storage medium, It can solve because the weight of feature distinguishes unobvious the problem of leading to not the probability that promotion validity feature is selected.The technology Scheme is as follows:

In a first aspect, this application provides a kind of methods for randomly selecting feature, this method comprises:

The metric for determining each candidate feature obtains the first measurement value set；

First measurement value set is standardized；

Difference in the first measurement value set after expanding standardization by preset algorithm between each metric, obtains Second measurement value set；

The metric second measured in value set inputs roulette model as the fitness of each candidate feature, will take turns The feature of disk gambling model output, which is used as, chooses feature.

Optionally, the first measurement value set is standardized, comprising: min- is carried out to the first measurement value set Max standardization.

Optionally, the difference in the first measurement value set after expanding standardization by preset algorithm between each metric It is different, obtain the second measurement value set, comprising: carry out to each metric in the first measurement value set after standardization flat Square operation obtains the second measurement value set to expand the difference between each metric.

Optionally, the difference in the first measurement value set after expanding standardization by preset algorithm between each metric It is different, obtain the second measurement value set, comprising:

Metric in the first measurement value set after standardization is clustered, multiple clusters are obtained, in each cluster Including at least one metric；

The metric in each cluster is carried out respectively according to preset strategy to expand difference processing, obtains the second metric collection It closes.

Optionally, the metric in each cluster is carried out respectively according to preset strategy expanding difference processing, comprising:

Determine the boundary point of each cluster and the quantity of metric that each cluster includes；

According to the quantity for the metric that the boundary point of each cluster and each cluster include, the density of each cluster is determined；

Judge whether the density of each cluster is greater than pre-set density, the metric being greater than in the cluster of pre-set density to density carries out Expand difference processing.

Optionally, the metric in the cluster of pre-set density is greater than to density to carry out expanding difference processing, comprising:

Expand the boundary of cluster to be processed, wherein cluster to be processed is the cluster that density is greater than pre-set density；

Boundary before being expanded according to cluster to be processed and the boundary after expansion determine sampling factor；

Expand the distance between each metric in cluster to be processed according to sampling factor.

Optionally, the metric second measured in value set inputs roulette mould as the fitness of each candidate feature Before type, method further include: Laplce's smoothing processing is carried out to the second measurement value set；

The metric second measured in value set inputs roulette model, packet as the fitness of each candidate feature It includes: the metric in the second measurement value set after Laplce's smoothing processing is inputted as the fitness of each candidate feature Roulette model.

Second aspect, this application provides a kind of device for randomly selecting feature, which includes:

Metric determining module obtains the first measurement value set for determining the metric of each candidate feature；

Standardization module, for being standardized to the first measurement value set；

Difference extension module, for passing through each degree in the first measurement value set after preset algorithm expansion standardization Difference between magnitude obtains the second measurement value set；

Characteristic selecting module, it is defeated as the fitness of each candidate feature for measuring the metric in value set using second Enter roulette model, using feature that roulette model exports as choosing feature.

The third aspect, this application provides a kind of electronic equipment, which includes: one or more processors；

Memory；

One or more application program, wherein one or more application programs be stored in memory and be configured as by One or more processors execute, and one or more application program is configured to: executing random shown in the application first aspect The method of selected characteristic.

Fourth aspect stores on computer readable storage medium this application provides a kind of computer readable storage medium There is computer program, which realizes the method that feature is randomly selected shown in the application first aspect when being executed by processor.

Technical solution provided by the embodiments of the present application have the benefit that by the metric that expands each feature it Between difference, the difference between the corresponding select probability of each feature is expanded, so that metric height and the low feature of metric The diversity factor of selected probability is larger, therefore, while guaranteeing the randomness of Feature Selection, it is selected to have improved feature Probability, finally to choose the algorithm of feature to can make full use of effective characteristic information using this, improve arithmetic accuracy.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.

Fig. 1 is a kind of flow diagram for the method for randomly selecting feature provided by the embodiments of the present application；

Fig. 2 is the flow diagram for the method that another kind provided by the embodiments of the present application randomly selects feature；

Fig. 3 be it is provided by the embodiments of the present application another randomly select the flow diagram of the method for feature；

Fig. 4 is a kind of structural schematic diagram for the device for randomly selecting feature provided by the embodiments of the present application；

Fig. 5 is the structural schematic diagram for the device that another kind provided by the embodiments of the present application randomly selects feature；

Fig. 6 is a kind of structural schematic diagram for the electronic equipment for randomly selecting feature provided by the embodiments of the present application.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and cannot be construed to the limitation to the application.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.

How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.

Embodiment one

The embodiment of the present application provides a kind of method for randomly selecting feature, as shown in Figure 1, this method comprises: step S101, step S102, step S103, step S104.

Step S101, the metric for determining each candidate feature obtains the first measurement value set.

Wherein, candidate feature refers to that the predetermined output result to learning algorithm, sorting algorithm scheduling algorithm has shadow Loud feature.For example, the banking operation of prediction user, can choose following candidate feature: for describing personal essential characteristic User property feature, the credit feature for describing user's revenue potential and income situation are practised for describing client's major consumers Used and consumption preferences consumption features, the interest characteristics for describing hobby of the client with which aspect are used for description The banking operation of user is predicted by features described above in the movable social characteristics etc. of social media in family.

Wherein, metric is the index of classification of assessment algorithm, prediction algorithm performance, such as nicety of grading, ROC curve (receiver operating characteristic curve), AUC etc..

For example, it is assumed that the metric selected is AUC, the AUC of each candidate feature is calculated by the following method:

The first step, collecting sample data, determine candidate feature, and establish attribute prediction model.

The actual value of a certain attribute of collecting sample, and classified according to the actual value of sample attribute to sample, for example, The sample that actual value is 1 is positive sample, and the sample that actual value is 0 is negative sample, obtains sample set with this.

Determine each feature correspondences that are multiple for evaluating the features of the attribute as candidate feature, and acquiring each sample Characteristic value.For example, the credit feature based on people, the moral standing attribute of appraiser is good or bad.

Attribute prediction model is actually a preparatory trained classifier, and the corresponding characteristic value of candidate feature is inputted The prediction probability value to attribute can be obtained in attribute prediction model.

Second step, the AUC for calculating each candidate feature.

A feature A is chosen from candidate feature, the characteristic value input attribute of the feature A of sample each in sample set is pre- Estimate model, obtains the prediction probability value score to the attribute of each sample.Wherein, score indicates that each sample belongs to positive sample Probability.

The score of positive sample is greater than the probability of the score of negative sample in calculating sample set.Method particularly includes: take N*M two Tuple, wherein N is positive sample number, and M is negative sample number；The score for counting the positive sample in all binary groups is greater than negative sample The binary group quantity L of score, wherein when the score of positive sample in the binary group and score of negative sample equal, binary Group quantity increases by 0.5；It is calculated by the following formula the AUC:AUC=L/ (N*M) of feature A.

The AUC of all candidate features is obtained by the above method.

Step S102, the first measurement value set is standardized.

Wherein, the method for standardization can be min-max standardized method, log function conversion method, standard deviation mark Quasi-ization method, extremum method etc..The unit limitation that metric can be removed by standardization, converts dimensionless for metric Pure values, be able to carry out convenient for the index of not commensurate or magnitude and compare and weight.

Step S103, in the first measurement value set after expanding standardization by preset algorithm between each metric Difference obtains the second measurement value set.

Wherein, preset algorithm can be square operation, cube operation or ruler carries out expanding the first measurement according to a certain percentage Value set etc..For example, by carrying out square operation to each metric in the first measurement value set after standardization, To expand the difference between each metric, it is assumed that the first measurement value set is { 0.5,0.6,0.8.0.9 }, after square operation The the second measurement value set arrived is { 0.25,0.36,0.64,0.81 }, it is clear that square operation expands the difference between each metric It is different, the proportional difference of each feature weight in roulette model can be increased in this way.

Step S104, the metric measured second in value set inputs roulette as the fitness of each candidate feature Model, using feature that roulette model exports as choosing feature.

Wherein, the detailed process of roulette model selection feature includes: firstly, second is measured the metric in value set Fitness F as each feature in roulette algorithm_i(i.e. weight), the selected probability of feature is directly proportional to its fitness, If feature total data is n, the fitness of feature i is F_i.Then, according to fitness F_iCalculate the select probability q of each feature_i, choosing Select the calculation formula of probability are as follows:Then, according to select probability q_iThe cumulative probability of each feature is obtained, it is special Levying the corresponding cumulative probability of j isThen, the corresponding probability interval of each feature, feature are obtained according to cumulative probability The corresponding probability interval of j is [P_j-1,P_j], wherein P₀=0, as shown in table 1, gives and calculated by a roulette model The example of the select probability and cumulative probability that arrive.Finally, generating the random number between a 0-1 at random, determine that this is scolded at random The probability interval entered, for example, generate random number be 0.01, reference table 1, the probability interval which falls into be [0, 0.18], it be the corresponding feature 1 of probability interval is the feature chosen through roulette model which, which falls into,.

Table 1

The method for randomly selecting feature of the present embodiment, the difference between metric by expanding each feature, expands Difference between each feature corresponding select probability, so that metric height and the low selected probability of feature of metric Diversity factor is larger, therefore, while guaranteeing the randomness of Feature Selection, has improved the selected probability of feature, has finally made It obtains and chooses the algorithm of feature to can make full use of effective characteristic information using this, improve arithmetic accuracy.

It can select a feature by step S101, step S102, step S103, step S104, in practical application, needle To an algorithm, generally require to select multiple features.As shown in Fig. 2, in step S101, step S102, step S103, step On the basis of S104, the method for randomly selecting feature for selecting multiple features is given, comprising:

Step S102, the first measurement value set is standardized.

Step S105, it chooses feature to be put into result set for what roulette model exported, and is deleted from the first measurement value set Except this chooses the corresponding metric of feature.

Step S106, judging result concentrates whether the quantity of feature will arrive preset value.If not up to preset value, step is returned Rapid S102.If reaching preset value, S107, output result set are thened follow the steps.

Wherein, preset value is the quantity for needing the feature selected.Feature in the result set of output is final checked Feature.

By the above method, multiple features can be selected from candidate feature automatically.

Embodiment two

The embodiment of the present application provides alternatively possible implementation, further includes implementing on the basis of example 1 Method shown in example two.

Optionally, step S102 is specifically included: carrying out min-max standardization to the first measurement value set.

Wherein, metric can be mapped on [0,1] section by min-max standardization.Min-max standardized method energy Unified conversion is carried out to the metric of each candidate feature so that metric can other than the weight as roulette algorithm, In addition, the difference between metric can also be expanded by min-max standardized method.For example, when the metric selected is AUC When, the value range of AUC is [0.5,1], and the difference between the metric of each feature is unobvious, and such one group of metric is made For weight, the proportional difference for accounting for total weight is too small, when to the AUC progress min-max standardization in the first measurement value set After processing, the AUC in the first measurement value set is mapped on the section of [0,1], is compared with the value range of original [0.5,1], The diversity factor between metric is expanded to a certain extent.

Optionally, step S103 is specifically included:

Step S1031, the metric in the first measurement value set after standardization is clustered, is obtained multiple Cluster includes at least one metric in each cluster.

Wherein, the method for cluster can be K-means algorithm, K- central point algorithm, CLARANS algorithm etc., herein no longer It repeats.

Step S1032, the metric in each cluster is carried out respectively according to preset strategy expanding difference processing, obtains second Measure value set.

The diversity factor of the metric in some sections is sufficiently large, does not need to carry out expansion difference processing again, and some areas Between metric diversity factor it is smaller, need to expand diversity factor.Therefore, method of the present embodiment by clustering, by similar degree Magnitude is divided into a cluster, expands the diversity factor between metric using different strategies for different clusters.

Further, step S1032 is specifically included: step S201, step S202 and step S203.

Step S201, the boundary point of each cluster is determined and the quantity of metric that each cluster includes.

Wherein, the boundary point of cluster refers to maximum value and minimum value in the metric of the cluster.

Step S202, the quantity for the metric for including according to the boundary point of each cluster and each cluster, determines each cluster Density.

Wherein, the density of clusterWherein, according to the maximum value A in cluster_maxFor the maximum value in cluster, A_min For the minimum value in cluster, num is the quantity for the metric that the cluster includes.

Step S203, judge whether the density of each cluster is greater than pre-set density, density is greater than in the cluster of pre-set density Metric carries out expanding difference processing.

Further, step S203 is specifically included:

Step S2031, expand the boundary of cluster to be processed, wherein cluster to be processed is the cluster that density is greater than pre-set density.

Step S2032, the boundary before being expanded according to cluster to be processed and the boundary after expansion determine sampling factor.

Step S2033, the distance between each metric in cluster to be processed is expanded according to sampling factor.

Assuming that the boundary point of cluster to be processed is A_maxAnd A_min, the boundary point after expansion is C_maxAnd C_min, wherein C_max>A_max, C_min< A_min；By the metric in same cluster to the boundary point C at both ends_maxAnd C_minEqual proportion extension, sampling factor areI-th of metric after extension is C_i=C_min+k(A_i-A_min), A_iFor i-th of metric before extension.

Wherein, the boundary point after each cluster expands can determine, the corresponding expansion of the biggish cluster of density according to the density of each cluster Large scale is larger, i.e. boundary point C_maxWith A_max、C_minWith A_minSpacing can set greatly, but after must assure that each cluster expands Range is not overlapped.

Optionally, as shown in figure 3, further including step S105 between step S103 and step S104: to the second metric collection It closes and carries out Laplce's smoothing processing.

Correspondingly, step S104 is specifically included: by the measurement in the second measurement value set after Laplce's smoothing processing Fitness input roulette model of the value as each candidate feature, using feature that roulette model exports as choosing feature.

Zero probability problem is exactly in the probability of calculated examples, if some amount x, in observation sample database or training set Do not occurred, the probability results that will lead to entire example are 0.For example, in the text classification the problem of, when a word does not have Occur in training sample, which is 0, the use of when multiplying and calculating text probability of occurrence is also 0.This is unreasonable , it is 0 that the just dogmatic probability for thinking the event cannot be not observed because of an event.

Laplce's smoothing processing, exactly by adding 1 method to molecule when calculating probability, estimation does not occur The probability for the phenomenon that crossing.It is assumed that when training sample is very big, each component x estimated probability variation caused by count is incremented can be neglected Slightly disregard, but can be convenient and effectively avoid zero probability problem.

Therefore, the method for the present embodiment avoids selecting by carrying out Laplce's smoothing processing to the second measurement value set Occur zero probability problem during selecting feature, guarantees that each feature has selected probability, do not lose any one feature.

Embodiment three

Based on embodiment one, two identical inventive concepts, the embodiment of the present application provides a kind of feature that randomly selects Device, as shown in figure 4, the device 40 for randomly selecting feature may include: metric determining module 401, standardization mould Block 402, difference extension module 403 and characteristic selecting module 404.

Metric determining module 401 is used to determine the metric of each candidate feature, obtains the first measurement value set.

Standardization module 402 is used to be standardized the first measurement value set.

Difference extension module 403 is used to expand by preset algorithm each in the first measurement value set after standardization Difference between metric obtains the second measurement value set.

Characteristic selecting module 404 is used for using the metric in the second measurement value set as the fitness of each candidate feature Roulette model is inputted, using feature that roulette model exports as choosing feature.

The device provided in this embodiment for randomly selecting feature, the difference between metric by expanding each feature, The difference between the corresponding select probability of each feature is expanded, so that metric height and selected general of the low feature of metric The diversity factor of rate is larger, therefore, while guaranteeing the randomness of Feature Selection, has improved the selected probability of feature, most Make to choose the algorithm of feature to can make full use of effective characteristic information using this eventually, improves arithmetic accuracy.

Optionally, standardization module 402 is specifically used for: carrying out at min-max standardization to the first measurement value set Reason.

Optionally, difference extension module 403 is specifically used for: to each in the first measurement value set after standardization Metric carries out square operation and obtains the second measurement value set to expand the difference between each metric.

Optionally, as shown in figure 5, difference extension module 403 includes cluster cell 501 and difference processing unit 502.

Wherein, cluster cell 501 is used to cluster the metric in the first measurement value set after standardization, Multiple clusters are obtained, include at least one metric in each cluster.

Wherein, difference processing unit 502 is used to expand to the metric in each cluster respectively according to preset strategy poor Different processing obtains the second measurement value set.

Further, as shown in figure 5, difference processing unit 502 includes density computation subunit 5021 and difference processing Unit 5022.

Wherein, density computation subunit 5021 is for determining the metric that the boundary point of each cluster and each cluster include Quantity determines the density of each cluster according to the quantity for the metric that the boundary point of each cluster and each cluster include.

Difference processing subelement 5022 is greater than density default for judging whether the density of each cluster is greater than pre-set density Metric in the cluster of density carries out expanding difference processing.

Further, difference processing subelement 5022 is specifically used for: expanding the boundary of cluster to be processed, wherein cluster to be processed It is greater than the cluster of pre-set density for density；Boundary before being expanded according to cluster to be processed and the boundary after expansion determine sampling factor；Root Expand the distance between each metric in cluster to be processed according to sampling factor.

Optionally, the device 40 of the present embodiment further includes smoothing module 405, the smoothing module 405 for pair Second measurement value set carries out Laplce's smoothing processing.

Correspondingly, characteristic selecting module 404 is specifically used for: will be in the second measurement value set after Laplce's smoothing processing Metric as each candidate feature fitness input roulette model.

The device for randomly selecting feature of the present embodiment uses the method phase for randomly selecting feature with embodiment one, two Same inventive concept, can obtain identical technical effect, details are not described herein.

Example IV

The embodiment of the present application provides a kind of electronic equipment, as shown in fig. 6, electronic equipment shown in fig. 6 600 includes: place Manage device 601 and memory 603.Wherein, processor 601 is connected with memory 603, is such as connected by bus 602.Optionally, electric Sub- equipment 600 can also include transceiver 604.It should be noted that transceiver 604 is not limited to one in practical application, the electricity The structure of sub- equipment 600 does not constitute the restriction to the embodiment of the present application.

Wherein, processor 601 is applied in the embodiment of the present application, for realizing metric determining module shown in Fig. 4 401, standardization module 402, the function of difference extension module 403 and characteristic selecting module 404.Transceiver 604 includes connecing Receipts machine and transmitter.

Processor 601 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 601 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..

Bus 602 may include an access, and information is transmitted between said modules.Bus 602 can be pci bus or EISA Bus etc..Bus 602 can be divided into address bus, data/address bus, control bus etc..For convenient for indicating, in Fig. 6 only with one slightly Line indicates, it is not intended that an only bus or a type of bus.

Memory 603 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.

Optionally, memory 603 be used for store execution application scheme application code, and by processor 601 Control executes.Processor 601 is for executing the application code stored in memory 603, to realize that embodiment illustrated in fig. 6 mentions The movement of the device for randomly selecting feature supplied.

Compared with prior art, electronic equipment provided by the embodiments of the present application, by expand each feature metric it Between difference, the difference between the corresponding select probability of each feature is expanded, so that metric height and the low feature of metric The diversity factor of selected probability is larger, therefore, while guaranteeing the randomness of Feature Selection, it is selected to have improved feature Probability, finally to choose the algorithm of feature to can make full use of effective characteristic information using this, improve arithmetic accuracy.

Optionally, processor 601 is real shown in Fig. 5 to realize for executing the application code stored in memory 603 The movement of the device for randomly selecting feature of example offer is applied, details are not described herein.

Embodiment five

The embodiment of the present application provides a kind of computer readable storage medium, is stored on the computer readable storage medium Computer program realizes the method that feature is randomly selected shown in embodiment one when the program is executed by processor.

The embodiment of the present application provides a kind of computer readable storage medium, compared with prior art, each by expanding Difference between the metric of feature expands the difference between the corresponding select probability of each feature so that metric height and The diversity factor of the selected probability of the low feature of metric is larger, therefore, while guaranteeing the randomness of Feature Selection, is promoted The selected probability of good feature finally to choose the algorithm of feature to can make full use of effective characteristic information using this, Improve arithmetic accuracy.

Optionally, the embodiment of the present application also provides a kind of computer readable storage medium, the computer-readable storage mediums It is stored with computer program in matter, the side for randomly selecting feature shown in embodiment two is realized when which is executed by processor Method, details are not described herein.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the application, it is noted that for the ordinary skill people of the art For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered It is considered as the protection scope of the application.

Claims

1. a kind of method for randomly selecting feature characterized by comprising

The first measurement value set is standardized；

Difference in the first measurement value set after expanding standardization by preset algorithm between each metric, obtains second Measure value set；

Metric in the second measurement value set is inputted into roulette model as the fitness of each candidate feature, by institute The feature for stating the output of roulette model, which is used as, chooses feature.

2. the method according to claim 1, wherein described be standardized place to the first measurement value set Reason, comprising:

Min-max standardization is carried out to the first measurement value set.

3. the method according to claim 1, wherein the expanded after standardization by preset algorithm Difference in one metric set between each metric obtains the second measurement value set, comprising:

Square operation is carried out to each metric in the first measurement value set after standardization, to expand each metric Between difference, obtain the second measurement value set.

4. the method according to claim 1, wherein the expanded after standardization by preset algorithm Difference in one metric set between each metric obtains the second measurement value set, comprising:

Metric in the first measurement value set after standardization is clustered, multiple clusters is obtained, includes in each cluster At least one metric；

The metric in each cluster is carried out respectively according to preset strategy to expand difference processing, obtains the second measurement value set.

5. according to the method described in claim 4, it is characterized in that, it is described according to preset strategy respectively to the measurement in each cluster Value carries out expanding difference processing, comprising:

Judge whether the density of each cluster is greater than pre-set density, the metric being greater than in the cluster of the pre-set density to density carries out Expand difference processing.

6. according to the method described in claim 5, it is characterized in that, it is described to density be greater than the pre-set density cluster in degree Magnitude carries out expanding difference processing, comprising:

Expand the boundary of cluster to be processed, wherein the cluster to be processed is the cluster that density is greater than the pre-set density；

Boundary before being expanded according to the cluster to be processed and the boundary after expansion determine sampling factor；

Expand the distance between each metric in the cluster to be processed according to the sampling factor.

7. method according to any one of claim 1 to 6, which is characterized in that described by the second measurement value set In metric input roulette model as the fitness of each candidate feature before, the method also includes: to described the Two measurement value sets carry out Laplce's smoothing processing；

The metric using in the second measurement value set inputs roulette model as the fitness of each candidate feature, It include: that the metric in the second measurement value set after Laplce's smoothing processing is defeated as the fitness of each candidate feature Enter roulette model.

8. one kind randomly selects characterizing arrangement characterized by comprising

Difference extension module, for passing through each metric in the first measurement value set after preset algorithm expansion standardization Between difference, obtain the second measurement value set；

Characteristic selecting module, for the metric in the second measurement value set is defeated as the fitness of each candidate feature Enter roulette model, using the feature of roulette model output as choosing feature.

9. a kind of electronic equipment, characterized in that it comprises:

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, one or more of application programs are configured to: being executed according to claim 1 To the method for randomly selecting feature described in any one of 7.

10. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing computer Instruction, when run on a computer, allow computer execute described in any one of the claims 1 to 7 with The method of machine selected characteristic.