CN109635955A - A kind of feature combination method, device and equipment - Google Patents

A kind of feature combination method, device and equipment Download PDF

Info

Publication number
CN109635955A
CN109635955A CN201811430613.3A CN201811430613A CN109635955A CN 109635955 A CN109635955 A CN 109635955A CN 201811430613 A CN201811430613 A CN 201811430613A CN 109635955 A CN109635955 A CN 109635955A
Authority
CN
China
Prior art keywords
feature
assemblage characteristic
characteristic
assemblage
hot encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811430613.3A
Other languages
Chinese (zh)
Inventor
何博睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Integrity Information Co Ltd
Original Assignee
China Integrity Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Integrity Information Co Ltd filed Critical China Integrity Information Co Ltd
Priority to CN201811430613.3A priority Critical patent/CN109635955A/en
Publication of CN109635955A publication Critical patent/CN109635955A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the invention provides a kind of feature combination method, device and equipment, this method comprises: each feature is indicated using one-hot encoding, as the corresponding one-hot encoding feature of each feature;The selected characteristic from one-hot encoding feature carries out feature combination using feature of the logical operation algorithm to selection, obtains the first preset quantity assemblage characteristic;Determine that degree of correlation absolute value is greater than the feature of preset threshold between target signature in obtained assemblage characteristic, as the first assemblage characteristic;The second preset quantity feature is chosen from obtained assemblage characteristic, as the second assemblage characteristic;Cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic using genetic algorithm and obtains new assemblage characteristic, repeat to determine the first assemblage characteristic to preset termination condition is met, using the highest third preset quantity assemblage characteristic of the degree of correlation in the first assemblage characteristic determined each time as feature combined result.The workload of modeling personnel can be reduced using scheme provided in an embodiment of the present invention.

Description

A kind of feature combination method, device and equipment
Technical field
The present invention relates to field of computer technology, more particularly to a kind of feature combination method, device and equipment.
Background technique
During machine learning modeling, it is characterized in determining for carrying out model training by what Feature Engineering obtained The key of modelling effect quality.Feature Engineering occupies very important effect in machine learning, generally comprises: feature combination, Three feature extraction, feature selecting parts.Feature combination is to pass through the combination between feature on the basis of existing feature New feature is obtained, the feature for model training is increased with this.
In practical application, feature combination is usually that modeling personnel rule of thumb carry out artificial combination completion.However it is artificial Combined mode can expend a large amount of time and efforts of modeling personnel, increase the workload of modeling personnel.Based on this, how in spy The workload that anabolic process reduces modeling personnel is levied, is current urgent problem to be solved.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of feature combination method, device and equipment, reduces modeling to realize The workload of personnel.Specific technical solution is as follows:
The one side that the present invention is implemented provides a kind of feature combination method, which comprises
Obtain the feature for carrying out feature combination;
Acquired each feature is indicated using one-hot encoding, as the corresponding one-hot encoding feature of each feature;
The selected characteristic from one-hot encoding feature carries out feature group to selected feature using preset logical operation algorithm It closes, obtains the first preset quantity assemblage characteristic;
Determine that degree of correlation absolute value is greater than the feature of preset threshold between target signature in obtained assemblage characteristic, as First assemblage characteristic;
The second preset quantity feature is chosen from obtained assemblage characteristic, as the second assemblage characteristic;
Using genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, it is special to obtain new combination Sign returns in the assemblage characteristic that the determination obtains the feature that between target signature degree of correlation absolute value is greater than preset threshold Step, until meet preset termination condition, and by the degree of correlation in the first assemblage characteristic determined each time highest the Three preset quantity assemblage characteristics are as output result.
It is optionally, described that acquired each feature is indicated using one-hot encoding, comprising:
Indicate acquired each feature in the following way using one-hot encoding:
In the case where feature is continuous feature, feature is classified to obtain discrete features using branch mailbox method, using only Thermal meter code shows obtained discrete features, as the corresponding one-hot encoding feature of feature;
In the case where feature is discrete features, feature directly is indicated using one-hot encoding, as the corresponding one-hot encoding of feature Feature.
Optionally, the selected characteristic from one-hot encoding feature, using preset logical operation algorithm to selected spy The step of sign carries out feature combination, obtains the first preset quantity assemblage characteristic, comprising:
N feature is randomly selected from one-hot encoding feature, wherein n >=2;
Using with or XOR logic mathematical algorithm feature is carried out to selected feature and combines to obtain assemblage characteristic;
Described the step of randomly selecting n feature in one-hot encoding feature is returned to, until it is default to be accumulated by described first Quantity assemblage characteristic.
Optionally, degree of correlation absolute value is greater than default threshold between target signature in the assemblage characteristic that the determination obtains The feature of value, the step of as the first assemblage characteristic after, further includes:
Duplicate removal processing is carried out to the first assemblage characteristic;
It is described to utilize genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, obtains new group Close feature, comprising:
The first assemblage characteristic and the second assemblage characteristic progress cross and variation after duplicate removal processing are obtained using genetic algorithm To new assemblage characteristic.
Optionally, the degree of correlation are as follows: information value IV or relative coefficient.
Optionally, described to utilize genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, is obtained The step of new assemblage characteristic, comprising:
An assemblage characteristic is chosen in the first assemblage characteristic and the second assemblage characteristic respectively;
Using genetic algorithm, cross and variation is carried out to two selected assemblage characteristics and obtains new assemblage characteristic;
Return it is described respectively the first selected assemblage characteristic with selected second combine in choose combination spy The step of sign, until the first assemblage characteristic of traversal.
The another aspect that the present invention is implemented, additionally provides a kind of feature combination unit, and described device includes:
Module is obtained, for obtaining the feature for carrying out feature combination;
Representation module, for indicating acquired each feature using one-hot encoding, as the corresponding one-hot encoding of each feature Feature;
Composite module, for the selected characteristic from one-hot encoding feature, using preset logical operation algorithm to selected Feature carries out feature combination, obtains the first preset quantity assemblage characteristic;
First determining module, for determining, degree of correlation absolute value is greater than in advance between target signature in obtained assemblage characteristic If the feature of threshold value, as the first assemblage characteristic;
Module is chosen, it is special as the second combination for choosing the second preset quantity feature from obtained assemblage characteristic Sign;
Make a variation module, for utilizing genetic algorithm, carries out cross and variation to the first assemblage characteristic and the second assemblage characteristic, obtains To new assemblage characteristic, and the first determining module is triggered, until meet preset termination condition, and will determined each time The highest third preset quantity assemblage characteristic of the degree of correlation is as output result in first assemblage characteristic.
Optionally, the representation module includes:
First indicates submodule, for being classified to feature using branch mailbox method in the case where feature is continuous feature Discrete features are obtained, obtained discrete features are indicated using one-hot encoding, as the corresponding one-hot encoding feature of feature;
Second indicates submodule, for directly indicating feature using one-hot encoding in the case where feature is discrete features, makees It is characterized corresponding one-hot encoding feature.
Optionally, the composite module, comprising:
First chooses submodule, for randomly selecting n feature from one-hot encoding feature, wherein n >=2;
Combine submodule, for using with or XOR logic mathematical algorithm selected feature progress feature is combined To assemblage characteristic, the selection submodule is triggered, until being accumulated by the first preset quantity assemblage characteristic.
Optionally, described device further include:
Deduplication module, for carrying out duplicate removal processing to the first assemblage characteristic;
The variation module, be specifically used for utilize genetic algorithm, to after duplicate removal processing the first assemblage characteristic and second group It closes feature and carries out cross and variation, obtain new assemblage characteristic.
Optionally, the degree of correlation are as follows: information value IV or relative coefficient.
Optionally, the variation module, comprising:
Second chooses submodule, for combining spy with selection one in the second assemblage characteristic in the first assemblage characteristic respectively Sign;
Make a variation submodule, for utilizing genetic algorithm, carries out cross and variation to two selected assemblage characteristics and obtains newly Assemblage characteristic, trigger it is described second choose submodule, until traversal the first assemblage characteristic.
The another aspect that the present invention is implemented additionally provides a kind of electronic equipment, including processor, communication interface, memory And communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described feature combination method.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable Instruction is stored in storage medium, when run on a computer, so that computer executes any of the above-described feature group Conjunction method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes any of the above-described feature combination method.
A kind of feature combination method, device and equipment provided in an embodiment of the present invention can use one-hot encoding expression and obtained The each feature taken, as the corresponding one-hot encoding feature of each feature, and from one-hot encoding feature, selected characteristic utilizes in advance in turn If logical operation algorithm feature combination is carried out to selected feature, obtain the first preset quantity assemblage characteristic, then exist The feature that the degree of correlation absolute value between target signature is greater than preset threshold is chosen in obtained assemblage characteristic, as the first combination Feature then chooses the second preset quantity feature from obtained assemblage characteristic, is used to utilize something lost as the second assemblage characteristic Propagation algorithm and the first assemblage characteristic carry out cross and variation, obtain new assemblage characteristic, and continuation is chosen in obtained assemblage characteristic Degree of correlation absolute value is greater than the feature of preset threshold between target signature, until meet preset termination condition, it will be every The assemblage characteristic in the first assemblage characteristic once determined obtains degree of correlation highest according to relevancy ranking between target signature Third preset quantity assemblage characteristic as final feature combined result.It, can using scheme provided in an embodiment of the present invention Automatically feature is carried out to the feature of acquisition to combine to obtain combined result, and then reduce the workload of modeling personnel.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of feature combination method flow diagram provided in an embodiment of the present invention;
Fig. 2 is that a kind of feature provided in an embodiment of the present invention combines the unit structural schematic diagram;
Fig. 3 is a kind of electronic equipment structural schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
A kind of flow diagram of feature combination method provided in an embodiment of the present invention, this method, packet are shown referring to Fig. 1 It includes:
S100 obtains the feature for carrying out feature combination.
Feature generally includes: continuous feature and discrete features.Continuous feature namely refers to that characteristic value is continuous feature, than Such as, age, wage etc..Correspondingly, discrete features namely refer to the discontinuous feature of characteristic value, for example, gender, color etc..
S110 indicates acquired each feature using one-hot encoding, as the corresponding one-hot encoding feature of each feature.
One-hot encoding, i.e. one-hot code, it is intuitive for be exactly how many state with regard to how many bit, and only one A bit is 1, other bits are all a kind of 0 code system.And one-hot encoding can play the role of augmented features.
Feature is combined for the ease of later use logical operation algorithm, can use one-hot encoding indicates the every of acquisition One feature.
One-hot encoding could be utilized to indicate after needing to carry out discretization due to continuous feature, it is continuous feature in feature In the case where, it can use branch mailbox method and feature classified to obtain discrete features, indicate obtained discrete using one-hot encoding Feature, as the corresponding one-hot encoding feature of feature;For example, continuous characteristic age, it can be discrete by the age are as follows: juvenile (0-18), Middle aged (19-60), old (60 or more) three parts, then indicate juvenile with 001, and 010 indicates the middle age, and 100 indicate old.
Branch mailbox method is the common method of discretization to be carried out to feature, and can play the role of smooth features.
And in the case where feature is discrete features, can feature directly be indicated using one-hot encoding, it is corresponding as feature One-hot encoding feature.For example, discrete features gender only has male and two kinds of situations of female, therefore, male can be indicated with 01,10 indicate female.
S120, the selected characteristic from one-hot encoding feature carry out selected feature using preset logical operation algorithm Feature combination, is accumulated by the first preset quantity assemblage characteristic.
First preset quantity can be set as needed, such as can be with are as follows: 10000,20000 etc..
In one embodiment of the present of invention, above-mentioned preset logical operation algorithm can be for based on logic of propositions operator Algorithm, wherein above-mentioned logic of propositions operator can be with are as follows: with or the operators such as exclusive or.
Specifically, n feature can be randomly selected from one-hot encoding feature, wherein n >=2;Then utilize with or exclusive or Logical operation algorithm carries out feature to selected feature and combines to obtain assemblage characteristic;It repeats random in one-hot encoding feature Choose n feature and with or XOR logic mathematical algorithm feature is carried out to selected feature and is combined, up to being accumulated by First preset quantity assemblage characteristic.
Quantity due to obtaining assemblage characteristic after randomly selecting n feature progress feature combination from one-hot encoding feature may not Equal to the first preset quantity, then need to repeat at this time to randomly select the process that n feature carries out feature combination from one-hot encoding feature, Until the quantity for the assemblage characteristic being accumulated by reaches the first preset quantity.
S130 determines that degree of correlation absolute value is greater than the spy of preset threshold between target signature in obtained assemblage characteristic Sign, as the first assemblage characteristic.
In one embodiment of the present of invention, above-mentioned target signature can be preset feature, for example, to obtain and believe The relevant assemblage characteristic of expenditure, correspondingly credit rating is then target signature.
Specifically, phase of each assemblage characteristic respectively between target signature can be calculated after obtaining said combination feature Guan Du selects degree of correlation absolute value to be greater than the feature of preset threshold then from obtained assemblage characteristic, special as the first combination Sign.
The degree of correlation usually there will be positive correlation and negatively correlated situation when being related coefficient, and degree of correlation absolute value more it is big then It indicates that the discrimination of feature is better, therefore, a threshold value can be set to select the assemblage characteristic of needs.For example, the degree of correlation takes When value range is (- 1,1), above-mentioned preset threshold can be set to 0.6.
In a kind of implementation, can use fitness function assess the assemblage characteristic each obtained and target signature it Between related coefficient as in the assemblage characteristic each obtained between target signature the degree of correlation.
In another implementation, the IV (information between target signature can also be calculated in assemblage characteristic Value, information value) value as in the assemblage characteristic each obtained between target signature the degree of correlation.
Due to being randomly selected when selected characteristic carries out feature combination from one-hot encoding feature, accordingly, it is possible to which there are two Secondary selection be characterized in it is the same, at this time to the feature specifically chosen carry out the result that feature combines be also it is identical, There may be identical feature when resulting in choosing the first assemblage characteristic, therefore, it is necessary to carry out at duplicate removal to the first assemblage characteristic Reason;Then the first assemblage characteristic after duplicate removal processing is determined as the subsequent feature that cross and variation is carried out with the second assemblage characteristic.
S140 chooses the second preset quantity feature, as the second assemblage characteristic from obtained assemblage characteristic.
The part in assemblage characteristic that second preset quantity feature can be, at this time can be by randomly selecting Mode in obtained assemblage characteristic selected part as the second preset quantity feature;Second preset quantity feature can also be with For the whole in obtained assemblage characteristic.
S150 carries out cross and variation to the first assemblage characteristic and the second assemblage characteristic, obtains new group using genetic algorithm Feature is closed, S130 is returned, until meeting preset termination condition, and will be related in the first assemblage characteristic determined each time Highest third preset quantity assemblage characteristic is spent as output result.
It is pre- that the quantity for the assemblage characteristic that preset termination condition can be higher than preset threshold for the degree of correlation reaches the 4th If quantity, or iteration determines that the number of the first assemblage characteristic reaches preset times.4th preset quantity and preset times The quantity for the assemblage characteristic that can according to need is set, and reaching the 4th preset quantity in advance can allow calculating to stop in advance Only;The value of preset times is bigger, and finally obtained assemblage characteristic is more.
In a kind of implementation, using genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic During, an assemblage characteristic can be chosen in the first assemblage characteristic and the second assemblage characteristic respectively;Then heredity is utilized Algorithm carries out cross and variation to two selected assemblage characteristics and obtains new assemblage characteristic;It repeats respectively at first group It closes and chooses an assemblage characteristic in feature and the second assemblage characteristic and cross and variation is carried out to two selected assemblage characteristics Process, until the first assemblage characteristic of traversal.
Each scheme provided in an embodiment of the present invention, feature combination method can carry out feature group to the feature of acquisition automatically Conjunction obtains combined result, and then reduces the workload of modeling personnel.
A kind of structural schematic diagram of feature combination unit provided in an embodiment of the present invention, the device, packet are shown referring to fig. 2 It includes:
Module 200 is obtained, for obtaining the feature for carrying out feature combination;
Representation module 210, for indicating acquired each feature using one-hot encoding, as the corresponding only heat of each feature Code feature;
Composite module 220, for the selected characteristic from one-hot encoding feature, using preset logical operation algorithm to selected Feature carry out feature combination, obtain the first preset quantity assemblage characteristic;
First determining module 230, for determining, degree of correlation absolute value is big between target signature in obtained assemblage characteristic In the feature of preset threshold, as the first assemblage characteristic;
Module 240 is chosen, for choosing the second preset quantity feature from obtained assemblage characteristic, as the second combination Feature;
The module 250 that makes a variation carries out the first assemblage characteristic with the second assemblage characteristic to intersect change for utilizing genetic algorithm It is different, new assemblage characteristic is obtained, and trigger the first determining module 230, until meeting preset termination condition, and will be each The highest third preset quantity assemblage characteristic of the degree of correlation is as output result in first assemblage characteristic of secondary determination.
In a kind of implementation of the embodiment of the present invention, the representation module 210, comprising:
First indicates submodule, for being classified to feature using branch mailbox method in the case where feature is continuous feature Discrete features are obtained, obtained discrete features are indicated using one-hot encoding, as the corresponding one-hot encoding feature of feature;
Second indicates submodule, for directly indicating feature using one-hot encoding in the case where feature is discrete features, makees It is characterized corresponding one-hot encoding feature.
In a kind of implementation of the embodiment of the present invention, the combination die 220, comprising:
First chooses submodule, for randomly selecting n feature from one-hot encoding feature, wherein n >=2;
Combine submodule, for using with or XOR logic mathematical algorithm selected feature progress feature is combined To assemblage characteristic, the selection submodule is triggered, until being accumulated by the first preset quantity assemblage characteristic.
In a kind of implementation of the embodiment of the present invention, described device further include:
Deduplication module, for carrying out duplicate removal processing to the first assemblage characteristic;
The variation module 250 is specifically used for utilizing genetic algorithm, to the first assemblage characteristic and second after duplicate removal processing Assemblage characteristic carries out cross and variation, obtains new assemblage characteristic.
In a kind of implementation of the embodiment of the present invention, the degree of correlation are as follows: information value IV or relative coefficient.
In a kind of implementation of the embodiment of the present invention, the variation module 250, comprising:
Second chooses submodule, for combining spy with selection one in the second assemblage characteristic in the first assemblage characteristic respectively Sign;
Make a variation submodule, for utilizing genetic algorithm, carries out cross and variation to two selected assemblage characteristics and obtains newly Assemblage characteristic, trigger it is described second choose submodule, until traversal the first assemblage characteristic.
Each scheme provided in an embodiment of the present invention, feature combination unit can carry out feature group to the feature of acquisition automatically Conjunction obtains combined result, and then reduces the workload of modeling personnel.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 3, include processor 001, communication interface 002, Memory 003 and communication bus 004, wherein processor 001, communication interface 002, memory 003 are complete by communication bus 004 At mutual communication,
Memory 003, for storing computer program;
Processor 001 when for executing the program stored on memory 003, realizes spy provided in an embodiment of the present invention Levy combined method.
Specifically, features described above combined method includes:
Obtain the feature for carrying out feature combination;
Acquired each feature is indicated using one-hot encoding, as the corresponding one-hot encoding feature of each feature;
The selected characteristic from one-hot encoding feature carries out feature group to selected feature using preset logical operation algorithm It closes, obtains the first preset quantity assemblage characteristic;
Determine that degree of correlation absolute value is greater than the feature of preset threshold between target signature in obtained assemblage characteristic, as First assemblage characteristic;
The second preset quantity feature is chosen from obtained assemblage characteristic, as the second assemblage characteristic;
Using genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, it is special to obtain new combination Sign returns in the assemblage characteristic that the determination obtains the feature that between target signature degree of correlation absolute value is greater than preset threshold Step, until meet preset termination condition, and by the degree of correlation in the first assemblage characteristic determined each time highest the Three preset quantity assemblage characteristics are as output result.
It should be noted that above-mentioned processor 011, which executes the program stored on memory 013, realizes feature combination method Other embodiments, with preceding method embodiment part provide embodiment it is identical, which is not described herein again.
In each scheme provided in an embodiment of the present invention, electronic equipment can carry out feature combination to the feature of acquisition automatically Combined result is obtained, and then reduces the workload of modeling personnel.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, feature combination method provided in an embodiment of the present invention.
Specifically, features described above combined method, comprising:
Obtain the feature for carrying out feature combination;
Acquired each feature is indicated using one-hot encoding, as the corresponding one-hot encoding feature of each feature;
The selected characteristic from one-hot encoding feature carries out feature group to selected feature using preset logical operation algorithm It closes, obtains the first preset quantity assemblage characteristic;
Determine that degree of correlation absolute value is greater than the feature of preset threshold between target signature in obtained assemblage characteristic, as First assemblage characteristic;
The second preset quantity feature is chosen from obtained assemblage characteristic, as the second assemblage characteristic;
Using genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, it is special to obtain new combination Sign returns in the assemblage characteristic that the determination obtains the feature that between target signature degree of correlation absolute value is greater than preset threshold Step, until meet preset termination condition, and by the degree of correlation in the first assemblage characteristic determined each time highest the Three preset quantity assemblage characteristics are as output result.
It should be noted that the other embodiments of feature combination method are realized by above-mentioned computer readable storage medium, Identical as the embodiment that preceding method embodiment part provides, which is not described herein again.
In each scheme provided in an embodiment of the present invention, by running the finger stored in above-mentioned computer readable storage medium It enables, feature can be carried out to the feature of acquisition automatically and combine to obtain combined result, and then reduce the workload of modeling personnel.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, feature combination method provided in an embodiment of the present invention is realized.
Specifically, features described above combined method, comprising:
Obtain the feature for carrying out feature combination;
Acquired each feature is indicated using one-hot encoding, as the corresponding one-hot encoding feature of each feature;
The selected characteristic from one-hot encoding feature carries out feature group to selected feature using preset logical operation algorithm It closes, obtains the first preset quantity assemblage characteristic;
Determine that degree of correlation absolute value is greater than the feature of preset threshold between target signature in obtained assemblage characteristic, as First assemblage characteristic;
The second preset quantity feature is chosen from obtained assemblage characteristic, as the second assemblage characteristic;
Using genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, it is special to obtain new combination Sign returns in the assemblage characteristic that the determination obtains the feature that between target signature degree of correlation absolute value is greater than preset threshold Step, until meet preset termination condition, and by the degree of correlation in the first assemblage characteristic determined each time highest the Three preset quantity assemblage characteristics are as output result.
It should be noted that the other embodiments of feature combination method are realized by above-mentioned computer program product, and it is preceding The embodiment for stating the offer of embodiment of the method portion is identical, and which is not described herein again.
It, can by running the above-mentioned computer program product comprising instruction in each scheme provided in an embodiment of the present invention It combines to obtain combined result to carry out feature to the feature of acquisition automatically, and then reduces the workload of modeling personnel.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment, computer scale storage medium and computer program product embodiments, since it is substantially similar to method Embodiment, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (14)

1. a kind of feature combination method, which is characterized in that the described method includes:
Obtain the feature for carrying out feature combination;
Acquired each feature is indicated using one-hot encoding, as the corresponding one-hot encoding feature of each feature;
The selected characteristic from one-hot encoding feature carries out feature combination to selected feature using preset logical operation algorithm, Obtain the first preset quantity assemblage characteristic;
Determine that degree of correlation absolute value is greater than the feature of preset threshold between target signature in obtained assemblage characteristic, as first Assemblage characteristic;
The second preset quantity feature is chosen from obtained assemblage characteristic, as the second assemblage characteristic;
Using genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, new assemblage characteristic is obtained, returns The step of degree of correlation absolute value is greater than the feature of preset threshold between target signature in the assemblage characteristic that the determination obtains is returned, Until meeting preset termination condition, and the highest third of the degree of correlation in the first assemblage characteristic determined each time is preset Quantity assemblage characteristic is as output result.
2. the method as described in claim 1, which is characterized in that described to indicate acquired each feature, packet using one-hot encoding It includes:
Indicate acquired each feature in the following way using one-hot encoding:
In the case where feature is continuous feature, feature is classified to obtain discrete features using branch mailbox method, utilizes one-hot encoding Obtained discrete features are indicated, as the corresponding one-hot encoding feature of feature;
In the case where feature is discrete features, feature directly is indicated using one-hot encoding, as the corresponding one-hot encoding feature of feature.
3. the method as described in claim 1, which is characterized in that the selected characteristic from one-hot encoding feature, utilization are preset The step of logical operation algorithm carries out feature combination to selected feature, obtains the first preset quantity assemblage characteristic, comprising:
N feature is randomly selected from one-hot encoding feature, wherein n >=2;
Using with or XOR logic mathematical algorithm feature is carried out to selected feature and combines to obtain assemblage characteristic;
Described the step of randomly selecting n feature in one-hot encoding feature is returned to, until being accumulated by first preset quantity A assemblage characteristic.
4. the method as described in claim 1, which is characterized in that in the assemblage characteristic that the determination obtains with target signature it Between degree of correlation absolute value be greater than the feature of preset threshold, the step of as the first assemblage characteristic after, further includes:
Duplicate removal processing is carried out to the first assemblage characteristic;
It is described to utilize genetic algorithm, cross and variation is carried out to the first assemblage characteristic and the second assemblage characteristic, it is special to obtain new combination Sign, comprising:
The first assemblage characteristic and the second assemblage characteristic progress cross and variation after duplicate removal processing are obtained new using genetic algorithm Assemblage characteristic.
5. the method as described in claim 1, which is characterized in that the degree of correlation are as follows: information value IV or relative coefficient.
6. the method according to claim 1 to 5, which is characterized in that it is described to utilize genetic algorithm, it is special to the first combination The step of sign carries out cross and variation with the second assemblage characteristic, obtains new assemblage characteristic, comprising:
An assemblage characteristic is chosen in the first assemblage characteristic and the second assemblage characteristic respectively;
Using genetic algorithm, cross and variation is carried out to two selected assemblage characteristics and obtains new assemblage characteristic;
Return it is described respectively the first selected assemblage characteristic with selected second combine in choose assemblage characteristic Step, until the first assemblage characteristic of traversal.
7. a kind of feature combination unit, which is characterized in that described device includes:
Module is obtained, for obtaining the feature for carrying out feature combination;
Representation module, for indicating acquired each feature using one-hot encoding, as the corresponding one-hot encoding feature of each feature;
Composite module, for the selected characteristic from one-hot encoding feature, using preset logical operation algorithm to selected feature Feature combination is carried out, the first preset quantity assemblage characteristic is obtained;
First determining module, for determining, degree of correlation absolute value is greater than default threshold between target signature in obtained assemblage characteristic The feature of value, as the first assemblage characteristic;
Module is chosen, for choosing the second preset quantity feature from obtained assemblage characteristic, as the second assemblage characteristic;
Make a variation module, for utilizing genetic algorithm, carries out cross and variation to the first assemblage characteristic and the second assemblage characteristic, obtains new Assemblage characteristic, and the first determining module is triggered, until meet preset termination condition, and will determine each time first The highest third preset quantity assemblage characteristic of the degree of correlation is as output result in assemblage characteristic.
8. device as claimed in claim 7, which is characterized in that the representation module includes:
First indicates submodule, for being classified to obtain to feature using branch mailbox method in the case where feature is continuous feature Discrete features indicate obtained discrete features using one-hot encoding, as the corresponding one-hot encoding feature of feature;
Second indicates submodule, for directly feature being indicated using one-hot encoding, as spy in the case where feature is discrete features Levy corresponding one-hot encoding feature.
9. device as claimed in claim 7, which is characterized in that the composite module, comprising:
First chooses submodule, for randomly selecting n feature from one-hot encoding feature, wherein n >=2;
Combine submodule, for using with or XOR logic mathematical algorithm selected feature progress feature is combined to obtain group Feature is closed, the selection submodule is triggered, until being accumulated by the first preset quantity assemblage characteristic.
10. device as claimed in claim 7, which is characterized in that described device further include:
Deduplication module, for carrying out duplicate removal processing to the first assemblage characteristic;
The variation module is specifically used for utilizing genetic algorithm, combines spy with second to the first assemblage characteristic after duplicate removal processing Sign carries out cross and variation, obtains new assemblage characteristic.
11. device as claimed in claim 7, which is characterized in that the degree of correlation are as follows: information value IV or correlation system Number.
12. such as the described in any item devices of claim 7-11, which is characterized in that the variation module, comprising:
Second chooses submodule, for choosing an assemblage characteristic in the first assemblage characteristic and the second assemblage characteristic respectively;
Make a variation submodule, for utilizing genetic algorithm, carries out cross and variation to two selected assemblage characteristics and obtains new group Feature is closed, described second is triggered and chooses submodule, until the first assemblage characteristic of traversal.
13. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-6.
14. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When run on a computer, so that computer realizes any method and step of claim 1-6.
CN201811430613.3A 2018-11-28 2018-11-28 A kind of feature combination method, device and equipment Pending CN109635955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811430613.3A CN109635955A (en) 2018-11-28 2018-11-28 A kind of feature combination method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811430613.3A CN109635955A (en) 2018-11-28 2018-11-28 A kind of feature combination method, device and equipment

Publications (1)

Publication Number Publication Date
CN109635955A true CN109635955A (en) 2019-04-16

Family

ID=66069782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811430613.3A Pending CN109635955A (en) 2018-11-28 2018-11-28 A kind of feature combination method, device and equipment

Country Status (1)

Country Link
CN (1) CN109635955A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717182A (en) * 2019-10-14 2020-01-21 杭州安恒信息技术股份有限公司 Webpage Trojan horse detection method, device and equipment and readable storage medium
US11062792B2 (en) 2017-07-18 2021-07-13 Analytics For Life Inc. Discovering genomes to use in machine learning techniques
US11139048B2 (en) 2017-07-18 2021-10-05 Analytics For Life Inc. Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11062792B2 (en) 2017-07-18 2021-07-13 Analytics For Life Inc. Discovering genomes to use in machine learning techniques
US11139048B2 (en) 2017-07-18 2021-10-05 Analytics For Life Inc. Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions
CN110717182A (en) * 2019-10-14 2020-01-21 杭州安恒信息技术股份有限公司 Webpage Trojan horse detection method, device and equipment and readable storage medium

Similar Documents

Publication Publication Date Title
JP6771751B2 (en) Risk assessment method and system
CN104067282B (en) Counter operation in state machine lattice
CN109635955A (en) A kind of feature combination method, device and equipment
KR102125119B1 (en) Data handling method and device
CN113792423B (en) Digital twin behavior constraint method and system for TPM equipment management
CN111291643B (en) Video multi-label classification method, device, electronic equipment and storage medium
WO2022083093A1 (en) Probability calculation method and apparatus in graph, computer device and storage medium
US10956470B2 (en) Facet-based query refinement based on multiple query interpretations
CN108460056A (en) Method for converting effective graphic elements of DXF file into JSON data
CN111783514A (en) Face analysis method, face analysis device and computer-readable storage medium
WO2017039684A1 (en) Classifier
CN108197203A (en) A kind of shop front head figure selection method, device, server and storage medium
CN105159927B (en) Method and device for selecting subject term of target text and terminal
Yan et al. A clustering algorithm for multi-modal heterogeneous big data with abnormal data
Shi et al. Segmentation quality evaluation based on multi-scale convolutional neural networks
CN110334104A (en) A kind of list update method, device, electronic equipment and storage medium
CN107748801A (en) News recommends method, apparatus, terminal device and computer-readable recording medium
CN110019763A (en) Text filtering method, system, equipment and computer readable storage medium
CN104809302B (en) Resource share method and its system in RTL circuit synthesis
CN110297578A (en) Method and device for processing partial data in mass data in batch and electronic equipment
CN110309257A (en) A kind of file read-write deployment method and device
CN110019383A (en) A kind of association rule mining method, device and computer readable storage medium
CN109783052B (en) Data sorting method, device, server and computer readable storage medium
CN109614542B (en) Public number recommendation method, device, computer equipment and storage medium
CN113536859A (en) Behavior recognition model training method, recognition method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190416