CN112148764A - Feature screening method, device, equipment and storage medium - Google Patents

Feature screening method, device, equipment and storage medium Download PDF

Info

Publication number
CN112148764A
CN112148764A CN201910576711.6A CN201910576711A CN112148764A CN 112148764 A CN112148764 A CN 112148764A CN 201910576711 A CN201910576711 A CN 201910576711A CN 112148764 A CN112148764 A CN 112148764A
Authority
CN
China
Prior art keywords
feature
type
mutual information
samples
stability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910576711.6A
Other languages
Chinese (zh)
Other versions
CN112148764B (en
Inventor
王倩
徐晓飞
杨海华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910576711.6A priority Critical patent/CN112148764B/en
Publication of CN112148764A publication Critical patent/CN112148764A/en
Application granted granted Critical
Publication of CN112148764B publication Critical patent/CN112148764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a feature screening method, a feature screening device and a storage medium, in the scheme, electronic equipment obtains a plurality of samples to be screened, each sample comprises at least one type of feature, mutual information and coverage rate of each type of feature in different time periods are obtained according to a preset time interval, stability indexes of each type of feature are obtained according to the mutual information and coverage rate of each type of feature in each time period, the features in the samples are screened according to the stability indexes of each type of feature, feature selection is carried out by calculating dynamic indexes of stability measurement in different time periods, a modeling effect can be effectively improved, and model accuracy is improved.

Description

Feature screening method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of big data, in particular to a feature screening method, a feature screening device, feature screening equipment and a storage medium.
Background
With the development of big data technology, mass data is continuously enriched, more and more data information is applied to machine learning application, and the data information is required to be used for model construction. Machine learning can be used in a variety of scenarios, such as recommendation systems, search systems, etc. in the internet, and in order to train a more accurate model, it is necessary to identify various indexes of features in the acquired data and to screen the features.
In the prior art, data information includes many features, and mutual information and coverage rate are important indexes for feature selection, so a common method is to screen features by calculating the mutual information and coverage rate of each feature, and then defining a threshold according to the magnitude of a value.
However, data information serving as features often has the characteristic of unstable distribution in time, and features with unstable distribution in time are difficult to screen out, so that the model effect at the feature training position is poor.
Disclosure of Invention
The embodiment of the application provides a feature screening method, a feature screening device, feature screening equipment and a storage medium, and aims to solve the problem that the unstable features of time distribution are difficult to screen out, so that the model effect of a feature training position is not good.
In a first aspect, the present application provides a method for screening features, the method comprising:
obtaining a plurality of samples to be screened, wherein each sample comprises at least one type of characteristic;
acquiring mutual information and coverage rate of each type of feature in different time periods according to a preset time interval;
acquiring the stability index of each type of feature according to the mutual information and the coverage rate of each type of feature in each time period;
and screening the characteristics in the plurality of samples according to the stability index of each type of characteristics.
In a specific embodiment, each sample further includes time information of a feature, and the obtaining mutual information and coverage of each type of feature in different time periods according to a preset time interval includes:
dividing each sample into sub-samples of a plurality of time segments according to the time interval according to the time information of the features in each sample;
mutual information and coverage of each type of feature in each subsample is calculated.
In a specific embodiment, the obtaining the stability index of each type of feature according to the mutual information and coverage rate of each type of feature in each time period includes:
calculating the variance values of mutual information and coverage rates corresponding to a plurality of time periods aiming at the characteristics of each type; the stability indicator includes the variance value.
In one embodiment, the screening the features in the plurality of samples according to the stability indicator of each type of feature includes:
and filtering the characteristics of which the stability indexes are smaller than a preset threshold value in the plurality of samples to obtain at least one type of characteristics of which the stability is higher than the threshold value.
This application second aspect provides a sieving mechanism of characteristic, includes:
the system comprises an acquisition module, a selection module and a selection module, wherein the acquisition module is used for acquiring a plurality of samples to be screened, and each sample comprises at least one type of characteristics;
the processing module is used for acquiring mutual information and coverage rate of each type of feature in different time periods according to a preset time interval;
the processing module is further used for acquiring the stability index of each type of feature according to the mutual information and the coverage rate of each type of feature in each time period;
and the screening module is used for screening the characteristics in the plurality of samples according to the stability index of each type of characteristics.
Optionally, each sample further includes characteristic time information, and the processing module is specifically configured to:
dividing each sample into sub-samples of a plurality of time segments according to the time interval according to the time information of the features in each sample;
mutual information and coverage of each type of feature in each subsample is calculated.
Optionally, the processing module is specifically configured to:
calculating the variance values of mutual information and coverage rates corresponding to a plurality of time periods aiming at the characteristics of each type; the stability indicator includes the variance value.
Optionally, the screening module is specifically configured to:
and filtering the characteristics of which the stability indexes are smaller than a preset threshold value in the plurality of samples to obtain at least one type of characteristics of which the stability is higher than the threshold value.
A third aspect of the present application provides an electronic device comprising: a processor, a memory, and a computer program; the computer program is stored in the memory, and the processor executes the computer program to implement the screening method of the features provided in any one of the first aspect.
A fourth aspect of the present application provides a computer-readable storage medium storing a computer program for implementing the screening method for the features provided in any one of the first aspects.
According to the feature screening method, the feature screening device, the feature screening equipment and the storage medium, electronic equipment obtains a plurality of samples to be screened, each sample comprises at least one type of feature, mutual information and coverage rate of each type of feature in different time periods are obtained according to a preset time interval, stability indexes of each type of feature are obtained according to the mutual information and coverage rate of each type of feature in each time period, the features in the samples are screened according to the stability indexes of each type of feature, feature selection is carried out by calculating dynamic indexes of stability measurement in different time periods, modeling effects can be effectively improved, and model accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a first embodiment of a method for screening features provided herein;
FIG. 2 is a flow chart of a second embodiment of a method for screening features provided herein;
FIG. 3 is a flow chart of an example of a method for screening features provided herein;
FIG. 4 is a schematic structural diagram of a first embodiment of a screening apparatus featuring aspects provided herein;
fig. 5 is a schematic structural diagram of an electronic device entity provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The data information includes many features, and the mutual information and the coverage rate are important indexes for feature selection, so a common method is to screen the features by calculating the mutual information and the coverage rate of each feature, and then defining a threshold according to the magnitude of the value.
However, data information serving as features often has the characteristic of unstable distribution in time, and features with unstable distribution in time are difficult to screen out, so that the model effect at the feature training position is poor.
In view of the above existing problems, the present application provides a feature screening method, which can be applied to various technical fields such as finance, internet, retail, etc., and can screen features with unstable temporal distribution according to the scheme, so that the feature screening method can be more accurate in an application process (e.g., model training) of the screened features.
The screening method for this characteristic will be described in detail below with reference to several embodiments.
Fig. 1 is a flowchart of a first embodiment of a feature screening method provided in the present application, and as shown in fig. 1, an execution main body of the scheme may be an electronic device such as a server, a cloud server, a terminal for performing data processing, and a computer, and the present scheme is not limited thereto, and the feature screening method includes the following steps:
s101: a plurality of samples to be screened are obtained, each sample including at least one type of feature.
In this step, before performing model training or big data analysis processing, a large amount of data needs to be acquired, that is, the above samples need to be prepared, a large amount of samples at first, the samples include at least one type of feature, and only when the feature data is acquired, time information of the feature data needs to be recorded, for example: when collecting the income of the user, the collecting time needs to be recorded, for example: month, day. So that the data can be processed according to the time information when the data analysis processing is carried out subsequently.
S102: and acquiring mutual information and coverage rate of each type of feature in different time periods according to a preset time interval.
In this step, Mutual Information (Mutual Information) is a useful Information measure in Information theory, which can be regarded as the amount of Information contained in a random variable about another random variable, or the uncertainty that one random variable decreases because another random variable is known. Coverage is a measure of test integrity and is a measure of test effectiveness.
In a specific implementation, each sample may be divided into sub-samples of a plurality of time segments according to the time interval according to the time information of the features in each sample; mutual information and coverage of each type of feature in each subsample is calculated.
The meaning of the scheme is that after a large number of samples are obtained, the samples are divided into sub-samples corresponding to a plurality of time periods according to the time information corresponding to each characteristic data and the preset time interval, and then the mutual information among the characteristics and the coverage rate of each type of characteristics are calculated in the sub-samples corresponding to each time period.
S103: and acquiring the stability index of each type of feature according to the mutual information and the coverage rate of each type of feature in each time period.
In this step, after the mutual information and the coverage rate of each type of feature in different sub-samples are obtained, a stability index of each type of feature may be obtained by calculation according to values of the mutual information and the coverage rate of the same type of feature in different samples, where the stability index is used to measure a parameter of whether each feature is stable, and the parameter may be a mean value, a variance, a mean square error, or the like of the obtained mutual information and the coverage rate, and the scheme is not limited.
Preferably, in order to more clearly judge the stability of the feature, the obtained stability index may be normalized to be in the range of 0 to 1.
S104: the features in the plurality of samples are screened for stability indicators for each type of feature.
In this step, after the stability index of each type of feature is obtained, a suitable threshold may be set according to different requirements of a specific application scenario on the stability of different features, and then, for each type of feature, the stability index is compared with the set threshold, if the stability index is higher than the threshold, it is determined that the type of feature is relatively stable to a certain extent, and if the stability index is lower than the threshold, it may be considered that the stability is relatively poor.
According to the method, the characteristics of the obtained multiple samples can be screened, for example: the characteristics with low stability can be obtained, so that the characteristics with good stability can be obtained, the model training can be carried out subsequently, and the model with high accuracy can be obtained.
The feature screening method provided in this embodiment obtains mutual information and coverage of each type of feature in different time periods according to a preset time interval, obtains a stability index of each type of feature according to the mutual information and coverage of each type of feature in each time period, screens features in the plurality of samples according to the stability index of each type of feature, and performs feature selection by calculating a dynamic index of stability measurement in different time periods, so that a modeling effect can be effectively improved, and model accuracy is improved.
Fig. 2 is a flowchart of a second embodiment of the feature screening method provided in the present application, and as shown in fig. 2, on the basis of the foregoing embodiment, the feature screening method provided in the present embodiment specifically includes the following steps:
s101: a plurality of samples to be screened are obtained, each sample including at least one type of feature.
S102: and acquiring mutual information and coverage rate of each type of feature in different time periods according to a preset time interval.
The above two steps are consistent with the foregoing embodiments, and the specific implementation can refer to embodiment one.
In a specific implementation, the acquired sample needs to be divided into sub-samples of multiple time periods according to time intervals, for example, the sub-samples may be divided according to daily, monthly, yearly, and the like, a specific situation may be set according to a situation of the feature itself, generally, an evaluation period of the feature may be used as one time period, and this scheme is not limited.
In the present scheme, it should be understood that mutual information is used to measure the mutual nature between two objects, and in the scheme, the mutual nature between two types of features in the same subsample is measured. In the scheme of feature screening, the method is used for measuring the distinguishing degree of the features to the theme. Mutual information is defined to approximate cross entropy. Mutual information is a concept in information theory, is used for representing the relationship between information, is a measure of statistical correlation of two random variables, and the feature extraction by using the mutual information theory is based on the following assumptions: the frequency of occurrence in a certain category is high, but the frequency of occurrence in other categories is low, and the mutual information between the entries and the category is large. Mutual information is usually used as a measure between feature words and categories, and if the feature words belong to the category, the mutual information quantity of the feature words is the largest, and the method does not need to make any assumption about the nature of the relationship between the feature words and the categories.
S1031: calculating the variance values of mutual information and coverage rates corresponding to a plurality of time periods aiming at the characteristics of each type; the stability indicator comprises the variance value.
In this step, after obtaining the mutual information and the coverage rate of each type of feature in each sub-sample, for a type of feature, a variance value of the mutual information and the coverage rate in each sub-sample may be calculated, which is the most measure of the stability index of the type of feature.
S1041: and filtering the characteristics of which the stability indexes are smaller than a preset threshold value in the plurality of samples to obtain at least one type of characteristics of which the stability is higher than the threshold value.
In this step, after the stability index of each type of feature is obtained, the features may be filtered, generally speaking, the stability index may be compared with a set threshold, and the features with the stability index lower than the threshold are filtered out, so as to retain the features with higher stability.
In particular implementations, a suitable threshold may be selected. For example: the stability index is the standard deviation of the mutual information and the coverage rate, that is, the size of the standard deviation (standard deviation of the mean sequence/mean of the mean sequence) is the stability index of the feature, and the larger the standard deviation is, the more unstable the feature is, otherwise, the more stable the feature is. In the process of feature screening, if the requirement on the stability of the model is high in subsequent model training, a low threshold is selected, if the standard deviation of the features is larger than the threshold, the features of the type are unstable in time, and if the standard deviation of the features is smaller than the threshold, the features of the type are determined to be stable in time. If the goal is to improve the prediction accuracy, different thresholds are tried, or the threshold with the highest accuracy is taken, and different choices can be made according to different application scenarios.
According to the feature screening method provided by the embodiment, time periods are divided for a sample according to a certain time interval, then the coverage rate and the mutual information of the features in each time period are calculated, finally the coverage rate and the mutual information of the features in each time period are integrated, a corresponding variance value is obtained through calculation, then the features are screened according to the threshold value of the variance value, the features which are unstable in time can be screened out, the training model effect is improved, and the problem of overfitting is avoided.
On the basis of the two embodiments, the following describes a screening method for the features provided in the present application through a specific implementation manner.
FIG. 3 is a flow chart of an example of a method for screening features provided herein; as shown in fig. 3, the screening method of the feature specifically includes the following steps:
s201: samples and features are prepared.
In this step, m samples with time information and p-dimensional features are prepared in advance.
S202: mutual information and coverage of features at different time periods are calculated.
And calculating the characteristic mutual information and the coverage rate in different time periods. The method comprises the following steps of dividing a sample into N sections in an equal frequency mode according to time information, and respectively calculating mutual information and coverage rate of features on each time section, wherein the specific formula is as follows:
mutual information calculation formula:
Figure BDA0002112225440000071
coverage calculation formula: u shapen(xi;y)=qi/p
In the above formula, InMutual information of the characteristic x and the characteristic y in the time period n is represented; u shapenTo represent the coverage of the feature x in the time period n.
S203: and calculating the stability index.
In this step, the variance values of the characteristic mutual information and the coverage rate over N time periods are used as indexes of stability measurement and normalized to the range of [0,1 ].
S204: and selecting characteristics according to the stability index.
S205: and finishing the feature selection.
And defining a reasonable threshold according to the characteristic stability index obtained in the step, and selecting and reserving the characteristics larger than the threshold as two groups of characteristics finally screened, namely, the characteristics of which the stability index is lower than the threshold are eliminated.
In the feature screening method provided by the technical scheme of the application, the time limit feature selection is carried out by calculating the dynamic indexes of stability measurement in different time periods, and the filtered mutual information and the features stably distributed in the coverage rate are beneficial to improving the modeling effect.
Fig. 4 is a schematic structural diagram of a first embodiment of the screening apparatus according to the present invention, and as shown in fig. 4, the screening apparatus 10 according to the present invention includes:
an obtaining module 11, configured to obtain multiple samples to be screened, where each sample includes at least one type of feature;
the processing module 12 is configured to obtain mutual information and coverage rate of each type of feature in different time periods according to a preset time interval;
the processing module 12 is further configured to obtain a stability index of each type of feature according to the mutual information and coverage rate of each type of feature in each time period;
and the screening module 13 is configured to screen the features in the plurality of samples according to the stability index of each type of feature.
The feature screening apparatus provided in this embodiment is configured to execute the technical solution of the electronic device in the foregoing method embodiment, where the electronic device obtains a plurality of samples to be screened, each sample includes at least one type of feature, obtains mutual information and coverage of each type of feature in different time periods according to a preset time interval, obtains a stability index of each type of feature according to the mutual information and coverage of each type of feature in each time period, screens features in the plurality of samples according to the stability index of each type of feature, and performs feature selection by calculating a dynamic index of stability measurement in different time periods, so that a modeling effect can be effectively improved, and model accuracy is improved.
On the basis of the above-described embodiments, in a specific implementation,
optionally, each sample further includes characteristic time information, and the processing module 12 is specifically configured to:
dividing each sample into sub-samples of a plurality of time segments according to the time interval according to the time information of the features in each sample;
mutual information and coverage of each type of feature in each subsample is calculated.
Optionally, the processing module 12 is specifically configured to:
calculating the variance values of mutual information and coverage rates corresponding to a plurality of time periods aiming at the characteristics of each type; the stability indicator includes the variance value.
Optionally, the screening module 13 is specifically configured to:
and filtering the characteristics of which the stability indexes are smaller than a preset threshold value in the plurality of samples to obtain at least one type of characteristics of which the stability is higher than the threshold value.
The screening apparatus for characteristics provided in any of the above embodiments is used to implement the technical scheme of the electronic device in the foregoing method embodiments, and the implementation principle and technical effect are similar, and are not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device entity provided in the present application, and the electronic device 20 shown in fig. 5 includes: a processor 21, a memory 22, and a computer program; the computer program is stored in the memory 22, and the processor 21 executes the computer program to implement the technical solution of the method for screening the characteristics of the electronic device in any one of the method embodiments.
Alternatively, the memory 22 may be separate or integrated with the processor 21.
When the memory 22 is a device independent of the processor 21, the electronic apparatus may further include:
a bus 23 for connecting the processor 21 and the memory 22.
The application further provides a computer-readable storage medium, which stores a computer program, where the computer program is used to implement a technical solution of the method for screening characteristics of electronic equipment in any one of the foregoing method embodiments.
In the above-mentioned Specific implementation of the electronic device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (magnetic tape), floppy disk (flexible disk), optical disk (optical disk), and any combination thereof.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method of screening for a feature, the method comprising:
obtaining a plurality of samples to be screened, wherein each sample comprises at least one type of characteristic;
acquiring mutual information and coverage rate of each type of feature in different time periods according to a preset time interval;
acquiring the stability index of each type of feature according to the mutual information and the coverage rate of each type of feature in each time period;
and screening the characteristics in the plurality of samples according to the stability index of each type of characteristics.
2. The method according to claim 1, wherein each sample further includes time information of features, and the obtaining mutual information and coverage rate of each type of feature in different time periods according to the preset time interval comprises:
dividing each sample into sub-samples of a plurality of time segments according to the time interval according to the time information of the features in each sample;
mutual information and coverage of each type of feature in each subsample is calculated.
3. The method according to claim 2, wherein the obtaining the stability index of each type of feature according to the mutual information and coverage rate of each type of feature in each time period comprises:
calculating the variance values of mutual information and coverage rates corresponding to a plurality of time periods aiming at the characteristics of each type; the stability indicator includes the variance value.
4. The method according to any one of claims 1 to 3, wherein the screening of the features in the plurality of samples according to the stability indicator for each type of feature comprises:
and filtering the characteristics of which the stability indexes are smaller than a preset threshold value in the plurality of samples to obtain at least one type of characteristics of which the stability is higher than the threshold value.
5. A screening apparatus for a feature, comprising:
the system comprises an acquisition module, a selection module and a selection module, wherein the acquisition module is used for acquiring a plurality of samples to be screened, and each sample comprises at least one type of characteristics;
the processing module is used for acquiring mutual information and coverage rate of each type of feature in different time periods according to a preset time interval;
the processing module is further used for acquiring the stability index of each type of feature according to the mutual information and the coverage rate of each type of feature in each time period;
and the screening module is used for screening the characteristics in the plurality of samples according to the stability index of each type of characteristics.
6. The apparatus of claim 5, wherein each sample further includes temporal information of the feature, and the processing module is specifically configured to:
dividing each sample into sub-samples of a plurality of time segments according to the time interval according to the time information of the features in each sample;
mutual information and coverage of each type of feature in each subsample is calculated.
7. The apparatus of claim 6, wherein the processing module is specifically configured to:
calculating the variance values of mutual information and coverage rates corresponding to a plurality of time periods aiming at the characteristics of each type; the stability indicator includes the variance value.
8. The apparatus according to any one of claims 5 to 7, wherein the screening module is specifically configured to:
and filtering the characteristics of which the stability indexes are smaller than a preset threshold value in the plurality of samples to obtain at least one type of characteristics of which the stability is higher than the threshold value.
9. An electronic device, comprising: a processor, a memory, and a computer program; stored in said memory, said computer program being executable by said processor to implement a screening method of the features of any one of claims 1 to 4.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for implementing a screening method of the features of any one of claims 1 to 4.
CN201910576711.6A 2019-06-28 2019-06-28 Feature screening method, device, equipment and storage medium Active CN112148764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910576711.6A CN112148764B (en) 2019-06-28 2019-06-28 Feature screening method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910576711.6A CN112148764B (en) 2019-06-28 2019-06-28 Feature screening method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112148764A true CN112148764A (en) 2020-12-29
CN112148764B CN112148764B (en) 2024-05-07

Family

ID=73869457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910576711.6A Active CN112148764B (en) 2019-06-28 2019-06-28 Feature screening method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112148764B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007147166A2 (en) * 2006-06-16 2007-12-21 Quantum Leap Research, Inc. Consilence of data-mining
CN104346379A (en) * 2013-07-31 2015-02-11 克拉玛依红有软件有限责任公司 Method for identifying data elements on basis of logic and statistic technologies
CN105528465A (en) * 2016-02-03 2016-04-27 天弘基金管理有限公司 Credit status assessment method and device
WO2016067072A1 (en) * 2014-10-28 2016-05-06 Super Sonic Imagine Imaging methods and apparatuses for performing shear wave elastography imaging
WO2017101506A1 (en) * 2015-12-14 2017-06-22 乐视控股(北京)有限公司 Information processing method and device
CN106991447A (en) * 2017-04-06 2017-07-28 哈尔滨理工大学 A kind of embedded multi-class attribute tags dynamic feature selection algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007147166A2 (en) * 2006-06-16 2007-12-21 Quantum Leap Research, Inc. Consilence of data-mining
CN104346379A (en) * 2013-07-31 2015-02-11 克拉玛依红有软件有限责任公司 Method for identifying data elements on basis of logic and statistic technologies
WO2016067072A1 (en) * 2014-10-28 2016-05-06 Super Sonic Imagine Imaging methods and apparatuses for performing shear wave elastography imaging
WO2017101506A1 (en) * 2015-12-14 2017-06-22 乐视控股(北京)有限公司 Information processing method and device
CN105528465A (en) * 2016-02-03 2016-04-27 天弘基金管理有限公司 Credit status assessment method and device
CN106991447A (en) * 2017-04-06 2017-07-28 哈尔滨理工大学 A kind of embedded multi-class attribute tags dynamic feature selection algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔清清;宫会丽;丁香乾;刘明;: "基于互信息的遗传算法在光谱谱段选择中应用", 光谱学与光谱分析, no. 01, 15 January 2018 (2018-01-15) *

Also Published As

Publication number Publication date
CN112148764B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
US10324989B2 (en) Microblog-based event context acquiring method and system
CN109873832B (en) Flow identification method and device, electronic equipment and storage medium
CN110049372B (en) Method, device, equipment and storage medium for predicting stable retention rate of anchor
CN107463904A (en) A kind of method and device for determining periods of events value
CN108322827B (en) Method, system, and computer-readable storage medium for measuring video preferences of a user
CN111784160A (en) River hydrological situation change evaluation method and system
CN111680085A (en) Data processing task analysis method and device, electronic equipment and readable storage medium
CN110210774B (en) Landslide risk evaluation method and system
CN112383828A (en) Experience quality prediction method, equipment and system with brain-like characteristic
CN113495913B (en) Air quality data missing value interpolation method and device
CN110490635B (en) Commercial tenant dish transaction prediction and meal preparation method and device
CN114996257A (en) Data amount abnormality detection method, device, medium, and program product
CN109116183B (en) Harmonic model parameter identification method and device, storage medium and electronic equipment
CN107818473A (en) A kind of method and device for judging loyal user
CN110852443B (en) Feature stability detection method, device and computer readable medium
CN112148764A (en) Feature screening method, device, equipment and storage medium
CN110852322A (en) Method and device for determining region of interest
CN110991241A (en) Abnormality recognition method, apparatus, and computer-readable medium
CN112988536B (en) Data anomaly detection method, device, equipment and storage medium
CN112149833A (en) Prediction method, device, equipment and storage medium based on machine learning
CN114650239A (en) Data brushing amount identification method, storage medium and electronic equipment
CN114418304A (en) Method and device for evaluating bad asset pack
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium
CN111222672B (en) Air Quality Index (AQI) prediction method and device
CN111060443A (en) Interference pulse identification method and device, storage medium and cell counting equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant