CN110210559A - Object screening technique and device, storage medium - Google Patents

Object screening technique and device, storage medium Download PDF

Info

Publication number
CN110210559A
CN110210559A CN201910471428.7A CN201910471428A CN110210559A CN 110210559 A CN110210559 A CN 110210559A CN 201910471428 A CN201910471428 A CN 201910471428A CN 110210559 A CN110210559 A CN 110210559A
Authority
CN
China
Prior art keywords
processed
value
inequality extent
average value
extent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910471428.7A
Other languages
Chinese (zh)
Other versions
CN110210559B (en
Inventor
刘毅超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201910471428.7A priority Critical patent/CN110210559B/en
Publication of CN110210559A publication Critical patent/CN110210559A/en
Application granted granted Critical
Publication of CN110210559B publication Critical patent/CN110210559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a kind of object screening technique and device, storage medium.This method comprises: obtaining the first object to be processed characteristic value in each pre-set categories respectively, then, for any first object to be processed, obtain the first average value of the characteristic value in any pre-set categories, to, according to first average value, the inequality extent of first object to be processed is obtained;Wherein, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed, in turn, the second object to be processed that the inequality extent is greater than or equal to predetermined inequality extent is obtained in the described first object to be processed.Disclosed method can carry out Effective selection to the object in more classification problems, avoid over-fitting, improve fitting precision.

Description

Object screening technique and device, storage medium
Technical field
This disclosure relates to computer technology more particularly to a kind of object screening technique and device, storage medium.
Background technique
It, may be excessive due to data volume especially in the biggish more classification problems of feature quantity in more classification problems Over-fitting is caused, and some unnecessary features also result in the accuracy decline of fitting result.Therefore, reasonably special to more classification Sign, which is screened, just becomes particularly important.
Mutual information mode is usually used in the prior art to screen discrete type feature, and is directed to continuous feature, then Directly wherein apparent sparse features are deleted, alternatively, not doing any Screening Treatment to continuous feature.
As a result, in particular for continuous type feature, the mode for directly deleting obvious sparse features will lead to some redundancies spies Sign is retained, these redundancy features are not sparse but do not have distinction, be easy to cause over-fitting and influence fitting precision.
Summary of the invention
The disclosure provides a kind of object screening technique and device, storage medium, to (can to the object in more classification problems Specifically it is characterized) Effective selection is carried out, over-fitting is avoided, fitting precision is improved.
In a first aspect, the disclosure provides a kind of object screening technique, comprising:
Obtain the first object to be processed characteristic value in each pre-set categories respectively;
For any first object to be processed, obtain the characteristic value in any pre-set categories first is average Value;
According to first average value, the inequality extent of first object to be processed is obtained;Wherein, described unbalanced Degree is used to characterize the distributional difference degree of the described first object to be processed;
The of the inequality extent more than or equal to predetermined inequality extent is obtained in the described first object to be processed Two objects to be processed.
It is described according to first average value in a kind of possible design, obtain the unevenness of first object to be processed Weighing apparatus degree, comprising:
For any first object to be processed, institute of first object to be processed in each pre-set categories is obtained State the variance of the first average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
In alternatively possible design, it is described obtained in the described first object to be processed the inequality extent be greater than or Equal to the second object to be processed of predetermined inequality extent, comprising:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
According to sequence vertical after sequence, the described first object acquisition preset quantity to be processed described second to Process object.
In alternatively possible design, the method also includes:
In multiple subclass in initial object set, spy of each object respectively in each subclass is obtained Value indicative;
For any object, the second average value of the characteristic value of the object in each subclass is obtained;
Second average value is greater than to the object of default characteristic threshold value, is determined as the described first object to be processed.
In alternatively possible design, the method also includes:
Obtain maximum eigenvalue of each object in initial object set;
For each object, the product of the maximum eigenvalue Yu initial object set coefficient is obtained, using as the object pair The characteristic threshold value answered.
Second aspect, the disclosure provide a kind of object screening plant, comprising:
First obtains module, for obtaining the first object to be processed characteristic value in each pre-set categories respectively;
Second obtains module, for being directed to any first object to be processed, obtains the institute in any pre-set categories State the first average value of characteristic value;
Third obtains module, for obtaining the unbalanced journey of first object to be processed according to first average value Degree;Wherein, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed;
Screening module, for obtained in the described first object to be processed the inequality extent be greater than or equal to it is predetermined not The object to be processed of the second of balance degree.
In a kind of possible design, the third obtains module, is used for:
For any first object to be processed, institute of first object to be processed in each pre-set categories is obtained State the variance of the first average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
In alternatively possible design, the screening module is used for:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
According to sequence vertical after sequence, the described first object acquisition preset quantity to be processed described second to Process object.
In alternatively possible design, described device further include:
4th obtains module, for obtaining each object respectively every in multiple subclass in initial object set Characteristic value in a subclass;
5th obtains module, for being directed to any object, obtains the feature of the object in each subclass Second average value of value;
Determining module, for second average value to be greater than to the object of default characteristic threshold value, be determined as described first to Process object.
In alternatively possible design, described device further include:
6th obtains module, for obtaining maximum eigenvalue of each object in initial object set;
7th obtains module, for being directed to each object, obtain the maximum eigenvalue and initial object set coefficient it Product, using as the corresponding characteristic threshold value of the object.
The third aspect, the disclosure provide a kind of object screening plant, comprising:
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality Now method as described in relation to the first aspect.
Fourth aspect, the disclosure provide a kind of computer readable storage medium, are stored thereon with computer program,
The computer program is executed by processor to realize method as described in relation to the first aspect.
The object screening technique and device, storage medium that the disclosure provides, for the first object to be processed multiple default Characteristic value in classification calculates separately first average value of the object in classification, and thus gets the first object to be processed Inequality extent, and inequality extent is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent For the redundancy object of fit procedure, and the higher object of inequality extent is more meaningful to fit procedure, therefore, is based on each first The inequality extent of object to be processed carries out object screening, can reduce redundancy object, and inhibit over-fitting to a certain extent Phenomenon, and redundancy object is deleted in the set of the second object to be processed due to obtaining after screening, Object Dimension is reduced, more The complexity of model of fit is advantageously reduced, fitting precision is improved.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow diagram of object screening technique provided by the embodiment of the present disclosure;
Fig. 2 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 3 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 4 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 5 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 6 is a kind of functional block diagram of object screening plant provided by the embodiment of the present disclosure;
Fig. 7 is a kind of entity structure schematic diagram of object screening plant provided by the embodiment of the present disclosure.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The application scenarios of the embodiment of the present disclosure can be with are as follows: Feature Selection process may further be pair in more classification problems As fit procedure.
The object screening mode as used by existing more classification problem fit procedures is lower there are precision and is easy to lead The problem of causing over-fitting, the object screening technique that the disclosure provides, it is intended to the technical problem as above of the prior art is solved, and It is proposed following resolving ideas: according to preset classification, to the characteristic value averaged of each object in each category, and root According to the average value of multiple classifications, the inequality extent of each object is calculated, instructs object to screen with this.
It is carried out in detail to how the technical solution of the technical solution of the disclosure and the application solves above-mentioned technical problem below It describes in detail bright.These embodiments can be combined with each other below, may be in certain implementations for the same or similar concept or process It is repeated no more in example.Below in conjunction with attached drawing, embodiment of the disclosure is described.
Present embodiments provide a kind of object screening technique.Referring to FIG. 1, Fig. 1 is one provided by the embodiment of the present disclosure The flow diagram of kind object screening technique, as shown in Figure 1, this method comprises the following steps:
S102 obtains the first object to be processed characteristic value in each pre-set categories respectively.
The object type of embodiment of the present disclosure object to be processed for first is not particularly limited.Wherein, in mostly classification scene In, each first object to be processed can be embodied as feature, that is, carrying out Feature Selection for more characteristic of division.
S104 obtains the of the characteristic value in any pre-set categories for any first object to be processed One average value.
The division of classification then can be set as needed.For example, the first object to be processed can be divided into according to gender Two classes: human male subject and female subject;Alternatively, can age-based section, the first object to be processed is divided into three classes: old class pair As, middle aged class object and juvenile class object.
S106 obtains the inequality extent of first object to be processed according to first average value;Wherein, described Inequality extent is used to characterize the distributional difference degree of the described first object to be processed.
S108 obtains the inequality extent more than or equal to predetermined inequality extent in the described first object to be processed The second object to be processed.
By processing mode as shown in Figure 1, the inequality extent of each first object to be processed is obtained to realize that object sieves Choosing, and inequality extent is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent is fit procedure Redundancy object, and the higher object of inequality extent is more meaningful to fit procedure, therefore, is based on each first object to be processed Inequality extent carry out object screening, redundancy object can be reduced, and inhibit over-fitting to a certain extent, and due to Redundancy object is deleted in the set of second obtained after screening object to be processed, reduces Object Dimension, is more advantageous to reduction The complexity of model of fit improves fitting precision.
The present embodiment provides another object screening techniques.The embodiment is to the further of step each in above-described embodiment Extension and refinement.
In the present embodiment, the type of the first object to be processed is determined by real data.In order to make it easy to understand, providing as follows Several the case where being likely to occur:
Firstly, each classification can have 0, one or more first objects to be processed in preset multiple classifications.Example Such as, it if the first all objects to be processed is female subject, classifying according to sexes, then the number of female subject is multiple, And the number of human male subject is 0.
Secondly, be directed to any classification, the number for the same first object to be processed for including in the category are as follows: 0,1 or It is multiple.For example, having 1 the first object a to be processed in classification A, do not have the first object a to be processed in classification B, and classification C In have the multiple first object a to be processed, such as 10.In addition, when having multiple a certain first objects to be processed in a classification When, the characteristic value of these the first objects to be processed may be identical, may be different.
Conversely, one the first object to be processed can be in one or more classifications from the angle of the first object to be processed Occur.As before, object a is appeared in classification A and classification C simultaneously.
In addition, the type of embodiment of the present disclosure object to be processed for first is not particularly limited.Due to the embodiment of the present disclosure It is the object screening realized based on characteristic value, therefore, in a realization scene of the present embodiment, the first object to be processed can be with For data type object.In addition, in the object screening process of other non-data type objects, can also according to default rule, The characteristic value of each non-data type object is obtained, then, then executes object screening technique provided by the embodiment of the present disclosure.This Open embodiment is unlimited for the implementation for how obtaining the characteristic value of non-data type object.For example, still with aforementioned For gender object, the characteristic value that can preset female subject is 1, and the characteristic value of human male subject is 0.
Hereinafter, in order to make it easy to understand, assuming that the first whole objects to be processed is related to altogether N number of classification (1~classification of classification N), for some any first object to be processed determined, the number for first object to be processed having in each classification Mesh is unlimited.
In this way, being illustrated by taking the first average value for obtaining the first object a to be processed as an example to the implementation of S104.
If obtaining the of the first object a to be processed in classification 1 altogether comprising x1 the first object a to be processed in classification 1 One average value is are as follows: obtains the sum of the characteristic value of this x1 the first object a to be processed, then divided by x1.Wherein, x1 can be 1 or the integer greater than 1.In addition, first object a to be processed is in classification 1 if not including the first object a to be processed in classification 1 The first average value can be denoted as 0.
Similarly, if obtaining the first object a to be processed in classification 2 comprising x2 the first object a to be processed altogether in classification 2 In the first average value be are as follows: obtain the sum of the characteristic value of this x2 the first object a to be processed, then divided by x2.Wherein, x2 It can be 1 or the integer greater than 1.In addition, first object a to be processed exists if not including the first object a to be processed in classification 2 The first average value in classification 1 can be denoted as 0.
And so on, the first average value of characteristic value of the first object a to be processed in any pre-set categories can be obtained.
And for other first objects to be processed, all in accordance with foregoing manner processing, so that it may obtain any described the One object to be processed obtains the first average value of the characteristic value in any pre-set categories.
Later, S106 can be performed to obtain inequality extent of each object in the first object set to be processed.As before Described, inequality extent is used to measure the distributional difference degree of the described first object to be processed, and in the embodiment of the present disclosure, consider The dispersion degree of stochastic variable or one group of data can be measured to variance, therefore, the embodiment of the present disclosure can be using variance come table Levy the inequality extent of the first object to be processed.
It is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure, such as Fig. 2 referring to Fig. 2, Fig. 2 Shown, step S106 shown in FIG. 1 can further include:
S202, for any first object to be processed, first object to be processed is in each pre-set categories The variance of first average value.
S204, using the numerical value of the variance as the numerical value of the inequality extent.
For example, still by taking the aforementioned first object a to be processed as an example, the inequality extent for obtaining the first object a to be processed can be with It is characterized by following formula:
Wherein, score is used to characterize the inequality extent of the first object a to be processed, and AVG (i) indicates that first is to be processed right As first average value of a in i-th of classification, wherein the value range of i is [1, N], and N is classification sum, AVGCATAGORYIt indicates The average value of first average value of N number of classification, it is, AVGCATAGORYIt can satisfy: AVGCATAGORY=SUM (AVG (1), AVG (2)……AVG(N))/N。
As previously shown, the numerical value to characterize inequality extent can be tentatively obtained through S106, it is therefore, feasible at one Realization during, which can directly translate into numerical value, or referred to as score (score).The implementation, nothing Other processing need to be done to aforementioned processing result, are more economized on resources to a certain extent, and treatment effeciency is improved.
Alternatively, can also be further processed, obtain for the numerical value of aforementioned inequality extent during another realization To other forms of expression of inequality extent.
During a possible realization, according to the numerical value of inequality extent, multiple grades are divided, then with grade Mode characterizes the inequality extent of object.Wherein, the expression-form of grade can be with are as follows: text indicates symbol etc..For example, Two grades: the first estate and the second grade can be divided according to inequality extent, the inequality extent of object is big in the first estate The inequality extent of object in the second grade.
In addition, any first object to be processed is directed to, all in accordance with foregoing manner processing, so that it may obtain each first The inequality extent of object to be processed.
In the present embodiment, the inequality extent of the first object to be processed is higher, it was demonstrated that the reality of first object to be processed Meaning is bigger, therefore, can retain the object to be processed of inequality extent higher first.Conversely, if the first object to be processed is in sample Inequality extent in this combination set is lower, then proves that first object to be processed is more sparse, be more possible to cause over-fitting, Also, if these sparse objects participate in fitting, and the precision that will lead to fitting result reduces, therefore, can be by these unbalanced journeys The lower first object to be processed is spent to delete.
It is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure, such as Fig. 3 referring to Fig. 3, Fig. 3 Shown, step S108 shown in FIG. 1 may comprise steps of:
S302 is ranked up the described first object to be processed according to the sequence of the inequality extent from large to small.
S304, it is described in the described first object acquisition preset quantity to be processed according to sequence vertical after sequence Second object to be processed.
It is, inequality extent lower part first object to be processed is deleted.Wherein, specifying number can shift to an earlier date It is default.For example, if preset in advance need to retain y the first objects to be processed, according to aforementioned sequence, will sort forward y the One object to be processed retains, and as the second object to be processed, and remaining first object to be processed is deleted.
Alternatively, in addition to the default object number for needing to retain, it can also be according to the default first object sum to be processed The mode of retaining ratio realizes screening process.For example, if preset in advance retains in the first object sum to be processed 50% pair As, then according to aforementioned sequence, the first object to be processed of forward 50% that sorts is retained as the second object to be processed, Remaining object is deleted.
It is compared alternatively, the inequality extent can also be more than or equal to the pre- threshold value of predeterminable level, and therefrom Obtain the inequality extent be more than or equal to the pre- threshold value of predeterminable level the first object to be processed, using as described second to Process object.
Alternatively, if the inequality extent of the first object to be processed is characterized according to hierarchical manner, it, can when executing the step The first object to be processed is screened directly as unit of grade, junior part first object to be processed is deleted, Retain the higher ranked object to be processed of part first, using as the second object to be processed.
For example, if dividing two grades: the first estate and the second grade according to inequality extent, in the first estate first to The inequality extent of process object is greater than the inequality extent of the first object to be processed in the second grade, then executes S108 step When, all first objects to be processed that inequality extent is the second grade can be deleted, and retaining inequality extent is first etc. All first objects to be processed of grade, thus the second object to be processed after being screened.
It is similar with aforementioned implementation, can also preset and need the grade that retains, alternatively, can with predetermined level with whether Mapping relations between reservation.For example, if having preset the mapping relations between the first estate and reservation, the second grade and deletion Between mapping relations, then, it is available after the inequality extent that S106 has determined each first object to be processed that it is uneven The grade of weighing apparatus degree then directly carries out object screening according to aforementioned mapping relations when executing S108 later.
The present embodiment provides another object screening techniques.The embodiment is further expanded to above-described embodiment.
In addition, the embodiment of the present disclosure also further provide it is a kind of by obtaining aforementioned the to screening in initial object set The implementation of one object to be processed.Referring to FIG. 4, Fig. 4 is another kind object screening technique provided by the embodiment of the present disclosure Flow diagram, as shown in figure 4, this method can also include the following steps:
S402 in multiple subclass in initial object set, obtains each object respectively in each subclass In characteristic value.
Wherein, initial object set is made of subclass, and subclass is made of multiple objects.And based on different realization fields Scape, object can have the different forms of expression.
For example, object can be user, Mei Geyong in the scene that screening is investigated in the user to a certain application program Family has different characteristic values in different classes of (or dimension), for example, classification may include: gender classification, age categories, duty Industry classification, height classification etc..
In another example object is information in arbitrary information processing scene, since each information has inhomogeneity another characteristic Value, for example, classification may include: information describe classification (such as instruction category information still describe category information), storage location classification, Length classification of information etc..
S404, for any object, obtain the characteristic value of the object in each subclass second is average Value.
Second average value is greater than the object of default characteristic threshold value, is determined as the described first object to be processed by S406.
In order to make it easy to understand, for having K subclass (1~subclass K of subclass) in initial object set altogether, it is right Implementation described in Fig. 2 is illustrated.It should be noted that having in each subclass for some object determined The object not limited to.
The implementation of S402 step is similar with the implementation of S102 step, can obtain as follows (with object b For):
If including altogether z1 object b in subclass 1, second average value of the object b in subclass 1 is obtained i.e. are as follows: obtain The sum of the characteristic value of this z1 object b is taken, then divided by z1.Wherein, z1 can be 1 or the integer greater than 1.In addition, if son Object b is not included in set 1, then object b second average value of characteristic value in subclass 1 can be denoted as 0.
Similarly, if including altogether z2 object b in subclass 2, second average value of the object b in subclass 2 is obtained i.e. Are as follows: the sum of the characteristic value of this z2 object b is obtained, then divided by z2.Wherein, z2 can be 1 or the integer greater than 1.In addition, If not including object b in subclass 2, object b second average value of characteristic value in subclass 1 can be denoted as 0.
And so on, obtain the second average value of characteristic value of the object b in each subclass.
In addition, for other each objects in initial object set, all in accordance with foregoing manner processing, so that it may obtain every Second average value of characteristic value of a object in each subclass.
After obtaining the second average value of each object, the second average value can be compared with characteristic threshold value, thus, retain big In the object of characteristic threshold value, the object for being less than or equal to characteristic threshold value is deleted, in this way, it is to be processed right to obtain first As subsequent to execute the object screening process as described in Fig. 1 and its any implementation for first object to be processed again.
In the embodiment of the present disclosure, characteristic threshold value can preset in advance.
In a realization scene of the present embodiment, characteristic threshold value can be preset as a numerical value.At this point, this can be preset Characteristic threshold value be stored in designated position, when executing S404 step, then directly transfer preset this feature threshold value.
Wherein, when presetting characteristic threshold value with specific value, whole objects can be set as identical characteristic threshold value, alternatively, Individual characteristic threshold value can be set separately for each object, there may be have identical feature threshold when the mode of being separately provided is realized The object of value.
In a realization scene of the present embodiment, characteristic threshold value can be preset as algorithm, at this point, execute S404 step it Before, further include following steps: obtaining the characteristic threshold value of each object according to preset algorithm.
The method that the embodiment of the present disclosure provides a kind of characteristic threshold value that each object is obtained according to preset algorithm as described below, Referring to FIG. 5, Fig. 5 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure, as shown in figure 5, This method further includes following steps:
S502 obtains maximum eigenvalue of each object in initial object set.
S504 obtains the product of the maximum eigenvalue Yu initial object set coefficient for each object, using right as this As the corresponding characteristic threshold value.
In a realization scene of the present embodiment, initial object set coefficient can be a fixed value, for example, can set It is set to 0.0001.
Alternatively, initial object set coefficient can be associated with the total number of object in initial object set, wherein initial The total number of object is bigger in object set, and initial object set coefficient is smaller.
In the embodiment of the present disclosure, by screening to initial object set, obtaining the first object to be processed in turn can Further object screening is carried out using aforementioned screening mode as shown in Figure 1, obtains the second object to be processed, utilizes the as a result, When two objects to be processed carry out subsequent fitting or other data processings, the dimension of number of objects can be further decreased, is improved The fitting precision being fitted after object screens.
Technical solution provided by the embodiment of the present disclosure at least has following technical effect:
In technical solution provided by the present disclosure, for characteristic value of first object to be processed in multiple pre-set categories, divide First average value of the object in classification is not calculated, and thus gets the inequality extent of the first object to be processed, without Balance degree is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent is the redundancy of fit procedure Object, and the higher object of inequality extent is more meaningful to fit procedure, therefore, the unevenness based on each first object to be processed Weighing apparatus degree carry out object screening, redundancy object can be reduced, and inhibit over-fitting to a certain extent, and due to screening after Redundancy object is deleted in the set of second obtained object to be processed, reduces Object Dimension, is more advantageous to reduction fitting mould The complexity of type improves fitting precision.
Object screening technique provided by one based on the above embodiment, the embodiment of the present disclosure, which further provides, realizes above-mentioned side The Installation practice of each step and method in method embodiment.
The embodiment of the present disclosure provides a kind of object screening plant, referring to FIG. 6, Fig. 3 is provided by the embodiment of the present disclosure A kind of object screening plant functional block diagram, as shown in fig. 6, the object screening plant 600, comprising:
First obtains module 61, for obtaining the first object to be processed characteristic value in each pre-set categories respectively;
Second obtains module 62, for being directed to any first object to be processed, obtains in any pre-set categories First average value of the characteristic value;
Third obtains module 63, for obtaining the unbalanced of first object to be processed according to first average value Degree;Wherein, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed;
Screening module 64, for obtaining the inequality extent in the described first object to be processed more than or equal to predetermined The object to be processed of the second of inequality extent.
In a realization scene of the present embodiment, third obtains module 63, is used for:
For any first object to be processed, institute of first object to be processed in each pre-set categories is obtained State the variance of the first average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
In a realization scene of the present embodiment, screening module 64 is used for:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
According to sequence vertical after sequence, the described first object acquisition preset quantity to be processed described second to Process object.
In addition, the object screening plant 600 can also include that (Fig. 6 does not show in a realization scene of the present embodiment Out):
4th obtains module, for obtaining each object respectively every in multiple subclass in initial object set Characteristic value in a subclass;
5th obtains module, for being directed to any object, obtains the feature of the object in each subclass Second average value of value;
Determining module, for second average value to be greater than to the object of default characteristic threshold value, be determined as described first to Process object.
In addition, the object screening plant 600 can also include that (Fig. 6 does not show in a realization scene of the present embodiment Out):
6th obtains module, for obtaining maximum eigenvalue of each object in initial object set;
7th obtains module, for being directed to each object, obtain the maximum eigenvalue and initial object set coefficient it Product, using as the corresponding characteristic threshold value of the object.
In addition, the embodiment of the present disclosure provides a kind of object screening plant, referring to FIG. 7, the object screening plant 700, Include:
Memory 710;
Processor 720;And
Computer program;
Wherein, computer program is stored in memory 710, and is configured as being executed by processor 720 to realize as above State method described in embodiment.
In addition, being used for as shown in fig. 7, be additionally provided with transmitter 730 and receiver 740 in the object screening plant 700 Carry out data transmission with other equipment or communicate, details are not described herein.
In addition, the embodiment of the present disclosure provides a kind of readable storage medium storing program for executing, it is stored thereon with computer program,
The computer program is executed by processor to realize the method as described in embodiment one.
Method shown in embodiment one is able to carry out as each module in this present embodiment, what the present embodiment was not described in detail Part can refer to the related description to embodiment one.
Technical solution provided by the embodiment of the present disclosure at least has following technical effect:
In technical solution provided by the present disclosure, for characteristic value of first object to be processed in multiple pre-set categories, divide First average value of the object in classification is not calculated, and thus gets the inequality extent of the first object to be processed, without Balance degree is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent is the redundancy of fit procedure Object, and the higher object of inequality extent is more meaningful to fit procedure, therefore, the unevenness based on each first object to be processed Weighing apparatus degree carry out object screening, redundancy object can be reduced, and inhibit over-fitting to a certain extent, and due to screening after Redundancy object is deleted in the set of second obtained object to be processed, reduces Object Dimension, is more advantageous to reduction fitting mould The complexity of type improves fitting precision.
Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claims are pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by appended claims System.

Claims (12)

1. a kind of object screening technique characterized by comprising
Obtain the first object to be processed characteristic value in each pre-set categories respectively;
For any first object to be processed, the first average value of the characteristic value in any pre-set categories is obtained;
According to first average value, the inequality extent of first object to be processed is obtained;Wherein, the inequality extent For characterizing the distributional difference degree of the described first object to be processed;
Obtained in the described first object to be processed the inequality extent more than or equal to predetermined inequality extent second to Process object.
2. the method according to claim 1, wherein described according to first average value, acquisition described first The inequality extent of object to be processed, comprising:
For any first object to be processed, described the of first object to be processed in each pre-set categories is obtained The variance of one average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
3. method according to claim 1 or 2, which is characterized in that described to obtain institute in the described first object to be processed State the second object to be processed that inequality extent is greater than or equal to predetermined inequality extent, comprising:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
It is to be processed in the described first object acquisition preset quantity to be processed described second according to sequence vertical after sequence Object.
4. the method according to claim 1, wherein the method also includes:
In multiple subclass in initial object set, feature of each object respectively in each subclass is obtained Value;
For any object, the second average value of the characteristic value of the object in each subclass is obtained;
Second average value is greater than to the object of default characteristic threshold value, is determined as the described first object to be processed.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Obtain maximum eigenvalue of each object in initial object set;
For each object, the product of the maximum eigenvalue Yu initial object set coefficient is obtained, using corresponding as the object The characteristic threshold value.
6. a kind of object screening plant characterized by comprising
First obtains module, for obtaining the first object to be processed characteristic value in each pre-set categories respectively;
Second obtains module, for being directed to any first object to be processed, obtains the spy in any pre-set categories First average value of value indicative;
Third obtains module, for obtaining the inequality extent of first object to be processed according to first average value;Its In, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed;
Screening module, it is unbalanced more than or equal to making a reservation for for obtaining the inequality extent in the described first object to be processed The object to be processed of the second of degree.
7. device according to claim 6, which is characterized in that the third obtains module, is used for:
For any first object to be processed, described the of first object to be processed in each pre-set categories is obtained The variance of one average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
8. device according to claim 6 or 7, which is characterized in that the screening module is used for:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
It is to be processed in the described first object acquisition preset quantity to be processed described second according to sequence vertical after sequence Object.
9. device according to claim 6, which is characterized in that described device further include:
4th obtains module, in multiple subclass in initial object set, obtaining each object respectively in each institute State the characteristic value in subclass;
5th obtains module, for being directed to any object, obtains the characteristic value of the object in each subclass Second average value;
It is to be processed to be determined as described first for second average value to be greater than to the object of default characteristic threshold value for determining module Object.
10. device according to claim 9, which is characterized in that described device further include:
6th obtains module, for obtaining maximum eigenvalue of each object in initial object set;
7th obtains module, for obtaining the product of the maximum eigenvalue Yu initial object set coefficient for each object, with As the corresponding characteristic threshold value of the object.
11. a kind of object screening plant characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as Method described in any one of claim 1 to 5.
12. a kind of computer readable storage medium, which is characterized in that it is stored thereon with computer program,
The computer program is executed by processor to realize such as method described in any one of claim 1 to 5.
CN201910471428.7A 2019-05-31 2019-05-31 Object screening method and device and storage medium Active CN110210559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910471428.7A CN110210559B (en) 2019-05-31 2019-05-31 Object screening method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910471428.7A CN110210559B (en) 2019-05-31 2019-05-31 Object screening method and device and storage medium

Publications (2)

Publication Number Publication Date
CN110210559A true CN110210559A (en) 2019-09-06
CN110210559B CN110210559B (en) 2021-10-08

Family

ID=67790194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910471428.7A Active CN110210559B (en) 2019-05-31 2019-05-31 Object screening method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110210559B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840516A (en) * 2010-04-27 2010-09-22 上海交通大学 Feature selection method based on sparse fraction
CN103106275A (en) * 2013-02-08 2013-05-15 西北工业大学 Text classification character screening method based on character distribution information
CN105117617A (en) * 2015-08-26 2015-12-02 大连海事大学 Method for screening environmentally sensitive biomolecules
CN105740388A (en) * 2016-01-27 2016-07-06 上海晶赞科技发展有限公司 Distributed drift data set-based feature selection method
CN105938523A (en) * 2016-03-31 2016-09-14 陕西师范大学 Feature selection method and application based on feature identification degree and independence
CN106874286A (en) * 2015-12-11 2017-06-20 阿里巴巴集团控股有限公司 A kind of method and device for screening user characteristics
CN107468260A (en) * 2017-10-12 2017-12-15 公安部南昌警犬基地 A kind of brain electricity analytical device and analysis method for judging ANIMAL PSYCHE state
CN107518894A (en) * 2017-10-12 2017-12-29 公安部南昌警犬基地 A kind of construction method and device of animal brain electricity disaggregated model
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN107714038A (en) * 2017-10-12 2018-02-23 北京翼石科技有限公司 The feature extracting method and device of a kind of EEG signals
CN107844865A (en) * 2017-11-20 2018-03-27 天津科技大学 Feature based parameter chooses the stock index prediction method with LSTM models
CN107845407A (en) * 2017-08-24 2018-03-27 大连大学 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined
CN107945053A (en) * 2017-12-29 2018-04-20 广州思泰信息技术有限公司 A kind of multiple source power distribution network data convergence analysis platform and its control method
CN108240978A (en) * 2016-12-26 2018-07-03 同方威视技术股份有限公司 Self-learning type method for qualitative analysis based on Raman spectrum
CN108427966A (en) * 2018-03-12 2018-08-21 成都信息工程大学 A kind of magic magiscan and method based on PCA-LDA
CN108509996A (en) * 2018-04-03 2018-09-07 电子科技大学 Feature selection approach based on Filter and Wrapper selection algorithms
CN109192310A (en) * 2018-07-25 2019-01-11 同济大学 A kind of undergraduate psychological behavior unusual fluctuation scheme Design method based on big data
CN109523118A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Risk data screening technique, device, computer equipment and storage medium
CN109636035A (en) * 2018-12-12 2019-04-16 北京天诚同创电气有限公司 Load forecasting model creation method and device, Methods of electric load forecasting and device

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840516A (en) * 2010-04-27 2010-09-22 上海交通大学 Feature selection method based on sparse fraction
CN103106275A (en) * 2013-02-08 2013-05-15 西北工业大学 Text classification character screening method based on character distribution information
CN105117617A (en) * 2015-08-26 2015-12-02 大连海事大学 Method for screening environmentally sensitive biomolecules
CN106874286A (en) * 2015-12-11 2017-06-20 阿里巴巴集团控股有限公司 A kind of method and device for screening user characteristics
CN105740388A (en) * 2016-01-27 2016-07-06 上海晶赞科技发展有限公司 Distributed drift data set-based feature selection method
CN105938523A (en) * 2016-03-31 2016-09-14 陕西师范大学 Feature selection method and application based on feature identification degree and independence
CN108240978A (en) * 2016-12-26 2018-07-03 同方威视技术股份有限公司 Self-learning type method for qualitative analysis based on Raman spectrum
CN107845407A (en) * 2017-08-24 2018-03-27 大连大学 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined
CN107468260A (en) * 2017-10-12 2017-12-15 公安部南昌警犬基地 A kind of brain electricity analytical device and analysis method for judging ANIMAL PSYCHE state
CN107518894A (en) * 2017-10-12 2017-12-29 公安部南昌警犬基地 A kind of construction method and device of animal brain electricity disaggregated model
CN107714038A (en) * 2017-10-12 2018-02-23 北京翼石科技有限公司 The feature extracting method and device of a kind of EEG signals
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN107844865A (en) * 2017-11-20 2018-03-27 天津科技大学 Feature based parameter chooses the stock index prediction method with LSTM models
CN107945053A (en) * 2017-12-29 2018-04-20 广州思泰信息技术有限公司 A kind of multiple source power distribution network data convergence analysis platform and its control method
CN108427966A (en) * 2018-03-12 2018-08-21 成都信息工程大学 A kind of magic magiscan and method based on PCA-LDA
CN108509996A (en) * 2018-04-03 2018-09-07 电子科技大学 Feature selection approach based on Filter and Wrapper selection algorithms
CN109192310A (en) * 2018-07-25 2019-01-11 同济大学 A kind of undergraduate psychological behavior unusual fluctuation scheme Design method based on big data
CN109523118A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Risk data screening technique, device, computer equipment and storage medium
CN109636035A (en) * 2018-12-12 2019-04-16 北京天诚同创电气有限公司 Load forecasting model creation method and device, Methods of electric load forecasting and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
FRESH_SUGER: "数据预处理与特征选择", 《CSDN:HTTPS://BLOG.CSDN.NET/GANZHANTOULEBI0546/ARTICLE/DETAILS/72921236》 *
JOEY_YK: "特征选择方法总结", 《CSDN:HTTPS://BLOG.CSDN.NET/JOEY_YK/ARTICLE/DETAILS/82736145》 *
RINNYLU: "机器学习--特征选择(Python代码实现)", 《CSDN:HTTPS://BLOG.CSDN.NET/GITHUB_38980969/ARTICLE/DETAILS/82252412》 *
冀俊忠等: "基于类别加权和方差统计的特征选择方法", 《北京工业大学学报》 *
奋斗的小炎: "机器学习中特征选择的方法综述", 《CSDN:HTTPS://BLOG.CSDN.NET/LITTLE_FIRE/ARTICLE/DETAILS/80500354》 *
打牛地: "机器学习 特征选择(过滤法 封装法 嵌入法)", 《CSDN:HTTPS://BLOG.CSDN.NET/WEIXIN_43172660/ARTICLE/DETAILS/84340164》 *

Also Published As

Publication number Publication date
CN110210559B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
KR101030653B1 (en) User-based collaborative filtering recommender system amending similarity using information entropy
CN108090208A (en) Fused data processing method and processing device
CN112363813A (en) Resource scheduling method and device, electronic equipment and computer readable medium
CN105787055A (en) Information recommendation method and device
Zeng et al. A novel induced aggregation method for intuitionistic fuzzy set and its application in multiple attribute group decision making
CN112035753B (en) Recommendation page generation method and device, electronic equipment and computer readable medium
WO2020135144A1 (en) Method and apparatus for predicting object preference, and computer-readable medium
CN112214616B (en) Knowledge graph fluency display method and device
CN114330670A (en) Graph neural network training method, device, equipment and storage medium
CN111158828A (en) User interface determining method and device of application program APP and storage medium
CN116957874B (en) Intelligent automatic course arrangement method, system and equipment for universities and storage medium
JP2016029526A (en) Information processing apparatus and program
CN111291217B (en) Content recommendation method, device, electronic equipment and computer readable medium
CN111105297A (en) Information pushing method and related device
US20180150754A1 (en) Data analysis method, system and non-transitory computer readable medium
CN111160491B (en) Pooling method and pooling model in convolutional neural network
CN113204642A (en) Text clustering method and device, storage medium and electronic equipment
CN110210559A (en) Object screening technique and device, storage medium
CN109584047B (en) Credit granting method, system, computer equipment and medium
CN112258285A (en) Content recommendation method and device, equipment and storage medium
CN110309361A (en) A kind of determination method, recommended method, device and the electronic equipment of video scoring
CN110688508A (en) Image-text data expansion method and device and electronic equipment
Pissanetzky et al. Efficient calculation of numerical values of a polyhedral function
CN112464073B (en) Method for automatically generating detailed page and newly added form page according to query page design result
CN111563177B (en) Theme wallpaper recommendation method and system based on cosine algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant