CN110210559A - Object screening technique and device, storage medium - Google Patents
Object screening technique and device, storage medium Download PDFInfo
- Publication number
- CN110210559A CN110210559A CN201910471428.7A CN201910471428A CN110210559A CN 110210559 A CN110210559 A CN 110210559A CN 201910471428 A CN201910471428 A CN 201910471428A CN 110210559 A CN110210559 A CN 110210559A
- Authority
- CN
- China
- Prior art keywords
- processed
- value
- inequality extent
- average value
- extent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure provides a kind of object screening technique and device, storage medium.This method comprises: obtaining the first object to be processed characteristic value in each pre-set categories respectively, then, for any first object to be processed, obtain the first average value of the characteristic value in any pre-set categories, to, according to first average value, the inequality extent of first object to be processed is obtained;Wherein, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed, in turn, the second object to be processed that the inequality extent is greater than or equal to predetermined inequality extent is obtained in the described first object to be processed.Disclosed method can carry out Effective selection to the object in more classification problems, avoid over-fitting, improve fitting precision.
Description
Technical field
This disclosure relates to computer technology more particularly to a kind of object screening technique and device, storage medium.
Background technique
It, may be excessive due to data volume especially in the biggish more classification problems of feature quantity in more classification problems
Over-fitting is caused, and some unnecessary features also result in the accuracy decline of fitting result.Therefore, reasonably special to more classification
Sign, which is screened, just becomes particularly important.
Mutual information mode is usually used in the prior art to screen discrete type feature, and is directed to continuous feature, then
Directly wherein apparent sparse features are deleted, alternatively, not doing any Screening Treatment to continuous feature.
As a result, in particular for continuous type feature, the mode for directly deleting obvious sparse features will lead to some redundancies spies
Sign is retained, these redundancy features are not sparse but do not have distinction, be easy to cause over-fitting and influence fitting precision.
Summary of the invention
The disclosure provides a kind of object screening technique and device, storage medium, to (can to the object in more classification problems
Specifically it is characterized) Effective selection is carried out, over-fitting is avoided, fitting precision is improved.
In a first aspect, the disclosure provides a kind of object screening technique, comprising:
Obtain the first object to be processed characteristic value in each pre-set categories respectively;
For any first object to be processed, obtain the characteristic value in any pre-set categories first is average
Value;
According to first average value, the inequality extent of first object to be processed is obtained;Wherein, described unbalanced
Degree is used to characterize the distributional difference degree of the described first object to be processed;
The of the inequality extent more than or equal to predetermined inequality extent is obtained in the described first object to be processed
Two objects to be processed.
It is described according to first average value in a kind of possible design, obtain the unevenness of first object to be processed
Weighing apparatus degree, comprising:
For any first object to be processed, institute of first object to be processed in each pre-set categories is obtained
State the variance of the first average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
In alternatively possible design, it is described obtained in the described first object to be processed the inequality extent be greater than or
Equal to the second object to be processed of predetermined inequality extent, comprising:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
According to sequence vertical after sequence, the described first object acquisition preset quantity to be processed described second to
Process object.
In alternatively possible design, the method also includes:
In multiple subclass in initial object set, spy of each object respectively in each subclass is obtained
Value indicative;
For any object, the second average value of the characteristic value of the object in each subclass is obtained;
Second average value is greater than to the object of default characteristic threshold value, is determined as the described first object to be processed.
In alternatively possible design, the method also includes:
Obtain maximum eigenvalue of each object in initial object set;
For each object, the product of the maximum eigenvalue Yu initial object set coefficient is obtained, using as the object pair
The characteristic threshold value answered.
Second aspect, the disclosure provide a kind of object screening plant, comprising:
First obtains module, for obtaining the first object to be processed characteristic value in each pre-set categories respectively;
Second obtains module, for being directed to any first object to be processed, obtains the institute in any pre-set categories
State the first average value of characteristic value;
Third obtains module, for obtaining the unbalanced journey of first object to be processed according to first average value
Degree;Wherein, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed;
Screening module, for obtained in the described first object to be processed the inequality extent be greater than or equal to it is predetermined not
The object to be processed of the second of balance degree.
In a kind of possible design, the third obtains module, is used for:
For any first object to be processed, institute of first object to be processed in each pre-set categories is obtained
State the variance of the first average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
In alternatively possible design, the screening module is used for:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
According to sequence vertical after sequence, the described first object acquisition preset quantity to be processed described second to
Process object.
In alternatively possible design, described device further include:
4th obtains module, for obtaining each object respectively every in multiple subclass in initial object set
Characteristic value in a subclass;
5th obtains module, for being directed to any object, obtains the feature of the object in each subclass
Second average value of value;
Determining module, for second average value to be greater than to the object of default characteristic threshold value, be determined as described first to
Process object.
In alternatively possible design, described device further include:
6th obtains module, for obtaining maximum eigenvalue of each object in initial object set;
7th obtains module, for being directed to each object, obtain the maximum eigenvalue and initial object set coefficient it
Product, using as the corresponding characteristic threshold value of the object.
The third aspect, the disclosure provide a kind of object screening plant, comprising:
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality
Now method as described in relation to the first aspect.
Fourth aspect, the disclosure provide a kind of computer readable storage medium, are stored thereon with computer program,
The computer program is executed by processor to realize method as described in relation to the first aspect.
The object screening technique and device, storage medium that the disclosure provides, for the first object to be processed multiple default
Characteristic value in classification calculates separately first average value of the object in classification, and thus gets the first object to be processed
Inequality extent, and inequality extent is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent
For the redundancy object of fit procedure, and the higher object of inequality extent is more meaningful to fit procedure, therefore, is based on each first
The inequality extent of object to be processed carries out object screening, can reduce redundancy object, and inhibit over-fitting to a certain extent
Phenomenon, and redundancy object is deleted in the set of the second object to be processed due to obtaining after screening, Object Dimension is reduced, more
The complexity of model of fit is advantageously reduced, fitting precision is improved.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow diagram of object screening technique provided by the embodiment of the present disclosure;
Fig. 2 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 3 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 4 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 5 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure;
Fig. 6 is a kind of functional block diagram of object screening plant provided by the embodiment of the present disclosure;
Fig. 7 is a kind of entity structure schematic diagram of object screening plant provided by the embodiment of the present disclosure.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings
It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments
Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The application scenarios of the embodiment of the present disclosure can be with are as follows: Feature Selection process may further be pair in more classification problems
As fit procedure.
The object screening mode as used by existing more classification problem fit procedures is lower there are precision and is easy to lead
The problem of causing over-fitting, the object screening technique that the disclosure provides, it is intended to the technical problem as above of the prior art is solved, and
It is proposed following resolving ideas: according to preset classification, to the characteristic value averaged of each object in each category, and root
According to the average value of multiple classifications, the inequality extent of each object is calculated, instructs object to screen with this.
It is carried out in detail to how the technical solution of the technical solution of the disclosure and the application solves above-mentioned technical problem below
It describes in detail bright.These embodiments can be combined with each other below, may be in certain implementations for the same or similar concept or process
It is repeated no more in example.Below in conjunction with attached drawing, embodiment of the disclosure is described.
Present embodiments provide a kind of object screening technique.Referring to FIG. 1, Fig. 1 is one provided by the embodiment of the present disclosure
The flow diagram of kind object screening technique, as shown in Figure 1, this method comprises the following steps:
S102 obtains the first object to be processed characteristic value in each pre-set categories respectively.
The object type of embodiment of the present disclosure object to be processed for first is not particularly limited.Wherein, in mostly classification scene
In, each first object to be processed can be embodied as feature, that is, carrying out Feature Selection for more characteristic of division.
S104 obtains the of the characteristic value in any pre-set categories for any first object to be processed
One average value.
The division of classification then can be set as needed.For example, the first object to be processed can be divided into according to gender
Two classes: human male subject and female subject;Alternatively, can age-based section, the first object to be processed is divided into three classes: old class pair
As, middle aged class object and juvenile class object.
S106 obtains the inequality extent of first object to be processed according to first average value;Wherein, described
Inequality extent is used to characterize the distributional difference degree of the described first object to be processed.
S108 obtains the inequality extent more than or equal to predetermined inequality extent in the described first object to be processed
The second object to be processed.
By processing mode as shown in Figure 1, the inequality extent of each first object to be processed is obtained to realize that object sieves
Choosing, and inequality extent is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent is fit procedure
Redundancy object, and the higher object of inequality extent is more meaningful to fit procedure, therefore, is based on each first object to be processed
Inequality extent carry out object screening, redundancy object can be reduced, and inhibit over-fitting to a certain extent, and due to
Redundancy object is deleted in the set of second obtained after screening object to be processed, reduces Object Dimension, is more advantageous to reduction
The complexity of model of fit improves fitting precision.
The present embodiment provides another object screening techniques.The embodiment is to the further of step each in above-described embodiment
Extension and refinement.
In the present embodiment, the type of the first object to be processed is determined by real data.In order to make it easy to understand, providing as follows
Several the case where being likely to occur:
Firstly, each classification can have 0, one or more first objects to be processed in preset multiple classifications.Example
Such as, it if the first all objects to be processed is female subject, classifying according to sexes, then the number of female subject is multiple,
And the number of human male subject is 0.
Secondly, be directed to any classification, the number for the same first object to be processed for including in the category are as follows: 0,1 or
It is multiple.For example, having 1 the first object a to be processed in classification A, do not have the first object a to be processed in classification B, and classification C
In have the multiple first object a to be processed, such as 10.In addition, when having multiple a certain first objects to be processed in a classification
When, the characteristic value of these the first objects to be processed may be identical, may be different.
Conversely, one the first object to be processed can be in one or more classifications from the angle of the first object to be processed
Occur.As before, object a is appeared in classification A and classification C simultaneously.
In addition, the type of embodiment of the present disclosure object to be processed for first is not particularly limited.Due to the embodiment of the present disclosure
It is the object screening realized based on characteristic value, therefore, in a realization scene of the present embodiment, the first object to be processed can be with
For data type object.In addition, in the object screening process of other non-data type objects, can also according to default rule,
The characteristic value of each non-data type object is obtained, then, then executes object screening technique provided by the embodiment of the present disclosure.This
Open embodiment is unlimited for the implementation for how obtaining the characteristic value of non-data type object.For example, still with aforementioned
For gender object, the characteristic value that can preset female subject is 1, and the characteristic value of human male subject is 0.
Hereinafter, in order to make it easy to understand, assuming that the first whole objects to be processed is related to altogether N number of classification (1~classification of classification
N), for some any first object to be processed determined, the number for first object to be processed having in each classification
Mesh is unlimited.
In this way, being illustrated by taking the first average value for obtaining the first object a to be processed as an example to the implementation of S104.
If obtaining the of the first object a to be processed in classification 1 altogether comprising x1 the first object a to be processed in classification 1
One average value is are as follows: obtains the sum of the characteristic value of this x1 the first object a to be processed, then divided by x1.Wherein, x1 can be
1 or the integer greater than 1.In addition, first object a to be processed is in classification 1 if not including the first object a to be processed in classification 1
The first average value can be denoted as 0.
Similarly, if obtaining the first object a to be processed in classification 2 comprising x2 the first object a to be processed altogether in classification 2
In the first average value be are as follows: obtain the sum of the characteristic value of this x2 the first object a to be processed, then divided by x2.Wherein, x2
It can be 1 or the integer greater than 1.In addition, first object a to be processed exists if not including the first object a to be processed in classification 2
The first average value in classification 1 can be denoted as 0.
And so on, the first average value of characteristic value of the first object a to be processed in any pre-set categories can be obtained.
And for other first objects to be processed, all in accordance with foregoing manner processing, so that it may obtain any described the
One object to be processed obtains the first average value of the characteristic value in any pre-set categories.
Later, S106 can be performed to obtain inequality extent of each object in the first object set to be processed.As before
Described, inequality extent is used to measure the distributional difference degree of the described first object to be processed, and in the embodiment of the present disclosure, consider
The dispersion degree of stochastic variable or one group of data can be measured to variance, therefore, the embodiment of the present disclosure can be using variance come table
Levy the inequality extent of the first object to be processed.
It is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure, such as Fig. 2 referring to Fig. 2, Fig. 2
Shown, step S106 shown in FIG. 1 can further include:
S202, for any first object to be processed, first object to be processed is in each pre-set categories
The variance of first average value.
S204, using the numerical value of the variance as the numerical value of the inequality extent.
For example, still by taking the aforementioned first object a to be processed as an example, the inequality extent for obtaining the first object a to be processed can be with
It is characterized by following formula:
Wherein, score is used to characterize the inequality extent of the first object a to be processed, and AVG (i) indicates that first is to be processed right
As first average value of a in i-th of classification, wherein the value range of i is [1, N], and N is classification sum, AVGCATAGORYIt indicates
The average value of first average value of N number of classification, it is, AVGCATAGORYIt can satisfy: AVGCATAGORY=SUM (AVG (1), AVG
(2)……AVG(N))/N。
As previously shown, the numerical value to characterize inequality extent can be tentatively obtained through S106, it is therefore, feasible at one
Realization during, which can directly translate into numerical value, or referred to as score (score).The implementation, nothing
Other processing need to be done to aforementioned processing result, are more economized on resources to a certain extent, and treatment effeciency is improved.
Alternatively, can also be further processed, obtain for the numerical value of aforementioned inequality extent during another realization
To other forms of expression of inequality extent.
During a possible realization, according to the numerical value of inequality extent, multiple grades are divided, then with grade
Mode characterizes the inequality extent of object.Wherein, the expression-form of grade can be with are as follows: text indicates symbol etc..For example,
Two grades: the first estate and the second grade can be divided according to inequality extent, the inequality extent of object is big in the first estate
The inequality extent of object in the second grade.
In addition, any first object to be processed is directed to, all in accordance with foregoing manner processing, so that it may obtain each first
The inequality extent of object to be processed.
In the present embodiment, the inequality extent of the first object to be processed is higher, it was demonstrated that the reality of first object to be processed
Meaning is bigger, therefore, can retain the object to be processed of inequality extent higher first.Conversely, if the first object to be processed is in sample
Inequality extent in this combination set is lower, then proves that first object to be processed is more sparse, be more possible to cause over-fitting,
Also, if these sparse objects participate in fitting, and the precision that will lead to fitting result reduces, therefore, can be by these unbalanced journeys
The lower first object to be processed is spent to delete.
It is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure, such as Fig. 3 referring to Fig. 3, Fig. 3
Shown, step S108 shown in FIG. 1 may comprise steps of:
S302 is ranked up the described first object to be processed according to the sequence of the inequality extent from large to small.
S304, it is described in the described first object acquisition preset quantity to be processed according to sequence vertical after sequence
Second object to be processed.
It is, inequality extent lower part first object to be processed is deleted.Wherein, specifying number can shift to an earlier date
It is default.For example, if preset in advance need to retain y the first objects to be processed, according to aforementioned sequence, will sort forward y the
One object to be processed retains, and as the second object to be processed, and remaining first object to be processed is deleted.
Alternatively, in addition to the default object number for needing to retain, it can also be according to the default first object sum to be processed
The mode of retaining ratio realizes screening process.For example, if preset in advance retains in the first object sum to be processed 50% pair
As, then according to aforementioned sequence, the first object to be processed of forward 50% that sorts is retained as the second object to be processed,
Remaining object is deleted.
It is compared alternatively, the inequality extent can also be more than or equal to the pre- threshold value of predeterminable level, and therefrom
Obtain the inequality extent be more than or equal to the pre- threshold value of predeterminable level the first object to be processed, using as described second to
Process object.
Alternatively, if the inequality extent of the first object to be processed is characterized according to hierarchical manner, it, can when executing the step
The first object to be processed is screened directly as unit of grade, junior part first object to be processed is deleted,
Retain the higher ranked object to be processed of part first, using as the second object to be processed.
For example, if dividing two grades: the first estate and the second grade according to inequality extent, in the first estate first to
The inequality extent of process object is greater than the inequality extent of the first object to be processed in the second grade, then executes S108 step
When, all first objects to be processed that inequality extent is the second grade can be deleted, and retaining inequality extent is first etc.
All first objects to be processed of grade, thus the second object to be processed after being screened.
It is similar with aforementioned implementation, can also preset and need the grade that retains, alternatively, can with predetermined level with whether
Mapping relations between reservation.For example, if having preset the mapping relations between the first estate and reservation, the second grade and deletion
Between mapping relations, then, it is available after the inequality extent that S106 has determined each first object to be processed that it is uneven
The grade of weighing apparatus degree then directly carries out object screening according to aforementioned mapping relations when executing S108 later.
The present embodiment provides another object screening techniques.The embodiment is further expanded to above-described embodiment.
In addition, the embodiment of the present disclosure also further provide it is a kind of by obtaining aforementioned the to screening in initial object set
The implementation of one object to be processed.Referring to FIG. 4, Fig. 4 is another kind object screening technique provided by the embodiment of the present disclosure
Flow diagram, as shown in figure 4, this method can also include the following steps:
S402 in multiple subclass in initial object set, obtains each object respectively in each subclass
In characteristic value.
Wherein, initial object set is made of subclass, and subclass is made of multiple objects.And based on different realization fields
Scape, object can have the different forms of expression.
For example, object can be user, Mei Geyong in the scene that screening is investigated in the user to a certain application program
Family has different characteristic values in different classes of (or dimension), for example, classification may include: gender classification, age categories, duty
Industry classification, height classification etc..
In another example object is information in arbitrary information processing scene, since each information has inhomogeneity another characteristic
Value, for example, classification may include: information describe classification (such as instruction category information still describe category information), storage location classification,
Length classification of information etc..
S404, for any object, obtain the characteristic value of the object in each subclass second is average
Value.
Second average value is greater than the object of default characteristic threshold value, is determined as the described first object to be processed by S406.
In order to make it easy to understand, for having K subclass (1~subclass K of subclass) in initial object set altogether, it is right
Implementation described in Fig. 2 is illustrated.It should be noted that having in each subclass for some object determined
The object not limited to.
The implementation of S402 step is similar with the implementation of S102 step, can obtain as follows (with object b
For):
If including altogether z1 object b in subclass 1, second average value of the object b in subclass 1 is obtained i.e. are as follows: obtain
The sum of the characteristic value of this z1 object b is taken, then divided by z1.Wherein, z1 can be 1 or the integer greater than 1.In addition, if son
Object b is not included in set 1, then object b second average value of characteristic value in subclass 1 can be denoted as 0.
Similarly, if including altogether z2 object b in subclass 2, second average value of the object b in subclass 2 is obtained i.e.
Are as follows: the sum of the characteristic value of this z2 object b is obtained, then divided by z2.Wherein, z2 can be 1 or the integer greater than 1.In addition,
If not including object b in subclass 2, object b second average value of characteristic value in subclass 1 can be denoted as 0.
And so on, obtain the second average value of characteristic value of the object b in each subclass.
In addition, for other each objects in initial object set, all in accordance with foregoing manner processing, so that it may obtain every
Second average value of characteristic value of a object in each subclass.
After obtaining the second average value of each object, the second average value can be compared with characteristic threshold value, thus, retain big
In the object of characteristic threshold value, the object for being less than or equal to characteristic threshold value is deleted, in this way, it is to be processed right to obtain first
As subsequent to execute the object screening process as described in Fig. 1 and its any implementation for first object to be processed again.
In the embodiment of the present disclosure, characteristic threshold value can preset in advance.
In a realization scene of the present embodiment, characteristic threshold value can be preset as a numerical value.At this point, this can be preset
Characteristic threshold value be stored in designated position, when executing S404 step, then directly transfer preset this feature threshold value.
Wherein, when presetting characteristic threshold value with specific value, whole objects can be set as identical characteristic threshold value, alternatively,
Individual characteristic threshold value can be set separately for each object, there may be have identical feature threshold when the mode of being separately provided is realized
The object of value.
In a realization scene of the present embodiment, characteristic threshold value can be preset as algorithm, at this point, execute S404 step it
Before, further include following steps: obtaining the characteristic threshold value of each object according to preset algorithm.
The method that the embodiment of the present disclosure provides a kind of characteristic threshold value that each object is obtained according to preset algorithm as described below,
Referring to FIG. 5, Fig. 5 is the flow diagram of another kind object screening technique provided by the embodiment of the present disclosure, as shown in figure 5,
This method further includes following steps:
S502 obtains maximum eigenvalue of each object in initial object set.
S504 obtains the product of the maximum eigenvalue Yu initial object set coefficient for each object, using right as this
As the corresponding characteristic threshold value.
In a realization scene of the present embodiment, initial object set coefficient can be a fixed value, for example, can set
It is set to 0.0001.
Alternatively, initial object set coefficient can be associated with the total number of object in initial object set, wherein initial
The total number of object is bigger in object set, and initial object set coefficient is smaller.
In the embodiment of the present disclosure, by screening to initial object set, obtaining the first object to be processed in turn can
Further object screening is carried out using aforementioned screening mode as shown in Figure 1, obtains the second object to be processed, utilizes the as a result,
When two objects to be processed carry out subsequent fitting or other data processings, the dimension of number of objects can be further decreased, is improved
The fitting precision being fitted after object screens.
Technical solution provided by the embodiment of the present disclosure at least has following technical effect:
In technical solution provided by the present disclosure, for characteristic value of first object to be processed in multiple pre-set categories, divide
First average value of the object in classification is not calculated, and thus gets the inequality extent of the first object to be processed, without
Balance degree is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent is the redundancy of fit procedure
Object, and the higher object of inequality extent is more meaningful to fit procedure, therefore, the unevenness based on each first object to be processed
Weighing apparatus degree carry out object screening, redundancy object can be reduced, and inhibit over-fitting to a certain extent, and due to screening after
Redundancy object is deleted in the set of second obtained object to be processed, reduces Object Dimension, is more advantageous to reduction fitting mould
The complexity of type improves fitting precision.
Object screening technique provided by one based on the above embodiment, the embodiment of the present disclosure, which further provides, realizes above-mentioned side
The Installation practice of each step and method in method embodiment.
The embodiment of the present disclosure provides a kind of object screening plant, referring to FIG. 6, Fig. 3 is provided by the embodiment of the present disclosure
A kind of object screening plant functional block diagram, as shown in fig. 6, the object screening plant 600, comprising:
First obtains module 61, for obtaining the first object to be processed characteristic value in each pre-set categories respectively;
Second obtains module 62, for being directed to any first object to be processed, obtains in any pre-set categories
First average value of the characteristic value;
Third obtains module 63, for obtaining the unbalanced of first object to be processed according to first average value
Degree;Wherein, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed;
Screening module 64, for obtaining the inequality extent in the described first object to be processed more than or equal to predetermined
The object to be processed of the second of inequality extent.
In a realization scene of the present embodiment, third obtains module 63, is used for:
For any first object to be processed, institute of first object to be processed in each pre-set categories is obtained
State the variance of the first average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
In a realization scene of the present embodiment, screening module 64 is used for:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
According to sequence vertical after sequence, the described first object acquisition preset quantity to be processed described second to
Process object.
In addition, the object screening plant 600 can also include that (Fig. 6 does not show in a realization scene of the present embodiment
Out):
4th obtains module, for obtaining each object respectively every in multiple subclass in initial object set
Characteristic value in a subclass;
5th obtains module, for being directed to any object, obtains the feature of the object in each subclass
Second average value of value;
Determining module, for second average value to be greater than to the object of default characteristic threshold value, be determined as described first to
Process object.
In addition, the object screening plant 600 can also include that (Fig. 6 does not show in a realization scene of the present embodiment
Out):
6th obtains module, for obtaining maximum eigenvalue of each object in initial object set;
7th obtains module, for being directed to each object, obtain the maximum eigenvalue and initial object set coefficient it
Product, using as the corresponding characteristic threshold value of the object.
In addition, the embodiment of the present disclosure provides a kind of object screening plant, referring to FIG. 7, the object screening plant 700,
Include:
Memory 710;
Processor 720;And
Computer program;
Wherein, computer program is stored in memory 710, and is configured as being executed by processor 720 to realize as above
State method described in embodiment.
In addition, being used for as shown in fig. 7, be additionally provided with transmitter 730 and receiver 740 in the object screening plant 700
Carry out data transmission with other equipment or communicate, details are not described herein.
In addition, the embodiment of the present disclosure provides a kind of readable storage medium storing program for executing, it is stored thereon with computer program,
The computer program is executed by processor to realize the method as described in embodiment one.
Method shown in embodiment one is able to carry out as each module in this present embodiment, what the present embodiment was not described in detail
Part can refer to the related description to embodiment one.
Technical solution provided by the embodiment of the present disclosure at least has following technical effect:
In technical solution provided by the present disclosure, for characteristic value of first object to be processed in multiple pre-set categories, divide
First average value of the object in classification is not calculated, and thus gets the inequality extent of the first object to be processed, without
Balance degree is used to characterize the distributional difference degree of object, in this way, the lower object of inequality extent is the redundancy of fit procedure
Object, and the higher object of inequality extent is more meaningful to fit procedure, therefore, the unevenness based on each first object to be processed
Weighing apparatus degree carry out object screening, redundancy object can be reduced, and inhibit over-fitting to a certain extent, and due to screening after
Redundancy object is deleted in the set of second obtained object to be processed, reduces Object Dimension, is more advantageous to reduction fitting mould
The complexity of type improves fitting precision.
Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein
Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following
Claims are pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by appended claims
System.
Claims (12)
1. a kind of object screening technique characterized by comprising
Obtain the first object to be processed characteristic value in each pre-set categories respectively;
For any first object to be processed, the first average value of the characteristic value in any pre-set categories is obtained;
According to first average value, the inequality extent of first object to be processed is obtained;Wherein, the inequality extent
For characterizing the distributional difference degree of the described first object to be processed;
Obtained in the described first object to be processed the inequality extent more than or equal to predetermined inequality extent second to
Process object.
2. the method according to claim 1, wherein described according to first average value, acquisition described first
The inequality extent of object to be processed, comprising:
For any first object to be processed, described the of first object to be processed in each pre-set categories is obtained
The variance of one average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
3. method according to claim 1 or 2, which is characterized in that described to obtain institute in the described first object to be processed
State the second object to be processed that inequality extent is greater than or equal to predetermined inequality extent, comprising:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
It is to be processed in the described first object acquisition preset quantity to be processed described second according to sequence vertical after sequence
Object.
4. the method according to claim 1, wherein the method also includes:
In multiple subclass in initial object set, feature of each object respectively in each subclass is obtained
Value;
For any object, the second average value of the characteristic value of the object in each subclass is obtained;
Second average value is greater than to the object of default characteristic threshold value, is determined as the described first object to be processed.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Obtain maximum eigenvalue of each object in initial object set;
For each object, the product of the maximum eigenvalue Yu initial object set coefficient is obtained, using corresponding as the object
The characteristic threshold value.
6. a kind of object screening plant characterized by comprising
First obtains module, for obtaining the first object to be processed characteristic value in each pre-set categories respectively;
Second obtains module, for being directed to any first object to be processed, obtains the spy in any pre-set categories
First average value of value indicative;
Third obtains module, for obtaining the inequality extent of first object to be processed according to first average value;Its
In, the inequality extent is used to characterize the distributional difference degree of the described first object to be processed;
Screening module, it is unbalanced more than or equal to making a reservation for for obtaining the inequality extent in the described first object to be processed
The object to be processed of the second of degree.
7. device according to claim 6, which is characterized in that the third obtains module, is used for:
For any first object to be processed, described the of first object to be processed in each pre-set categories is obtained
The variance of one average value;
Using the numerical value of the variance as the numerical value of the inequality extent.
8. device according to claim 6 or 7, which is characterized in that the screening module is used for:
According to the sequence of the inequality extent from large to small, the described first object to be processed is ranked up;
It is to be processed in the described first object acquisition preset quantity to be processed described second according to sequence vertical after sequence
Object.
9. device according to claim 6, which is characterized in that described device further include:
4th obtains module, in multiple subclass in initial object set, obtaining each object respectively in each institute
State the characteristic value in subclass;
5th obtains module, for being directed to any object, obtains the characteristic value of the object in each subclass
Second average value;
It is to be processed to be determined as described first for second average value to be greater than to the object of default characteristic threshold value for determining module
Object.
10. device according to claim 9, which is characterized in that described device further include:
6th obtains module, for obtaining maximum eigenvalue of each object in initial object set;
7th obtains module, for obtaining the product of the maximum eigenvalue Yu initial object set coefficient for each object, with
As the corresponding characteristic threshold value of the object.
11. a kind of object screening plant characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as
Method described in any one of claim 1 to 5.
12. a kind of computer readable storage medium, which is characterized in that it is stored thereon with computer program,
The computer program is executed by processor to realize such as method described in any one of claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910471428.7A CN110210559B (en) | 2019-05-31 | 2019-05-31 | Object screening method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910471428.7A CN110210559B (en) | 2019-05-31 | 2019-05-31 | Object screening method and device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210559A true CN110210559A (en) | 2019-09-06 |
CN110210559B CN110210559B (en) | 2021-10-08 |
Family
ID=67790194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910471428.7A Active CN110210559B (en) | 2019-05-31 | 2019-05-31 | Object screening method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210559B (en) |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101840516A (en) * | 2010-04-27 | 2010-09-22 | 上海交通大学 | Feature selection method based on sparse fraction |
CN103106275A (en) * | 2013-02-08 | 2013-05-15 | 西北工业大学 | Text classification character screening method based on character distribution information |
CN105117617A (en) * | 2015-08-26 | 2015-12-02 | 大连海事大学 | Method for screening environmentally sensitive biomolecules |
CN105740388A (en) * | 2016-01-27 | 2016-07-06 | 上海晶赞科技发展有限公司 | Distributed drift data set-based feature selection method |
CN105938523A (en) * | 2016-03-31 | 2016-09-14 | 陕西师范大学 | Feature selection method and application based on feature identification degree and independence |
CN106874286A (en) * | 2015-12-11 | 2017-06-20 | 阿里巴巴集团控股有限公司 | A kind of method and device for screening user characteristics |
CN107468260A (en) * | 2017-10-12 | 2017-12-15 | 公安部南昌警犬基地 | A kind of brain electricity analytical device and analysis method for judging ANIMAL PSYCHE state |
CN107518894A (en) * | 2017-10-12 | 2017-12-29 | 公安部南昌警犬基地 | A kind of construction method and device of animal brain electricity disaggregated model |
CN107622333A (en) * | 2017-11-02 | 2018-01-23 | 北京百分点信息科技有限公司 | A kind of event prediction method, apparatus and system |
CN107714038A (en) * | 2017-10-12 | 2018-02-23 | 北京翼石科技有限公司 | The feature extracting method and device of a kind of EEG signals |
CN107844865A (en) * | 2017-11-20 | 2018-03-27 | 天津科技大学 | Feature based parameter chooses the stock index prediction method with LSTM models |
CN107845407A (en) * | 2017-08-24 | 2018-03-27 | 大连大学 | Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined |
CN107945053A (en) * | 2017-12-29 | 2018-04-20 | 广州思泰信息技术有限公司 | A kind of multiple source power distribution network data convergence analysis platform and its control method |
CN108240978A (en) * | 2016-12-26 | 2018-07-03 | 同方威视技术股份有限公司 | Self-learning type method for qualitative analysis based on Raman spectrum |
CN108427966A (en) * | 2018-03-12 | 2018-08-21 | 成都信息工程大学 | A kind of magic magiscan and method based on PCA-LDA |
CN108509996A (en) * | 2018-04-03 | 2018-09-07 | 电子科技大学 | Feature selection approach based on Filter and Wrapper selection algorithms |
CN109192310A (en) * | 2018-07-25 | 2019-01-11 | 同济大学 | A kind of undergraduate psychological behavior unusual fluctuation scheme Design method based on big data |
CN109523118A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Risk data screening technique, device, computer equipment and storage medium |
CN109636035A (en) * | 2018-12-12 | 2019-04-16 | 北京天诚同创电气有限公司 | Load forecasting model creation method and device, Methods of electric load forecasting and device |
-
2019
- 2019-05-31 CN CN201910471428.7A patent/CN110210559B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101840516A (en) * | 2010-04-27 | 2010-09-22 | 上海交通大学 | Feature selection method based on sparse fraction |
CN103106275A (en) * | 2013-02-08 | 2013-05-15 | 西北工业大学 | Text classification character screening method based on character distribution information |
CN105117617A (en) * | 2015-08-26 | 2015-12-02 | 大连海事大学 | Method for screening environmentally sensitive biomolecules |
CN106874286A (en) * | 2015-12-11 | 2017-06-20 | 阿里巴巴集团控股有限公司 | A kind of method and device for screening user characteristics |
CN105740388A (en) * | 2016-01-27 | 2016-07-06 | 上海晶赞科技发展有限公司 | Distributed drift data set-based feature selection method |
CN105938523A (en) * | 2016-03-31 | 2016-09-14 | 陕西师范大学 | Feature selection method and application based on feature identification degree and independence |
CN108240978A (en) * | 2016-12-26 | 2018-07-03 | 同方威视技术股份有限公司 | Self-learning type method for qualitative analysis based on Raman spectrum |
CN107845407A (en) * | 2017-08-24 | 2018-03-27 | 大连大学 | Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined |
CN107468260A (en) * | 2017-10-12 | 2017-12-15 | 公安部南昌警犬基地 | A kind of brain electricity analytical device and analysis method for judging ANIMAL PSYCHE state |
CN107518894A (en) * | 2017-10-12 | 2017-12-29 | 公安部南昌警犬基地 | A kind of construction method and device of animal brain electricity disaggregated model |
CN107714038A (en) * | 2017-10-12 | 2018-02-23 | 北京翼石科技有限公司 | The feature extracting method and device of a kind of EEG signals |
CN107622333A (en) * | 2017-11-02 | 2018-01-23 | 北京百分点信息科技有限公司 | A kind of event prediction method, apparatus and system |
CN107844865A (en) * | 2017-11-20 | 2018-03-27 | 天津科技大学 | Feature based parameter chooses the stock index prediction method with LSTM models |
CN107945053A (en) * | 2017-12-29 | 2018-04-20 | 广州思泰信息技术有限公司 | A kind of multiple source power distribution network data convergence analysis platform and its control method |
CN108427966A (en) * | 2018-03-12 | 2018-08-21 | 成都信息工程大学 | A kind of magic magiscan and method based on PCA-LDA |
CN108509996A (en) * | 2018-04-03 | 2018-09-07 | 电子科技大学 | Feature selection approach based on Filter and Wrapper selection algorithms |
CN109192310A (en) * | 2018-07-25 | 2019-01-11 | 同济大学 | A kind of undergraduate psychological behavior unusual fluctuation scheme Design method based on big data |
CN109523118A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Risk data screening technique, device, computer equipment and storage medium |
CN109636035A (en) * | 2018-12-12 | 2019-04-16 | 北京天诚同创电气有限公司 | Load forecasting model creation method and device, Methods of electric load forecasting and device |
Non-Patent Citations (6)
Title |
---|
FRESH_SUGER: "数据预处理与特征选择", 《CSDN:HTTPS://BLOG.CSDN.NET/GANZHANTOULEBI0546/ARTICLE/DETAILS/72921236》 * |
JOEY_YK: "特征选择方法总结", 《CSDN:HTTPS://BLOG.CSDN.NET/JOEY_YK/ARTICLE/DETAILS/82736145》 * |
RINNYLU: "机器学习--特征选择(Python代码实现)", 《CSDN:HTTPS://BLOG.CSDN.NET/GITHUB_38980969/ARTICLE/DETAILS/82252412》 * |
冀俊忠等: "基于类别加权和方差统计的特征选择方法", 《北京工业大学学报》 * |
奋斗的小炎: "机器学习中特征选择的方法综述", 《CSDN:HTTPS://BLOG.CSDN.NET/LITTLE_FIRE/ARTICLE/DETAILS/80500354》 * |
打牛地: "机器学习 特征选择(过滤法 封装法 嵌入法)", 《CSDN:HTTPS://BLOG.CSDN.NET/WEIXIN_43172660/ARTICLE/DETAILS/84340164》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110210559B (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101030653B1 (en) | User-based collaborative filtering recommender system amending similarity using information entropy | |
CN108090208A (en) | Fused data processing method and processing device | |
CN112363813A (en) | Resource scheduling method and device, electronic equipment and computer readable medium | |
CN105787055A (en) | Information recommendation method and device | |
Zeng et al. | A novel induced aggregation method for intuitionistic fuzzy set and its application in multiple attribute group decision making | |
CN112035753B (en) | Recommendation page generation method and device, electronic equipment and computer readable medium | |
WO2020135144A1 (en) | Method and apparatus for predicting object preference, and computer-readable medium | |
CN112214616B (en) | Knowledge graph fluency display method and device | |
CN114330670A (en) | Graph neural network training method, device, equipment and storage medium | |
CN111158828A (en) | User interface determining method and device of application program APP and storage medium | |
CN116957874B (en) | Intelligent automatic course arrangement method, system and equipment for universities and storage medium | |
JP2016029526A (en) | Information processing apparatus and program | |
CN111291217B (en) | Content recommendation method, device, electronic equipment and computer readable medium | |
CN111105297A (en) | Information pushing method and related device | |
US20180150754A1 (en) | Data analysis method, system and non-transitory computer readable medium | |
CN111160491B (en) | Pooling method and pooling model in convolutional neural network | |
CN113204642A (en) | Text clustering method and device, storage medium and electronic equipment | |
CN110210559A (en) | Object screening technique and device, storage medium | |
CN109584047B (en) | Credit granting method, system, computer equipment and medium | |
CN112258285A (en) | Content recommendation method and device, equipment and storage medium | |
CN110309361A (en) | A kind of determination method, recommended method, device and the electronic equipment of video scoring | |
CN110688508A (en) | Image-text data expansion method and device and electronic equipment | |
Pissanetzky et al. | Efficient calculation of numerical values of a polyhedral function | |
CN112464073B (en) | Method for automatically generating detailed page and newly added form page according to query page design result | |
CN111563177B (en) | Theme wallpaper recommendation method and system based on cosine algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |