CN109086453A - A kind of method and system for extracting label correlation from neighbours' example - Google Patents
A kind of method and system for extracting label correlation from neighbours' example Download PDFInfo
- Publication number
- CN109086453A CN109086453A CN201810991693.3A CN201810991693A CN109086453A CN 109086453 A CN109086453 A CN 109086453A CN 201810991693 A CN201810991693 A CN 201810991693A CN 109086453 A CN109086453 A CN 109086453A
- Authority
- CN
- China
- Prior art keywords
- label
- neighbours
- example sample
- sample
- object instance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of from neighbours' example extracts the method and system of label correlation, wherein the realization of method includes: to be clustered according to feature to example sample;Find K neighbours' sample similar in feature;Obtain the tag set of K neighbours' sample;Calculate the label percentage set C of neighbours' sample;With decision tree prediction label correlation;The distributed intelligence of comprehensive tally set and decision tree prediction label lkExisting confidence score.The case where present invention extracts label correlation from neighbours' example, considers the relationship between all labels, label in fractional sample is occurred in pairs regards feature as, obtains the confidence score of label according to the feature of label correlation.The present invention can effectively improve the accuracy rate of multi-tag classification.
Description
Technical field
The invention belongs to data minings and machine learning field, extract mark from neighbours' example more particularly, to one kind
Sign the method and system of correlation.
Background technique
With the arrival of big data era, real-life every field produces a large amount of multi-tag data, accurately
It obtains sample label and is conducive to improve the hit rate of text retrieval, picture retrieval, object identification, and face the number of explosive growth
According to, the valuable information of manual withdrawal is increasingly difficult to complete, had become by the automatic sample drawn label of machine learning method based on
Want direction.
Data classification is an important branch in the field of data mining research, is the important aspect to solve practical problems,
It receives significant attention and studies, traditional classification method is that each sample is assigned to one and only one label.However it is true
The object in the world often not only has unique semanteme, but has ambiguity, in the fields such as text mining and bioinformatics
In research object be all multi-tag, such as in text classification, a document may belong to multiple types;Yeast genes function
Can classify is also multi-tag classification problem, and yeast data set is made of 1500 genes, and a gene has multiple functions label;
In medical diagnosis, a kind of disease may belong to multiple classifications.Traditional single labeling is unable to satisfy demand, multi-tag classification
Become research emphasis.
When to multi-tag sample classification, traditional multi-tag classification method is the mapping between learning sample feature and label
Relationship can predict to have no exemplary class label in the mapping, and without considering the relationship between label, label is often pairs of
Occur, statistics show they have correlation, from the perspective of study and prediction, these relationships provide except essential information with
Outer useful information, therefore it is beneficial to consider that label correlation promotes the accuracy rate of algorithm.Consider that the correlation of label is more, mould
The complexity of type is higher, if only considering that part labels correlation will be unable to capture true dependence, if it is considered that all
The complex relationship of correlation, label is more intractable.
Summary of the invention
Aiming at the above defects or improvement requirements of the prior art, the present invention provides one kind extracts label from neighbours' example
The method and system of correlation, thus solving existing multi-tag sample classification mode has the technical issues of certain limitation.
To achieve the above object, according to one aspect of the present invention, it provides one kind and extracts label phase from neighbours' example
The method of closing property, comprising:
(1) similitude between each example sample is measured by the Euclidean distance between each example sampling feature vectors, with
Each example sample is clustered according to close and distant distance, wherein the label of the example sample after cluster is with uniformity or phase
Association;
(2) it for arbitrary target example, is found from the example sample after cluster and k neighbour similar in object instance feature
Example sample is occupied, to obtain the tag set of k neighbours' example sample of object instance;
(3) the label percentage set C={ c of k neighbours' example sample is calculatedl1,cl2,cl3,…,clm, wherein clmIt is
Contain the label percentage of m-th of label according to the object instance that the tag set of k neighbours' example sample obtains, m indicates mark
Sign number;
(4) classifier is established using label percentage as the feature of example sample, constructs the topological diagram of label importance, from
Top is formed down decision tree;
(5) distributed intelligence of comprehensive tag set and decision tree predict the label for whether having in tag set in object instance
Confidence score.
Preferably, in step (2), it is minimum that the Euclidean distance between object instance is found from the example sample after cluster
K neighbours' example sample, to obtain the tag set of k neighbours' example sample of object instance.
Preferably, in step (3), object instance contains the label percentage c of m-th of labellmAre as follows:Wherein, Yj(j=1,2 ..., k) is the tally set of j-th of neighbour in object instance k neighbour
It closes, IYj(lm) indicate object instance j-th of neighbour tag set in whether contain label lmIf label lm∈Yj, thenOtherwise,
Preferably, in step (4), the input space of the classifier is the label percentage set C=of neighbours' sample
{cl1,cl2,cl3,…,clm, corresponding output space is t={ 0,1 }, judges whether there is tally set in example sample with decision tree
Label in conjunction, and if it exists, then t value is 1, and then t is 0 if it does not exist.
It is another aspect of this invention to provide that providing a kind of system for extracting label correlation from neighbours' example, comprising:
Cluster module is measured similar between each example sample by the Euclidean distance between each example sampling feature vectors
Property, to be clustered to each example sample according to close and distant distance, wherein the label of the example sample after cluster it is with uniformity or
Person is associated;
Tag set obtains module, for for arbitrary target example, finds from the example sample after cluster and target
K neighbours' example sample similar in example aspects, to obtain the tag set of k neighbours' example sample of object instance;
Label percentage computing module, for calculating the label percentage set C={ c of k neighbours' example samplel1,cl2,
cl3,…,clm, wherein clmIt is that the object instance obtained according to the tag set of k neighbours' example sample contains m-th of label
Label percentage, m indicate label number;
Decision tree constructs module, for establishing classifier for label percentage as the feature of example sample, constructs label
The topological diagram of importance, top-down formation decision tree;
Prediction module, for whether having tally set in integrating the distributed intelligence and decision tree prediction object instance of tag set
The confidence score of label in conjunction.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show
Beneficial effect: the present invention mainly finds similar neighbours' sample, excavates label from the tag set of the similar sample of small cluster
The case where occurring in pairs, as label correlative character, obtains the confidence score of label according to the feature of label correlation, realizes
Prediction to multi-tag improves the accuracy rate of classification.
Detailed description of the invention
Fig. 1 is a kind of process signal of method that label correlation is extracted from neighbours' example provided in an embodiment of the present invention
Figure.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
Not constituting a conflict with each other can be combined with each other.
The present invention considers partial tag dependence, proposes a kind of side that label correlation is obtained from neighbours' example
Method considers that similar sample local under actual conditions has correlation, by the label occurred in pairs in fractional sample tally set point
Cloth information regards feature as, and low with complexity and can be parallel single labeling method calculates the probability that label occurs.The present invention
Similar neighbours' sample is mainly found, the feelings that label occurs in pairs are excavated from the tag set of the similar sample of small cluster
Condition obtains the confidence score of label according to the feature of label correlation, realizes to the pre- of multi-tag as label correlative character
It surveys, improves the accuracy rate of classification.
It is as shown in Figure 1 a kind of method flow schematic diagram provided in an embodiment of the present invention, comprising the following steps:
(1) example sample is clustered according to feature: special by calculating each example sample using the method for Euclidean distance
Euclidean distance between sign vector measures the similitude between each example sample, to cluster to each example sample, wherein with
Sample similar in example aspects can regard the cluster of example similar in feature as;
(2) for arbitrary target example, the sample label set of k neighbours of object instance is obtained, is found and target reality
K neighbours' sample similar in example feature, to obtain the tag set of k neighbours' sample of object instance;
Wherein, the label distributed intelligence in cluster labels set is related to the object instance sample label, with same at high proportion
When existing label there is correlation.
Wherein, the smallest k neighbours example of the Euclidean distance between object instance is found from the example sample after cluster
Sample, to obtain the tag set of k neighbours' example sample of object instance.
(3) the label percentage set C={ c of k neighbours' example sample is calculatedl1,cl2,cl3,…,clm, wherein clmIt is
Contain the label percentage of m-th of label according to the object instance that the tag set of k neighbours' example sample obtains, m indicates mark
Sign number;
In embodiments of the present invention, it can be used two to recirculate to calculate label percentage set C, outer loop is used to time
Go through tag set L={ l1,l2,l3,...,lm, interior loop is used to traverse each in k neighbour's example sample of object instance
The label that example sample is possessed.
In embodiments of the present invention, object instance contains the label percentage c of m-th of labellmAre as follows:Wherein, Yj(j=1,2 ..., k) is the label of j-th of neighbour in the k neighbour of object instance
Set,Indicate whether contain label l in the tag set of j-th of neighbour of object instancemIf label lm∈Yj, thenOtherwise,
(4) use decision tree prediction label correlation: the sample label percentage that step (3) is obtained is as example sample
Feature establishes classifier, constructs the topological diagram of label importance, top-down formation decision tree is established shared by different labels
Percentage is characterized, and objective function is the classifier for the label whether object instance possesses in tag set;
Wherein, the input space of classifier is the label percentage set C={ c of neighbours' samplel1,cl2,cl3,…,clm,
Corresponding output space is ti={ 0,1 } judges example sample x with decision treeiIn whether have label in tag set, if depositing
In then tiValue is 1, if it does not exist then tiIt is 0.
(5) distributed intelligence of comprehensive tag set and decision tree prediction object instance possess setting for the label in tag set
Believe score.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.
Claims (5)
1. a kind of method for extracting label correlation from neighbours' example characterized by comprising
(1) similitude between each example sample is measured by the Euclidean distance between each example sampling feature vectors, to each
Example sample is clustered according to close and distant distance, wherein the label of the example sample after cluster is with uniformity or associated;
(2) it for arbitrary target example, is found from the example sample after cluster real with k neighbours similar in object instance feature
Example sample, to obtain the tag set of k neighbours' example sample of object instance;
(3) the label percentage set C={ c of k neighbours' example sample is calculatedl1,cl2,cl3,…,clm, wherein clmIt is according to k
The object instance that the tag set of a neighbours' example sample obtains contains the label percentage of m-th of label, and m indicates label
Number;
(4) establish classifier using label percentage as the feature of example sample, construct the topological diagram of label importance, push up certainly to
Lower formation decision tree;
(5) distributed intelligence of comprehensive tag set and decision tree predict the confidence containing the label in tag set in object instance
Score.
2. the method according to claim 1, wherein being found from the example sample after cluster in step (2)
The smallest k neighbours example sample of Euclidean distance between object instance, to obtain k neighbours' example sample of object instance
Tag set.
3. method according to claim 1 or 2, which is characterized in that in step (3), object instance contains m-th of label
Label percentage clmAre as follows:Wherein, Yj(j=1,2 ..., k) is the jth of object instance k neighbour
The tag set of a neighbour,Indicate whether contain label l in the tag set of j-th of neighbour of object instancemIf mark
Sign lm∈Yj, thenOtherwise,
4. according to the method described in claim 3, it is characterized in that, the input space of the classifier is neighbour in step (4)
Occupy the label percentage set C={ c of samplel1,cl2,cl3,…,clm, corresponding output space is t={ 0,1 }, uses decision tree
Judge the label for whether having in tag set in example sample, and if it exists, then t value is 1, and then t is 0 if it does not exist.
5. providing a kind of system for extracting label correlation from neighbours' example characterized by comprising
Cluster module measures the similitude between each example sample by the Euclidean distance between each example sampling feature vectors,
To be clustered to each example sample according to close and distant distance, wherein the label of the example sample after cluster it is with uniformity or
It is associated;
Tag set obtains module, for for arbitrary target example, finds from the example sample after cluster and object instance
K neighbours' example sample similar in feature, to obtain the tag set of k neighbours' example sample of object instance;
Label percentage computing module, for calculating the label percentage set C={ c of k neighbours' example samplel1,cl2,
cl3,…,clm, wherein clmIt is that the object instance obtained according to the tag set of k neighbours' example sample contains m-th of label
Label percentage, m indicate label number;
Decision tree constructs module, and for establishing classifier for label percentage as the feature of example sample, construction label is important
The topological diagram of property, top-down formation decision tree;
Prediction module, distributed intelligence and decision tree for integrating tag set are predicted in object instance containing in tag set
The confidence score of label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810991693.3A CN109086453A (en) | 2018-08-29 | 2018-08-29 | A kind of method and system for extracting label correlation from neighbours' example |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810991693.3A CN109086453A (en) | 2018-08-29 | 2018-08-29 | A kind of method and system for extracting label correlation from neighbours' example |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109086453A true CN109086453A (en) | 2018-12-25 |
Family
ID=64795070
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810991693.3A Pending CN109086453A (en) | 2018-08-29 | 2018-08-29 | A kind of method and system for extracting label correlation from neighbours' example |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086453A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347791A (en) * | 2019-06-20 | 2019-10-18 | 广东工业大学 | A kind of topic recommended method based on multi-tag classification convolutional neural networks |
CN111507382A (en) * | 2020-04-01 | 2020-08-07 | 北京互金新融科技有限公司 | Sample file clustering method and device and electronic equipment |
CN112766383A (en) * | 2021-01-22 | 2021-05-07 | 浙江工商大学 | Label enhancement method based on feature clustering and label similarity |
-
2018
- 2018-08-29 CN CN201810991693.3A patent/CN109086453A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347791A (en) * | 2019-06-20 | 2019-10-18 | 广东工业大学 | A kind of topic recommended method based on multi-tag classification convolutional neural networks |
CN111507382A (en) * | 2020-04-01 | 2020-08-07 | 北京互金新融科技有限公司 | Sample file clustering method and device and electronic equipment |
CN111507382B (en) * | 2020-04-01 | 2023-05-05 | 北京互金新融科技有限公司 | Sample file clustering method and device and electronic equipment |
CN112766383A (en) * | 2021-01-22 | 2021-05-07 | 浙江工商大学 | Label enhancement method based on feature clustering and label similarity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948561B (en) | The method and system that unsupervised image/video pedestrian based on migration network identifies again | |
CN104599275B (en) | The RGB-D scene understanding methods of imparametrization based on probability graph model | |
CN108960184B (en) | Pedestrian re-identification method based on heterogeneous component deep neural network | |
CN108256450A (en) | A kind of supervised learning method of recognition of face and face verification based on deep learning | |
Gong et al. | Instance-dependent positive and unlabeled learning with labeling bias estimation | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN109086453A (en) | A kind of method and system for extracting label correlation from neighbours' example | |
CN110377747A (en) | A kind of knowledge base fusion method towards encyclopaedia website | |
TWI525574B (en) | Collaborative face annotation method and collaborative face annotation system | |
CN110443120A (en) | A kind of face identification method and equipment | |
CN109447110A (en) | The method of the multi-tag classification of comprehensive neighbours' label correlative character and sample characteristics | |
CN117152459B (en) | Image detection method, device, computer readable medium and electronic equipment | |
CN115205570A (en) | Unsupervised cross-domain target re-identification method based on comparative learning | |
Zhao et al. | Learning discriminative region representation for person retrieval | |
CN103778206A (en) | Method for providing network service resources | |
Pang et al. | Reliability modeling and contrastive learning for unsupervised person re-identification | |
Wang et al. | Visual space optimization for zero-shot learning | |
CN109214430A (en) | A kind of recognition methods again of the pedestrian based on feature space topology distribution | |
CN112115996A (en) | Image data processing method, device, equipment and storage medium | |
CN113688757B (en) | SAR image recognition method, SAR image recognition device and storage medium | |
Wang et al. | Challenge of multi-camera tracking | |
CN104778272B (en) | A kind of picture position method of estimation excavated based on region with space encoding | |
Li et al. | Person re-identification with activity prediction based on hierarchical spatial-temporal model | |
Zhou et al. | Unsupervised self-training correction learning for 2D image-based 3D model retrieval | |
Zhu et al. | A cross-view intelligent person search method based on multi-feature constraints |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181225 |