CN109086453A - A kind of method and system for extracting label correlation from neighbours' example - Google Patents

A kind of method and system for extracting label correlation from neighbours' example Download PDF

Info

Publication number
CN109086453A
CN109086453A CN201810991693.3A CN201810991693A CN109086453A CN 109086453 A CN109086453 A CN 109086453A CN 201810991693 A CN201810991693 A CN 201810991693A CN 109086453 A CN109086453 A CN 109086453A
Authority
CN
China
Prior art keywords
label
neighbours
example sample
sample
object instance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810991693.3A
Other languages
Chinese (zh)
Inventor
施展
冯丹
杨蕾
戴凯航
方交凤
刘上
曹孟媛
杨文鑫
陈硕
陈静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810991693.3A priority Critical patent/CN109086453A/en
Publication of CN109086453A publication Critical patent/CN109086453A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of from neighbours' example extracts the method and system of label correlation, wherein the realization of method includes: to be clustered according to feature to example sample;Find K neighbours' sample similar in feature;Obtain the tag set of K neighbours' sample;Calculate the label percentage set C of neighbours' sample;With decision tree prediction label correlation;The distributed intelligence of comprehensive tally set and decision tree prediction label lkExisting confidence score.The case where present invention extracts label correlation from neighbours' example, considers the relationship between all labels, label in fractional sample is occurred in pairs regards feature as, obtains the confidence score of label according to the feature of label correlation.The present invention can effectively improve the accuracy rate of multi-tag classification.

Description

A kind of method and system for extracting label correlation from neighbours' example
Technical field
The invention belongs to data minings and machine learning field, extract mark from neighbours' example more particularly, to one kind Sign the method and system of correlation.
Background technique
With the arrival of big data era, real-life every field produces a large amount of multi-tag data, accurately It obtains sample label and is conducive to improve the hit rate of text retrieval, picture retrieval, object identification, and face the number of explosive growth According to, the valuable information of manual withdrawal is increasingly difficult to complete, had become by the automatic sample drawn label of machine learning method based on Want direction.
Data classification is an important branch in the field of data mining research, is the important aspect to solve practical problems, It receives significant attention and studies, traditional classification method is that each sample is assigned to one and only one label.However it is true The object in the world often not only has unique semanteme, but has ambiguity, in the fields such as text mining and bioinformatics In research object be all multi-tag, such as in text classification, a document may belong to multiple types;Yeast genes function Can classify is also multi-tag classification problem, and yeast data set is made of 1500 genes, and a gene has multiple functions label; In medical diagnosis, a kind of disease may belong to multiple classifications.Traditional single labeling is unable to satisfy demand, multi-tag classification Become research emphasis.
When to multi-tag sample classification, traditional multi-tag classification method is the mapping between learning sample feature and label Relationship can predict to have no exemplary class label in the mapping, and without considering the relationship between label, label is often pairs of Occur, statistics show they have correlation, from the perspective of study and prediction, these relationships provide except essential information with Outer useful information, therefore it is beneficial to consider that label correlation promotes the accuracy rate of algorithm.Consider that the correlation of label is more, mould The complexity of type is higher, if only considering that part labels correlation will be unable to capture true dependence, if it is considered that all The complex relationship of correlation, label is more intractable.
Summary of the invention
Aiming at the above defects or improvement requirements of the prior art, the present invention provides one kind extracts label from neighbours' example The method and system of correlation, thus solving existing multi-tag sample classification mode has the technical issues of certain limitation.
To achieve the above object, according to one aspect of the present invention, it provides one kind and extracts label phase from neighbours' example The method of closing property, comprising:
(1) similitude between each example sample is measured by the Euclidean distance between each example sampling feature vectors, with Each example sample is clustered according to close and distant distance, wherein the label of the example sample after cluster is with uniformity or phase Association;
(2) it for arbitrary target example, is found from the example sample after cluster and k neighbour similar in object instance feature Example sample is occupied, to obtain the tag set of k neighbours' example sample of object instance;
(3) the label percentage set C={ c of k neighbours' example sample is calculatedl1,cl2,cl3,…,clm, wherein clmIt is Contain the label percentage of m-th of label according to the object instance that the tag set of k neighbours' example sample obtains, m indicates mark Sign number;
(4) classifier is established using label percentage as the feature of example sample, constructs the topological diagram of label importance, from Top is formed down decision tree;
(5) distributed intelligence of comprehensive tag set and decision tree predict the label for whether having in tag set in object instance Confidence score.
Preferably, in step (2), it is minimum that the Euclidean distance between object instance is found from the example sample after cluster K neighbours' example sample, to obtain the tag set of k neighbours' example sample of object instance.
Preferably, in step (3), object instance contains the label percentage c of m-th of labellmAre as follows:Wherein, Yj(j=1,2 ..., k) is the tally set of j-th of neighbour in object instance k neighbour It closes, IYj(lm) indicate object instance j-th of neighbour tag set in whether contain label lmIf label lm∈Yj, thenOtherwise,
Preferably, in step (4), the input space of the classifier is the label percentage set C=of neighbours' sample {cl1,cl2,cl3,…,clm, corresponding output space is t={ 0,1 }, judges whether there is tally set in example sample with decision tree Label in conjunction, and if it exists, then t value is 1, and then t is 0 if it does not exist.
It is another aspect of this invention to provide that providing a kind of system for extracting label correlation from neighbours' example, comprising:
Cluster module is measured similar between each example sample by the Euclidean distance between each example sampling feature vectors Property, to be clustered to each example sample according to close and distant distance, wherein the label of the example sample after cluster it is with uniformity or Person is associated;
Tag set obtains module, for for arbitrary target example, finds from the example sample after cluster and target K neighbours' example sample similar in example aspects, to obtain the tag set of k neighbours' example sample of object instance;
Label percentage computing module, for calculating the label percentage set C={ c of k neighbours' example samplel1,cl2, cl3,…,clm, wherein clmIt is that the object instance obtained according to the tag set of k neighbours' example sample contains m-th of label Label percentage, m indicate label number;
Decision tree constructs module, for establishing classifier for label percentage as the feature of example sample, constructs label The topological diagram of importance, top-down formation decision tree;
Prediction module, for whether having tally set in integrating the distributed intelligence and decision tree prediction object instance of tag set The confidence score of label in conjunction.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show Beneficial effect: the present invention mainly finds similar neighbours' sample, excavates label from the tag set of the similar sample of small cluster The case where occurring in pairs, as label correlative character, obtains the confidence score of label according to the feature of label correlation, realizes Prediction to multi-tag improves the accuracy rate of classification.
Detailed description of the invention
Fig. 1 is a kind of process signal of method that label correlation is extracted from neighbours' example provided in an embodiment of the present invention Figure.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below Not constituting a conflict with each other can be combined with each other.
The present invention considers partial tag dependence, proposes a kind of side that label correlation is obtained from neighbours' example Method considers that similar sample local under actual conditions has correlation, by the label occurred in pairs in fractional sample tally set point Cloth information regards feature as, and low with complexity and can be parallel single labeling method calculates the probability that label occurs.The present invention Similar neighbours' sample is mainly found, the feelings that label occurs in pairs are excavated from the tag set of the similar sample of small cluster Condition obtains the confidence score of label according to the feature of label correlation, realizes to the pre- of multi-tag as label correlative character It surveys, improves the accuracy rate of classification.
It is as shown in Figure 1 a kind of method flow schematic diagram provided in an embodiment of the present invention, comprising the following steps:
(1) example sample is clustered according to feature: special by calculating each example sample using the method for Euclidean distance Euclidean distance between sign vector measures the similitude between each example sample, to cluster to each example sample, wherein with Sample similar in example aspects can regard the cluster of example similar in feature as;
(2) for arbitrary target example, the sample label set of k neighbours of object instance is obtained, is found and target reality K neighbours' sample similar in example feature, to obtain the tag set of k neighbours' sample of object instance;
Wherein, the label distributed intelligence in cluster labels set is related to the object instance sample label, with same at high proportion When existing label there is correlation.
Wherein, the smallest k neighbours example of the Euclidean distance between object instance is found from the example sample after cluster Sample, to obtain the tag set of k neighbours' example sample of object instance.
(3) the label percentage set C={ c of k neighbours' example sample is calculatedl1,cl2,cl3,…,clm, wherein clmIt is Contain the label percentage of m-th of label according to the object instance that the tag set of k neighbours' example sample obtains, m indicates mark Sign number;
In embodiments of the present invention, it can be used two to recirculate to calculate label percentage set C, outer loop is used to time Go through tag set L={ l1,l2,l3,...,lm, interior loop is used to traverse each in k neighbour's example sample of object instance The label that example sample is possessed.
In embodiments of the present invention, object instance contains the label percentage c of m-th of labellmAre as follows:Wherein, Yj(j=1,2 ..., k) is the label of j-th of neighbour in the k neighbour of object instance Set,Indicate whether contain label l in the tag set of j-th of neighbour of object instancemIf label lm∈Yj, thenOtherwise,
(4) use decision tree prediction label correlation: the sample label percentage that step (3) is obtained is as example sample Feature establishes classifier, constructs the topological diagram of label importance, top-down formation decision tree is established shared by different labels Percentage is characterized, and objective function is the classifier for the label whether object instance possesses in tag set;
Wherein, the input space of classifier is the label percentage set C={ c of neighbours' samplel1,cl2,cl3,…,clm, Corresponding output space is ti={ 0,1 } judges example sample x with decision treeiIn whether have label in tag set, if depositing In then tiValue is 1, if it does not exist then tiIt is 0.
(5) distributed intelligence of comprehensive tag set and decision tree prediction object instance possess setting for the label in tag set Believe score.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims (5)

1. a kind of method for extracting label correlation from neighbours' example characterized by comprising
(1) similitude between each example sample is measured by the Euclidean distance between each example sampling feature vectors, to each Example sample is clustered according to close and distant distance, wherein the label of the example sample after cluster is with uniformity or associated;
(2) it for arbitrary target example, is found from the example sample after cluster real with k neighbours similar in object instance feature Example sample, to obtain the tag set of k neighbours' example sample of object instance;
(3) the label percentage set C={ c of k neighbours' example sample is calculatedl1,cl2,cl3,…,clm, wherein clmIt is according to k The object instance that the tag set of a neighbours' example sample obtains contains the label percentage of m-th of label, and m indicates label Number;
(4) establish classifier using label percentage as the feature of example sample, construct the topological diagram of label importance, push up certainly to Lower formation decision tree;
(5) distributed intelligence of comprehensive tag set and decision tree predict the confidence containing the label in tag set in object instance Score.
2. the method according to claim 1, wherein being found from the example sample after cluster in step (2) The smallest k neighbours example sample of Euclidean distance between object instance, to obtain k neighbours' example sample of object instance Tag set.
3. method according to claim 1 or 2, which is characterized in that in step (3), object instance contains m-th of label Label percentage clmAre as follows:Wherein, Yj(j=1,2 ..., k) is the jth of object instance k neighbour The tag set of a neighbour,Indicate whether contain label l in the tag set of j-th of neighbour of object instancemIf mark Sign lm∈Yj, thenOtherwise,
4. according to the method described in claim 3, it is characterized in that, the input space of the classifier is neighbour in step (4) Occupy the label percentage set C={ c of samplel1,cl2,cl3,…,clm, corresponding output space is t={ 0,1 }, uses decision tree Judge the label for whether having in tag set in example sample, and if it exists, then t value is 1, and then t is 0 if it does not exist.
5. providing a kind of system for extracting label correlation from neighbours' example characterized by comprising
Cluster module measures the similitude between each example sample by the Euclidean distance between each example sampling feature vectors, To be clustered to each example sample according to close and distant distance, wherein the label of the example sample after cluster it is with uniformity or It is associated;
Tag set obtains module, for for arbitrary target example, finds from the example sample after cluster and object instance K neighbours' example sample similar in feature, to obtain the tag set of k neighbours' example sample of object instance;
Label percentage computing module, for calculating the label percentage set C={ c of k neighbours' example samplel1,cl2, cl3,…,clm, wherein clmIt is that the object instance obtained according to the tag set of k neighbours' example sample contains m-th of label Label percentage, m indicate label number;
Decision tree constructs module, and for establishing classifier for label percentage as the feature of example sample, construction label is important The topological diagram of property, top-down formation decision tree;
Prediction module, distributed intelligence and decision tree for integrating tag set are predicted in object instance containing in tag set The confidence score of label.
CN201810991693.3A 2018-08-29 2018-08-29 A kind of method and system for extracting label correlation from neighbours' example Pending CN109086453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810991693.3A CN109086453A (en) 2018-08-29 2018-08-29 A kind of method and system for extracting label correlation from neighbours' example

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810991693.3A CN109086453A (en) 2018-08-29 2018-08-29 A kind of method and system for extracting label correlation from neighbours' example

Publications (1)

Publication Number Publication Date
CN109086453A true CN109086453A (en) 2018-12-25

Family

ID=64795070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810991693.3A Pending CN109086453A (en) 2018-08-29 2018-08-29 A kind of method and system for extracting label correlation from neighbours' example

Country Status (1)

Country Link
CN (1) CN109086453A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347791A (en) * 2019-06-20 2019-10-18 广东工业大学 A kind of topic recommended method based on multi-tag classification convolutional neural networks
CN111507382A (en) * 2020-04-01 2020-08-07 北京互金新融科技有限公司 Sample file clustering method and device and electronic equipment
CN112766383A (en) * 2021-01-22 2021-05-07 浙江工商大学 Label enhancement method based on feature clustering and label similarity

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347791A (en) * 2019-06-20 2019-10-18 广东工业大学 A kind of topic recommended method based on multi-tag classification convolutional neural networks
CN111507382A (en) * 2020-04-01 2020-08-07 北京互金新融科技有限公司 Sample file clustering method and device and electronic equipment
CN111507382B (en) * 2020-04-01 2023-05-05 北京互金新融科技有限公司 Sample file clustering method and device and electronic equipment
CN112766383A (en) * 2021-01-22 2021-05-07 浙江工商大学 Label enhancement method based on feature clustering and label similarity

Similar Documents

Publication Publication Date Title
CN109948561B (en) The method and system that unsupervised image/video pedestrian based on migration network identifies again
CN104599275B (en) The RGB-D scene understanding methods of imparametrization based on probability graph model
CN108960184B (en) Pedestrian re-identification method based on heterogeneous component deep neural network
CN108256450A (en) A kind of supervised learning method of recognition of face and face verification based on deep learning
Gong et al. Instance-dependent positive and unlabeled learning with labeling bias estimation
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN109086453A (en) A kind of method and system for extracting label correlation from neighbours' example
CN110377747A (en) A kind of knowledge base fusion method towards encyclopaedia website
TWI525574B (en) Collaborative face annotation method and collaborative face annotation system
CN110443120A (en) A kind of face identification method and equipment
CN109447110A (en) The method of the multi-tag classification of comprehensive neighbours' label correlative character and sample characteristics
CN117152459B (en) Image detection method, device, computer readable medium and electronic equipment
CN115205570A (en) Unsupervised cross-domain target re-identification method based on comparative learning
Zhao et al. Learning discriminative region representation for person retrieval
CN103778206A (en) Method for providing network service resources
Pang et al. Reliability modeling and contrastive learning for unsupervised person re-identification
Wang et al. Visual space optimization for zero-shot learning
CN109214430A (en) A kind of recognition methods again of the pedestrian based on feature space topology distribution
CN112115996A (en) Image data processing method, device, equipment and storage medium
CN113688757B (en) SAR image recognition method, SAR image recognition device and storage medium
Wang et al. Challenge of multi-camera tracking
CN104778272B (en) A kind of picture position method of estimation excavated based on region with space encoding
Li et al. Person re-identification with activity prediction based on hierarchical spatial-temporal model
Zhou et al. Unsupervised self-training correction learning for 2D image-based 3D model retrieval
Zhu et al. A cross-view intelligent person search method based on multi-feature constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181225