CN107316063A - Multiple labeling sorting technique, device, medium and computing device - Google Patents
Multiple labeling sorting technique, device, medium and computing device Download PDFInfo
- Publication number
- CN107316063A CN107316063A CN201710493622.6A CN201710493622A CN107316063A CN 107316063 A CN107316063 A CN 107316063A CN 201710493622 A CN201710493622 A CN 201710493622A CN 107316063 A CN107316063 A CN 107316063A
- Authority
- CN
- China
- Prior art keywords
- sample
- mark
- original
- collection
- positive example
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The application is related to machine learning techniques field, more particularly to multiple labeling sorting technique, device, medium and equipment.In the embodiment of the present application, after the original positive example collection and original minus example collection that obtain each mark, alignd by class, determine the operation of particular community and the particular community of insertion mark of correlation, realize with particular community to represent the dependency relation between mark, in order to enrich the data and semanteme of each mark.So, multiple labeling classification merely will be more accurate using the method for single mark relative to prior art.For example, " desert " and " camel " has dependency relation, the picture for containing a small amount of desert based on camel can be categorized into the picture of desert;For another example the lake water for the dusk that a pictures are included, if having the inverted image of the setting sun in lake water, prior art only can be by the picture classification into lake water, but the inverted image of the sun is again related to the setting sun in lake water, then, can also be by the classification of the picture classification to dusk scenery using the scheme of the application.
Description
Technical field
The application is related to machine learning techniques field, more particularly to multiple labeling sorting technique, device, medium and calculating are set
It is standby.
Background technology
Multiple labeling problem is widely present in machine learning.For example in image labeling problem, if given " canoe ",
" water ", " mountain peak ", " bridge ", " pedestrian ", " setting sun ", " cloud " etc. are marked, and the picture of a secondary description riverside scenery can be marked with
One or more of these marks.For another example in gene function classification, a gene can be with " energy ", " metabolism "
Deng for representing that the mark of functional category is related.Because mark quantities is big, handmarking is slow due to speed, so using artificial
The method of mark is unpractical.So, it is particularly important that research and utilization computer technology carries out automatic multiple labeling classification.
In correlation technique, one needs the object (abbreviation multiple labeling object) of mark, commonly uses attribute vector and label vector
To describe.Wherein, attribute vector describes the multiple labeling Properties of Objects, and label vector describes which mark it possesses.Specifically
, mark is more to be represented using the vector being made up of " -1 " and "+1 ", and " -1 " represents that multiple labeling object does not have correspondence markings, and
"+1 " represents there is correspondence markings.
Although people classify to multiple labeling the research of a period of time, multiple labeling classification how is carried out so far still
The problem of being so one extremely challenging.Comparatively, traditional single mark Study on Problems achievement is more, method comparative maturity.
If multiple labeling problem is simply seen be multiple single mark problems combination, this method effect is often not fully up to expectations.One
Important the reason for, is that this method have ignored the relation between not isolabeling.And the relation between marking is that mark prediction can
Utilize important information.For example, for containing " desert ", the picture library of " camel " the two marks, certain pictures has " husky
The mark of desert ", then be likely to have with " camel " this mark.Occur because " desert " and " camel " is often common, have
Positive correlation.Therefore, it is a science multiple labeling classifying quality how to be improved using there is dependency relation between multiple marks
Boundary and industrial circle very concern.
The content of the invention
The embodiment of the present application provides multiple labeling sorting technique, device, medium and computing device, to solve in the prior art
Exist simply see multiple labeling problem is that multiple lists mark the combination of problems to carry out multiple labeling classification, causes classification results
Inaccurate etc. the problem of.
A kind of multiple labeling sorting technique that the embodiment of the present application is provided, including:
For each mark in tag set, the original positive example collection and original minus example collection of the mark are determined;Wherein, for
Each sample, if the sample has the mark, the sample belongs to the original positive example collection of the mark, and otherwise, the sample belongs to this
The original minus example collection of mark;
Class alignment is carried out respectively to original positive example collection and original minus the example collection of each mark, after the class alignment for obtaining each mark
Negative example collection after positive example collection and class alignment;Wherein, the positive example after the class alignment respectively marked concentrates sample size equal and each mark
Class alignment after negative example concentrate sample size it is equal;
According to predetermined cluster centre number, the positive example collection after each class alignment is determined based on clustering method
The cluster centre of negative example collection after cluster centre, and each class alignment;
For each mark, the original positive example collection and original minus example for calculating the mark concentrate each sample relative to the mark
Each cluster centre distance, will obtain being used as mark specified genus corresponding with respective sample after arranged in sequence
Property, and constituted by element of the particular community of each sample of the mark particular community set of the mark;
For each mark, the particular community of other marks with the mark with dependency relation is inserted into the mark
In particular community set;
Particular community set based on each mark, carries out classification based training.
Another embodiment of the application also provides a kind of multiple labeling sorter, and the device includes:
Positive example bears example collection determining module, for for each mark in tag set, determining the original positive example of the mark
Collection and original minus example collection;Wherein, for each sample, if the sample has the mark, the sample belongs to the original of the mark
Positive example collection, otherwise, the sample belong to the original minus example collection of the mark;
Class alignment module, carries out class alignment for original positive example collection and original minus the example collection to each mark, obtains each respectively
The negative example collection after positive example collection and class alignment after the class alignment of mark;Wherein, the positive example after the class alignment respectively marked concentrates sample
Quantity it is equal and respectively mark class alignment after negative example concentrate sample size it is equal;
Cluster centre determining module, for according to predetermined cluster centre number, being determined based on clustering method
The cluster centre of negative example collection after the cluster centre of positive example collection after each class alignment, and each class alignment;
Particular community determining module, for for each mark, calculating the original positive example collection and original minus example collection of the mark
In each sample relative to each cluster centre of the mark distance, will obtain after arranged in sequence as the mark with
The corresponding particular community of respective sample, and the specified genus of the mark is constituted using the particular community of each sample of the mark as element
Property set;
Data-optimized module, for for each mark, by the specific of other marks with the mark with dependency relation
Attribute is inserted into the particular community set of the mark;
Classification based training module, for the particular community set based on each mark, carries out classification based training.
Another embodiment of the application additionally provides a kind of computing device, and it includes memory and processor, wherein, it is described to deposit
Reservoir is instructed for storage program, and the processor is used to call the programmed instruction stored in the memory, according to acquisition
Programmed instruction performs any multiple labeling sorting technique in the embodiment of the present application.
Another embodiment of the application additionally provides a kind of computer-readable storage medium, wherein, the computer-readable storage medium is deposited
Computer executable instructions are contained, the computer executable instructions are used to make the computer perform in the embodiment of the present application
Any multiple labeling sorting technique.
In the embodiment of the present application, after the original positive example collection and original minus example collection that obtain each mark, alignd by class, it is determined that special
Determine the operation of the particular community of attribute and insertion mark of correlation, realize with particular community to represent the related pass between mark
System, in order to enrich the data and semanteme of each mark.So, multiple labeling classification is relative to prior art merely using single mark
Method will be more accurate.For example, " desert " and " camel " has dependency relation, the picture in a small amount of desert will be contained based on camel
It can be categorized into the picture of desert;For another example the lake water for the dusk that a pictures are included, if having the inverted image of the setting sun in lake water,
Prior art only can by the picture classification into lake water, but in lake water the sun inverted image again it is related to the setting sun, then using the application
Scheme, can also be by the classification of the picture classification to dusk scenery.
Brief description of the drawings
Fig. 1 is the multiple labeling sorting technique schematic flow sheet that the embodiment of the present application one is provided;
Fig. 2 is the structural representation for the multiple labeling sorter that the embodiment of the present application three is provided;
Fig. 3 is the structural representation for the computing device that the embodiment of the present application four is provided.
Embodiment
The embodiment of the present application provides multiple labeling sorting technique, device, medium and computing device.There is provided in the embodiment of the present application
Multiple labeling sorting technique in, obtain each mark original positive example collection and original minus example collection after, alignd by class, determine specified genus
Property and insertion mark of correlation particular community operation, realize with particular community represent mark between dependency relation, with
It is easy to enrich the data and semanteme of each mark.So, multiple labeling classification is relative to prior art merely using the method for single mark
To be more accurate.For example, " desert " and " camel " has dependency relation, can by the picture containing a small amount of desert based on camel
It is categorized into the picture of desert;For another example the lake water for the dusk that a pictures are included, existing if having the inverted image of the setting sun in lake water
Technology only can by the picture classification into lake water, but in lake water the sun inverted image again it is related to the setting sun, then using the side of the application
Case, can also be by the classification of the picture classification to dusk scenery.
The embodiment of the present application is described in further detail with reference to Figure of description.
Embodiment one
Reference picture 1, be the embodiment of the present application one provide multiple labeling sorting technique schematic flow sheet figure, this method include with
Lower step:
Step 101:For each mark in tag set, the original positive example collection and original minus example collection of the mark are determined;
Wherein, for each sample, if the sample has the mark, the sample belongs to the original positive example collection of the mark, otherwise, the sample
Originally the original minus example collection of the mark is belonged to.
Step 102:Class alignment is carried out respectively to original positive example collection and original minus the example collection of each mark, the class of each mark is obtained
The negative example collection after positive example collection and class alignment after alignment;Wherein, respectively mark class alignment after positive example concentrate sample size it is equal,
And the negative example after the class alignment respectively marked concentrates sample size equal.
For example, having 2 marks, respectively mark 1 and mark 2.Wherein, the sample number that the original positive example of mark 1 is concentrated
The sample number concentrated for the original positive example of 10, mark 2 is 8.Then class align when due to mark 2 original positive example concentrate sample number
It is few, then increase is needed by the positive example sample of mark 1, so that mark 1 is equal with the sample size that the original positive example of mark 2 is concentrated.
The purpose and method that original minus example collection carries out class alignment are repeated no more here with original positive example collection.Hereinafter will citing
Class alignment is described in detail, wouldn't be repeated again.
Step 103:According to predetermined cluster centre number, determined based on clustering method after each class alignment
The cluster centre of negative example collection after the cluster centre of positive example collection, and each class alignment.
Wherein, clustering method can according to the actual requirements, using the method such as k-means clusters point of prior art
Analysis method, the application is not construed as limiting to this.
Wherein, in one embodiment, cluster centre number, including step A1- steps can be determined according to following methods
A3:
Step A1:Determine class alignment after positive example collection sample number alignd with class after negative example collection sample number minimum
Value.
Step A2:The product of default control variable and the minimum value determined is calculated, and product is carried out after floor operation
The number of cluster centre is obtained, wherein, default control variable is the constant more than 0 and less than 1.
Above-mentioned steps A1- steps A3 can be represented with adopting by equivalent equation below (1):
In formula (1), c represents cluster centre number;R represents default control variableRepresent the positive example after class alignment
The sample number of collection;Represent the sample number of the negative example collection after class alignment;Expression takes the two sample numbers
Minimum value.
Because class alignment has been carried out, to any mark, once if r values are consistent, c value is also just identical.Therefore
This, so determines that the method for cluster centre number is easy and easily performs, it is possible to increase treatment effeciency.
Certainly, when it is implemented, cluster centre number, the application can also be determined using the method for other prior arts
This is not construed as limiting.
Step 104:For each mark, the original positive example collection and original minus example for calculating the mark concentrate each sample relative
In the distance of each cluster centre of the mark, it will obtain corresponding as the mark with respective sample after arranged in sequence
Particular community, and constituted by element of the particular community of each sample of the mark particular community set of the mark.
If being for the obtained cluster centres of mark kWhereinRepresent class
The cluster centre of positive example collection after alignment,Represent the cluster centre of the negative example collection after class alignment.
Then, can be by the attribute transfer function shown in formula (2)To obtain marking lkEach sample xiIt is specific
Attribute:
In formula (2),Represent mark lkWith sample xiCorresponding particular community.
Here d () function representation layback,Exemplified by, represent sample xiWith cluster centreAway from
From.When it is implemented, the distance can be Euclidean distance.
Like this, all samples are all changed, and obtain lkParticular community set can use equation below (3) table
Show:
In formula (3), D represents data set, and the data set is represented by D={ (xi,yi) | 1≤i≤N }, wherein xtFor
I-th of sample, is represented, y with the attribute vector of the sampletThe mark of the sample, N represents number of samples.
Step 105:For each mark, the particular community of other marks with the mark with dependency relation is inserted into
In the particular community set of the mark.
Step 106:Particular community set based on each mark, carries out classification based training.
When it is implemented, corresponding two-value grader can be respectively trained.Conventional two-value grader have SVMs,
Decision tree etc., can select, the application is not construed as limiting according to particular problem.
In summary, being alignd by class, it is identical with sample number holding various in negative example sample by positive example sample to realize,
To determine the basic data that uniform amount is provided during particular community later.By determining particular community, there will be correlation to realize
The mark of relation sets up incidence relation, and the data and semanteme of the particular community of abundant mark, so that being based on enriching number
According to more accurate with the training result of semantic sample.
Following for the technical scheme for being easy to further understand the application offer, by following (1)-(2) point to phase above
Step is closed to be described further.
(1), for each mark described in step 105, by the specific of other marks with the mark with dependency relation
Attribute is inserted into the particular community set of the mark, may particularly include following steps B1- steps B3:
Step B1:For specifying sample, sample and multiple neighbours' samples of the specified sample are specified to constitute one by this
The corresponding neighbours' sample set of sample is specified with this.
Wherein, in one embodiment, for specifying sample, it can determine that this specifies sample corresponding according to following methods
Neighbours' sample set:
Step B11:Calculate the sample difference that other samples specify sample with this.
Wherein, sample difference is used to represent the gap between sample, can determine meter during specific implementation according to specific sample
The method for calculating sample difference.For example, sample is the distance of a certain distance location reference point, then sample difference, can with two samples away from
Deviation is represented.For another example sample is the color value of pixel, then the sample difference of two pixels can use the aberration of two samples.
So, it is defined in the embodiment of the present application without the circular to sample difference.
Step B12:The sample for choosing default neighbours' sample size according to the order of sample difference from small to large is specified as this
Neighbours' sample of sample.
For example, the sample difference of 5 samples and specified sample is respectively 1,2,3,4,5.If default neighbours' sample size is 3,
It is respectively that 1,2,3 sample specifies neighbours' sample of sample as this then to choose sample difference.
In addition, neighbours' sample can also be determined using the position relationship between sample during specific implementation.For example, in image
If specifying the pixel of sample one, then 4 neighborhoods or 8 neighborhoods of the pixel can be selected as neighbours' sample.
Step B2:In each neighbours' sample set of multiple neighbours' sample sets, for each mark, the mark and its are determined
It marks the frequency of positive example simultaneously as same sample as the co-occurrence frequency of the mark;And determine that this is marked at neighbours' sample
The maximum co-occurrence frequency of this concentration.
Step B3:If the maximum co-occurrence frequency is more than designated value, by other marks that the co-occurrence frequency with the mark is maximum
The particular community of the specified sample corresponding with neighbours' sample set of note is inserted into the particular community set of the mark.
Wherein, in one embodiment, designated value could be arranged to 0.As long as namely in neighbours' sample set, maximum is altogether
Existing frequency does not illustrate there is dependency relation between two marks for 0.
For example, specifying the corresponding neighbours' sample set M ' of sample M.In M ', for mark l2, mark l2With l1Co-occurrence
Frequency is maximum and is not 0, then l particular communitys corresponding with specified sample M is added into l1Particular community set in.
When it is implemented, the accuracy in order to improve co-occurrence frequency calculating, can determine the co-occurrence according to below equation
Frequency:
Wherein, i represents to specify sample;ljRepresent the mark of co-occurrence frequency to be determined;lkRepresent another mark p (i, j, k)
Represent the l in the corresponding neighbours' sample sets of specified sample ijWith lkCo-occurrence frequency.
(2), class pair is carried out respectively for original positive example collection and original minus the example collection to each mark in step 102
Together, it may particularly include following steps C1- steps C2:
Step C1:Determine that each original positive example concentrates sample number maximum, and for each mark, if the mark it is original just
The sample number of example collection is less than the sample number maximum, then the sample that the original positive example to the mark is concentrated carries out resampling and obtained just
Example sample, and by positive example sample be added to the original positive example of the mark concentrate obtain class alignment after positive example collection.
Step C2:Determine that each original minus example concentrates Maximum sample size, and for each mark, if the original minus example of the mark
The sample number of collection is less than the Maximum sample size, then the sample that the original minus example to the mark is concentrated carries out resampling and obtains negative example sample
Originally, and the negative example sample is added to the original minus of mark example concentration and obtains the negative example collection after class alignment.
Wherein, when it is implemented, step C1 and step C2 execution sequence are unrestricted.
By step C1 and step C2, class can be achieved when aliging by resampling.Method is simple, can be applicable each
The sample of type.
When it is implemented, the sample that the original positive example to mark is concentrated, which carries out resampling, obtains positive example sample, can specifically it wrap
Include:The sample for choosing the first specified sample size is concentrated from the original positive example of the mark, and determines that the average for the sample chosen is made
The positive example sample obtained for resampling;
The sample that original minus example to mark is concentrated carries out resampling and obtains negative example sample, may particularly include:
Concentrated from the original minus of mark example and choose second and specify the sample of sample size, and the sample for determining to choose is equal
The negative example sample that value is obtained as resampling.
Wherein, first sample size and second is specified to specify sample size can be the same or different, the application is to this
It is not construed as limiting.The average of sample can be represented using the average of the attribute vector of sample.
For ease of further understanding, the multiple labeling sorting technique provided below by embodiment two the application makees further
Explanation.
Embodiment two
It is as shown in table 1 multiple labeling data set, one has 6 samples in the table, respectively x1,x2,...,x6, label sets
It is combined into { l1,l2}。
The sample of table 1 and its mark having
1st step:Class is alignd:
Pass through statistical form 1, l1Original positive example collection beIts original minus example, which collects, is l2Original positive example collection beIts original minus example, which collects, is
Obviously,(i.e. l1Positive example sample number for 2),ThereforeIn order to realize pair
WithCarry out class alignment, it is only necessary to increase a positive example and arriveIn go it is all right.L may be selected1In 2 positive example x2And x3Next life
Into positive example (x2+x3)/2.Similarly, l in this example2Negative example it is less, it is also desirable to class align, may be selected l22 negative example x3And x4Come
Negative example (the x of generation3+x4)/2.Multiple samples are produced if desired, then using the method for circulating selection in turn.
Using the above method, the result carried out after class alignment is as shown in table 2.
The positive example collection and negative example collection of each mark after the alignment of the class of table 2
2nd step:Determine particular community
If c=1 (i.e. 1 cluster centre), the data for table 2 perform k-means clustering methods.For mark l1
For, the cluster centre of both positive example collection and negative example collection after obtained class alignment
Similarly, to mark l2For, obtain cluster centre
According to formula (2) and (3), the original positive example for calculating each mark concentrates the distance of each sample and corresponding cluster centre,
Obtain marking l1And l2Respective particular community set is as shown in table 3.
The particular community set that table 3 is respectively marked
3rd step:Particular community is inserted
Assuming that specifying sample to be x2, and k is set to 6 (element is 6 i.e. in neighbours' sample set), then i.e. x2
Neighbours' sample set { x1, x2, x3, x4, x5, x6}.SoSince so,Calculate after co-occurrence frequency, it is known that R21=2 (i.e. with sample l1Co-occurrence frequency is most
Big sample is l2), R22=1.So, for specifying sample x2For, l1And l2There is local relation in this, accordingly to mark
Particular community needs to be inserted into the particular community set of other side, and its result is as shown in table 4.
The result of the particular community of table 4 insertion
So, complete and mark of correlation data and semanteme are enriched by the dependency relation realization between mark.Next
Particular community set can be just based on, sample training is carried out.
By test, method of the invention can effectively carry out multiple labeling classification, in following Hamming Loss, One-
General effect is fine in error, Coverage, Ranking Loss, Average Precision indexs.
The embodiment of the present application sets up particular community set for not isolabeling, and binding marker relation is lifted on this basis
The classification capacity of particular community set.In addition, most multiple labeling sorting techniques independently consider class uneven (i.e. class is not lined up), mark
The problems such as note relation, particular community of mark, and sorting technique proposed by the invention considers at this 3 points and caused final minute
Class result will be more accurate.
Embodiment three
Based on identical inventive concept, the embodiment of the present application also provides a kind of multiple labeling sorter, as shown in Fig. 2 being
The structural representation of the device, including:
Positive example bears example collection determining module 201, for for each mark in tag set, determine the mark it is original just
Example collection and original minus example collection;Wherein, for each sample, if the sample has the mark, the sample belongs to the original of the mark
Beginning positive example collection, otherwise, the sample belong to the original minus example collection of the mark;
Class alignment module 202, carries out class alignment for original positive example collection and original minus the example collection to each mark, obtains respectively
The negative example collection after positive example collection and class alignment after the class alignment respectively marked;Wherein, the positive example after the class alignment respectively marked concentrates sample
This quantity it is equal and respectively mark class alignment after negative example concentrate sample size it is equal;
Cluster centre determining module 203, it is true based on clustering method for according to predetermined cluster centre number
The cluster centre of positive example collection after fixed each class alignment, and each cluster centre of the negative example collection after class alignment;
Particular community determining module 204, for for each mark, calculating the original positive example collection and original minus example of the mark
Concentrate each sample relative to the distance of each cluster centre of the mark, will obtain being used as the mark after arranged in sequence
Particular community corresponding with respective sample, and constituted the specific of the mark using the particular community of each sample of the mark as element
Attribute set;
Data-optimized module 205, for for each mark, by the spy of other marks with the mark with dependency relation
Determine attribute to be inserted into the particular community set of the mark;
Classification based training module 206, for the particular community set based on each mark, carries out classification based training.
Wherein, in one embodiment, the data-optimized module, is specifically included:
Neighbours' sample set determining unit, for for specify sample, by this specify sample and this specify sample it is multiple
Neighbours' sample constitutes neighbours' sample set corresponding with the specified sample;
Co-occurrence frequency determining unit, in each neighbours' sample set of multiple neighbours' sample sets, for each mark,
Determine the mark and co-occurrence frequency of other frequencies for marking positive example simultaneously as same sample as the mark;And determine to be somebody's turn to do
It is marked at the maximum co-occurrence frequency in neighbours' sample set;
Optimize unit, if being more than designated value for the maximum co-occurrence frequency, the co-occurrence frequency with the mark is maximum
The particular community of the specified sample corresponding with neighbours' sample set of other marks is inserted into the particular community set of the mark.
Wherein, in one embodiment, the co-occurrence frequency is determined according to below equation:
Wherein, i represents to specify sample;ljRepresent the mark of co-occurrence frequency to be determined;lkRepresent another mark p (i, j, k)
Represent the l in the corresponding neighbours' sample sets of specified sample ijWith lkCo-occurrence frequency.
Wherein, in one embodiment, described device also includes:
Neighbours' sample set determining module, for for specifying sample, determining that this specifies sample corresponding according to following methods
Neighbours' sample set:
Calculate the sample difference that other samples specify sample with this;
The sample for choosing default neighbours' sample size according to the order of sample difference from small to large specifies the neighbour of sample as this
Occupy sample.
Wherein, in one embodiment, the class alignment module, specifically for determining that each original positive example concentrates sample number most
Big value, and for each mark, if the sample number of the original positive example collection of the mark is less than the sample number maximum, to the mark
The sample concentrated of original positive example carry out resampling and obtain positive example sample, and positive example sample is added to the original positive example of the mark
Concentrate and obtain the positive example collection after class alignment;And
Determine that each original minus example concentrates Maximum sample size, and for each mark, if the sample of the original minus example collection of the mark
This number is less than the Maximum sample size, then the sample that the original minus example to the mark is concentrated carries out resampling and obtains negative example sample, and
The negative example sample is added to the original minus of mark example concentration and obtains the negative example collection after class alignment.
Wherein, in one embodiment, class alignment module specifically for:
Concentrated from the original positive example of the mark and choose first and specify the sample of sample size, and the sample for determining to choose is equal
The positive example sample that value is obtained as resampling;
Concentrated from the original minus of mark example and choose second and specify the sample of sample size, and the sample for determining to choose is equal
The negative example sample that value is obtained as resampling.
Wherein, in one embodiment, described device also includes:
Cluster centre number determining module, for determining cluster centre number according to following methods:
Determine class alignment after positive example collection sample number alignd with class after negative example collection sample number minimum value;
The product of default control variable and the minimum value determined is calculated, and to being clustered after product progress floor operation
The number at center, wherein, default control variable is the constant more than 0 and less than 1.
The inventive concept of said apparatus is identical with embodiment of the method, and wherein principle and beneficial effect are implemented referring also to method
Example, therefore not to repeat here.
Example IV
The embodiment of the present application four additionally provides a kind of computing device, and the computing device is specifically as follows desktop computer, just
Take formula computer, smart mobile phone, tablet personal computer, personal digital assistant (Personal Digital Assistant, PDA) etc..
As shown in figure 3, the computing device can include central processing unit (Center Processing Unit, CPU) 301, memory
302nd, input equipment 303, output equipment 304 etc., input equipment can include keyboard, mouse, touch-screen etc., and output equipment can be with
Including display device, such as liquid crystal display (Liquid Crystal Display, LCD), cathode-ray tube (Cathode Ray
Tube, CRT) etc..
Memory can include read-only storage (ROM) and random access memory (RAM), and provide storage to processor
The programmed instruction and data stored in device.In the embodiment of the present application, memory can be used for storing multiple labeling sorting technique
Programmed instruction.
Processor is by calling the programmed instruction of memory storage, and processor is used to perform according to the programmed instruction of acquisition:
For each mark in tag set, the original positive example collection and original minus example collection of the mark are determined;Wherein, for each sample
This, if the sample has the mark, the sample belongs to the original positive example collection of the mark, and otherwise, the sample belongs to the mark
Original minus example collection;
Class alignment is carried out respectively to original positive example collection and original minus the example collection of each mark, after the class alignment for obtaining each mark
Negative example collection after positive example collection and class alignment;Wherein, the positive example after the class alignment respectively marked concentrates sample size equal and each mark
Class alignment after negative example concentrate sample size it is equal;
According to predetermined cluster centre number, the positive example collection after each class alignment is determined based on clustering method
The cluster centre of negative example collection after cluster centre, and each class alignment;
For each mark, the original positive example collection and original minus example for calculating the mark concentrate each sample relative to the mark
Each cluster centre distance, will obtain being used as mark specified genus corresponding with respective sample after arranged in sequence
Property, and constituted by element of the particular community of each sample of the mark particular community set of the mark;
For each mark, the particular community of other marks with the mark with dependency relation is inserted into the mark
In particular community set;
Particular community set based on each mark, carries out classification based training.
Embodiment five
The embodiment of the present application five provides a kind of computer-readable storage medium, by saving as based on used in above-mentioned computing device
Calculation machine programmed instruction, it includes the program for being used for performing above-mentioned multiple labeling sorting technique.
The computer-readable storage medium can be any usable medium or data storage device that computer can be accessed, bag
Include but be not limited to magnetic storage (such as floppy disk, hard disk, tape, magneto-optic disk (MO)), optical memory (such as CD, DVD,
BD, HVD etc.) and semiconductor memory it is (such as ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid
State hard disk (SSD)) etc..
Finally it should be noted that:Above example is only to the technical scheme for illustrating the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used
To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic;
And these modification or replace, do not make appropriate technical solution essence depart from each embodiment technical scheme of the application spirit and
Scope.
Claims (10)
1. a kind of multiple labeling sorting technique, it is characterised in that methods described includes:
For each mark in tag set, the original positive example collection and original minus example collection of the mark are determined;Wherein, for each
Sample, if the sample has the mark, the sample belongs to the original positive example collection of the mark, and otherwise, the sample belongs to the mark
Original minus example collection;
Class alignment is carried out respectively to original positive example collection and original minus the example collection of each mark, the positive example after the class alignment of each mark is obtained
Negative example collection after collection and class alignment;Wherein, the positive example after the class alignment respectively marked concentrates the class that sample size is equal and respectively marks
Negative example after alignment concentrates sample size equal;
According to predetermined cluster centre number, the cluster of the positive example collection after each class alignment is determined based on clustering method
The cluster centre of negative example collection behind center, and each class alignment;
For each mark, the original positive example collection and original minus example for calculating the mark concentrate each sample each relative to the mark
The distance of cluster centre, will be obtained as mark particular community corresponding with respective sample after arranged in sequence, and
The particular community set of the mark is constituted by element of the particular community of each sample of the mark;
For each mark, the particular community of other marks with the mark with dependency relation is inserted into the specific of the mark
In attribute set;
Particular community set based on each mark, carries out classification based training.
2. according to the method described in claim 1, it is characterised in that described for each mark, will have to the mark related
The particular community of other marks of relation is inserted into the particular community set of the mark, is specifically included:
For specifying sample, multiple neighbours' samples of sample and the specified sample are specified to constitute one and the specified sample by this
Corresponding neighbours' sample set;
In each neighbours' sample set of multiple neighbours' sample sets, for each mark, determine the mark with other marks simultaneously
As same sample positive example frequency as the mark co-occurrence frequency;And determine that this is marked in neighbours' sample set most
Big co-occurrence frequency;
If the maximum co-occurrence frequency is more than designated value, by the maximum other marks of the co-occurrence frequency with the mark and the neighbours
The particular community of the corresponding specified sample of sample set is inserted into the particular community set of the mark.
3. method according to claim 2, it is characterised in that the co-occurrence frequency is determined according to below equation:
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mover>
<msub>
<mi>l</mi>
<mi>j</mi>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mover>
<msub>
<mi>l</mi>
<mi>k</mi>
</msub>
<mo>&RightArrow;</mo>
</mover>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<mover>
<msub>
<mi>l</mi>
<mi>k</mi>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>|</mo>
<msub>
<mo>|</mo>
<mn>1</mn>
</msub>
</mrow>
</mfrac>
</mrow>
Wherein, i represents to specify sample;ljRepresent the mark of co-occurrence frequency to be determined;lkRepresent that another mark p (i, j, k) is represented
The l in the corresponding neighbours' sample sets of specified sample ijWith lkCo-occurrence frequency.
4. method according to claim 2, it is characterised in that methods described also includes:
For specifying sample, determine that this specifies the corresponding neighbours' sample set of sample according to following methods:
Calculate the sample difference that other samples specify sample with this;
The sample for choosing default neighbours' sample size according to the order of sample difference from small to large specifies neighbours' sample of sample as this
This.
5. according to the method described in claim 1, it is characterised in that original positive example collection and the original minus example to each mark collects
Class alignment is carried out respectively, is specifically included:
Determine that each original positive example concentrates sample number maximum, and for each mark, if the sample of the original positive example collection of the mark
Number is less than the sample number maximum, then the sample that the original positive example to the mark is concentrated carries out resampling and obtains positive example sample, and
Positive example sample is added to the original positive example of mark concentration and obtains the positive example collection after class alignment;And,
Determine that each original minus example concentrates Maximum sample size, and for each mark, if the sample number of the original minus example collection of the mark
Less than the Maximum sample size, then the sample that the original minus example to the mark is concentrated carries out resampling and obtains negative example sample, and should
Negative example sample is added to the original minus of mark example and concentrates the negative example collection obtained after class alignment.
6. method according to claim 5, it is characterised in that the sample that the original positive example to mark is concentrated carries out resampling
Positive example sample is obtained, is specifically included:
The sample for choosing the first specified sample size is concentrated from the original positive example of the mark, and determines that the average for the sample chosen is made
The positive example sample obtained for resampling;
The sample that original minus example to mark is concentrated carries out resampling and obtains negative example sample, specifically includes:
The sample for choosing the second specified sample size is concentrated from the original minus example of the mark, and determines that the average for the sample chosen is made
The negative example sample obtained for resampling.
7. according to any described method in claim 1-6, it is characterised in that methods described also includes:
Cluster centre number is determined according to following methods:
Determine class alignment after positive example collection sample number alignd with class after negative example collection sample number minimum value;
The product of default control variable and the minimum value determined is calculated, and to obtaining cluster centre after product progress floor operation
Number, wherein, default control variable for more than 0 and less than 1 constant.
8. a kind of multiple labeling sorter, it is characterised in that the device includes:
Positive example bears example collection determining module, for for each mark in tag set, determine the mark original positive example collection and
Original minus example collection;Wherein, for each sample, if the sample has the mark, the sample belongs to the original positive example of the mark
Collection, otherwise, the sample belong to the original minus example collection of the mark;
Class alignment module, carries out class alignment for original positive example collection and original minus the example collection to each mark, obtains each mark respectively
Class alignment after positive example collection and class alignment after negative example collection;Wherein, the positive example after the class alignment respectively marked concentrates sample size
Negative example after class that is equal and respectively marking is alignd concentrates sample size equal;
Cluster centre determining module, for according to predetermined cluster centre number, being determined based on clustering method each
The cluster centre of negative example collection after the cluster centre of positive example collection after class alignment, and each class alignment;
Particular community determining module, for for each mark, the original positive example collection and original minus example for calculating the mark to concentrate every
Individual sample relative to the distance of each cluster centre of the mark, will obtain after arranged in sequence as the mark with it is corresponding
The corresponding particular community of sample, and constituted by element of the particular community of each sample of the mark particular community collection of the mark
Close;
Data-optimized module, for for each mark, by the particular community of other marks with the mark with dependency relation
It is inserted into the particular community set of the mark;
Classification based training module, for the particular community set based on each mark, carries out classification based training.
9. a kind of computing device, it is characterised in that including memory and processor, wherein, the memory is used for storage program
Instruction, the processor is used to call the programmed instruction stored in the memory, is performed according to the programmed instruction of acquisition as weighed
Profit requires 1~7 any described multiple labeling sorting technique.
10. a kind of computer-readable storage medium, it is characterised in that the computer-readable storage medium is stored with, and computer is executable to be referred to
Order, the computer executable instructions are used to make the computer perform the multiple labeling classification as described in claim 1~7 is any
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710493622.6A CN107316063A (en) | 2017-06-26 | 2017-06-26 | Multiple labeling sorting technique, device, medium and computing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710493622.6A CN107316063A (en) | 2017-06-26 | 2017-06-26 | Multiple labeling sorting technique, device, medium and computing device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107316063A true CN107316063A (en) | 2017-11-03 |
Family
ID=60181259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710493622.6A Pending CN107316063A (en) | 2017-06-26 | 2017-06-26 | Multiple labeling sorting technique, device, medium and computing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107316063A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537270A (en) * | 2018-04-04 | 2018-09-14 | 厦门理工学院 | Image labeling method, terminal device and storage medium based on multi-tag study |
CN109711433A (en) * | 2018-11-30 | 2019-05-03 | 东南大学 | A kind of fine grit classification method based on meta learning |
CN110941612A (en) * | 2019-11-19 | 2020-03-31 | 上海交通大学 | Autonomous data lake construction system and method based on associated data |
CN111460187A (en) * | 2020-04-01 | 2020-07-28 | 重庆金山医疗技术研究院有限公司 | Picture labeling method, device and equipment and readable storage medium |
-
2017
- 2017-06-26 CN CN201710493622.6A patent/CN107316063A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537270A (en) * | 2018-04-04 | 2018-09-14 | 厦门理工学院 | Image labeling method, terminal device and storage medium based on multi-tag study |
CN109711433A (en) * | 2018-11-30 | 2019-05-03 | 东南大学 | A kind of fine grit classification method based on meta learning |
CN110941612A (en) * | 2019-11-19 | 2020-03-31 | 上海交通大学 | Autonomous data lake construction system and method based on associated data |
CN110941612B (en) * | 2019-11-19 | 2020-08-11 | 上海交通大学 | Autonomous data lake construction system and method based on associated data |
CN111460187A (en) * | 2020-04-01 | 2020-07-28 | 重庆金山医疗技术研究院有限公司 | Picture labeling method, device and equipment and readable storage medium |
CN111460187B (en) * | 2020-04-01 | 2023-06-30 | 重庆金山医疗技术研究院有限公司 | Picture labeling method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Deep learning approach to peripheral leukocyte recognition | |
CN107316063A (en) | Multiple labeling sorting technique, device, medium and computing device | |
Wahab et al. | Multifaceted fused-CNN based scoring of breast cancer whole-slide histopathology images | |
CN110457577B (en) | Data processing method, device, equipment and computer storage medium | |
CN109102498B (en) | Method for segmenting cluster type cell nucleus in cervical smear image | |
CN111144215A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN110263821A (en) | Transaction feature generates the generation method and device of the training of model, transaction feature | |
Luo et al. | OXnet: deep omni-supervised thoracic disease detection from chest X-rays | |
Wen et al. | Wheat spike detection and counting in the field based on SpikeRetinaNet | |
CN108647595A (en) | Vehicle recognition methods again based on more attribute depth characteristics | |
CN106250909A (en) | A kind of based on the image classification method improving visual word bag model | |
Wang et al. | A generalizable and robust deep learning algorithm for mitosis detection in multicenter breast histopathological images | |
CN107545038A (en) | A kind of file classification method and equipment | |
Rozenberg et al. | Localization with limited annotation for chest x-rays | |
Li et al. | Domain adaptive nuclei instance segmentation and classification via category-aware feature alignment and pseudo-labelling | |
CN103971136A (en) | Large-scale data-oriented parallel structured support vector machine classification method | |
CN108664986A (en) | Based on lpThe multi-task learning image classification method and system of norm regularization | |
Liu et al. | Object proposal on RGB-D images via elastic edge boxes | |
CN109447080A (en) | A kind of character identifying method and device | |
Kruitbosch et al. | A convolutional neural network for segmentation of yeast cells without manual training annotations | |
CN113344079B (en) | Image tag semi-automatic labeling method, system, terminal and medium | |
Tsao et al. | Autovp: An automated visual prompting framework and benchmark | |
Shen et al. | Automatic cell segmentation using mini-u-net on fluorescence in situ hybridization images | |
CN117315090A (en) | Cross-modal style learning-based image generation method and device | |
CN107886513A (en) | A kind of device for determining training sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171103 |