CN107644235A

CN107644235A - Image automatic annotation method based on semi-supervised learning

Info

Publication number: CN107644235A
Application number: CN201711002595.4A
Authority: CN
Inventors: 李志欣; 林兰; 张灿龙
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2017-10-24
Filing date: 2017-10-24
Publication date: 2018-01-30

Abstract

The present invention discloses a kind of image automatic annotation method based on semi-supervised learning, and data set is divided into training dataset, Unlabeled data collection and test set first.Then extract the SIFT feature of training sample and HOG features train LDA_SVM graders；Color and textural characteristics are extracted to train neutral net.Unlabeled data is utilized afterwards, allow two graders that prediction is marked to same unmarked sample simultaneously, and according to contribution of the grader to unmarked sample classification precision, fusion is weighted to the classification results of two graders with adaptive weighted convergence strategy, obtains the final predictive marker probability vector of sample.Finally two graders are updated with the high sample of confidence level and its predictive marker, until reaching default maximum iteration.The present invention can make full use of unmarked sample to excavate the inherent law of characteristics of image, efficiently reduce mark sample size required during classifier training, and obtain preferably mark effect.

Description

Image automatic annotation method based on semi-supervised learning

Technical field

The present invention relates to image retrieval technologies field, and in particular to a kind of automatic image annotation side based on semi-supervised learning Method.

Background technology

With the popularization of network and digital equipment, various media image data are skyrocketed through, and how they are carried out effective Organization and management, to user it is efficient browse and retrieve as researcher it is widely studied the problem of.

Image retrieval just turns into very active research field since the last century 70's, applies at present than wide Image retrieval technologies have text based image retrieval technologies (Text-based Image Retrieval, TBIR) and be based on The image retrieval technologies (Content-basedImage Retrieval, CBIR) of content.Due to TBIR technologies, there is obvious The defects of, particularly when the quantity of image is very more, the workload marked by hand needed for image is very big, and mark by hand Subjectivity and inexactness likely result in image in retrieving mismatch；And there is prominent low-level feature for CBIR technologies " semantic gap " problem between high-level semantic, thus both approaches are all difficult to apply to current large-scale view data Library management.

Automatic image annotation is exactly to allow computer automatically to learn to have marked image, semantic and concept space and visual signature sky Between between potential relation can reflect the semantic key words of its content to not marking image and add.Automatic image annotation can be with The predicament of current image retrieval is effectively improved, retrieval is reduced manual mark while basic text key word search is retained The huge workload of note, also reduces " semantic gap ", therefore, automatic image annotation technology is all the time standby to a certain extent Paid close attention to by the research of people.

Although researcher has been achieved for very big progress in terms of automatic image annotation, traditional image is marked automatically Injecting method usually requires to be trained grader using substantial amounts of training sample, and in actual applications, marker samples obtain Must be relatively difficult, unmarked sample is but readily available, and how to make full use of the connection between marked sample and unmarked sample Series structure marking model, the accuracy rate and performance of grader are improved, the problem of being a great challenge.

The content of the invention

The present invention still needs a large amount of training samples marked by hand for traditional images automatic marking, in marker samples number In the case of less, a kind of the problem of automatic marking effect is undesirable, there is provided automatic image annotation based on semi-supervised learning Method, it can make full use of unmarked sample to excavate the inherent law of characteristics of image, efficiently reduce classifier training when institute The mark sample size needed, and obtain preferably mark effect.

The present invention principle be：In the case where training sample data are less, to make full use of unmarked sample excavation figure As the inherent law of feature, so as to obtain preferable automatic image annotation effect, the present invention proposes that one kind is based on semi-supervised learning Image automatic annotation method：First, data set is divided into training dataset, Unlabeled data collection and test set.Then, carry Take training sample SIFT feature and HOG features as feature set A, for training LDA_SVM graders；Extract color and texture Feature is as feature set B, for training neutral net.Because now training data is less, obtained classifier performance is weaker, because This, by two grader coorinated trainings, can lift the classification performance of grader using substantial amounts of Unlabeled data.Afterwards, Unlabeled data is recycled, allows two graders that prediction is marked to same unmarked sample simultaneously, and according to grader pair The classification results of two graders are weighted and melted by the contribution of unmarked sample classification precision with adaptive weighted convergence strategy Close, obtain the final predictive marker probability vector of sample.Finally two are classified with the high sample of confidence level and its predictive marker Device is updated, and until reaching default maximum iteration, exits algorithm.

Image automatic annotation method based on semi-supervised learning, including step are as follows：

Step 1, given data set is divided into 3 Sub Data Sets, i.e. training dataset, Unlabeled data collection and test Data set；

Step 2, LDA_SVM classifier training stages；

The SIFT feature and HOG features for the training image that step 2.1, extraction training data are concentrated as fisrt feature collection, Its visual signature is quantified using bag of words method, the bag of words for obtaining every width training image represent；

Step 2.2, the visual signature using LDA modeling training images, obtain each vision word theme of training image Distribution and the visual theme of every width training image are distributed；

Step 2.3, the visual theme distribution with step 2.2 gained and their original mark construction SVM multi classifiers, The LDA_SVM graders currently trained；

Step 3, neural network classifier training stage；

The color characteristic and textural characteristics for the training image that step 3.1, extraction training data are concentrated are as second feature collection；

Step 3.2, neutral net is input to together with second feature collection and corresponding label information is trained, worked as Before the neural network classifier that trains；

Step 4, coorinated training stage；

The SIFT feature and HOG features for the unmarked image that step 4.1, extraction Unlabeled data are concentrated, and use bag of words Method quantifies its visual signature, and the bag of words for obtaining every unmarked image represent；

Step 4.2, the visual theme point with the unmarked image of vision word theme distribution study obtained by step 2.2 Cloth；

Step 4.3, the LDA_SVM graders for currently training the image vision theme distribution learnt input, are obtained First mark prediction probability vector of unmarked image；

Step 4.4, with the neural network classifier currently trained the unmarked image that Unlabeled data is concentrated is carried out Mark prediction, obtain the second mark prediction probability vector of unmarked image；

Step 4.5, the first mark prediction probability according to the given unmarked image of adaptive weighted convergence strategy fusion Vector sum second marks prediction probability vector, draws the final mark prediction probability vector of unmarked image；

Step 4.6, selection confidence level high unmarked image and its predictive marker are added to training dataset, give again LDA_SVM graders and neural network classifier are trained renewal, i.e. return to step 2, until reaching default greatest iteration Number, the LDA_SVM graders and neural network classifier finally trained；

The mark stage of step 5, test image；

Step 5.1, the fisrt feature collection and second feature collection for extracting the test image that test data is concentrated respectively；

Step 5.2, with the LDA_SVM graders finally trained the fisrt feature collection of test image is marked it is pre- Survey, obtain the first mark prediction probability vector of test image；

Step 5.3, with the neural network classifier finally trained the second feature collection of test image is marked it is pre- Survey, obtain the second mark prediction probability vector of test image；

Step 5.4, according to the first mark prediction probability of given adaptive weighted convergence strategy fusion test image to Amount and the second mark prediction probability vector, draw the final mark prediction probability vector of test image；

Step 5.5, tag set of the n mark of confidence level highest as test image is chosen, wherein n is set to be artificial Value.

Although the quantity of 3 sub- data images can be set as needed in above-mentioned 3 Sub Data Sets, 3 The quantity of respective image is preferably in individual Sub Data Set：Unlabeled data collection>Test data set>Training dataset.

In above-mentioned steps 4.5 and step 5.4, adaptive weighted convergence strategy is according to LDA_SVM graders and neutral net Contribution of the grader to same Unlabeled data precision of prediction and determine.

Compared with prior art, the present invention has following features：

(1) in the two different characteristic collection A and feature set B of feature extraction phases, respectively extraction image, wherein feature set A For SIFT feature and HOG features, feature set B is color and textural characteristics, and it is in order to from different angles to extract different feature sets Degree description image.

(2) image is converted to the theme vector of K dimension and represented by LDA from feature set A expression, and this theme vector Also imply the semantic information of image.The effect of effective dimensionality reduction can be played to the vector of higher-dimension and can preferably expression figure Picture.

(3) coorinated training is carried out with two different graders of LDA_SVM graders and neutral net, from different angles Training image, the prediction result of two graders is finally merged, obtains preferably marking effect.

(4) the semi-supervised learning method of coorinated training is used, makes full use of unmarked sample to excavate the inherence of characteristics of image Rule, greatly reduce the accuracy that the workload manually marked improves mark again.

Brief description of the drawings

Automatic image annotation general frames of the Fig. 1 based on semi-supervised learning.

Fig. 2 LDA_SVM classifier trainings and dimensioning algorithm flow chart.

Fig. 3 LDA graph models.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with instantiation, and with reference to attached Figure, the present invention is described in more detail.

A kind of general frame of the image automatic annotation method based on semi-supervised learning as shown in figure 1, its specifically include as Lower step：

Step (1) divides data set, and data are divided into three Sub Data Sets, respectively training dataset, unmarked number According to collection and test data set.The ratio of three Sub Data Sets can artificially be set, and setting principle is Unlabeled data collection>Test data Collection>Training dataset.

The training process of step (2) training image is divided into several stages, is LDA_SVM classifier training stages, god respectively Through network training stage and coorinated training stage.

Step (2.1) LDA_SVM classifier training stages.

Step (2.1.1) extract respectively training sample image SIFT feature and HOG features as feature set A, using " word Bag " method quantifies its visual signature, and " bag of words " for obtaining each image represent.

Step (2.1.2) obtains each vision word master of training image using the visual signature of LDA modeling training images Topic distributionθ is distributed with the visual theme of each image_d.During using LDA models, theme number is arranged to 60, hyper parameter initial value For α=0.1, β=0.01.

The obtained visual theme distribution θ of step (2.1.3)_dSVM multicategory classification models are constructed with their original mark.

Step (2.2) the neural metwork training stage.When training neutral net, learning rate η=0.01, hidden neuron number Mesh value is 9.

Step (2.2.1) extract respectively training sample image color characteristic and textural characteristics as feature set B.

Step (2.2.2) is input to neutral net with feature set B and corresponding label information and is trained together.

Step (2.3) the coorinated training stage.

Step (2.3.1) performs step (2.1.1), obtains " bag of words " of image for the image in unmarked sample set Represent.With the vision word theme distribution obtained in step (2.1.2)Learn the visual theme distribution θ of unmarked sample_d。

The image vision theme distribution θ that step (2.3.2) will learn_dThe SVM multi classifiers trained are inputted, are obtained The mark prediction probability vector C of unmarked sample_L。

Prediction is marked to the image in unmarked sample set with the neutral net trained in step (2.3.3), obtains Mark prediction probability vector C_N。

Step (2.3.4) according to contribution of the two graders to same Unlabeled data precision of prediction, by one from Adapt to Weighted Fusion strategy fusion C_LAnd C_NDraw last mark prediction probability vector, select the high predictive marker of confidence level and Its sample gives two classifier trainings again, until reaching default maximum iteration, exits algorithm.

The mark stage of step (3) image to be marked (test image).

Step (3.1) extracts the feature set A and feature set B of test sample image respectively.

Prediction is marked to the feature set A of test image with the LDA_SVM graders trained in step (3.2), obtains Mark prediction probability vector C_L。

Prediction is marked to test image feature set B with the neutral net trained in step (3.3), obtains mark prediction Probability vector C_N。

Step (3.4) merges C by adaptive weighted convergence strategy_LAnd C_NDraw last mark prediction probability vector, choosing Tag set of the n mark of confidence level highest as test sample is taken, wherein n is the value artificially set.Parameter n is set to 5, i.e., Tag set using 5 marks of confidence level highest as the test image.

It is divided into three phases for the training process of training sample image：LDA_SVM classifier training stages, neutral net Training stage and coorinated training stage.(1) the LDA_SVM stages are trained, SIFT feature and the HOG for first extracting training image respectively are special Sign is used as feature set A, then quantifies visual signature using " bag of words " method, then with the vision of LDA model modeling training images Feature, obtain the theme distribution of each vision wordθ is distributed with the visual theme of every width training image_d, by this visual theme The intermediate representation vector as each image is distributed, is distributed θ with visual theme_dAnd to construct SVM together more for their label informations Class grader.(2) training stage of neutral net, first the color of extraction training image and textural characteristics are as feature set B, then use Feature set B constructing neural networks.(3) the coorinated training stage, the feature set A of unmarked image is equally extracted, with " bag of words " method amount Change visual signature, the vision word theme distribution obtained using the LDA_SVM training stagesVision word as unmarked image Theme distribution, it is distributed using the visual signature and vision word of unmarked imageLearn every unmarked image subject distribution θ_d, Using theme distribution θ as each image intermediate vector, the intermediate vector is classified using the SVM multi classifiers that training obtains, obtained To the mark prediction probability vector C of unmarked sample_L.The feature set B of unmarked image is extracted, with the neutral net pair trained Prediction is marked in unmarked image, obtains marking prediction probability vector C_N.According to the two graders to same unmarked number It is predicted that the contribution of precision, C is merged by an adaptive weighted convergence strategy_LAnd C_NDraw last mark prediction probability to Amount, select the high predictive marker of confidence level and its sample to give two classifier trainings again, changed until reaching default maximum Generation number, exits algorithm.

Four-stage is divided into the annotation process of test image：(1) the feature set A and spy of test sample image are extracted respectively Collect B.(2) prediction is marked to the feature set A of test image with the LDA_SVM graders trained, obtains mark prediction Probability vector C_L.(3) prediction is marked to test image feature set B with the neutral net trained, it is general obtains mark prediction Rate vector C_N.(4) C is merged by adaptive weighted convergence strategy_LAnd C_NLast mark prediction probability vector is drawn, selection is put Several tag sets of mark as test sample of reliability highest.

The method that the feature set A of present invention extracting method first uses each image in data set dense piece of sampling Be divided into regular square, square is 16 × 16, and entire image is traveled through according to the step-length of 10 pixels, using window overlay area as One characteristic area extracts the SIFT feature and HOG features of image respectively.Then image is represented using " bag of words " method, step is such as Under：

Step 1) constructs visual dictionary.The parts of images of every class training data is taken at random, using k-means algorithms to image SIFT feature and HOG features cluster respectively, it is assumed that SIFT feature obtains N by cluster_SIndividual vision word, HOG features are passed through It is N to cluster obtained vision word number_H, then the size of final visual dictionary is that both vision word sums are N_S+N_H。

Step 2) visual signature quantifies.The visual signature of each image is mapped on visual dictionary, and to each image Vision word carry out statistics with histogram, then piece image can use (1) formula shown in N_S+N_HVision histogram is tieed up to represent：

v(d_i)={ n (d_i,v₁),n(d_i,v₂)…n(d_i,v_NS),n(d_i,v_NS+1),n(d_i,v_NS+NH)}

The feature set B of present invention extracting method is that each image in data set is divided into regular square first, side Block size is 16 × 16, and the characteristic vector of one 18 dimension is then extracted for each square, is tieed up comprising 9 color characteristics tieed up and 9 Textural characteristics, color characteristic are described with color histogram, by the hsv color space quantization of image into 9 bin, and are led to Cross and calculate the color histogram that the pixel quantity that color falls in each bin then can obtain each image；Textural characteristics are to use 3 The Gabor filter group of individual yardstick calculates on 3 directions (being respectively 0 °, 60 °, 120 °).

Semi-supervised learning refers to the knowledge in the case where there is a small amount of marker samples, allowing grader to be obtained from training sample Based on, the performance of grader is automatically lifted using unmarked sample.Coorinated training (Co-Training) is a kind of half prison Educational inspector's learning method, this method are needed to utilize two or more graders, and independent training is carried out on different data characteristics collection, And the precision of grader is improved by combining the categorised decision of all graders, unlabelled data are classified device and progressively predicted And give and mark, the data for then selecting confidence level higher are added to training set, continuous iteration, until unlabelled data are whole Untill labeled.

The present invention uses two independent feature sets, constructs two different grader LDA_SVM graders and nerve net Network, by the coorinated training of two graders, the performance of automatic image annotation is lifted using a large amount of Unlabeled datas.

LDA_SVM classifier trainings and dimensioning algorithm flow chart are as shown in Figure 2.

For the feature set A of extraction, its visual signature is quantified using " bag of words " method, obtains " bag of words " table of each image Show.Then all training samples are modeled using LDA, the feature using obtained image vision theme distribution θ as each image, used To train SVM multi classifiers.

LDA (Latent dirichlet allocation) is a kind of topic model, and text and image can be built Mould.When being modeled to image, image can be regarded as document, visual word be regarded as the word in document, then pass through LDA Model the intermediate representation vector for excavate the potential theme distribution of image, obtaining image so that the intrinsic dimensionality of image drops significantly It is low, and can preferably represent image.

Assuming that D={ d₁,d₂,...d_MRepresent an image data set, w={ w₁₁,w₂₂,...w_mnIt is in m width images N-th of vision word, the model assumption each image tie up implicit theme variable Z={ z by K₁,z₂,…z_kMixing generation, and it is every Individual theme z_kIt is the probability distribution on visual dictionary generated by parameter θ.Parameter θ and parameterIt is α, β to obey parameter respectively Di Li Crays distribution, θ represent image subject distribution mixed proportion,Represent in given theme z_kUnder the conditions of vision word Distribution.W then represents the vision word of image.The model determines that LDA graph models are as shown in figure 3, wherein by this 6 major parameters In addition to w is observable variable, remaining is unobservable hidden variable.From the foregoing, LDA committed step is exactly to require Go out optimal hyper parameter α and β, the optimal solution of the two parameters is tried to achieve according to observable variable w by variation EM algorithms.

SVMs (SVM) is because it can efficiently solve high dimensional data problem, and in the less situation of training sample Under also can obtain preferable effect and be widely used, its core concept is by finding optimal classification in feature space Hyperplane separates different data samples.Automatic image annotation can be regarded as a multicategory classification problem, and traditional SVM It is two-value grader, in order that SVM can solve more classification problems, the most frequently used strategy has " one vs.all " referred to as " OVA " plans Slightly (with given class compared with other all classes) and " one vs one " (by the way of comparing in pairs), the present invention adopts More classification are realized with " OVA " strategy, when training grader for each semantic concept, belong to the training sample of certain semantic concept Originally it is considered as positive sample, and other all samples are regarded as negative sample.So, it is assumed that data are concentrated with n classes image then N SVM classifier can be produced.Test phase, each grader produce a prediction probability to each unmarked sample, and prediction is general The maximum classification of rate is considered as the most probable classification of unmarked sample.

LDA_SVM training algorithm processes are as follows：

(1) for training image collection, each image is divided into 16 × 16 rule side using the method for intensive piece of sampling Block, sampling interval are 10 pixels.

(2) SIFT the and HOG features of each square are extracted respectively, its visual signature is quantified using " bag of words " method, are obtained every " bag of words " of width image represent.

(3) using the visual signature of LDA modeling training images, the theme distribution of each vision word of training image is obtainedθ is distributed with the visual theme of each image_d。

(4) with obtained visual theme distribution θ_dSVM multicategory classification models are constructed with their original mark.

LDA_SVM dimensioning algorithms comprise the following steps that：

(1) for every width new images d_new, perform training algorithm the step of (1) and (2).

(2) the vision word theme distribution obtained according to training algorithmTo learn the visual theme of new images distribution θ_new。

(3) visual theme of the new images learnt is distributed θ_newThe SVM multi classifiers trained are inputted, are obtained new The mark prediction probability vector of image.

For the feature set B of extraction, the present invention is handled using neutral net.Artificial neural network (ANN) referred to as god Through network (NN), there is powerful ability in solving the problems, such as multicategory classification.The present invention uses the multilayer with Three Tiered Network Architecture Feedforward neural network carries out the training and prediction of sample, and first layer receives the input signal from sample, possessed and sample characteristics Dimension identical neuron number；Intermediate layer is hidden layer, how to select optimal hidden layer neuron number so far also It is a problem, the number of hidden layer is determined generally according to experience；Last layer is output layer, comprising identical with sample class number Neuron number.Neuron between different layers is connected by the side of Weight, and generally use sigmoid functions are as activation Function, produces the output of interlayer neuron, and the training process of neutral net is exactly that different neurons are adjusted according to training sample Between " connection weight " and threshold value.

Assuming that there are data set D={ (x₁,y₁),(x₂,y₂)…(x_i,y_i),I.e. each sample is by n Dimensional feature vector describes, and the real-valued vectors tieed up for m is exported, for each input sample (x_i,y_i), corresponding network output isI.e.

WhereinThe input received for k-th of neuron of output layer, w_lkFor sample x_i Implicit connection weight between unit l and output layer unit k, v_lFor the output of l-th of neuron of hidden layer,For output layer The threshold value of k neuron, then neutral net is in sample x_iThe error that the reality output of upper k-th of class exports with target is

As k ∈ y_i, y_ikValue be 1, otherwise its value be -1.

Strategy is declined according to gradient, gives learning rate η, the right value update formula of each hidden layer to output layer is：

wlk←w_lk+Δw_lk (3)

WhereinConvolution (1) and (2) can be released

The then threshold value of output layer neuron

Similar, the weights that can release input layer and hidden layer are with threshold value more new formula

The training process of neutral net is broadly divided into two stages：Propagated forward (calculation error) (is repaiied with error Back-Propagation Change weights), detailed process is as follows：

(1) build and sample characteristics dimension n input block of identical, l hidden unit and m output unit first.

(2) random initializtion all-network weights, scope is in (0,1).

(3) network propagated forward process, sample is inputted network, and in calculating network each unit k output Calculated by formula (1), wherein α_kTotal input value that k-th of neuron of output layer receives is represented,It is expressed as output layer k-th The threshold value of neuron, function f are s type activation primitives.Then sample x_iNetwork error by formula (2) calculate, wherein,For sample Reality output, y_ikThe target output of sample.

(4) each output unit m in the error Back-Propagation stage for network, calculates its error term

To each hidden unit l of network, its error term is calculated

The weights of final updating network, stop changing when neutral net reaches default iterations or training precision Generation.

Due to the nicety of grading of each grader in the case where markd training sample is fewer, training to obtain compared with It is weak, according to traditional co-training methods, some Weak Classifier is only thought that the high sample of confidence level and its mark are handed over Training is updated to other side, it is easy to larger error occurs, the present invention is by considering two graders to same instruction Practice the influence of the mark confidence level of data, using marking probability pre- direction finding of the method for adaptive weighted fusion to two graders Amount is weighted fusion, and blending weight is determined by contribution of each of which to image classification accuracy.

Adaptive weighted fusion (Adaptive weighted fusion, AWF) formula is as follows：

WhereinIt is vectorial for final mark prediction probability,Respectively LDA_SVM graders and neutral net pair The mark prediction probability vector of same sample, * are inner product operation symbols,It is the fusion weight vectors of LDA_SVM graders, its is big It is small to be determined by contribution of the LDA_SVM graders to image classification accuracy.It is calculated by likelihood method for normalizing, calculating process It is as follows：

(1) two likelihood matrix Ls are constructed respectively first_l, L_g, the output of LDA_SVM graders and neutral net is represented respectively Likelihood, matrix size N*M, N are total sample number to be marked, and M is prediction classification number.

(2) LDA_SVM and the weight vector of neutral net are calculated respectively by following formulaWith

Wherein w_l,mAnd w_g,m, m=1,2,3 ..., M, respectively two graders on classification m normalized output seemingly So, can be calculated by following formula

Wherein, L_l(n, c), L_g(n, c) represents that two graders are predicted to be classification c probability on the n-th width image, point Mother is classification m average likelihood, and molecule is the total average likelihood of M classes, is obtainedWithAfterwards, final weight vectorsUnder The formula in face calculates：

Coorinated training (Co-Training) algorithm assumes that data set has two different " views ".That is training is worked as When data are enough, each character subset can train strong classifier, and in given mark, each character subset condition Independently of another character subset.Therefore, view data is divided into two independent character subsets by the present invention, then constructs two Individual different grader LDA_SVM graders and neutral net, then the coorinated training by two graders, using not marking largely Evidence count to lift the performance of automatic image annotation.Assuming that a data are concentrated and D=m+n picture number are included in addition to test set According to for wherein m to there is flag data number, n is data untagged.(x, Y) represents markd training sample, wherein x=(x_A, x_B) represent sample characteristic vector, x_AFor sample characteristics collection A characteristic vector, x_BFor sample characteristics collection B characteristic vector, Y tables Show the tag set of the sample,L be all images tag set, L=(l₁,l₂,...,l_I), I is the class of data set Shuo not；Represent that piece image is noted as classification i probability with C={ ci | i=1,2 ..., I }, (x) represents unmarked sample, Then the training process of two grader coorinated trainings is as shown in table 1：

Traditional images automatic marking still needs a large amount of training samples marked by hand, in the less feelings of marker samples data Under condition, the grader for training to obtain is weaker, and according to traditional co-training methods, only some Weak Classifier is thought The high sample of confidence level and its mark give other side and are updated training, it is easy to larger error occur, the present invention is by comprehensive The influence for considering two grader LDA_SVM and neutral net to the mark confidence level of same training data is closed, using adaptive The method of Weighted Fusion is weighted fusion to the marking probability predicted vector of two graders, then with the high sample of confidence level and Its predictive marker is updated to two graders, has both efficiently reduced mark sample size required during classifier training, again Preferably mark effect can be obtained.

It should be noted that although embodiment of the present invention is illustrative above, but it is to the present invention that this, which is not, Limitation, therefore the invention is not limited in above-mentioned embodiment.Without departing from the principles of the present invention, it is every The other embodiment that those skilled in the art obtain under the enlightenment of the present invention, it is accordingly to be regarded as within the protection of the present invention.

Claims

1. the image automatic annotation method based on semi-supervised learning, it is characterized in that, including step is as follows：

Step 1, given data set is divided into 3 Sub Data Sets, i.e. training dataset, Unlabeled data collection and test data Collection；

Step 2, LDA_SVM classifier training stages；

The SIFT feature and HOG features for the training image that step 2.1, extraction training data are concentrated use as fisrt feature collection Bag of words method quantifies its visual signature, and the bag of words for obtaining every width training image represent；

Step 2.2, the visual signature using LDA modeling training images, obtain each vision word theme distribution of training image It is distributed with the visual theme of every width training image；

Step 2.3, the visual theme distribution with step 2.2 gained and their original mark construction SVM multi classifiers, are obtained The LDA_SVM graders currently trained；

Step 3, neural network classifier training stage；

Step 3.2, neutral net is input to together with second feature collection and corresponding label information is trained, currently instructed The neural network classifier perfected；

Step 4, coorinated training stage；

The SIFT feature and HOG features for the unmarked image that step 4.1, extraction Unlabeled data are concentrated, and use bag of words method amount Change its visual signature, the bag of words for obtaining every unmarked image represent；

Step 4.2, the visual theme distribution with the unmarked image of vision word theme distribution study obtained by step 2.2；

Step 4.3, the LDA_SVM graders for currently training the image vision theme distribution learnt input, are not marked Remember the first mark prediction probability vector of image；

Step 4.4, with the neural network classifier currently trained the unmarked image that Unlabeled data is concentrated is marked Prediction, obtain the second mark prediction probability vector of unmarked image；

Step 4.5, the first mark prediction probability vector according to the given unmarked image of adaptive weighted convergence strategy fusion With the second mark prediction probability vector, show that the final mark prediction probability of unmarked image is vectorial；

Step 4.6, selection confidence level high unmarked image and its predictive marker are added to training dataset, and return to step 2, Until reaching default maximum iteration, the LDA_SVM graders and neural network classifier that are finally trained；

The mark stage of step 5, test image；

Step 5.2, with the LDA_SVM graders finally trained prediction is marked to the fisrt feature collection of test image, obtained To the first mark prediction probability vector of test image；

Step 5.3, with the neural network classifier finally trained prediction is marked to the second feature collection of test image, obtained To the second mark prediction probability vector of test image；

Step 5.4, the first mark prediction probability vector sum according to given adaptive weighted convergence strategy fusion test image Second mark prediction probability vector, draw the final mark prediction probability vector of test image；

Step 5.5, tag set of the n mark of confidence level highest as test image is chosen, wherein n is artificially set Value.

2. the image automatic annotation method based on semi-supervised learning according to claim 1, it is characterized in that, in step 1,3 The quantity of respective image is in Sub Data Set：Unlabeled data collection>Test data set>Training dataset.

3. the image automatic annotation method based on semi-supervised learning according to claim 1, it is characterized in that, step 4.5 and step In rapid 5.4, adaptive weighted convergence strategy is pre- to same Unlabeled data according to LDA_SVM graders and neural network classifier Survey the contribution of precision and determine.