Summary of the invention
The objective of the invention is to overcome the not high defective of accuracy rate that existing method causes owing to the indirect feature of employing when realizing people's face and name coupling, thereby provide a kind of and realize the coupling of people's face and name, and then be to comprise the video of people's face or the method that picture is created correct index by direct characteristic matching.
To achieve these goals, the invention provides a kind of method, comprising multimedia establishment index:
Step 1), doing picture search for the name that found from voice relevant with multimedia or text, is the face database that described name is set up correspondence according to Search Results;
Step 2), extract feature respectively for each the individual face in the described face database and the frame of video or the people's face to be detected in the described picture of described video;
Step 3), each the individual face in described people's face to be detected and the described face database is carried out characteristic matching respectively, people's face to be detected is matched on the name of face database representative by each matching result;
Step 4), will mate the name of gained as described multimedia index.
In the technique scheme, also comprise:
Step 5), search corresponding nationality's name, resulting nationality's name is added in the described multimedia index for described name as index.
In the technique scheme, described step 1) comprises:
Step 1-1), the name that is found is searched on image search engine, from Search Results, select before N open image;
Step 1-2), opening image from selected N adopts method for detecting human face to remove the image that does not comprise people's face or include a plurality of people's faces;
Step 1-3), to the residual image face density Estimation of conducting oneself, according to resulting people's face density described residual image is done ordering then;
Step 1-4), from the image after the ordering, select preceding M image to set up the face database of described name.
In the technique scheme, the value of described N is between 20 to 100, and the value of described M is any one odd number between 1 to 21.
In the technique scheme, at described step 1-1) in, only adopt the highest name full name of accuracy rate on image search engine, to search for to a plurality of names of representing same people.
In the technique scheme, described step 2) also comprise: the people's face after the extraction feature is done dimension-reduction treatment.
In the technique scheme, described step 3) comprises:
Step 3-1), each the individual face in described people's face to be detected and the described face database is carried out characteristic matching respectively;
Step 3-2), obtain the average similarity of people's face in described people's face to be detected and the face database according to each characteristic matching result;
Step 3-3), described average similarity and a similarity threshold are compared, when greater than this similarity threshold, the name of people's face to be detected and face database representative coupling.
In the technique scheme, described step 4) comprises: from a plurality of names of people's face coupling select the highest name of similarity as index.
In the technique scheme, described step 5) comprises:
Step 5-1), in the name that will from voice relevant or text, be found with multimedia, and in the nationality's name that is found from voice relevant with multimedia or text one searches at network search engines, obtains the probability that a name belongs to the country of nationality's name representative;
Step 5-2), all names and all nationality's names are repeated above-mentioned step 5-1) operation is found out maximum probability from all results, and name and nationality's name of this probability representative made up as described multimedia index.
In the technique scheme, described step 5-1) comprising:
Step 5-1-1), in the name that will from voice relevant or text, be found with multimedia, and in the nationality's name that is found from voice relevant with multimedia or text one searches at network search engines, comprised the webpage number of a name and nationality's name simultaneously, and the webpage number that comprises a name separately;
Step 5-1-2), the webpage number that comprises a name and nationality's name simultaneously and the webpage number that comprises a name are separately asked ratio, obtain the probability that a name belongs to the country of nationality's name representative.
The present invention also provides a kind of multimedia index creation device, comprising: face database is dynamically set up module, face characteristic extraction module, people's face matching module and index and is set up module; Wherein,
Described face database is dynamically set up module and is used for and will does picture search from the name that voice or text found relevant with multimedia, is that described name is set up corresponding face database according to Search Results;
Described face characteristic extraction module is used to each the individual face in the described face database and the frame of video of described video or the people's face to be detected in the described picture to extract feature respectively;
Described people's face matching module is used for each individual face of described people's face to be detected and described face database is carried out characteristic matching respectively, by each matching result people's face to be detected is matched on the name of face database representative;
Described index set up module be used for will the coupling name as described multimedia index.
In the technique scheme, described index is set up module and is also searched corresponding nationality's name for described name as index, and resulting nationality's name is added in the described multimedia index.
The invention has the advantages that:
1, the method to multimedia establishment index of the present invention need not human intervention, need not prior labeled data, has good extendability.
2, the method to multimedia establishment index of the present invention can solve small sample amount problem common in the matching operation of people's face, can name the people's face in the news report very in a small amount convenient and swift and effectively, and create relevant index.
3, method of the present invention realizes that simply, institute's data volume to be processed is little, and index creation speed is fast.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is illustrated.
Before embodiments of the present invention are elaborated, it needs to be noted, related multimedia comprises video and static images among the present invention, to be example with news video common in the video in the following embodiments, in conjunction with Fig. 1, describe how to create index for this news video.
For a news video, at first can from this news video, find relevant information by audio recognition method of the prior art and/or text header recognition methods, for example, name, name of the country etc.Suppose the news video that has one section to give a lecture in the present embodiment about the US President Bush, this video is English the explanation, then can find following keyword: " President George W.Bush ", " President Bush " by audio recognition method of the prior art and/or text header recognition methods, in addition, also may comprise words such as " Bush ", " George Bush ", " George W.Bush ", these words all are meant the US President Bush.In addition, other people name may be arranged also in this video, but in the present embodiment, be absorbed in and create the index relevant, therefore, ignore other people name with Bush.
After obtaining the keyword relevant, just can dynamically set up face database for corresponding personage with name.Still be example with US President Bush, face database how dynamically to set up him is described.
Operation by the front can be that keyword is searched on the image search engine as Google and so on accordingly with these names just after obtaining name in the video.Know that from the explanation of front a people's name has the multiple form of expression, the personage that promptly several different names will be represented may be identical.Therefore, when being the image of keyword search correspondence with the name, the name of handy various ways is searched for simultaneously.With Bush is example, in search during about US President Bush's image, can be with keyword " President George W.Bush ", " President Bush ", " Bush ", " GeorgeBush ", " George W.Bush " respectively at the enterprising line search of network.Because the probability of occurrence of the name that same individual is multi-form is different, therefore adopt different keywords can obtain the webpage number of varying number, according to the quantity of webpage the probability of occurrence of multi-form name is carried out normalization, can obtain probability distribution graph as shown in Figure 2.As can be seen from the figure name is more little near its probability of occurrence of full name more, but its result of returning is also accurate more accordingly.Consider with a plurality of names of same individual and search for the defective that can cause repetitive operation too much respectively, therefore, in the operation below, adopt that the longest name of length to represent the someone, and will be according to the picture search result of the resulting picture search result of this name as related person.
The picture search result of the related person that image search engine returns has thousands of usually, and is especially true for the personality.If it is all picture search results are handled one time, not only time-consuming but also and unnecessary.In view of the top of image search engine return results is that search engine praises the highest result of rate, therefore, N opens image and does follow-up processing before can selecting from Search Results, thereby sets up face database.With President George W.Bush is that the picture search of keyword is an example, selects preceding 20 images in the Search Results, and these images as shown in Figure 3.These images may not all be suitable for creating face database, therefore also need to do further selection.At first remove the image that does not comprise people's face or a plurality of people's faces are arranged, all be removed because many people's faces are arranged as the 12nd, 13,16,17,18 figure among Fig. 3 with method for detecting human face.Then the people's face in the remaining image is carried out density Estimation, density is big more, and then the representativeness of corresponding people's face is just good more, also just helps realizing the identification of people's face more.Therefore, each image is sorted according to the people's face density in the image, select the face database relevant of preceding M image with the personage as dynamic creation.
The method that people's face is carried out density Estimation has multiple, and correlation technique of the prior art all can adopt, and in the present embodiment, can adopt following formula to realize density Estimation:
Wherein, C={x
1, x
2..., x
| C|The set of representative's face, everyone represents that with the proper vector of a d dimension Z represents the proper vector of specific people's face by face.
After adopting above-mentioned formula to calculate the density of the image of other except being removed image among Fig. 3,, select preceding 5 images to set up the face database of President George W.Bush to resulting density ordering.
Under the help of image search engine, obtained US President Bush's face database by aforesaid operations, just can do further processing according to this face database, to set up index to aforesaid news video of giving a lecture about the US President Bush.The described process of setting up index, come down in news video, to search people's face, then people's face and the described face database that is found compared, under the correct prerequisite of comparison, will with described face database institute corresponding name mark in described video, as the index of video.But realize above-mentioned process, need to solve following problem, the one, how to represent people's face, the 2nd, match people face how.
In news video and aforesaid face database, the problem that all has the people's face in pair image to represent.Because news video can be divided into a series of frame of video, therefore, the people's face in the news video is represented, in fact be exactly how the people's face in each news video frame to be represented accordingly.Method for expressing to people's face also can adopt various known method of the prior art to realize, as the Gabor Wavelet Transform.When adopting the Gabor Wavelet Transform to represent people's face, at first each people's face is normalized to the 128x160 pixel, wherein the distance of two eyes is 72 pixels; Used 40 Gabor small echos in the normalized process altogether in that people's face is done, therefore finally extracted 40960 features for everyone face.Obtaining with behind people's face of character representation, in order in subsequent operation, to use the Fisher discriminatory analysis, need be with face characteristic from 40960 dimensionality reductions to 640.
After people's face of related person and the people's face in the face database all are converted to corresponding intrinsic dimensionality in news video, just can compare, to realize the coupling of people's face intrinsic dimensionality.Still be example with aforesaid example, suppose to find people's face A in a certain frame of video of news video, then the feature of this people's face and the feature of 5 people's faces in the face database are compared respectively successively, in process relatively, two width of cloth people faces are mated with 640 features of tieing up of aforesaid people's face.If in people's face matching process, defined a similarity threshold 0.55. promptly the similarity of two facial images then think this two facial images that facial image is same individual greater than 0.55.Related similarity is people's face in the face database and the average similarity between people's face to be detected among the present invention, after the average similarity of people's face in people's face to be detected and the face database is greater than similarity threshold, just think the name coupling of people's face to be detected and face database representative.
After above-mentioned people's face matching operation, generally just can access the matching result of people's face and name.But in actual applications, also have such situation, the match is successful for a certain individual face and a plurality of face database, and promptly the name of people's face and a plurality of face database representatives is complementary, therefore, also need the name that the selection people face the most similar to this people's face referred to from a plurality of names.For example, the Bush, Jr is except the face database with oneself the match is successful, also may the match is successful with its father or other relatives' face database, and can find correct name this moment by following formula:
F wherein
v kK the people face of representative in video,
Representative is fetched j next people's face by i name.
In the above description, finished foundation to the name index of personage in the news video, in actual applications, the user may be not limited to above-mentioned a kind of for the requirement that comprises information in the index, the user is except after will knowing that whose news video news video is, also wonder the further information of related person, as nationality etc.Therefore, in yet another embodiment of the present invention, also to realize nationality's identification to personage in the video.
In this embodiment, the identification to the nationality obtains by the deduction of Bayes's posterior probability.Suppose that the national title that is extracted by audio recognition method of the prior art or text header recognition methods uses O respectively from aforesaid news video
1, O
2..., O
NExpression, and name is used N respectively
1, N
2..., N
mExpression, then the pairing country of each name is judged by following formula,
Wherein,
Above-mentioned formula key issue to be solved when carrying out actual computation be how to formula in related probability estimate.Realize that probability estimate generally needs relevant statistics, obtains concrete probable value by statistics.In the present invention, the statistics that has adopted search engine to obtain being correlated with equally, thus realized estimation to probability.It is many more to suppose that a name and country name occur on network simultaneously, and their relation is close more, and that maximum countries occurring with this name so is exactly country under this people.Suppose to represent the webpage number that returns by term O the webpage number that the NumberN representative is returned by term N, the webpage number that the NumberON representative is returned by term ON with NumberO.Suppose simultaneously by search engine searches to total webpage number be PN, then the P in the aforementioned formula (O/N) can obtain by following formula:
Can obtain the nationality of related person in the news video at an easy rate by said method, this nationality's information can be added in the index of news video with name information.
More than be to be example with the news video, the explanation that index how to create video is carried out, but in concrete the application, can expand in the video of other types, as sports video, video display video etc.In addition, because the operation of video is come down to operation to the image in the video, therefore method of the present invention can be applied in the static picture equally, for static images, find relevant informations such as name, nationality's name, can from the text informations such as news report relevant, obtain with static images.
On the basis of the method for multimedia being created index of the present invention, the present invention also provides a kind of multimedia index creation device, it is characterized in that, comprising: face database is dynamically set up module, face characteristic extraction module, people's face matching module and index and is set up module; Wherein,
Described face database is dynamically set up module and is used for and will does picture search from the name that voice or text found relevant with multimedia, is that described name is set up corresponding face database according to Search Results;
Described face characteristic extraction module is used to each the individual face in the described face database and the frame of video of described video or the people's face to be detected in the described picture to extract feature respectively;
Described people's face matching module is used for each individual face of described people's face to be detected and described face database is carried out characteristic matching respectively, by each matching result people's face to be detected is matched on the name of face database representative;
Described index set up module be used for will the coupling name as described multimedia index.Described index is set up module and is also searched corresponding nationality's name for described name as index, and resulting nationality's name is added in the described multimedia index.
It should be noted last that above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is had been described in detail with reference to embodiment, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.