CN101944111A - Method and device for searching news video - Google Patents

Method and device for searching news video Download PDF

Info

Publication number
CN101944111A
CN101944111A CN 201010280175 CN201010280175A CN101944111A CN 101944111 A CN101944111 A CN 101944111A CN 201010280175 CN201010280175 CN 201010280175 CN 201010280175 A CN201010280175 A CN 201010280175A CN 101944111 A CN101944111 A CN 101944111A
Authority
CN
China
Prior art keywords
news video
website
news
video
url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010280175
Other languages
Chinese (zh)
Other versions
CN101944111B (en
Inventor
朱明�
尹文科
崔昊旻
李自勉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI GUANGXING COMMUNICATION TECHNOLOGY Co Ltd
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN2010102801754A priority Critical patent/CN101944111B/en
Publication of CN101944111A publication Critical patent/CN101944111A/en
Application granted granted Critical
Publication of CN101944111B publication Critical patent/CN101944111B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for searching a news video. The method mainly comprises the following steps: constructing body knowledge for searching news video websites based on semantic association information, and searching out the news video website from the Internet by using the body knowledge; evaluating the news video website in time, and setting the pick-up time interval of the news video website by utilizing the in-time evaluation result; and picking up the contents in the news video website in time through the set searching method by utilizing the pick-up time interval of the news video website, and acquiring the news video in the contents. The invention effectively solves the problems in automatic, accurate and timely searching and integration of the Internet news video, can quickly and accurately identify the news video website, and can automatically find and integrate the news video in time.

Description

The searching method of news video and device
Technical field
The present invention relates to the Computer Applied Technology field, relate in particular to a kind of searching method and device of news video.
Background technology
In order to support the professional evolution of the integration of three networks, need research how based on the terminal device of resource-constrained, support to carry out more television services, relatively attract spectators' news especially at present in the television services.How to make televiewer's news that can teleview at any time, enjoy the personalization of TV news and the service of special topicization, become the problem that merits attention under the integration of three networks background.
The method of a kind of Web page subject identification of the prior art and Web page information extraction mainly comprises: on the basis that Web page subject is analyzed, all webpages of website are merged into a virtual page, adopt the words-frequency feature vector to carry out websites collection.Adopt vector space model, utilize the distance between vector to carry out the website subject analysis, adopt theme frequency vector to describe the theme feature of website, come the weights of corresponding definite vector element according to the webpage number that comprises each theme in the website.In addition, the internal links structure of website usually is regarded as a kind of tree or graph structure of level.For example: physics and logical connection structure according to website merge Web page subject, thereby determine the website theme.
Then, utilize artificial constructed information extraction system, have the information extraction system of supervision, semi-supervised information extraction system and unsupervised information extraction system to carry out Web page information extraction.
In realizing process of the present invention, the inventor finds the method for above-mentioned Web page subject of the prior art identification and Web page information extraction, and there are the following problems at least: the statistics and analysis that need carry out complexity to the whole link structure of website, in the face of the network size that increases fast, applicability has much room for improvement.Can't identify the news video website quickly and accurately, also can't find automatically, in time and integrated news video.
Summary of the invention
Embodiments of the invention provide a kind of searching method and device of news video, to realize automatically, accurately and in time to find and integrated news video.
A kind of searching method of news video comprises:
Based on the ontology knowledge of semantic association information architecture search news video website, utilize described ontology knowledge from the internet, to search out the news video website;
The evaluation of promptness is carried out in described news video website, utilize the assessment result of described promptness to set the time interval of picking up of described news video website;
Utilize the time interval of picking up of described news video website, pick up content in the described news video website in real time, obtain the news video in the described content by the searching method of setting.
A kind of searcher of news video comprises:
News video site search module is used for the ontology knowledge based on semantic association information architecture search news video website, utilizes described ontology knowledge to search out the news video website from the internet;
Pick up time interval setting module, be used for the evaluation of promptness is carried out in the news video website that described news video site search module searches for out, utilize the assessment result of described promptness to set the time interval of picking up of described news video website;
The news video acquisition module, be used to utilize the described time interval of picking up of picking up news video website that time interval setting module sets, pick up content in the described news video website in real time by the searching method of setting, obtain the news video in the described content.
The technical scheme that is provided by the embodiment of the invention described above as can be seen, the embodiment of the invention has solved the internet news video effectively and has searched for automatically, accurately, timely and integrated problem, can identify the news video website quickly and accurately, can find automatically, in time and integrated news video.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention, the accompanying drawing of required use is done to introduce simply in will describing embodiment below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
The principle schematic of the searching method of a kind of news video that Fig. 1 provides for the embodiment of the invention one;
The processing flow chart of the searching method of a kind of news video that Fig. 2 provides for the embodiment of the invention one;
The structure principle schematic of a kind of ontology knowledge that Fig. 3 provides for the embodiment of the invention one;
The processing flow chart of a kind of website subject identifying method that Fig. 4 provides for the embodiment of the invention one;
A kind of concrete processing flow chart that ontology knowledge is carried out new url generation power, degree of subject relativity evaluation that Fig. 5 provides for the embodiment of the invention one;
A kind of processing flow chart that the news video website of storing in the news video database is carried out the promptness evaluation that Fig. 6 provides for the embodiment of the invention one;
A kind of processing flow chart that the news video website of storing in the news video database is carried out the novelty evaluation that Fig. 7 provides for the embodiment of the invention one;
A kind of processing flow chart that the news video website of storing in the news video database is carried out original evaluation that Fig. 8 provides for the embodiment of the invention one;
The processing flow chart of a kind of content-based duplicate detection technology that Fig. 9 provides for the embodiment of the invention one;
A kind of processing flow chart of picking up the content of the news video website of storing in the news video database in real time that Figure 10 provides for the embodiment of the invention one;
The structural representation of the searcher of a kind of news video that Figure 11 provides for the embodiment of the invention two.
Embodiment
In embodiments of the present invention, based on the ontology knowledge of semantic association information architecture search news video website, utilize described ontology knowledge from the internet, to search out the news video website.The evaluation of promptness is carried out in described news video website, utilize the assessment result of described promptness to set the time interval of picking up of described news video website.Then, utilize the time interval of picking up of described news video website, pick up content in the described news video website in real time, obtain the news video in the described content by the searching method of setting.
For ease of understanding, be that example is further explained explanation below in conjunction with accompanying drawing with several specific embodiments, and each embodiment does not constitute the qualification to the embodiment of the invention to the embodiment of the invention.
Embodiment one
The principle schematic of the searching method of a kind of news video that this embodiment provides as shown in Figure 1, the concrete treatment scheme of the searching method of this news video comprises following treatment step as shown in Figure 2:
Step 21, based on the ontology knowledge of semantic association information architecture search news video website, utilize above-mentioned ontology knowledge, first search technique and website subject identifying method from the internet, to search out the news video website, and with the news video web site stores in the news video site databases.
At first, utilize the news video data in advance of small quantities of seed website to set up the news video database, the descriptor of each news video of storage and each news video in this news video database.Above-mentioned seed website comprises websites such as " www.xinhuanet.com's news ", " rising fast net news ".
In embodiments of the present invention, also to set up the news video site databases in advance, each news video website of storage in this news video site databases, and the evaluation information of each news video website, pick up information such as time interval.
Ontology knowledge based on semantic association information architecture search news video website.The structure principle schematic of this ontology knowledge as shown in Figure 3.Above-mentioned semantic association information spinner will comprise: the searching key word that search engine itself provides, search for the content keyword of the news video website of discovery, search for the content institutional framework keyword of the news video website of discovery and the content description keyword of having searched for the news video website of discovery.The content keyword of above-mentioned news video website comprises: the keyword in the title of the content of news video website, the content description keyword of above-mentioned news video website comprises: the focus video title.Therefore, mainly comprise four kinds of keywords in the above-mentioned ontology knowledge, i.e. searching key word, content keyword, content institutional framework keyword and content description keyword.
At each keyword in the above-mentioned ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the above-mentioned search engine of extraction setting quantity returns, extract the URL (Universal Resource Locator, URL(uniform resource locator)) that comprises in the return results.Identify the URL of the news video website that comprises among the above-mentioned URL by the website subject identifying method.
The treatment scheme of a kind of above-mentioned website subject identifying method that this embodiment provides as described in Figure 4, concrete processing procedure mainly comprises:
At first utilize the pattern information of the URL that comprises in the above-mentioned return results, as the information such as length, the degree of depth and form of URL, using technology such as decision tree or rule set to identify above-mentioned URL is website URL or webpage URL.
For each the website URL that identifies, grasp all webpages in the ground floor of website, utilize the broadcast page recognition technology to calculate the ratio of the video playback page or leaf in above-mentioned all webpages, if this ratio is less than predefined video playback page or leaf threshold value, think that then this website URL is irrelevant with news video website theme, gets rid of this website URL; Otherwise, think that above-mentioned website URL is relevant with news video website theme.
Utilize the corresponding literal (anchor literal) that links of video playback page or leaf in the above-mentioned website relevant that the news video database of setting up is in advance carried out fuzzy query, count total analog result number with news video website theme.Calculate the analog result number of average every link literal correspondence,, think that then this website and news video website theme are irrelevant if this analog result number is counted threshold value less than predefined analog result; Otherwise, think that above-mentioned website URL is relevant with news video website theme, promptly identifying above-mentioned website is the news video website.
Then, with the news video web site stores that identifies in the news video site databases of setting up in advance.
In embodiments of the present invention, the news video website that can also utilize above-mentioned website subject identifying method to be identified, the ontology knowledge of above-mentioned structure is carried out the evaluation that new url produces power, degree of subject relativity two aspects, the concrete processing flow chart of this evaluation procedure mainly comprises following process as shown in Figure 5:
At each keyword in the above-mentioned ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the above-mentioned search engine of extraction setting quantity returns extracts the URL that comprises in the return results.
Obtain the URL of the news video website that comprises among the above-mentioned URL by the website subject identifying method, the quantity of calculating the URL of this news video website accounts for the ratio of the total quantity of the URL that comprises in the above-mentioned return results, if this ratio is less than predetermined subject degree of correlation threshold value, think that then the theme of this keyword and news video website is irrelevant, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword is relevant with the theme of news video website.Continuation is carried out the relevant evaluation of new url generation power to this keyword.
In the news video site databases, search the URL of above-mentioned all news video websites of identifying, the quantity that calculates the URL of the news video website that is not included in the news video site databases according to lookup result accounts for the ratio between the total quantity of URL of above-mentioned news video website, if this ratio produces capacity threshold less than predefined new url, think that then this keyword does not have new url and produces ability, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword has topic relativity and new url produces ability.
In general, it is better that above-mentioned website degree of subject relativity threshold value and new url generation capacity threshold all is made as 0.1 effect.
Step 22, promptness, novelty and original evaluation are carried out in the news video website of storing in the news video database, utilize the promptness assessment result of news video website to set the time interval of picking up of news video website.
The news video website of storing in the news video database is carried out the evaluation of promptness, novelty and original three aspects.
This embodiment provides a kind ofly carries out treatment scheme that promptness estimates as shown in Figure 6 to the news video website of storing in the news video database, and concrete processing procedure comprises:
Obtain the news video on the same day of some in the above-mentioned seed website, the news video database is carried out fuzzy query according to the news video on the above-mentioned same day.The news video quantity similar with the news video above-mentioned same day that comprise in each news video website in the statistics news video database, a plurality of similar news video that belongs to same news video website that same news video searches out only writes down once.
Descending sort is carried out by the news video quantity similar with the news video above-mentioned same day that comprise in all news video websites, rank preceding 10% be made as 5 minutes, rank 10%~30% be made as 4 minutes, rank 30~70% be made as 3 minutes, being made as 2 fens of rank 70%~90%, the last 10% be made as 1 fen is that 0 news video website directly was made as 0 fen for the news video quantity similar with the news video above-mentioned same day that comprise in addition.
At last, the promptness evaluation result of above-mentioned each news video website is deposited in the news website database, as the tolerance foundation of the content promptness of each news video website.
Utilize the promptness assessment result of news video website to set the time interval of picking up of news video website.According to the above-mentioned news video quantity similar that comprise time interval of picking up of each news video website is set with the news video on the described same day, the website correspondence that the news video quantity similar with the news video described same day that comprise is many to pick up the time interval short.
A kind of feasible establishing method of picking up the time interval is: it is set in 5 minutes news video website of promptness score, and to pick up the time interval be 5 minutes, being made as 10 minutes of score 4 minutes, score 3 is divided into establishes 20 minutes, being made as 40 minutes of score 2 minutes, being made as 80 minutes of score 1 minute, being made as 1 day of score 0 minute.
This embodiment provides a kind ofly carries out treatment scheme that novelty estimates as shown in Figure 7 to the news video website of storing in the news video database, and concrete processing procedure comprises:
Utilize content-based duplicate detection technology that the news video that newly obtains from each news video website is carried out cluster, from each cluster, select the discovery time comparison news video early of some to be kept.Then, count total number of clicks of all news videos in each the news video website that remains, and then calculate the number of clicks of average each news video.
Number of clicks by above-mentioned average each news video is carried out descending sort to each news video website, rank preceding 10% be made as 5 minutes, rank 10%~30% be made as 4 minutes, rank 30~70% be made as 3 minutes, being made as 2 fens of rank 70%~90%, the last 10% be made as 1 fen is that 0 news video website directly was made as 0 fen for average each video number of clicks in addition.
At last, the novelty evaluation result of above-mentioned each news video website is deposited in the news website database, as the tolerance foundation of the novelty of each news video website.
This embodiment provides a kind ofly carries out the original treatment scheme of estimating as shown in Figure 8 to the news video website of storing in the news video database, and concrete processing procedure comprises:
Utilize content-based duplicate detection technology that the news video that newly obtains from each news video website is carried out cluster, from each cluster, select the discovery time comparison news video early of some to be kept the follow-up news video of remaining news video.Count total video quantity and repeated quantity that each news video website comprises, and then calculate the repeated ratio of each news video website.All news video websites are arranged in the ascending order of repeated ratio, rank preceding 10% be made as 5 minutes, rank 10%~30% be made as 4 minutes, rank 30~70% be made as 3 minutes, being made as 2 fens of rank 70%~90%, the last 10% be made as 1 fen is that 100% news video website directly was made as 0 fen for the repeated ratio in addition.
At last, the original evaluation result of above-mentioned each news video website is deposited in the news website database, as the tolerance foundation of the originality of each news video website.
The treatment scheme of a kind of above-mentioned content-based duplicate detection technology that this embodiment provides as shown in Figure 9, concrete processing procedure comprises as follows:
At first extract the key frame of video of the some of each news video, use Harris (Harris) operator to detect angle point to each key frame of video, utilize the proper vector of the angle point subregion of SIFT (conversion of yardstick invariant features) the above-mentioned key frame of video of latent structure, and utilize PCA (principal component analysis (PCA)) to reduce the dimension of above-mentioned proper vector.Between the key frame of video in twos of two news videos, use KNN (K arest neighbors) algorithm, nearest preceding K the proper vector of computed range is right, BIC (Bayes's information measure) algorithm is used for the characteristic value sequence X={x1 of an above-mentioned K proper vector to forming, x2 ..., the comparison of xN} (N=2K), if have trip point in the above-mentioned characteristic value sequence X sequence, judge that then two key frame of video do not repeat; Otherwise, judge that two key frame of video repeat.
Count the quantity of the key frame of video of two repetitions between the news video, the key frame of video that calculates repetition accounts for the ratio of total key frame of video, if greater than the key frame of video threshold value of setting, judge that then two news videos are repetitions; Otherwise, judge that two news videos do not repeat.
Step 23, utilize time interval of picking up of news video website, pick up news video in the news video website in real time, the news video of picking up is deposited in the news video database by the searching method of setting.
The treatment scheme of a kind of content of picking up the news video website of storing in the news video database in real time that this embodiment provides as shown in figure 10, concrete processing procedure is as follows:
At first from the news video site databases, obtain the URL and the promptness assessment result of each news video website, utilize the promptness assessment result of news video website to set the time interval of picking up of news video website, a kind of feasible time interval establishing method of picking up is: it is set in 5 minutes news video website of promptness score, and to pick up the time interval be 5 minutes, being made as 10 minutes of score 4 minutes, score 3 is divided into establishes 20 minutes, being made as 40 minutes of score 2 minutes, being made as 80 minutes of score 1 minute, being made as 1 day of score 0 minute.
Judge successively according to certain arrangement sequence whether each news video website in the news video site databases has surpassed the corresponding time interval of picking up apart from the time interval of picking up when finishing last time, if surpass, then the content of a corresponding news video site promoter new round is picked up process; Otherwise, judge whether the time interval when end was picked up apart from last time in next website has surpassed the corresponding time interval of picking up.
For each news video website to be picked up, by the searching method of setting the content in the above-mentioned news video website to be picked up, the searching method of above-mentioned setting comprises: the methods such as BFS (Breadth First Search) method that the degree of depth is limited.
Utilize the limited BFS (Breadth First Search) method of the degree of depth that above-mentioned news video website is traveled through, concrete degree of depth restriction can be the constant of an overall situation, also can change with the difference of news video website.For each webpage that runs in the above-mentioned ergodic process, at first utilize the broadcast page recognition technology to judge whether it is the video playback page or leaf, utilize webpage noise remove technology to remove the noise information that it comprises for the video playback page or leaf, the noise here comprises: ground unrest, random noise, and residual noise.With information remaining in the video playback page or leaf as news video.
Utilize above-mentioned content-based duplicate detection technology to carry out duplicate detection to this news video, the news video for duplicate detection is passed through utilizes the image quality that improves news video based on the inverse iteration sciagraphy in video compress territory.After utilizing existing instrument that news video is carried out the transcoding processing, obtain the news video of MP4 or FLV (FLV stream media format) encapsulation format.Then, news video and corresponding descriptor are deposited in the news video database.When end is picked up in the news video website, will deposit in the concluding time in the news video site databases.
News video in the above-mentioned news video site databases can use for the video on-demand system towards the TV news door.The description and the related information of news video can be pushed to Portal (door) website.Behind user's STB (Set Top Box, set-top box) the visit Portal website, can see up-to-date news video tabulation, the user can browse the news video in the news video tabulation, order and program request.
Embodiment two
The structural representation of the searcher of a kind of news video that this embodiment provides comprises following module as shown in figure 11:
News video site search module 11 is used for the ontology knowledge based on semantic association information architecture search news video website, utilizes described ontology knowledge to search out the news video website from the internet;
News video website evaluation module 12 is used for the evaluation of promptness is carried out in the news video website that described news video site search module searches for out, utilizes the assessment result of described promptness to set the time interval of picking up of described news video website;
News video acquisition module 13, be used to utilize the time interval of picking up of news video website that described news video website evaluation module sets, pick up content in the described news video website in real time by the searching method of setting, obtain the news video in the described content.
The searcher of described news video can also comprise:
Ontology knowledge evaluation module 14, be used at each keyword of above-mentioned ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the above-mentioned search engine of extraction setting quantity returns extracts the URL that comprises in the return results.
Obtain the URL of the news video website that comprises among the above-mentioned URL by the website subject identifying method, the quantity of calculating the URL of this news video website accounts for the ratio of the total quantity of the URL that comprises in the above-mentioned return results, if this ratio is less than predetermined subject degree of correlation threshold value, think that then the theme of this keyword and news video website is irrelevant, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword is relevant with the theme of news video website.Continuation is carried out the relevant evaluation of new url generation power to this keyword.
In the news video site databases, search the URL of above-mentioned all news video websites of identifying, the quantity that calculates the URL of the news video website that is not included in the news video site databases according to lookup result accounts for the ratio between the total quantity of URL of above-mentioned news video website, if this ratio produces capacity threshold less than predefined new url, think that then this keyword does not have new url and produces ability, this keyword is weeded out from above-mentioned ontology knowledge; Otherwise, think that this keyword has topic relativity and new url produces ability.
Described news video site search module 11 specifically can comprise:
Search module 111, be used for each keyword at described ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the described search engine of extraction setting quantity returns extracts the uniform resource position mark URL that comprises in the return results;
Identification module 112 is used for identifying by the website subject identifying method URL of the news video website that URL that described search module extracts comprises, with the news video web site stores that identifies at the news video site databases of setting up in advance.
Described news video website evaluation module 12 specifically can comprise:
Statistical module 121, be used in seed website, obtaining the news video on the same day of some, news video according to the described same day is carried out fuzzy query to the news video database, the news video quantity similar with the news video described same day that comprise in each news video website in the statistics news video database deposits the evaluation result of this news video quantity as the promptness of news video website in the news video site databases in;
Setting module 122 is used for setting according to the described news video quantity similar with the news video on the described same day that comprise time interval of picking up of each news video website, the news video website correspondence that news video quantity is many to pick up the time interval short.
Described news video acquisition module 13 specifically can comprise:
Pick up module 131, be used for when the news video website of news video site databases picked up apart from last time time when finishing surpassed described news video website pick up the time interval after, by the searching method of setting the content in the described news video website is picked up;
Identification module 132, be used for utilizing the broadcast page recognition technology to judge whether it is the video playback page or leaf to each webpage of picking up from described news video website, after removing its noise information that comprises for the video playback page or leaf of judging, with the information of remainder as news video;
Detect and enhancing module 133, be used for utilizing content-based duplicate detection technology to carry out duplicate detection to described news video, utilization strengthens the quality of the news video that duplicate detection passes through based on the inverse iteration sciagraphy in video compress territory, then, described news video and corresponding descriptor are deposited in the news video database.
Described news video website evaluation module 12 can also comprise:
Novelty evaluation module 123 is used for utilizing content-based duplicate detection technology that the news video that newly obtains from each news video website is carried out cluster, selects the discovery time comparison news video early of some to be kept from each cluster.Then, count total number of clicks of all news videos in each the news video website that remains, and then calculate the number of clicks of average each news video.
Set time interval of picking up of each news video website according to the described news video quantity similar that comprise with the news video on the described same day, the news video website correspondence that news video quantity is many to pick up the time interval short.
Number of clicks by above-mentioned average each news video is carried out the novelty evaluation to each news video website, the novelty evaluation result of each news video website is deposited in the news website database, as the tolerance foundation of the novelty of each news video website.
Original evaluation module 124, be used for utilizing content-based duplicate detection technology that the news video that newly obtains from each news video website is carried out cluster, from each cluster, select the discovery time comparison news video early of some to be kept the follow-up news video of remaining news video.Count total video quantity and repeated quantity that each news video website comprises, and then calculate the repeated ratio of each news video website.
Repeated ratio in above-mentioned each news video website is carried out originality evaluation to each news video website, the original evaluation result of each news video website is deposited in the news website database, as the tolerance foundation of the originality of each news video website.
One of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random AccessMemory, RAM) etc.
In sum, the embodiment of the invention has solved the internet news video effectively and has searched for automatically, accurately, timely and integrated problem, can identify the news video website quickly and accurately, can find automatically, in time and integrated news video.
The embodiment of the invention proposes a kind of towards the internet news video search of TV news door and integrated system and method, abundant and high-quality internet news video resource can be provided for the video on-demand system towards the TV news door, can provide necessary news video material and descriptor for the TV news door.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (10)

1. the searching method of a news video is characterized in that, comprising:
Based on the ontology knowledge of semantic association information architecture search news video website, utilize described ontology knowledge from the internet, to search out the news video website;
The evaluation of promptness is carried out in described news video website, utilize the assessment result of described promptness to set the time interval of picking up of described news video website;
Utilize the time interval of picking up of described news video website, pick up content in the described news video website in real time, obtain the news video in the described content by the searching method of setting.
2. the searching method of news video according to claim 1, it is characterized in that described semantic association information comprises: the searching key word that search engine itself provides, search for the content keyword of the news video website of discovery, search for discovery the news video website content institutional framework keyword and searched for the content description keyword of the news video website of discovery.
3. the searching method of news video according to claim 2 is characterized in that, describedly utilizes described ontology knowledge to search out the news video website from the internet, comprising:
At each keyword in the described ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the described search engine of extraction setting quantity returns extracts the uniform resource position mark URL that comprises in the described Search Results;
Identify the URL of the news video website that comprises among the described URL by the website subject identifying method, with the news video web site stores that identifies at the news video site databases of setting up in advance.
4. the searching method of news video according to claim 3 is characterized in that, describedly identifies the URL of the news video website that comprises among the described URL by the website subject identifying method, comprising:
Utilizing the pattern information of the URL that comprises in the described Search Results to identify described URL is website URL or webpage URL;
For each the website URL that identifies, grasp all webpages in the ground floor of website, utilize the broadcast page recognition technology to calculate the ratio of the video playback page or leaf in described all webpages, if this ratio is less than predefined video playback page or leaf threshold value, think that then described website URL is irrelevant with news video website theme, gets rid of described website URL; Otherwise, think that described website URL is relevant with news video website theme;
Utilize the corresponding literal that links of video playback page or leaf in the described website relevant that the news video database of setting up is in advance carried out fuzzy query, count total analog result number with news video website theme.Calculate the analog result number of average every link literal correspondence,, think that then described website and news video website theme are irrelevant if this analog result number is counted threshold value less than predefined analog result; Otherwise identifying described website is the news video website.
5. the searching method of news video according to claim 1 is characterized in that, described the evaluation of promptness is carried out in described news video website, utilizes the assessment result of described promptness to set the time interval of picking up of described news video website, comprising:
Obtain the news video on the same day of some in the described seed website, news video according to the described same day is carried out fuzzy query to the news video database, the news video quantity similar with the news video described same day that comprise in each news video website in the statistics news video database deposits the evaluation result of this news video quantity as the promptness of news video website in the news video site databases in;
According to the described news video quantity similar that comprise time interval of picking up of each news video website is set with the news video on the described same day, the website correspondence that the news video quantity similar with the news video described same day that comprise is many to pick up the time interval short.
6. according to the searching method of each described news video of claim 1 to 5, it is characterized in that, the described time interval of picking up of utilizing described news video website, pick up news video in the described news video website in real time by the searching method of setting, comprising:
When the news video website in the news video site databases picked up apart from last time time when finishing surpassed described news video website pick up the time interval after, by the searching method of setting the content in the described news video website is picked up;
Utilize the broadcast page recognition technology to judge whether it is the video playback page or leaf to each webpage of from described news video website, picking up, remove its noise information that comprises for the video playback page or leaf of judging after, with the information of remainder as news video;
Utilize content-based duplicate detection technology to carry out duplicate detection to described news video, utilization improves the quality of the news video that duplicate detection passes through based on the inverse iteration sciagraphy in video compress territory, then, described news video and corresponding descriptor are deposited in the news video database.
7. the searcher of a news video is characterized in that, comprising:
News video site search module is used for the ontology knowledge based on semantic association information architecture search news video website, utilizes described ontology knowledge to search out the news video website from the internet;
Pick up time interval setting module, be used for the evaluation of promptness is carried out in the news video website that described news video site search module searches for out, utilize the assessment result of described promptness to set the time interval of picking up of described news video website;
The news video acquisition module, be used to utilize the described time interval of picking up of picking up news video website that time interval setting module sets, pick up content in the described news video website in real time by the searching method of setting, obtain the news video in the described content.
8. the searcher of news video according to claim 7 is characterized in that, described news video site search module comprises:
Search module, be used for each keyword at described ontology knowledge, utilize the searching request of first search technique structure to the search engine in the internet, the Search Results that the described search engine of extraction setting quantity returns extracts the uniform resource position mark URL that comprises in the return results;
Identification module is used for identifying by the website subject identifying method URL of the news video website that URL that described search module extracts comprises, with the news video web site stores that identifies at the news video site databases of setting up in advance.
9. the searcher of news video according to claim 7 is characterized in that, the described time interval setting module of picking up comprises:
Statistical module, be used in seed website, obtaining the news video on the same day of some, news video according to the described same day is carried out fuzzy query to the news video database, the news video quantity similar with the news video described same day that comprise in each news video website in the statistics news video database deposits the evaluation result of this news video quantity as the promptness of news video website in the news video site databases in;
Setting module is used for setting according to the described news video quantity similar with the news video on the described same day that comprise time interval of picking up of each news video website, the news video website correspondence that news video quantity is many to pick up the time interval short.
10. according to the searcher of claim 7 or 8 or 9 described news videos, it is characterized in that described news video acquisition module comprises:
Pick up module, be used for when the news video website of news video site databases picked up apart from last time time when finishing surpassed described news video website pick up the time interval after, by the searching method of setting the content in the described news video website is picked up;
Identification module, be used for utilizing the broadcast page recognition technology to judge whether it is the video playback page or leaf to each webpage of picking up from described news video website, after removing its noise information that comprises for the video playback page or leaf of judging, with the information of remainder as news video;
Detect and the enhancing module, be used for utilizing content-based duplicate detection technology to carry out duplicate detection to described news video, utilization strengthens the quality of the news video that duplicate detection passes through based on the inverse iteration sciagraphy in video compress territory, then, described news video and corresponding descriptor are deposited in the news video database.
CN2010102801754A 2010-09-09 2010-09-09 Method and device for searching news video Expired - Fee Related CN101944111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102801754A CN101944111B (en) 2010-09-09 2010-09-09 Method and device for searching news video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102801754A CN101944111B (en) 2010-09-09 2010-09-09 Method and device for searching news video

Publications (2)

Publication Number Publication Date
CN101944111A true CN101944111A (en) 2011-01-12
CN101944111B CN101944111B (en) 2012-05-23

Family

ID=43436102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102801754A Expired - Fee Related CN101944111B (en) 2010-09-09 2010-09-09 Method and device for searching news video

Country Status (1)

Country Link
CN (1) CN101944111B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117267A (en) * 2011-02-25 2011-07-06 汉王科技股份有限公司 Information display method, device and electronic equipment
CN103455602A (en) * 2013-09-03 2013-12-18 小米科技有限责任公司 Video URL (Uniform Resource Locator) capturing method and device and terminal equipment
CN103548017A (en) * 2011-12-26 2014-01-29 华为技术有限公司 Video search method and video search system
CN103699661A (en) * 2013-12-26 2014-04-02 乐视网信息技术(北京)股份有限公司 Method and system for acquiring data of video resources
CN104216928A (en) * 2013-06-05 2014-12-17 腾讯科技(深圳)有限公司 Site information acquiring method and device
CN106528569A (en) * 2015-09-11 2017-03-22 北京国双科技有限公司 Method and device for calculating validity of site search
CN109032906A (en) * 2018-07-17 2018-12-18 郑州升达经贸管理学院 A kind of appraisal procedure and its assessment device of internet news
CN110704603A (en) * 2019-09-12 2020-01-17 武汉灯塔之光科技有限公司 Method and device for discovering current hot event through information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101065749A (en) * 2004-11-24 2007-10-31 琳达·劳逊 System and method for resource management
CN101599089A (en) * 2009-07-17 2009-12-09 中国科学技术大学 The automatic search of update information on content of video service website and extraction system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101065749A (en) * 2004-11-24 2007-10-31 琳达·劳逊 System and method for resource management
CN101599089A (en) * 2009-07-17 2009-12-09 中国科学技术大学 The automatic search of update information on content of video service website and extraction system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Signal Processing,ICSP 2008》 20081208 Ming Zhu,etc Effective Video Content Abstraction by Similar Shots Clustering 第1445-1448页 1-10 , 2 *
《计算机仿真》 20080831 朱明等 基于多超级节点的PMDN资源搜索策略 第131-135页 1-10 第25卷, 第8期 2 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117267A (en) * 2011-02-25 2011-07-06 汉王科技股份有限公司 Information display method, device and electronic equipment
CN103548017A (en) * 2011-12-26 2014-01-29 华为技术有限公司 Video search method and video search system
CN104216928A (en) * 2013-06-05 2014-12-17 腾讯科技(深圳)有限公司 Site information acquiring method and device
CN103455602A (en) * 2013-09-03 2013-12-18 小米科技有限责任公司 Video URL (Uniform Resource Locator) capturing method and device and terminal equipment
CN103455602B (en) * 2013-09-03 2017-03-29 小米科技有限责任公司 A kind of video URL grasping means, device and terminal device
CN103699661A (en) * 2013-12-26 2014-04-02 乐视网信息技术(北京)股份有限公司 Method and system for acquiring data of video resources
CN106528569A (en) * 2015-09-11 2017-03-22 北京国双科技有限公司 Method and device for calculating validity of site search
CN106528569B (en) * 2015-09-11 2019-09-17 北京国双科技有限公司 Calculate the method and device of search in Website availability
CN109032906A (en) * 2018-07-17 2018-12-18 郑州升达经贸管理学院 A kind of appraisal procedure and its assessment device of internet news
CN110704603A (en) * 2019-09-12 2020-01-17 武汉灯塔之光科技有限公司 Method and device for discovering current hot event through information

Also Published As

Publication number Publication date
CN101944111B (en) 2012-05-23

Similar Documents

Publication Publication Date Title
CN101944111B (en) Method and device for searching news video
CN102929928B (en) Multidimensional-similarity-based personalized news recommendation method
CN106600343B (en) Video content associated online video advertisement management method and system
US10032081B2 (en) Content-based video representation
US9706008B2 (en) Method and system for efficient matching of user profiles with audience segments
CN105022827B (en) A kind of Web news dynamic aggregation method of domain-oriented theme
CN112052387B (en) Content recommendation method, device and computer readable storage medium
CN102165464A (en) Method and system for automated annotation of persons in video content
CN104219575A (en) Related video recommending method and system
CN104462385A (en) Personalized movie similarity calculation method based on user interest model
CN104462293A (en) Search processing method and method and device for generating search result ranking model
CN105183897A (en) Method and system for ranking video retrieval
CN101685521A (en) Method for showing advertisements in webpage and system
CN103870454A (en) Method and method for recommending data
CN102880712A (en) Method and system for sequencing searched network videos
CN103593371A (en) Method and device for recommending search keywords
CN103546326A (en) Website traffic statistic method
US20170199930A1 (en) Systems Methods Devices Circuits and Associated Computer Executable Code for Taste Profiling of Internet Users
KR101541495B1 (en) Apparatus, method and computer readable recording medium for analyzing a video using the image captured from the video
CN104899306A (en) Information processing method, information display method and information display device
Liu et al. Query sensitive dynamic web video thumbnail generation
Falchi et al. Similarity caching in large-scale image retrieval
CN102542066A (en) Video clustering method, ordering method, video searching method and corresponding devices
CN102855245A (en) Image similarity determining method and image similarity determining equipment
CN103688256A (en) Method, device and system for determining video quality parameter based on comment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: ANHUI GUANGXING COMMUNICATION TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA

Effective date: 20130821

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 230026 HEFEI, ANHUI PROVINCE TO: 230001 HEFEI, ANHUI PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20130821

Address after: 800 C4, 12 floor, animation industry park, Wangjiang Road, Anhui, Hefei 230001, China

Patentee after: Anhui Guangxing Communication Technology Co., Ltd.

Address before: 230026 Jinzhai Road, Anhui, China, No. 96, No.

Patentee before: University of Science and Technology of China

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120523

Termination date: 20200909

CF01 Termination of patent right due to non-payment of annual fee