WO2011042946A1 - 類似コンテンツ検索装置及びプログラム - Google Patents
類似コンテンツ検索装置及びプログラム Download PDFInfo
- Publication number
- WO2011042946A1 WO2011042946A1 PCT/JP2009/067345 JP2009067345W WO2011042946A1 WO 2011042946 A1 WO2011042946 A1 WO 2011042946A1 JP 2009067345 W JP2009067345 W JP 2009067345W WO 2011042946 A1 WO2011042946 A1 WO 2011042946A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- program
- information
- genre
- program information
- phrase
- Prior art date
Links
- 238000000605 extraction Methods 0.000 claims description 27
- 239000000284 extract Substances 0.000 claims description 9
- 238000000034 method Methods 0.000 description 64
- 238000004364 calculation method Methods 0.000 description 16
- 238000010413 gardening Methods 0.000 description 6
- 230000000877 morphologic effect Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003898 horticulture Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
Definitions
- the present invention relates to a similar content search apparatus and program.
- Television broadcasting has various broadcasting forms such as terrestrial broadcasting, BS broadcasting, CS broadcasting, cable television, and Internet broadcasting, and a large number of programs are provided to viewers.
- One of the situations in which a viewer searches for a program that the viewer wants to watch is a search for similar programs.
- a similar content search apparatus includes an information acquisition unit that acquires a plurality of program information including content description information and genre information, an information storage unit that stores a plurality of the program information, and designation of a program of interest
- a receiving unit that receives the program information of the program of interest from the information storage unit, extracts a phrase from the explanation information included in the program information of the program of interest, and the first program having the phrase
- a first search unit that searches information from the information storage unit, a genre characteristic word storage unit that stores a combination of a genre and a phrase characteristic to the genre, program information of the program of interest, and the first program information;
- the phrase similarity calculation unit for calculating the phrase similarity and the phrase stored in the genre characteristic word storage unit are included in the program information of the program of interest and the first program information
- a point deduction unit deducting the word similarity is deducted by the point deduction unit
- a first presentation unit for presenting
- a similar content search program includes a step of acquiring a plurality of program information including content description information and genre information, a step of storing the plurality of program information in an information storage unit, and a designation of a program of interest Receiving the program information of the program of interest from the information storage unit, extracting the phrase from the description information included in the program information of the program of interest, and the first program information having the phrase
- a genre characteristic word storage that stores a step of searching from the information storage unit, a step of calculating word similarity between the program information of the program of interest and the first program information, a genre, and a combination of words characteristic of the genre.
- the phrase stored in the section is included in the program information of the program of interest and the first program information, and the phrase
- the step of deducting the word similarity and the genre information included in the program information of the program of interest are extracted.
- a step of searching the information storage unit for second program information having the genre information, a step of calculating a genre similarity between the program information of the program of interest and the second program information, and the deduction The step of presenting the first program information based on the phrase similarity and the step of presenting the second program information based on the genre similarity are executed by a computer.
- similar content based on a phrase and similar content based on a genre can be searched and appropriately presented.
- FIG. 1 is a schematic configuration diagram of a similar content search device according to a first embodiment.
- FIG. The flowchart explaining the similar content search method.
- the schematic block diagram of the similar content search apparatus which concerns on 2nd Embodiment. The flowchart explaining the addition method of a genre characteristic word.
- FIG. 1 shows a schematic configuration of a similar content search apparatus according to a first embodiment of the present invention.
- the similar content search device includes an information acquisition unit 10, an information storage unit 11, a reception unit 12, a phrase extraction unit 13, a phrase search unit (first search unit) 14, a phrase similarity calculation unit 15, and a phrase similar content presentation unit (first 1 presentation unit) 16, genre extraction unit 17, genre search unit (second search unit) 18, genre similarity calculation unit 19, genre similar content presentation unit (second presentation unit) 20, genre feature word storage unit 21, phrase A similarity deduction unit 22 and a genre similarity addition unit 23 are provided.
- the information acquisition unit 10 acquires program information such as EPG (Electronic Program Guide) including explanation information and genre information of the video program (content) from the received broadcast wave.
- the broadcast wave is not limited to a specific broadcast form, and can take various broadcast forms including EPG information, such as terrestrial digital broadcast, BS broadcast, CS broadcast, cable television, and Internet broadcast.
- a plurality of broadcast waves may be received.
- the explanation information is information indicating the details of the contents of the video program, and the EPG information includes program title information, program summary information, and the like.
- the genre information is information indicating the name of the classification set when the video program is classified according to the contents, and is expressed by a character string expressing the contents of the classification set, a numerical code defined externally, or the like.
- Program information may include broadcast date and time information, broadcast station information, etc. in addition to explanation information and genre information.
- the information storage unit 11 stores program information about a plurality of video programs acquired by the information acquisition unit 10 in a format that can be read by the phrase extraction unit 13, the phrase search unit 14, and the genre search unit 18.
- the information storage unit 11 is, for example, a hard disk or a flash memory.
- the accepting unit 12 accepts designation of a program of interest by the user.
- the designation of the program of interest may be an active video program selection by the user, or may be designated by another action indicating that the user is paying attention to the video program. Another action is, for example, watching a video program, recording, recording reservation, or the like.
- the receiving unit 12 extracts program information of the program of interest from the information storage unit 11 and outputs the program information to the word / phrase extraction unit 13 and the genre extraction unit 17.
- the phrase extraction unit 13 extracts a phrase indicating the content of the program of interest from the explanation information included in the program information (program of interest program information) of the program of interest received from the reception unit 12.
- phrases are extracted from the program title information and the program summary information.
- morphological analysis, semantic information extraction, or the like can be used.
- a specific type may be selected from the results of morphological analysis and semantic information extraction.
- the type of the extracted original information In addition to the character string obtained by dividing the program title information and the program summary information, the type of the extracted original information, the position on the extracted source character string, the type of phrase, the semantic information, the number of appearances in the original information, etc. It may be extracted.
- the phrase search unit 14 searches the program information stored in the information storage unit 11 for program information including the phrase extracted by the phrase extraction unit 13, and acquires the program information for each program.
- the phrase similarity calculation unit 15 calculates a phrase similarity between the program information of the program acquired by the phrase search unit 14 and the program information of interest. A method for calculating the phrase similarity will be described later.
- the genre characteristic word storage unit 21 stores a genre and a combination of words / phrases characteristic of the genre.
- a characteristic word / phrase means a word / phrase having a high appearance frequency in a certain genre and a low appearance frequency in another genre.
- the phrase “performance” has a high appearance frequency in the genre “music” and a low appearance frequency in genres other than “music”. Therefore, the genre feature word storage unit 21 stores the genre feature word “performance” and the corresponding genre “music” in combination.
- the genre feature word storage unit 21 is, for example, a hard disk or a flash memory.
- the phrase similarity deduction unit 22 determines whether or not the genre feature word stored in the genre feature word storage unit 21 is included in both the program information and the program information of interest of the program obtained by the search by the phrase search unit 14. Determine whether.
- the phrase similarity deduction unit 22 searches for and acquires the program acquired by the phrase search unit 14 when a common genre characteristic word is included in both the program information acquired by the phrase search unit 14 and the program information of interest. It is determined whether or not both information and noticed program information include a genre corresponding to this genre characteristic word.
- the phrase similarity deduction unit 22 deducts (decreases) the phrase similarity calculated by the phrase similarity calculation unit 15 when a genre corresponding to the genre characteristic word is included.
- the phrase similarity deduction unit 22 includes a common genre characteristic word in the two program information of the program information of interest and the program information for which the phrase similarity is to be calculated, and both the two program information have this genre characteristic. When the genre corresponding to the word is included, the phrase similarity is deducted.
- the genre characteristic words are words that frequently appear in the corresponding genre, and even if such words match, the similarity between the contents of the two programs is not always high. Therefore, the phrase similarity deduction unit 22 deducts the similarity.
- the phrase similar content presentation unit 16 orders the programs acquired by the phrase search unit 14 based on the phrase similarity, and presents them to the user according to the order.
- the presentation method will be described later.
- the genre extraction unit 17 extracts genre information from the noticed program information received from the reception unit 12.
- the genre search unit 18 searches the program information stored in the information storage unit 11 for program information including the genre extracted by the genre extraction unit 17, and acquires the program information for each program.
- the genre similarity calculation unit 19 calculates the genre similarity between the program information of the program acquired by the genre search unit 18 and the program information of interest. A method for calculating the genre similarity will be described later.
- the genre similarity score adding unit 23 includes a genre characteristic word corresponding to a genre common to two program information items of interest program information and genre similarity calculation target program information when both of the two program information items are included. The genre similarity calculated by the genre similarity calculation unit 19 is added.
- the genre similarity score adding unit 23 adds a genre similarity score when a genre feature word is included, and makes the genre similarity higher than when a genre feature word is not included.
- the genre-similar content presentation unit 20 orders the programs retrieved by the genre search unit 18 based on the genre similarity and presents them to the user according to the order.
- the presentation method will be described later.
- FIG. 2 shows an overall operation flow from the reception of the broadcast wave by the similar content search apparatus until the similar content is ordered and presented.
- Step S101 A broadcast wave is received.
- Step S102 The information acquisition unit 10 acquires program information (EPG information) of all programs from the broadcast wave.
- the acquired program information is stored in the information storage unit 11.
- FIG. 3 An example of the acquired program information is shown in FIG.
- Information of each program includes a program identification ID, a broadcast station name, a program title, up to three types of genre codes, program summary information, and a start date / time and end date / time. These pieces of information are stored in a format that can be used by the reception unit 12, the phrase search unit 14, the genre search unit 18, and the like.
- Step S103 The accepting unit 12 accepts designation of a program of interest from the user.
- An example of the screen for designating the program of interest is shown in FIG.
- a tabular program list is displayed on the screen W.
- a row indicates a time axis
- a column indicates a broadcasting station
- one column in the table corresponds to one program.
- the title of the program is described in the column, but program summary information and the like can also be displayed.
- the user can operate the pointer P with a pointing device such as a remote controller, and designates the program of interest by specifying the pointer P with the program of interest.
- Step S104 The receiving unit 12 acquires the program information (target program information) of the program of interest from the information storage unit 11, and outputs it to the word / phrase extraction unit 13 and the genre extraction unit 17.
- the phrase extraction unit 13 extracts phrase information from the explanation information (program title information and program summary information) of the program information of interest.
- the phrase extraction unit 13 performs morphological analysis / semantic analysis on the character string of the program title information and the program outline information, and extracts words having specific semantic attributes and parts of speech as phrase information from the analysis result.
- Semantic analysis is described in, for example, the document “Considerations on the relationship between question answering, Japanese proper expression extraction and proper expression system”, Yumi Ichimura, et al., Information Processing Society of Japan, NL-161-3, 2004 ”
- a well-known proper expression extraction method as described above can be used.
- Fig. 5 shows an example of the extracted phrase information.
- the phrase information is extracted as a set of the ID of the video program that has appeared, the semantic information and part-of-speech information of the phrase information, the type of the explanation information that has been extracted, and the number of appearances.
- the phrase information as shown in FIG. 5 may be temporarily stored in the memory or may be written back to the information storage unit 11.
- the genre extraction unit 17 extracts genre information from the program information of interest.
- Fig. 6 shows an example of genre information. It is assumed that the genre information is represented by a numerical code and a word representing meaning, and the genre system is defined in a two-layered structure like a large genre / small genre.
- Step S107 The phrase search unit 14 searches the information storage unit 11 for a program (other than the program of interest) that includes at least one of the phrases extracted in Step S105 in the program title information or the program summary information.
- the program information of the program is acquired.
- Step S108 The phrase similarity calculation unit 15 calculates the phrase similarity between each program as a search result in Step S107 and the program of interest. A method for calculating the phrase similarity will be described later.
- Step S109 In the combination of each program as a search result in Step S107 and the program of interest, a common word corresponds to a genre characteristic word, and the program and the program of interest in the genre corresponding to the genre characteristic word are the search result. It is determined whether or not it has. If a common phrase with the program of interest corresponds to the genre characteristic word and there is a program of interest and a search result that the corresponding program has, the process proceeds to step S110. Otherwise, the process proceeds to step S111.
- FIG. 7 shows an example of the format of genre feature word information stored in the genre feature word storage unit 21.
- Each genre characteristic word information is defined by a character string of characteristic words, a corresponding genre, and a set of weights at the time of addition and deduction.
- the weights in addition to the format shown in FIG. 7, for the sake of simplicity, it is possible to make the added point and the deducted point the same value, or to add or subtract in proportion. It is also possible to set a plurality of corresponding genres for each genre characteristic word.
- Step S110 The phrase similarity is deducted by the phrase similarity deduction unit 22 for the program common to the program of interest and the genre characteristic word, and the program of interest and the corresponding search result.
- the value to be deducted is the weight at the time of deduction in FIG.
- Step S111 A certain number of programs are listed (selected) in order of increasing word similarity as programs for presentation.
- the genre search unit 18 searches the information storage unit 11 for a program (other than the program of interest) that includes the genre information extracted in step S106 in the program information, and acquires the program information of the program. If a plurality of genre information is extracted in step S106, a program including at least one of the extracted plurality of genre information is searched.
- Step S113 The search result program in Step S112 and the program listed in Step S111 are compared, and the duplicate program is deleted from the search result.
- Step S114 The genre similarity calculation unit 19 calculates the genre similarity between each program as a search result and the program of interest. A method for calculating the genre similarity will be described later.
- Step S115 In the combination of each program of the search result and the target program, it is determined whether or not the genre characteristic word corresponding to the common genre is included in the program information and the target program information of the program as the search result. If it is included, the process proceeds to step S116. If it is not included, the process proceeds to step S117.
- Step S116 The genre similarity score is added by the genre similarity score adding unit 23 for the search result program including the genre feature words.
- the value to be added is the weight at the time of adding in FIG.
- Step S117 A predetermined number of programs are listed (selected) as programs for presentation in descending order of genre similarity.
- Step S118 The programs listed in steps S111 and S117 are presented to the user.
- step S108 the method for calculating the phrase similarity in step S108 will be described with reference to the flowchart shown in FIG.
- Step S201 The variables used in this flow are initialized.
- the number of search result programs is set to M, the index i, j is set to 1, and the phrase similarity Xi (1 ⁇ i ⁇ M) is set to 0.
- Xi indicates the phrase similarity between the i-th search result program and the program of interest.
- Step S202 The index i is compared with the search result program number M. If i> M, the phrase similarity is calculated for all the search result programs, and the process is terminated. Otherwise, the process proceeds to step S203.
- Step S203 The phrase information that appears in both the program information of the i-th search result program and the noticed program information is acquired, and the number thereof is set to N.
- Step S204 The index j is compared with the number N of phrase information.
- N the addition to the phrase similarity is completed for all the phrase information, and the process proceeds to step S207. Otherwise, the process proceeds to step S205.
- Step S205 The semantic attribute or the part of speech of the j-th phrase information is referred to, and the corresponding weight value is added to the phrase similarity Xi.
- the weight corresponding to the semantic attribute or part of speech is defined as shown in FIG. 9 and held in a storage unit (not shown).
- Step S206 The value of j is incremented by one.
- Step S207 The value of i is incremented by one.
- the word similarity is calculated by such a method.
- Step S301 The variables used in this flow are initialized.
- the number of search result programs is set to M, and the values of indexes i and j are set to 1.
- the phrase similarity Xi the value calculated in step S108 is used.
- Step S302 The index i is compared with the number M of search result programs. In the case of i> M, since the recalculation of the determination / phrase similarity is completed for all the search result programs, the process is terminated. Otherwise, the process proceeds to step S303.
- Step S303 The phrase information appearing in both the program information of the i-th search result program and the noticed program information is acquired, and the number thereof is set to N.
- Step S304 The index j is compared with the word information number N. In the case of j> N, the determination as to whether or not all the phrase information corresponds to the genre characteristic word and the deduction of the phrase similarity have been completed, so the process proceeds to step S305. Otherwise, the process proceeds to step S306.
- Step S305 The value of i is incremented by one.
- Step S306 With reference to the genre characteristic word list as shown in FIG. 7, it is searched whether the j-th word information is included in the list. If it is not included in the list, the process proceeds to step S307, and if it is included, the process proceeds to step S308.
- Step S307 The value of j is incremented by one.
- Step S308 Genre information corresponding to the same genre characteristic word as the j-th phrase information is acquired.
- Step S309 It is determined whether or not both the i-th search result program and the program of interest contain the genre information acquired in Step S308. If included, the process proceeds to step S310, and if not included, the process proceeds to step S307.
- Step S310 The weight value corresponding to the genre feature word is obtained from the genre feature word list and subtracted from the phrase similarity Xi.
- step S114 the genre similarity calculation method in step S114 will be described with reference to the flowchart shown in FIG.
- Step S401 The variables used in this flow are initialized.
- the number of search result programs is set to M, the values of indexes i, j, and k are set to 1, and the genre similarity Yi (1 ⁇ i ⁇ M) is set to 0.
- Yi indicates the genre similarity between the i-th search result program and the program of interest.
- Step S402 The index i is compared with the number M of search result programs. If i> M, the genre similarity is calculated for all the search result programs, and the process is terminated. Otherwise, the process proceeds to step S403.
- Step S403 The genre information of the i-th search result program is acquired, and the number thereof is set to N1.
- Step S404 The index j is compared with the genre information number N1. If j> N1, the addition to the genre similarity for all combinations of genre information has been completed, and the process proceeds to step S405. Otherwise, the process proceeds to step S406.
- Step S405 The value of i is incremented by one.
- Step S406 The genre information of the program of interest is acquired, and the number is set to N2.
- Step S407 The index k is compared with the genre information number N2. If k> N2, the process proceeds to step S408; otherwise, the process proceeds to step S409.
- Step S408 The value of j is incremented by one.
- Step S409 The j-th genre information of the i-th search result program is compared with the k-th genre information of the program of interest. If the two genre information matches both the large genre and the small genre as shown in FIG. 6, the process proceeds to step S411, and otherwise the process proceeds to step S410.
- Step S410 If the two genre information matches only the large genre, the process proceeds to step S412. If the large genre and the small genre do not match, the process proceeds to step S413.
- Step S412 The weight W2 is added to the genre similarity Yi.
- the weight W2 is a value smaller than the weight W1.
- Step S413 The value of k is increased by 1.
- Genre similarity can be calculated by such a method.
- the degree of coincidence of genre information is obtained for all the combinations of N1 ⁇ N2 with respect to the genre information number N1 and the genre information number N2 of the program of interest for a certain search result program, and the weight (W1, Although W2) is added to the genre similarity, the genre information once compared may not be used again.
- the genre similarity Yi can take a plurality of values depending on how the genre information to be compared is combined, but the maximum value among them may be the genre similarity Yi.
- Step S501 Variables used in this flow are initialized.
- the number of search result programs is set to M, and the values of indexes i and j are set to 1.
- the value calculated in step S114 is used as the genre similarity Yi.
- Step S502 The index i and the search result program number M are compared. In the case of i> M, since the determination / genre similarity recalculation has been completed for all the search result programs, the process ends. Otherwise, the process proceeds to step S503.
- Step S503 The genre information common to the i-th search result program and the program of interest is acquired, and the number is set to N.
- Step S504 The index j is compared with the genre information number N. If j> N, the process proceeds to step S505. Otherwise, the process proceeds to step S506.
- Step S505 The value of i is incremented by one.
- Step S506 A genre feature word corresponding to the j-th genre information is acquired by referring to the genre feature word list as shown in FIG.
- Step S507 It is determined whether or not the genre characteristic word acquired in Step S506 is included in the program information of the program of interest and the search result program. If it is included, the process proceeds to step S508. If it is not included, the process proceeds to step S509.
- step S508 it may be further determined whether or not a genre feature word having a similar meaning is included in the program information of the target program information and the search result program. For example, if the genre characteristic word “gardening” is included in the noticed program information and the genre characteristic word “gardening” is included in the program information of the search result program, the process proceeds to step S508.
- Step S508 A weight corresponding to the genre characteristic word is added to the genre similarity Yi.
- Step S509 The value of j is incremented by one.
- FIG. 13 (b) a phrase similar program as shown in FIG. 13 (b) and a genre similar program as shown in FIG. 13 (c) are listed with respect to the program of interest shown in FIG. 13 (a).
- FIGS. 13B and 13C the programs are similar in order from the top. For each similar program, words and genres common to the program of interest are also shown.
- the listed programs are displayed on the screen in a format as shown in FIG. 14 and presented to the user.
- the screen is divided into four areas of genre, keyword, person, and title centering on the program of interest, and programs with high similarity are arranged near the program of interest (screen center) in each area.
- Genre-similar programs are displayed in the genre area (left side of the screen), and phrase-similar programs are displayed in the keyword area (lower side of the screen).
- the user can select a similar program by operating the pointer P from this screen, and can view and record.
- FIG. 15 (a) For the program of interest shown in FIG. 13 (a), the phrase similar program assumed to be selected by the similar content search device excluding the genre characteristic word storage unit 21 and the phrase similarity deduction unit 22 is shown in FIG. 15 (a). Shown in FIG. 15B is the same as FIG. 13B and shows a phrase similar program selected by the similar content search apparatus according to the present embodiment.
- the phrase similar program shown in FIG. 15A is selected mainly because words such as “classic” and “performance” are matched, but these words are frequently used in video programs related to classical music. Therefore, it cannot be said that it is characteristic for the contents of each program.
- this embodiment reduces the word similarity of programs including genre characteristic words such as “classic” and “performance”, and more characteristic “Beethoven”, “Taro Tanaka”, “ Programs including phrases such as “Symphony No. 7” can be presented at the top.
- FIG. 16 shows an example of a genre-like program listed for a program of interest with the theme of gardening / gardening.
- FIG. 16A shows a genre similar program that is assumed to be selected by the similar content search device excluding the genre feature word storage unit 21 and the genre similarity score adding unit 23, and
- FIG. 16B shows the present embodiment. The genre similar program selected by the similar content search device is shown.
- FIG. 16 (a) shows a program of pets and magic tricks instead of gardening. This is because these genres are defined as “hobbies / culture / horticulture / pets / magic tricks” in the genre definition, and all programs are classified into the same genre.
- the genre actually set by the broadcasting station may be defined in a form in which fine genres having different contents are combined into one. Since the genres are the same, the gardening program and the pet program cannot be distinguished by the genre similarity, and a program as shown in FIG. 16A can be selected.
- FIG. 16B shows a gardening and gardening program. This is a result of the fact that words such as “horticultural” are included in the genre characteristic word, so that the genre similarity of programs including them is added and displayed higher.
- the phrase similarity is deducted when the phrase common to the search result program and the target program is a genre characteristic word having a high appearance frequency in the corresponding genre, and the search result program and the target program
- the genre characteristic word corresponding to the genre common to the program is included in the program information
- the similar content based on the phrase and the similar content based on the genre can be appropriately presented by adding the genre similarity.
- the phrase extraction unit 13 may be connected to the information acquisition unit 10 and the information storage unit 11.
- the phrase extraction unit 13 uses the EPG information acquired by the information acquisition unit 10 to extract phrases from the description information of all video programs, and extracts the extracted phrases and EPG information in the information storage unit 11. Can be stored in combination.
- FIG. 17 shows a schematic configuration of a similar content search apparatus according to a second embodiment of the present invention.
- the similar content search apparatus according to the present embodiment has a configuration in which a genre feature word adding unit 24 is further provided in the similar content search apparatus according to the first embodiment shown in FIG.
- the genre feature word adding unit 24 automatically acquires a new genre feature word from the program information searched by the noticed program information, the phrase search unit 14, and the genre search unit 16, and adds it to the genre feature word storage unit 21.
- the new genre characteristic words are words that frequently appear in program information of a certain genre and do not appear much in program information of other genres.
- Step S601 The variables used in this flow are initialized.
- the number of search results by the phrase search unit 14 is M
- the number of phrases extracted from the program of interest is Nk
- the number of genres is Ng
- the values of indexes i and j are 1.
- Step S602 The index i is compared with the number Ng of extracted genres. If i> Ng, the genre feature word addition process is terminated. Otherwise, the process proceeds to step S603.
- Step S603 All the programs including the i-th genre are selected from the search result programs, and the number thereof is C1.
- Step S604 The index j is compared with the number of extracted words Nk. If j> Nk, the process proceeds to step S605. Otherwise, the process proceeds to step S606.
- Step S605 The value of i is incremented by one.
- Step S606 Among the programs selected in Step S603, a program including the jth word / phrase of the program of interest is selected, and the number thereof is C2.
- Step S607 It is determined whether or not the j-th word is a genre characteristic word. Specifically, it is determined whether T1 ⁇ C2 / C1 and T2> C2 / M are satisfied by using two threshold values T1 and T2. If this condition is satisfied, the process proceeds to step S608, and otherwise, the process proceeds to step S609.
- the above condition indicates that the first (T1 ⁇ C2 / C1) indicates that the j-th word frequently appears in the program information of the same genre, and the second (T2> C2 / M). However, it shows that it does not appear so much in program information of other genres.
- the j-th phrase includes a character string indicating the i-th genre or a synonym thereof. Since this condition is independent of the above condition, it may be performed in parallel.
- Step S608 The j-th word and the i-th genre are paired and added to the genre characteristic word storage unit 21 as a genre characteristic word.
- the storage format is the same as in FIG.
- Step S609 The value of j is incremented by one.
- a new genre feature word can be added by such a method.
- genre characteristic words can be newly added, so that a language that has not been used before but has recently been used for a specific genre can be used. It is possible to respond flexibly, and it is possible to more appropriately present similar content based on a phrase and similar content based on a genre.
- At least a part of the similar content search device described in the above-described embodiment may be configured by hardware or software.
- a program for realizing at least a part of the functions of the similar content search apparatus may be stored in a recording medium such as a flexible disk or a CD-ROM, and read and executed by a computer.
- the recording medium is not limited to a removable medium such as a magnetic disk or an optical disk, but may be a fixed recording medium such as a hard disk device or a memory.
- a program that realizes at least a part of functions of the similar content search apparatus may be distributed via a communication line (including wireless communication) such as the Internet. Further, the program may be distributed in a state where the program is encrypted, modulated or compressed, and stored in a recording medium via a wired line such as the Internet or a wireless line.
- a communication line including wireless communication
- the program may be distributed in a state where the program is encrypted, modulated or compressed, and stored in a recording medium via a wired line such as the Internet or a wireless line.
- the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying constituent elements without departing from the scope of the invention in the implementation stage.
- various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment.
- constituent elements over different embodiments may be appropriately combined.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
11 情報記憶部
12 受付部
13 語句抽出部
14 語句検索部
15 語句類似度算出部
16 語句類似コンテンツ提示部
17 ジャンル抽出部
18 ジャンル検索部
19 ジャンル類似度算出部
20 ジャンル類似コンテンツ提示部
21 ジャンル特徴語記憶部
22 語句類似度減点部
23 ジャンル類似度加点部
Claims (5)
- コンテンツの説明情報及びジャンル情報を含む複数の番組情報を取得する情報取得部と、
複数の前記番組情報を記憶する情報記憶部と、
注目番組の指定を受け付ける受付部と、
前記注目番組の番組情報を前記情報記憶部から取得し、前記注目番組の番組情報に含まれる説明情報から語句を抽出する語句抽出部と、
前記語句を有する第1の番組情報を前記情報記憶部から検索する第1検索部と、
ジャンル及び当該ジャンルに特徴的な語句の組み合わせを記憶するジャンル特徴語記憶部と、
前記注目番組の番組情報と前記第1の番組情報との語句類似度を算出する語句類似度算出部と、
前記ジャンル特徴語記憶部に記憶されている語句が前記注目番組の番組情報及び前記第1の番組情報に含まれ、かつ、当該語句と組み合わされているジャンルが前記注目番組の番組情報及び前記第1の番組情報に含まれている場合に、前記語句類似度を減点する減点部と、
前記減点部により減点された前記語句類似度に基づいて前記第1の番組情報を提示する第1提示部と、
前記注目番組の番組情報に含まれるジャンル情報を抽出するジャンル抽出部と、
前記ジャンル情報を有する第2の番組情報を前記情報記憶部から検索する第2検索部と、
前記注目番組の番組情報と前記第2の番組情報とのジャンル類似度を算出するジャンル類似度算出部と、
前記ジャンル類似度に基づいて前記第2の番組情報を提示する第2提示部と、
を備える類似コンテンツ検索装置。 - 前記ジャンル特徴語記憶部に記憶されているジャンルが前記注目番組の番組情報及び前記第2の番組情報に含まれ、かつ、当該ジャンルと組み合わされている語句が前記注目番組の番組情報及び前記第2の番組情報に含まれている場合に、前記ジャンル類似度を加点する加点部をさらに備えることを特徴とする請求項1に記載の類似コンテンツ検索装置。
- 前記語句抽出部は第1~第M(Mは2以上の整数)の語句を抽出し、
前記第1検索部は前記第1~第Mの語句の少なくともいずれか1つを有する第1の番組情報を検索し、
前記第1検索部の検索結果から前記注目番組の番組情報と同じジャンル情報を有する番組情報を選択し、選択した番組情報における前記第k(kは1≦k≦Mを満たす整数)の語句を有する番組情報の割合が第1所定値以上であり、かつ、前記第1の番組情報における前記第kの語句を有する番組情報の割合が第2所定値未満である場合に、前記第kの語句を、前記ジャンル情報と組み合わせて前記ジャンル特徴語記憶部に追加する追加部をさらに備えることを特徴とする請求項2に記載の類似コンテンツ検索装置。 - 前記追加部は、前記注目番組のジャンル情報を示す文字列又は当該文字列の同義語を一部に含む語句を当該ジャンル情報と組み合わせて前記ジャンル特徴語記憶部に追加することを特徴とする請求項3に記載の類似コンテンツ検索装置。
- コンテンツの説明情報及びジャンル情報を含む複数の番組情報を取得するステップと、
複数の前記番組情報を情報記憶部に記憶させるステップと、
注目番組の指定を受け付けるステップと、
前記注目番組の番組情報を前記情報記憶部から取得するステップと、
前記注目番組の番組情報に含まれる説明情報から語句を抽出するステップと、
前記語句を有する第1番組情報を前記情報記憶部から検索するステップと、
前記注目番組の番組情報と前記第1番組情報との語句類似度を算出するステップと
ジャンル及び当該ジャンルに特徴的な語句の組み合わせを記憶するジャンル特徴語記憶部に記憶されている語句が前記注目番組の番組情報及び前記第1番組情報に含まれ、かつ、当該語句と組み合わされているジャンルが前記注目番組の番組情報及び前記第1番組情報に含まれている場合に、前記語句類似度を減点するステップと、
前記注目番組の番組情報に含まれるジャンル情報を抽出するステップと、
前記ジャンル情報を有する第2番組情報を前記情報記憶部から検索するステップと、
前記注目番組の番組情報と前記第2番組情報とのジャンル類似度を算出するステップと、
前記減点された語句類似度に基づいて前記第1番組情報を提示するステップと、
前記ジャンル類似度に基づいて前記第2番組情報を提示するステップと、
をコンピュータに実行させる類似コンテンツ検索プログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011535223A JP5415550B2 (ja) | 2009-10-05 | 2009-10-05 | 類似コンテンツ検索装置及びプログラム |
CN200980161698.2A CN102549569B (zh) | 2009-10-05 | 2009-10-05 | 相似内容搜索设备和程序 |
PCT/JP2009/067345 WO2011042946A1 (ja) | 2009-10-05 | 2009-10-05 | 類似コンテンツ検索装置及びプログラム |
US13/423,002 US8904437B2 (en) | 2009-10-05 | 2012-03-16 | Similar content search device and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2009/067345 WO2011042946A1 (ja) | 2009-10-05 | 2009-10-05 | 類似コンテンツ検索装置及びプログラム |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/423,002 Continuation US8904437B2 (en) | 2009-10-05 | 2012-03-16 | Similar content search device and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011042946A1 true WO2011042946A1 (ja) | 2011-04-14 |
Family
ID=43856447
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/067345 WO2011042946A1 (ja) | 2009-10-05 | 2009-10-05 | 類似コンテンツ検索装置及びプログラム |
Country Status (4)
Country | Link |
---|---|
US (1) | US8904437B2 (ja) |
JP (1) | JP5415550B2 (ja) |
CN (1) | CN102549569B (ja) |
WO (1) | WO2011042946A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150104347A (ko) * | 2014-03-05 | 2015-09-15 | 삼성전자주식회사 | 디스플레이 장치 및 디스플레이 장치의 제어 방법 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014078132A (ja) * | 2012-10-10 | 2014-05-01 | Toshiba Corp | 機械翻訳装置、方法およびプログラム |
JP2014085780A (ja) * | 2012-10-23 | 2014-05-12 | Samsung Electronics Co Ltd | 番組推薦装置及び番組推薦プログラム |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0778182A (ja) * | 1993-06-18 | 1995-03-20 | Hitachi Ltd | キーワード付与システム |
JPH11259515A (ja) * | 1998-03-12 | 1999-09-24 | Toshiba Corp | 類似文書検索装置、類似文書検索方法、および類似文書検索のためのプログラムが記録された記録媒体 |
JP2004220513A (ja) * | 2003-01-17 | 2004-08-05 | Canon Inc | 情報検索装置 |
JP2006338315A (ja) * | 2005-06-01 | 2006-12-14 | Alpine Electronics Inc | データ選別システム |
JP2007006095A (ja) * | 2005-06-23 | 2007-01-11 | Matsushita Electric Ind Co Ltd | コンテンツ再生装置、コンテンツ再生方法、コンテンツ再生プログラムを格納した記録媒体およびコンテンツ再生装置に用いられる集積回路 |
JP2007079948A (ja) * | 2005-09-14 | 2007-03-29 | Nec Corp | 専門用語抽出装置、専門用語抽出方法および専門用語抽出プログラム |
JP2008191748A (ja) * | 2007-02-01 | 2008-08-21 | Oki Electric Ind Co Ltd | ユーザ間コミュニケーション方法、ユーザ間コミュニケーションプログラム、ユーザ間コミュニケーション装置 |
JP2008306458A (ja) * | 2007-06-07 | 2008-12-18 | Sony Corp | 情報処理装置および方法、並びにプログラム |
JP2009116593A (ja) * | 2007-11-06 | 2009-05-28 | Nippon Telegr & Teleph Corp <Ntt> | 単語ベクトル生成装置、単語ベクトル生成方法、プログラムおよびプログラムを記録した記録媒体 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH117453A (ja) * | 1997-04-22 | 1999-01-12 | Mitsubishi Electric Corp | メディア情報推薦装置 |
US7367043B2 (en) * | 2000-11-16 | 2008-04-29 | Meevee, Inc. | System and method for generating metadata for programming events |
EP1489617A1 (en) | 2003-06-19 | 2004-12-22 | Matsuhita Electric Industrial Co., Ltd. | Music reproducing apparatus and music reproducing method |
JP4316432B2 (ja) | 2003-06-19 | 2009-08-19 | パナソニック株式会社 | 楽曲再生装置および楽曲再生方法 |
US20050086689A1 (en) * | 2003-10-17 | 2005-04-21 | Mydtv, Inc. | Interactive program guides providing program segment information |
GB2419072A (en) * | 2003-10-30 | 2006-04-12 | Nokia Corp | Personalisation of an information service provision |
JP4192760B2 (ja) * | 2003-10-31 | 2008-12-10 | 日本電信電話株式会社 | カテゴリ別新出特徴語ランキング方法及び装置及びプログラム及びカテゴリ別新出特徴語ランキングプログラムを記録したコンピュータ読み取り可能な記憶媒体 |
US8589973B2 (en) * | 2006-09-14 | 2013-11-19 | At&T Intellectual Property I, L.P. | Peer to peer media distribution system and method |
US20080155602A1 (en) * | 2006-12-21 | 2008-06-26 | Jean-Luc Collet | Method and system for preferred content identification |
WO2008107986A1 (ja) | 2007-03-07 | 2008-09-12 | Pioneer Corporation | データ閲覧装置及び方法 |
JP2009080580A (ja) | 2007-09-25 | 2009-04-16 | Toshiba Corp | 映像表示装置及び方法 |
JP5225037B2 (ja) | 2008-11-19 | 2013-07-03 | 株式会社東芝 | 番組情報表示装置および方法 |
-
2009
- 2009-10-05 CN CN200980161698.2A patent/CN102549569B/zh not_active Expired - Fee Related
- 2009-10-05 JP JP2011535223A patent/JP5415550B2/ja not_active Expired - Fee Related
- 2009-10-05 WO PCT/JP2009/067345 patent/WO2011042946A1/ja active Application Filing
-
2012
- 2012-03-16 US US13/423,002 patent/US8904437B2/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0778182A (ja) * | 1993-06-18 | 1995-03-20 | Hitachi Ltd | キーワード付与システム |
JPH11259515A (ja) * | 1998-03-12 | 1999-09-24 | Toshiba Corp | 類似文書検索装置、類似文書検索方法、および類似文書検索のためのプログラムが記録された記録媒体 |
JP2004220513A (ja) * | 2003-01-17 | 2004-08-05 | Canon Inc | 情報検索装置 |
JP2006338315A (ja) * | 2005-06-01 | 2006-12-14 | Alpine Electronics Inc | データ選別システム |
JP2007006095A (ja) * | 2005-06-23 | 2007-01-11 | Matsushita Electric Ind Co Ltd | コンテンツ再生装置、コンテンツ再生方法、コンテンツ再生プログラムを格納した記録媒体およびコンテンツ再生装置に用いられる集積回路 |
JP2007079948A (ja) * | 2005-09-14 | 2007-03-29 | Nec Corp | 専門用語抽出装置、専門用語抽出方法および専門用語抽出プログラム |
JP2008191748A (ja) * | 2007-02-01 | 2008-08-21 | Oki Electric Ind Co Ltd | ユーザ間コミュニケーション方法、ユーザ間コミュニケーションプログラム、ユーザ間コミュニケーション装置 |
JP2008306458A (ja) * | 2007-06-07 | 2008-12-18 | Sony Corp | 情報処理装置および方法、並びにプログラム |
JP2009116593A (ja) * | 2007-11-06 | 2009-05-28 | Nippon Telegr & Teleph Corp <Ntt> | 単語ベクトル生成装置、単語ベクトル生成方法、プログラムおよびプログラムを記録した記録媒体 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150104347A (ko) * | 2014-03-05 | 2015-09-15 | 삼성전자주식회사 | 디스플레이 장치 및 디스플레이 장치의 제어 방법 |
KR102229156B1 (ko) * | 2014-03-05 | 2021-03-18 | 삼성전자주식회사 | 디스플레이 장치 및 디스플레이 장치의 제어 방법 |
Also Published As
Publication number | Publication date |
---|---|
US20120266195A1 (en) | 2012-10-18 |
JP5415550B2 (ja) | 2014-02-12 |
US8904437B2 (en) | 2014-12-02 |
JPWO2011042946A1 (ja) | 2013-02-28 |
CN102549569B (zh) | 2014-11-12 |
CN102549569A (zh) | 2012-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5740814B2 (ja) | 情報処理装置および方法 | |
US7890490B1 (en) | Systems and methods for providing advanced information searching in an interactive media guidance application | |
US8478759B2 (en) | Information presentation apparatus and mobile terminal | |
JP4328757B2 (ja) | 番組選択装置及び番組選択装置の制御方法 | |
CN101778233B (zh) | 数据处理装置以及数据处理方法 | |
US9544528B2 (en) | Matrix search of video using closed caption information | |
EP2912855B1 (en) | Program recommendation device and program recommendation program | |
JP4852119B2 (ja) | データ表示装置、データ表示方法、データ表示プログラム | |
US20090024573A1 (en) | Method and system for performing search on a client device | |
US7577921B2 (en) | Method and system for performing search using acronym | |
JP2002108892A (ja) | データ管理システム、データ管理方法、及び、記録媒体 | |
JP5415550B2 (ja) | 類似コンテンツ検索装置及びプログラム | |
JP2004295102A (ja) | 音声認識辞書作成装置及び情報検索装置 | |
US9094736B2 (en) | Information processing apparatus, information processing method, and program | |
JP2001028010A (ja) | マルチメディアコンテンツ自動抽出システム及びその方法 | |
US20180246583A1 (en) | System and method for data item filtering based on character sequence entry | |
JP2006135388A (ja) | 情報処理装置、表示制御装置、情報処理方法、そのプログラム、およびそのプログラムを記録した記録媒体 | |
JP2006155336A (ja) | 情報処理装置、データ取得制御装置、情報処理方法、および、そのプログラム | |
JP6647141B2 (ja) | キーワードリスト生成装置、コンテンツ視聴装置、およびキーワードリスト生成プログラム | |
US20120209883A1 (en) | Content item search apparatus and method | |
JP2014048946A (ja) | 電子機器及びその制御方法 | |
JP2006203619A (ja) | 嗜好別番組分類装置および嗜好別番組分類方法 | |
US9946976B2 (en) | System for enabling channel designation differentiation for hierarchically organizing and accessing address registers with address signifiers and elements | |
US20080189231A1 (en) | Information Processing Device, Classification Reference Information Database, Information Generation Device, Information Processing Method, Information Generation Method, Information Processing Program, and Recording Medium Having Information Processing Program Recorded Therein | |
JP2010231271A (ja) | コンテンツ検索装置、コンテンツ検索方法、コンテンツ検索プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980161698.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09850219 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011535223 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09850219 Country of ref document: EP Kind code of ref document: A1 |