TWI220483B

TWI220483B - Creation method of search database for audio/video information and song search system

Info

Publication number: TWI220483B
Application number: TW91123913A
Authority: TW
Inventors: Chih-Chun Lai; Tzong-Der Wu
Original assignee: Inst Information Industry
Priority date: 2002-10-17
Filing date: 2002-10-17
Publication date: 2004-08-21

Abstract

A creation method of search database for audio/video information comprises the following procedures. Acquire audio and video information from the AV file respectively, and perform respective audio information feature analysis and video information feature analysis on the audio and video information, proceed scene division in terms of AV file in accordance with the result of the audio information feature analysis and video information feature analysis, extract the key film frames from all available divided scenes to acquire the texts in key frames as the text data, based on the text data, the audio information feature analysis result and the corresponding key frames to generate the index table, and create an audio/video information search database by storing the index table and AV files capable of being searched through the text or audio information feature. A KTV song search system that can employ the said database for users to search through the text, audio information feature for song or song fragment is also disclosed together.

Description

1220483 五、發明說明（1) 【發明領域】本發明是有關於建立音訊/視訊的檢索資料庫之方、、及應用前述資料庫之歌曲檢索系統，以達到透過文〜法確地找出該文予中特定字串在該影音檔中及對應場景置，以達到檢索特定字串，播放出特定場景之目的了位【發明背景】 ° 近年來由於KTV、MTV、卡拉0K已成為一流行之樂，對於目前的點歌系統依據歌名或歌手搜尋，其缺吳如果忘記歌名，造成歌曲太多不易尋找，或依據歌名點為手為檢索資料搜尋，得到過於繁雜妁資訊，造成搜尋時= 上的浪費’或搜尋不到真正需要的歌曲。、曰【發明概述】有鑑於此，本發明提供一種建立音訊/.視訊的檢料庫之方法以達到透過文字在每一該場景中之時間點緊，貝精確地找出該文字中特定字串在該影音檔中及對應場景f 二置“以達：檢索特定字串，播出特定場景之目。= J 檢，、系統，係應用前述資料庫，讓使用者可透文字或音訊特徵檢索歌曲或其片段。 ° -段Ϊ ΐ來：-ί練習新歌經常需要重複練習歌曲中的某撥=中=索到的歌曲，直接重：料庫，找出兮々，ί為索-貝料，來檢索音訊/視訊的資 ^子中特定字串在該影昔檔中對應場景之位〇213.8530TW(N) _^9114彳咖kl^ $ 4頁 1220483 五、發明說明（2) 置，以達到檢索特定字串，播出為讓本發明之卜、十、4甘μ特％厅、的0 顯易懂，下令』ί述和其目的、特徵、和優點能更明細說明如下：並配口所附圖式，作詳【實施例】廑建I t圖’第1圖係顯示音訊/視訊的檢索資料庫建立之操作流程圖。只丁叶知立ί先斜t步驟S10從一影音標中分別搁取出視訊 :二：、：該視訊及該音訊分別如步驟S14進行分，及如步驟S16進行音訊特徵分析；依據上述視訊特^ 訊特徵分析之結果，對該影音檔如步驟su進行 ^，由前述切割而得之所有場景中如步驟^0抽取接下來，如步職進行視訊光學文字辨識，擷取該關鍵影格中之文字，辦識出在影音檔中的文字，、作為文字資料；依據該文字資料、該音訊特徵分析結果从及其對應之關鍵影格，如步驟S24產生一索引表格；以將該索引表格及該影音檔儲存，如步驟S26產生一可透j 文字或音訊特徵進行檢索之音訊/影訊檢索資料庫。k 其中’前述步驟中利用解多工從影像檔中分析出視丢以及音訊，並將視訊與音訊的數據與訊號作特徵分析，$ 用視訊特徵分析S1 4將影片中顏色分佈作特徵分析，利音訊特徵分析S1 6將音訊中聲音數位訊號作特徵分析。利用場景切割S18將上述視訊特徵分析S14及音訊特^ 分析S1 6之結果切割成容易管理的個別片段，標記出每'個徵1220483 V. Description of the invention (1) [Field of the invention] The present invention is related to the establishment of audio / video search database, and the song search system using the aforementioned database, in order to find out exactly through text ~ law. The specific string in the text is placed in the video file and the corresponding scene to achieve the purpose of retrieving the specific string and playing the specific scene. [Background of the invention] ° In recent years, KTV, MTV, and karaoke have become popular. Music. For the current song ordering system, search by song name or singer. If you forget the song name, it will make too many songs difficult to find, or you will search for the retrieval data based on the song name. Hour = Waste of Time 'or you can't find the song you really need. [Summary of the Invention] In view of this, the present invention provides a method for establishing a library of audio / video materials to achieve a tight time point in each scene through the text, and accurately find a specific word in the text. String in the video file and the corresponding scene f set "Yida: Retrieve a specific string, broadcast the purpose of a specific scene. = J check, system, using the aforementioned database to allow users to see text or audio features Retrieve a song or its fragment. °-段-ΐ 来: -ί To practice a new song, you often need to repeat a certain song in the song = Medium = Requested songs, and directly re: the database, find out Xi Xi, ί is so-be Data, to retrieve the position of the specific string in the audio / video information corresponding to the scene in the movie file. 213.8530TW (N) _ ^ 9114 彳咖啡 kl ^ $ 4 页 1220483 V. Description of the invention (2) In order to achieve the retrieval of a specific string, it is broadcasted so that 0, 10, 4 and 4 of the present invention can be easily understood. The order and its purpose, characteristics, and advantages can be explained in more detail as follows: The figure attached to the port is detailed. [Embodiment] It is shown in the first picture of the "I t diagram" The operation flow chart of the establishment of the retrieval database of video / video. Dingye Zhili first obliquely steps S10 to remove the video from a video and audio mark respectively: 2: The video and the audio are separately divided according to step S14, and Perform audio feature analysis in step S16. According to the result of the above-mentioned video feature analysis, perform step ^ on the video file such as step su, and extract all the scenes obtained from the foregoing cutting as step ^ 0. Video optical text recognition, extract the text in the key frame, identify the text in the video file, and use it as text data; based on the text data and the result of the audio feature analysis, and the corresponding key frame, as in step S24 Generate an index table; to store the index table and the audio-visual file, such as step S26 to generate an audio / video retrieval database that can be transparently searched by text or audio features. K where 'the previous step uses demultiplexing from the image The video loss and audio are analyzed in the file, and the data and signal of the video and audio are analyzed. The video feature analysis is used to analyze the color distribution in the film. Analysis, Lee Audio Feature Analysis S16 analyzes the digital sound signals in the audio as feature analysis. Use scene cutting S18 to cut the above video feature analysis S14 and audio features ^ The results of S16 analysis are cut into individual segments that are easy to manage, marking each ' Sign

1220483 五、發明說明（3) 場景的開始/結束點，文字所位置的場景，以 1在一個/!場景甲的時間點例如"明姆的月光更照亮了：的二子串攸歌曲中的2分鐘19秒開始顯示。在撥放每 t巾’必須進行變化的時間點上就必須是—個關鍵影’、 -抛ί下來請同時參考第2圖’第2 ®係顯示視訊光學文子辨識之詳細操作流程圖，其中從步驟S40到步驟S46定為偵測程序’從步驟S48到步驟S49定義為辨識程序。茕首先取得一關鍵影格，接著如步驟S40進行紋理分析’對圓像灰度（滚淡）空間分布模武的提取和分析，如步驟S42對關鍵影格進行動作分析，接著如步驟s4〇將推行紋理分析與步驟S42動作分析的結果如步驟S44進行= 整合接著如步驟S46利用顏色區域切割，對、影像作門檻王 (threshold)，邊緣偵測（edge detection)之後作區域擴張（region growing)的方法來消除過量分割、 (^vei^segmentation)現象，以求出較好的分割效果以完成影像分割的動作，並如步驟S48對文字區塊作文字士割，如步驟S49進行文字辨識，依據文字辨識模組之= 抽取演算法求得文字特徵值加以分析分類之結果 * 資料。 p句又予接下來請參考第3圖，第3圖係顯示一實施例利用立訊/影訊檢索資料庫檢索κτν歌曲之系統的系統架構，= 明之實施例之系統架構將說明如下，其中以檢索歌曲為一實施例，但本發明之實施範圍不拘限於此。 … 0213.8530TWF(N);STLC-01-D-9114;franklin.ptd 第6頁丄220483 五發明說明（4) 依據本發明實施例之一種KTV歌曲檢索系餘影訊檢索資料庫55、_第—輸人裝置5G、: ’包括- 二輪入裝置60、一音訊特徵分析單元52、—音0、一第早元53、一搜尋引擎65。特徵比對音訊/影訊檢索資料庫55，儲存有複數曲目、徵、及索引表格，透過該索引表可由文字或立 '訊特而得到對應之曲目及片⑨’第-輸入裝置5 0 : 7°。：：： ^語音資W ;第二輸入裝置6〇，用以輸入文用 :徵分析單元52 ’用以接收由該第一輸入裝置二：二: 曰資料並進行音訊特徵分析，音訊特徵比對單元53，將該語音資料之音訊特徵與該音訊/影訊檢索資料庫中，存之音訊特徵進行比對，若有相符合者則將對應之曲目 =片段輪出；以及搜尋引擎65 ’用以在該音訊/影訊檢索賣料庫中搜尋符合由該第二輸入裝置所輸入文字資料之曲目’並將符合之曲目或片段輸出。請參考第3圖’使用者如果想檢索一歌曲，依據第一輸入裝置50例如一麥克風哼唱一歌曲片段，接著利用音訊特徵分析單元52對歌曲片段進行音訊特徵分析，利用音訊特徵比對分析單元5 3將歌曲片段的音訊的特徵與音訊/影訊檢索資料庫5 5中索引資料進行特徵比對分析，最後若有旋律相符合者則將對應之曲目或片段輸出。參考第3圖，同時可藉由第二輸入裝置6〇例如一鍵盤輸入一索引值例如歌曲名稱、歌手名稱、歌詞片段，經由搜尋引擎65，搜尋音訊/影訊檢索資料庫55，最後若有符1220483 V. Description of the invention (3) The start / end point of the scene, the scene where the text is located, the time point of 1 in a /! Scene A. For example, "Mim's moonlight is even more illuminated: in the two substrings song" The display starts at 2 minutes and 19 seconds. At the point in time that every t-shirt must be changed, it must be a key image.-Please refer to Figure 2 at the same time. Figure 2 shows the detailed operation flow chart of video optical recognition. From step S40 to step S46, the detection program is defined. From step S48 to step S49, it is defined as the identification program.茕 First obtain a key frame, then perform texture analysis in step S40, and extract and analyze the grayscale (fade) spatial distribution model of the circle image, perform motion analysis on the key frame in step S42, and then implement it in step s40. The results of the texture analysis and the action analysis of step S42 are performed as in step S44 = integration. Then, as in step S46, the color area is cut, and the image is thresholded. After edge detection, the area is expanded. Method to eliminate excessive segmentation and (^ vei ^ segmentation) phenomena, in order to find a better segmentation effect to complete the image segmentation action, and perform text division on the text block in step S48, and perform text recognition in step S49. Character recognition module = Extraction algorithm to obtain text feature values and analyze and classify the results * data. Please refer to FIG. 3 for the p sentence. FIG. 3 shows the system architecture of a system for retrieving κτν songs by using an example of Lixun / Video Retrieval Database. The system architecture of the illustrated embodiment will be described below. Searching for songs is an example, but the scope of implementation of the present invention is not limited to this. … 0213.8530TWF (N); STLC-01-D-9114; franklin.ptd Page 6 丄 220483 Five Descriptions of the Invention (4) A KTV song retrieval system according to an embodiment of the present invention is Yu Yingxun's retrieval database 55, _ 第 — Input device 5G ,: 'Includes-two round-in device 60, an audio feature analysis unit 52,-tone 0, a first early element 53, and a search engine 65. Feature comparison audio / video retrieval database 55, which stores a plurality of tracks, signs, and index tables, through which the corresponding tracks and films can be obtained by text or by the 'news'-input device 5 0: 7 °. ：： ^ Voice information W; the second input device 60 is used to input the literacy: sign analysis unit 52 ′ to receive the data from the first input device two: two: the data and the audio characteristic analysis, the audio characteristic ratio The unit 53 compares the audio features of the voice data with the stored audio features in the audio / video retrieval database, and if there is a match, the corresponding track = segment rotation; and the search engine 65 'uses To search the audio / video search and retrieval database for tracks that match the text data input by the second input device 'and output the matching tracks or segments. Please refer to FIG. 3 'If the user wants to retrieve a song, hum a song segment according to the first input device 50, such as a microphone, and then use the audio feature analysis unit 52 to perform audio feature analysis on the song segment and use audio feature comparison analysis The unit 53 compares and analyzes the characteristics of the audio of the song segment with the index data in the audio / video retrieval database 55. Finally, if there is a melody match, the corresponding track or segment is output. Referring to FIG. 3, at the same time, an index value such as song name, artist name, and lyrics segment can be input through the second input device 60 such as a keyboard, and the audio / video retrieval database 55 is searched through the search engine 65.

0213-853〇riW(N);STl£.〇i.D.9ii4；franklin.ptd 第7頁 1220483 發明說明（5) 合索引值者輸出符合條件之曲目；以及依據第一輸入裝置例如一麥克風輸入一索引值例如歌：名稱、歌手名稱、歌詞片段，利用聲音來替代鍵盤輸 ’、引值來達到雙向的自然人機互動，接著經由語音辨 ί ’辨識輸入之索弓1值，將資料傳送到搜尋引擎65，利用尋引擎65搜尋音訊/影訊檢索資料庫55，$後若有符合索引值者輸出符合條件之曲目^ =外:本發明在場景切割S1 8時，將影音檔之場景切割成容f管理的個別片段’標記出每個場景的開始/結束 =，文字所位在的場景，以及每一句文字在每個場景中的時間點例如2分鐘29秒’將這些資料儲存至音訊/影訊檢索資料庫S26 ’其中例如使用者依據麥克風輸入一歌曲片段或依據鍵盤或麥克風輸入一歌詞片段為索〖丨值所搜尋到的歌曲其撥放的方式可直接撥放出歌詞片段在歌曲中的音訊以及視訊之資料，舉例來說，當輸入歌曲片段或歌詞片段例如"明媚的月光更照亮了我的心"為—索引值，在音訊/ 影訊檢索資料庫5 5中尋找到綠島小夜曲這首歌中的歌詞片段符合索引值，當選擇撥放綠島小夜曲這首歌曲時，可直接從綠島小夜曲中歌詞片段"明媚的月光更照亮了我的心" 開始撥放"明媚的月光更照亮了我的心"的音訊以及視訊之資料，不需要重頭開始撥放歌曲，尤其通常在作歌唱練習時，經常需要重覆練習歌曲中某一段’但經常反覆倒帶不具有便利性，以及無法直接撥放歌曲中某一段的音訊以及視訊之資料形成時間上的浪費，但藉由本發明建立一視訊0213-853〇riW (N); ST1 £ .〇iD9ii4; franklin.ptd page 7 1220483 Description of the invention (5) Those who meet the index value output a track that meets the conditions; and input an index based on the first input device such as a microphone Values such as song: name, singer name, lyrics fragment, use sound instead of keyboard input, and value to achieve two-way natural human-computer interaction, and then recognize the input value of cable bow 1 by voice recognition and send the data to the search engine 65. Use the search engine 65 to search the audio / video retrieval database 55. If $ meets the index value after the $, then output the matching track ^ = Out: The present invention cuts the scene of the video file into a volume f when the scene cuts S1 8. Individual clips of management 'mark the start / end of each scene =, the scene where the text is located, and the time point of each sentence in each scene, such as 2 minutes 29 seconds' Save these data to audio / video retrieval Database S26 'In which, for example, a user inputs a song fragment according to a microphone or a lyrics fragment according to a keyboard or a microphone, and the playback method of the searched song Directly play out the audio and video information of the lyrics segment in the song. For example, when entering a song segment or lyrics segment such as " A bright moonlight illuminates my heart " as an index value, in audio / video Search the database 5 and find the lyrics fragments in the song "Lvdao Serenade" that meets the index value. When you choose to play the song "Lvdao Serenade", you can directly read the lyrics fragments from "Lvdao Serenade". The bright moonlight is even brighter. My heart " Start playing " The bright moonlight illuminates the audio and video information of my heart " I do n’t need to start playing songs again, especially when I ’m singing practice, I often need to repeat the practice A certain section of the song, but often repeated rewinding is not convenient, and the inability to directly play the audio and video data of a certain section of the song is a waste of time, but a video is created by the present invention.

〇213-8530TW(N)；STLC.〇l.D.9114;franklin.ptd 第8頁 1220483 五、發明說明（6) 與音訊的檢索資料庫〜^ 視訊與音訊同時撥放之功能雖然本發明已以較佳實施例揭露如上然其限定本發明，任何熟習此技藝者，在 :政和範圍内’當可作各種之更動與：：：精神範圍當視後附之中請專利範圍所界定者=本發明之保護〇213-8530TW (N); STLC.〇lD9114; franklin.ptd Page 8 1220483 V. Description of the invention (6) Retrieval database with audio ~ ^ The function of video and audio playback at the same time The preferred embodiment discloses that the invention is limited as above. Anyone who is familiar with this skill will be able to make various changes within the scope of: "political and political scope" and ::: spiritual scope. Please attach the scope of the patent as defined in the appendix = the invention Protection

0213-8530TWF(N);STLjC-01-D-9114;franklin.ptd0213-8530TWF (N); STLjC-01-D-9114; franklin.ptd

1220483 圖式簡單說明【圖式之簡單說明】圖·第】圖係顯示音訊/視訊的檢索資料庫建立之操作流程第2圖係顯示視訊光學文字辨識之操作流程圖，· 第3圖係顯示一實施例利用音訊/影訊檢索資 KTV歌曲之系統的系統架構。辱檢索【符號說明】 S1 0〜從影像檔中擷取出視訊與音訊； S14〜進行視訊特徵分析； S16〜進行音訊特徵分析； S18〜進行場景切割； S20〜抽取關鍵影格； S22〜進行視訊光學文字辨識； S24〜產生一索引表格； S26產生一音訊/視訊的檢索資料庫； S40〜進行紋理分析 S42〜進行動作分析 S44〜進行多框整合 S46〜進行顏色區域切割； S48〜進行文字切割； S49〜進行文字辨識； 50、70〜第一輸入裝置； 5 2〜音訊特徵分析單元； 53〜音訊特徵比對分析單元； 0213-8530TWF(N) »ST1C-01-D-9114; frank 1 in.ptd 第10頁 12204831220483 Simple description of the diagram [Simplified description of the diagram] Figure · Page] shows the operation flow of the audio / video search database establishment. Figure 2 shows the operation flowchart of video optical text recognition. · Figure 3 shows An embodiment uses the system architecture of a system for audio / video retrieval of KTV songs. [Remark] S1 0 ~ Extract video and audio from image file; S14 ~ Perform video feature analysis; S16 ~ Perform audio feature analysis; S18 ~ Perform scene cutting; S20 ~ Extract key frames; S22 ~ Perform video optics Character recognition; S24 ~ generates an index table; S26 generates an audio / video retrieval database; S40 ~ performs texture analysis S42 ~ performs motion analysis S44 ~ performs multi-frame integration S46 ~ performs color area cutting; S48 ~ performs text cutting; S49 ~ Perform character recognition; 50, 70 ~ First input device; 5 2 ~ Audio feature analysis unit; 53 ~ Audio feature comparison analysis unit; 0213-8530TWF (N) »ST1C-01-D-9114; frank 1 in .ptd Page 10 1220483

0213-853OIW(N);STLC-01-D-9114;franklin.ptd 第11頁0213-853OIW (N); STLC-01-D-9114; franklin.ptd Page 11

Claims

雙止督換丨 X, 9a ^ 17 f年且ϋ 六、申請專利範圍 1 · 一種音訊/影訊檢索步驟：枓庫之建立方法，包括以下從一影音檔中分別擷取對該視訊及該音訊分現訊和音訊；特徵分析；行視訊特徵分析及音訊依據上述視訊特徵分析 Α “ 該影音檔進行場景切割；9訊特徵分析之結果，對由前述切割而得之所有場景擷取該關鍵影格中之々A 、 4取出關鍵影格；依據該文字資料、該t:特：：㈡資料；關鍵影格’產生索引表格；以及 D果、及其對應之將該索引表格及該影音檑儲存或音訊特徵進行檢索之音訊/影訊檢索 —可透過文字 2.如申請專利範圍第卜貝戶斤述之音訊影^ 之建立方法，其中’該場景切割係將該影；料庫所杜您％景及在每一該場景中之時間點。 3·如申請專利範圍第2項所述之音訊/影訊檢索資之建立方法，#中，更將每一該等片段之起始點和結點、文字所在之場景及在每一該場景中之時間點，該音訊/影訊檢索資料庫。〃 ^子之…並_，片段(場景)之起始點：以：別文子所在之場景及在每_該場景中之時間點。衷：ί丨丨图结' Ο :士、夕立二β / —. 4·如申請專利範圍第3項所述之音訊/影訊檢索之建立方法，其中，透過文字在每一該場景中之時間點，更精確地找出該文字中特定字串在該影音檔中及對廡場iDouble-stop supervisory exchange X, 9a ^ 17 f years and 六 6. Scope of patent application 1 · An audio / video search step: The method of building a library includes the following steps to extract the video and audio from a video file Distinguish news and audio; feature analysis; perform video feature analysis and audio analysis based on the above video feature analysis A "The video and audio file is used to cut the scene; the result of the 9 feature analysis is to extract the key frame for all the scenes obtained from the aforementioned cutting Take out the key frames in 々A, 4; according to the textual data, the t: special :: ㈡data; keyframe 'to generate an index form; and D fruit, and its corresponding storage of the index form and the audiovisual 檑 or audio Feature search for audio / video search—can be done through text 2. For example, the method for establishing the audio and video ^ described by the patent application scope, which includes' the scene cutting is the video; At each time point in this scene 3. As in the method for establishing audio / video search data as described in item 2 of the scope of patent application, in #, the beginning of each of these segments is also The points and nodes, the scene where the text is located, and the time point in each scene, the audio / video search database. 〃子之 ... 和 _, the starting point of the clip (scene): start with: where the text is Scenes and the time points in each scene. Sincere: ί 丨丨图结 '〇: Shi, Xi Li Er β / —. 4. · Method for establishing audio / video search as described in item 3 of the scope of patent application In which, through the time of the text in each scene, more precisely find the specific string in the text in the video file and the confrontation field i

0213-8530TWFl(N);STLC-01-D-9114;franklin.ptc 第12頁 1220483 / _案號91123913 / 犯餐、月日修正_ 六、申請專利範圍之位置，以達到檢索特定字串並播出特定場景之目的。 5. 如申請專利範圍第1項所述之音訊/影訊檢索資料庫之建立方法，其中，擷取該關鍵影格中之文字係使用視訊光學文字辨識法。 6. 如申請專利範圍第1項所述之音訊/影訊檢索資料庫之建立方法，其中，該影音檔係為K T V影片檔。 7. —種K T V歌曲之檢索系統，包括：一音訊/影訊檢索資料庫，儲存有複數曲目、音訊特徵、及索引表格，透過該索引表可由文字或音訊特徵檢索而得到對應之曲目及片段；一第一輸入裝置，用以輸入語音資料；一第二輸入裝置，用以輸入文字資料；一音訊特徵分析單元，用以接收由該第一輸入裝置所輸入之語音資料並進行音訊特徵分析；一音訊特徵比對單元，用以將該語音資料之音訊特徵與該音訊/影訊檢索資料庫中所儲存之音訊特徵進行比對，若有相符合者則將對應之曲目或片段輸出；以及一搜尋引擎，用以在該音訊/影訊檢索資料庫中搜尋符合由該第二輸入裝置所輸入文字資料之曲目，並將符合之曲目或片段輸出。 8. 如申請專利範圍第7項所述之KTV歌曲之檢索系統，更包括一語音辨識單元，用以將由該第一輸入裝置所輸入之語音資料轉換為文字字串，再提供給該搜尋引擎進行比對。0213-8530TWFl (N); STLC-01-D-9114; franklin.ptc Page 12 1220483 / _ Case No. 91123913 / Messing meals, month and day correction _ 6, the position of the scope of patent application, in order to search for a specific string and The purpose of broadcasting a particular scene. 5. The method for establishing an audio / video search database as described in item 1 of the scope of patent application, wherein the text in the key frame is captured using video optical text recognition. 6. The method for establishing an audio / video search database as described in item 1 of the scope of patent application, wherein the video file is a K T V video file. 7. — A retrieval system for KTV songs, including: an audio / video retrieval database that stores plural tracks, audio features, and index tables, through which text or audio features can be retrieved to obtain corresponding tracks and fragments; A first input device for inputting voice data; a second input device for inputting text data; an audio feature analysis unit for receiving voice data input by the first input device and performing audio feature analysis; An audio feature comparison unit for comparing the audio feature of the voice data with the audio feature stored in the audio / video retrieval database, and outputting the corresponding track or segment if there is a match; and The search engine is used to search the audio / video retrieval database for tracks matching the text data input by the second input device, and output the matching tracks or segments. 8. The KTV song retrieval system described in item 7 of the scope of patent application, further comprising a voice recognition unit for converting the voice data inputted by the first input device into a text string, and then providing it to the search engine Compare.

0213-8530TWFl(N);STLC-01-D-9114;franklin.ptc 第13頁 1220483 _案號 91123913 U 赛，月日__ 六、申請專利範圍 · ' :…' 9.如申請專利範圍第8項所述之KTV歌曲之檢索系統，其中，該第一輸入裝置所輸入之語音資料係指明歌曲名稱、歌手名稱、及歌詞片段。 1 0.如申請專利範圍第7項所述之KTV歌曲之檢索系統，其中，該第一輸入裝置為麥克風，所輸入之語音資料係為歌曲片段。 11.如申請專利範圍第7項所述之KTV歌曲之檢索系統，其中，該第二輸入裝置為鍵盤，所輸入之文字資料係指明歌曲名稱、歌手名稱、及歌詞片段。0213-8530TWFl (N); STLC-01-D-9114; franklin.ptc Page 13 1220483 _ Case No. 91123913 U race, month __ VI. Patent application scope · ': ...' 9. The retrieval system for a KTV song according to item 8, wherein the voice data input by the first input device specifies a song name, an artist name, and a lyrics segment. 10. The retrieval system for a KTV song according to item 7 of the scope of the patent application, wherein the first input device is a microphone and the input voice data is a song segment. 11. The retrieval system for KTV songs as described in item 7 of the scope of the patent application, wherein the second input device is a keyboard, and the input text data specifies the song name, artist name, and lyrics segment.

0213-8530TWFl(N);STLC-01-D-9114;franklm.ptc 第14頁0213-8530TWFl (N); STLC-01-D-9114; franklm.ptc Page 14