KR20080024876A

KR20080024876A - Audio searching system

Info

Publication number: KR20080024876A
Application number: KR1020060089588A
Authority: KR
Inventors: 김도형
Original assignee: 엘지전자 주식회사
Priority date: 2006-09-15
Filing date: 2006-09-15
Publication date: 2008-03-19

Abstract

A device and a method for searching sections of audio data are provided to improve searching efficiency by segmenting the audio sections on the basis of feature information and displaying feature information for each segmented section. An audio searching method comprises the steps: displaying section information which is segmented on the basis of feature information of audio data; searching the displayed section and playing the corresponding audio section; displaying an audio searching interface having each of feature information for each section; and searching the segmented section and reproducing a corresponding audio signal according to an audio section searching command.

Description

오디오 구간 검색장치와 방법{Audio Searching System}Audio Segment Searcher and Method {Audio Searching System}

도1은 본 발명의 실시예에 따른 오디오 구간 검색의 개념을 설명하기 위한 도면1 is a view for explaining a concept of audio section search according to an embodiment of the present invention;

도2는 본 발명의 실시예에 따른 데이터 처리장치의 블록 구성도2 is a block diagram of a data processing apparatus according to an embodiment of the present invention;

도3은 본 발명의 실시예에 따른 오디오 구간 검색 시스템의 구조를 나타낸 도면3 is a diagram illustrating a structure of an audio section search system according to an exemplary embodiment of the present invention.

도4는 본 발명의 실시예에 따른 사용자 인터페이스의 실시예를 나타낸 도면4 illustrates an embodiment of a user interface in accordance with an embodiment of the present invention.

본 발명은 오디오 데이터 및/또는 오디오 신호의 특정 구간을 검색하는 방법과 그 검색 장치에 관한 것이다.The present invention relates to a method for searching for a specific section of audio data and / or an audio signal, and a search apparatus thereof.

오디오 신호를 기록 및/또는 재생하는 기기가 널리 보급되고 있다. 오디오 신호를 기록 및/또는 재생하는 기기는 기존의 아날로그 오디오 기록 및/또는 재생기기 뿐만 아니라, 디지털 오디오 데이터를 기록 및/또는 재생하는 기기까지 다양한 종류가 있다. 디지털 오디오 데이터를 기록 및/또는 재생하는 기기는 오디오 기기 그 자체 뿐만 아니라, 디지털 오디오 데이터를 기록 및/또는 재생할 수 있는 기 능을 탑재한 다른 장치(기기)들에도 해당된다. 예를 들면 전자의 경우 MP3 플레이어라고 알려지고 있는 디지털 오디오 재생기기를 들 수 있고, 이러한 오디오 기기는 오디오 신호의 기록(저장) 기능까지 겸비하는 경우가 적지 않다. 후자의 경우는 디지털 오디오 재생 기능을 탑재한 퍼스널 컴퓨터(PC)를 들 수 있으며, 또한 디지털 오디오 재생(및/또는 기록) 기능을 탑재한 이동통신 단말기를 들 수 있다.Devices for recording and / or reproducing audio signals are widespread. There are various types of devices for recording and / or reproducing audio signals, as well as devices for recording and / or reproducing digital audio data, as well as existing analog audio recording and / or reproducing devices. Devices for recording and / or playing digital audio data are applicable not only to the audio device itself but also to other devices (devices) equipped with a function capable of recording and / or playing digital audio data. For example, in the former case, there is a digital audio player known as an MP3 player. Such an audio device often combines an audio signal recording (storage) function. The latter case may be a personal computer (PC) equipped with a digital audio reproduction function, and a mobile communication terminal equipped with a digital audio reproduction (and / or recording) function.

여기서 오디오는 일반적으로 음악(아날로그 음악신호 또는 디지털 음악 파일) 뿐만 아니라, 음성신호를 비롯하여 자연 상에서 존재할 수 있는 다양한 음향을 포함하는 의미로 이해할 수 있다.Here, audio may be understood as meaning not only music (analog music signal or digital music file) but also various sounds that may exist in nature, including voice signals.

이러한 오디오 기기에서는 저장된 오디오 신호의 특정 구간을 검색해야 하는 경우가 있다. 예를 들면, 음성신호를 녹음한 경우에 그 음성신호에 대해서 사용자가 관심있는 특정 구간을 찾아내야 하는 경우를 들 수 있다. 기존의 오디오 기기에서 오디오의 구간 검색을 위한 사용자 인터페이스를 제공하고 있는데, 그 예로는 재생 키, 정지 키, 앞으로 이동 키(Forward Jump Key), 뒤로 이동 키(Reward Jump Key)를 두고 이러한 키 입력에 따라 기기가 특정 구간으로 이동하여 이동된 위치의 오디오를 재생하는 기법을 사용하고 있다.In such an audio device, it is sometimes necessary to search for a specific section of the stored audio signal. For example, when a voice signal is recorded, it is necessary to find a particular section of interest for the voice signal. Existing audio devices provide a user interface for searching sections of audio. For example, play key, stop key, forward jump key, and backward jump key are assigned to these key inputs. Therefore, the device moves to a specific section and uses a technique of playing audio at the moved position.

그렇지만 오디오의 경우 사용자가 그 구간의 오디오를 일정시간 이상 재생하여 청취하는 경우에만 그 위치에 어떤 내용이 녹음되었는지를 인식할 수 있기 때문에 원하는 구간을 찾아내기까지 적지않은 시간이 소요된다. 또한, 구간 탐색을 위하여 이동 키(Forward Key, Reward Key)와 정지, 재생 키들을 반복하여 조작해야 하고, 구간 이동이 실행되어도 미리 정해진 일정한 시간 길이만큼만 이동하기 때문 에 원하는 구간을 찾기까지 반복되는 기기 조작이 요구된다. 또한, 재생(청취)되지 않는 구간에 대해서는 어떤 내용이 녹음되어 있는지, 그 인식 자체가 불가능하기 때문에 검색을 원하는 구간을 경과할 가능성도 매우 커지게 된다. 이러한 경우는 더욱더 오랜 검색 시간이 소요되고, 불편함도 감수해야 한다.However, in the case of audio, it can take a long time to find a desired section because the user can recognize what content is recorded at the location only when the user plays the audio of the section for a predetermined time or more. In addition, the navigation key (Forward Key, Reward Key) and the stop and playback keys must be repeatedly operated to search the section, and the device repeats until the desired section is searched because it moves only a predetermined time length even when the section movement is executed. Manipulation is required. In addition, since the contents are not recorded for the sections that are not reproduced (listened) and the recognition itself is impossible, the possibility of passing the sections to be searched becomes very large. In this case, the search time is longer and the inconvenience is to be taken.

본 발명의 목적은 사용자가 원하는 구간을 빠르고 간편하게 검색할 수 있는 오디오 구간 검색장치와 그 방법을 제공하는데 있다.It is an object of the present invention to provide an audio section search apparatus and method for searching a section desired by a user quickly and simply.

본 발명의 다른 목적은 오디오를 특징정보를 기반으로 분할(Segmentation)하고, 분할된 각각의 구간마다 특징정보를 표현하여 줌으로써, 사용자가 원하는 구간을 빠르고 간편하게 검색할 수 있는 오디오 구간 검색장치와 그 방법을 제공하는데 있다.It is another object of the present invention to segment an audio based on feature information and to display feature information for each segment, so that a user can quickly and easily search for a desired section and an audio section search apparatus and method thereof. To provide.

본 발명의 또 다른 목적은 오디오를 음향학적인 특징이나 의미적인 특징을 토대로 구간 분할하고, 각각의 오디오 구간 정보를 사용자 인터페이스로 디스플레이하여 줌으로써, 사용자가 원하는 구간을 보다 빠르고 간편하게 검색할 수 있는 오디오 구간 검색장치와 그 방법을 제공하는데 있다.Yet another object of the present invention is to segment audio based on acoustic or semantic characteristics, and display audio section information on a user interface so that the user can quickly and easily search for a desired section. An apparatus and a method thereof are provided.

상기 목적을 달성하기 위한 본 발명에 따른 오디오 구간 검색장치는, 오디오 특징정보를 기반으로 구간 단위로 분할된 오디오 데이터를 저장하는 저장부; 상기 오디오 데이터의 구간정보를 디스플레이하는 디스플레이부; 구간 검색 명령에 따라 상기 구간 단위로 오디오 데이터를 선택하여 재생하는 재생 제어부;를 포함하여 이 루어지는 것을 특징으로 한다.An audio section search apparatus according to the present invention for achieving the above object, the storage unit for storing the audio data divided by section based on the audio feature information; A display unit which displays section information of the audio data; And a playback controller for selecting and playing back audio data in units of sections according to a section search command.

또한 상기 목적을 달성하기 위한 본 발명에 따른 오디오 구간 검색방법은, 오디오 특징정보를 기반으로 분할된 구간정보를 디스플레이하는 단계; 상기 디스플레이된 구간 단위로 이동하여 해당 구간의 오디오를 재생하는 단계;를 포함하여 이루어지는 것을 특징으로 한다.In addition, the audio section search method according to the present invention for achieving the above object, the step of displaying the segment information divided based on the audio feature information; And moving to the displayed interval unit to reproduce the audio of the corresponding interval.

또한 상기 목적을 달성하기 위한 본 발명에 따른 오디오 구간 검색방법은, 구간 단위로 분할되고, 분할된 각 구간의 특징을 표현하는 정보가 포함된 오디오 검색 인터페이스를 디스플레이하는 단계; 오디오 구간 검색 명령 입력에 따라, 상기 분할된 구간 단위로 이동하여 해당 구간의 오디오 신호를 재생하는 단계; 를 포함하여 이루어지는 것을 특징으로 한다.In addition, the audio section search method according to the present invention for achieving the above object is divided into sections, and displaying an audio search interface including information representing the characteristics of each divided section; Reproducing an audio signal of a corresponding section by moving in the divided section unit according to an audio section search command input; Characterized in that comprises a.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 따른 오디오 구간 검색장치와 그 방법을 설명한다.Hereinafter, an audio section search apparatus and a method thereof according to an embodiment of the present invention will be described with reference to the accompanying drawings.

도1은 본 발명의 실시예에 따른 오디오 구간 검색의 개념을 설명하기 위한 도면이다. 도1은 일종의 사용자 인터페이스 구조의 개념을 보여준다. 가로(X)와 세로(Y)의 2개의 시간축이 표현되고 있다. 오디오 데이터는 여러 개의 세그먼트(segment)로 분할되어 표현되며, 각각의 세그먼트는 하나의 구간이 된다. 각각의 오디오 구간, 즉 세그먼트들에는 그 구간마다 그 오디오 구간의 특징을 표현하는 정보가 부여되며 이 정보는 사용자 인터페이스 상에서 표현된다.1 is a view for explaining the concept of audio section search according to an embodiment of the present invention. Figure 1 shows the concept of a kind of user interface structure. Two time axes, horizontal (X) and vertical (Y), are represented. The audio data is divided into several segments and represented, and each segment is one section. Each audio section, i.e., segments, is given information representing the characteristics of the audio section for each section, and the information is represented on the user interface.

도1에 나타낸 바와 같이, 각각의 세그먼트를 구별할 수 있는 시각적인 표현으로 사용자에게 오디오 구간 검색을 위한 인터페이스가 제공되는데, 여기서 예시 한 세그먼트의 표현 형태나 세그먼트 구조의 표현 형태는 하나의 예를 나타낸 것 뿐이며, 각 세그먼트의 특징을 구별하여 표현할 수 있는 다양한 방법들이 사용될 수 있을 것이다.As shown in FIG. 1, an interface for searching an audio section is provided to a user as a visual representation that can distinguish each segment, wherein the representation form of the illustrated segment or the representation form of the segment structure shows one example. In addition, various methods for distinguishing and expressing the characteristics of each segment may be used.

각각의 세그먼트들은 단일 계층의 구조를 가질 수도 있겠으나, 도1에 나타낸 바와 같이 계층적인 구조(세로 시간축 참조)를 가질 수도 있다. 즉, 다중 세그먼트 구조를 이루는 것이다. 도1에서는 제1 세그먼트 구조(열)(A)와 제2 세그먼트 구조(열)(B)의 형태로 다중 세그먼트가 표현되고 있다.Each segment may have a single hierarchical structure, but may have a hierarchical structure (see vertical time axis) as shown in FIG. That is, to form a multi-segment structure. In Fig. 1, multiple segments are represented in the form of a first segment structure (column) A and a second segment structure (column) (B).

오디오 구간을 검색할 경우에는 사용자 명령에 따라 이동된 구간, 즉 특정 세그먼트 상에서 재생이 이루어지며, 이동된 하나의 세그먼트에 대해서만 재생이 이루어질 수도 있겠지만, 검색의 편의성을 좋게 하고 검색 속도를 높이기 위해서 동시에 여러 개의 세그먼트를 재생할 수도 있다. 후자의 경우는 다중 재생에 해당된다. 후자의 경우와 같이 여러 개의 세그먼트를 동시에 재생하는 경우에는 시간적으로 이전에 위치한 세그먼트(과거 세그먼트)와 현재 재생되는 세그먼트들을 서로 구별할 수 있도록 시각적, 청각적인 차이를 둔다.When searching for an audio section, playback is performed on a section that has been moved according to a user command, that is, a specific segment, and playback may be performed on only one segment that has been moved. You can also play two segments. The latter case corresponds to multiple playback. As in the latter case, when several segments are played at the same time, a visual and auditory difference is made so as to distinguish between the segments previously located (past segments) and the currently played segments.

시각적으로는 재생 중임을 표현하는 정보(재생 포인터)의 크기나 색상에 차이를 주는 방법을 사용할 수 있고, 청각적으로는 이전 세그먼트의 재생 음량은 작게 하고 현재 세그먼트의 재생 음량은 상대적으로 더 높여주는 방법을 사용할 수 있다. 또한, 이전 세그먼트의 재생 음량은 점진적으로 감소시키고, 현재 세그먼트의 재생 음량은 상대적으로 증가시키는 방법으로 구별시킬 수 있다. 이렇게 하면 시청각적으로 이전 검색 구간과 현재 검색 구간을 구별할 수 있게 된다.Visually, you can use a method that makes a difference in the size or color of the information (play pointer) that indicates that you are playing. Acoustically, the playback volume of the previous segment is reduced and the playback volume of the current segment is relatively higher. Method can be used. Also, the playback volume of the previous segment may be gradually decreased, and the playback volume of the current segment may be relatively increased. This makes it possible to visually distinguish the previous search section from the current search section.

본 발명에서는 사용자가 오디오 검색을 하고자 하는 경우, 세그먼트 단위로 이동하여 재생되며, 그 세그먼트의 내용을 모두 재생할 때까지 기다리지 않고도 빠른 검색이 가능하도록 복수의 과거 세그먼트의 내용이 동시에 재생된다. 즉, 다중 구간 재생이 이루어지는 것이며, 현재 재생 구간과 이전 구간을 구별할 수 있는 기반을 제공하므로 보다 빠르고 편리한 오디오 구간 검색 성능을 보여준다.In the present invention, when the user wants to search for audio, the content is reproduced by moving in units of segments, and the contents of the plurality of past segments are simultaneously reproduced so that a quick search can be performed without waiting until all contents of the segment are played. That is, multi-section playback is performed, and it provides a faster and more convenient audio section search performance since it provides a basis for distinguishing the current playback section from the previous section.

앞서 설명한 바와 같이 본 발명에서는 오디오 신호를 분석하여 구간 단위로 분할한다. 분할된 오디오 구간마다 그 구간의 특징을 표현하는 정보를 부여하고, 오디오 구간 및 각 구간마다의 특징정보를 표시한다. 사용자는 표시된 특징정보로부터 해당 구간의 오디오 정보를 시각적으로 인식할 수 있다. 사용자는 오디오 검색시 상기 특징정보에 근거하고, 분할된 오디오 구간 단위로 이동하여 원하는 오디오 구간을 검색할 수 있다.As described above, in the present invention, the audio signal is analyzed and divided into sections. Information representing the characteristics of the section is provided for each divided audio section, and the audio section and the feature information for each section are displayed. The user may visually recognize the audio information of the corresponding section from the displayed feature information. The user may search for a desired audio section based on the feature information during audio search and move in units of divided audio sections.

구간 단위의 오디오 분할 및 구간 특징정보를 표현하는 방법으로 오디오 신호에 대한 키워드 스포팅(keyword spotting) 과정과 세그멘테이션(segmentaion) 과정을 거친다.As a method of expressing audio segmentation and segment feature information in units of sections, a keyword spotting process and a segmentation process of an audio signal are performed.

여기서, 키워드 스포팅(keyword spotting) 및 세그멘테이션(segmentaion) 과정은 기기에 저장된 오디오 데이터에 대해서 기기 자체에서 실행될 수도 있겠지만, 키워드 스포팅 및 세그멘테이션 과정을 다른 외부 기기에서 실행하여 오디오 데이터를 구간 단위로 분할하고 각각의 구간마다 해당 구간의 특징을 표현하는 정보가 부여된 형태로 사전에 제공될 수도 있다.Here, the keyword spotting and segmentation process may be performed on the device itself with respect to the audio data stored in the device, but the keyword spotting and segmentation process may be performed on another external device to divide the audio data into sections. The information representing the characteristics of the corresponding section may be provided in advance for each section of.

키워드 스포팅(keyword spotting) 과정은 오디오 신호를 분석하여 특정 키워 드 사운드의 위치를 찾아내는 과정이며, 세그멘테이션(segmentation) 과정은 오디오 신호를 여러 개의 세그먼트(segment)로 분리하는 과정이다. 키워드 스포팅(keyword spotting) 과정이 선행되고 이어서 세그멘테이션(segmentaion) 과정이 수행되거나, 세그멘테이션(segmentaion) 과정이 선행되고 이어서 키워드 스포팅(keyword spotting) 과정이 수행될 수 있다.Keyword spotting is the process of analyzing the audio signal to find the location of a specific keyword sound. Segmentation is the process of separating the audio signal into several segments. A keyword spotting process may be followed by a segmentation process, or a segmentation process may be performed followed by a keyword spotting process.

오디오 신호의 키워드 스포팅(keyword spotting)을 위한 키워드는 여러가지 방법을 이용해서 제공될 수 있다. 예를 들면, 사용자가 키워드를 별도로 입력하여 설정해 주는 방법이다. 키워드의 입력은 문자 형태로 입력될 수도 있고, 사용자가 특정 단어를 녹음하여 입력하는 방식을 사용할 수도 있다. 또는 이미 저장되어 있는 오디오 신호의 일부분을 사용자가 발췌하여 이 발췌된 부분을 키워드로 지정하는 방법을 사용할 수도 있다. 또는 오디오 신호를 자동으로 세그멘테이션하고, 각 세그먼트의 반복 횟수를 구하여 사용 빈도가 높은 세그먼트를 자동으로 키워드로 지정하는 방법을 사용할 수도 있다.Keywords for keyword spotting of audio signals can be provided using various methods. For example, a user inputs and sets a keyword separately. The input of the keyword may be input in the form of a letter, or the user may record and input a specific word. Alternatively, a user may extract a portion of an audio signal that is already stored and designate the extracted portion as a keyword. Alternatively, an audio segment may be automatically segmented, and a number of repetitions of each segment may be obtained to automatically designate a high frequency segment as a keyword.

오디오 신호의 세그멘테이션(segmentaion)은 오디오 신호를 음향학적인 특징이나 의미적인 특징에 기반하여 분할하는 방법을 사용할 수 있다. 음향학적인 특징에 기반하여 세그멘테이션을 수행하는 방법으로는 오디오 신호의 메인 주파수 특성 및 포먼트(formant) 특징을 이용한 성별 구별 특성을 이용할 수 이으며, 의미적인 특징에 기반하여 분할하는 방법은 문장의 종료를 의미하는 어조사를 이용한 문장 구별이나 특정 단어가 등장하는 위치를 이용하는 방법을 사용할 수 있다. 여기서 사용되는 음향학적인 특징이나 의미적인 특징 정보는 해당 구간(세그먼트)의 분할 을 위한 키워드로 지정된 것을 사용할 수 있거나, 세그멘테이션을 수행한 후에 상기 특징 정보를 해당 구간(세그먼트)에 대한 특징을 표현하는 정보로 표현해 주는 방법을 모두 사용할 수 있다. 오디오 구간의 특징을 표현하는 정보는 사용자 인터페이스 구조에서 태그(tag)의 형태로 표현될 수 있다. 즉, 각 세그먼트에 메타데이터 태그(metadata tag)를 입력하고 이를 시각적으로 표현하는 것이다.Segmentation of the audio signal may use a method of dividing the audio signal based on acoustic characteristics or semantic characteristics. As a segmentation method based on an acoustic characteristic, a gender discrimination characteristic using a main frequency characteristic and a formant characteristic of an audio signal can be used, and a segmentation method based on a semantic characteristic is used to terminate a sentence. It is possible to use a method of distinguishing a sentence using a word search or a position using a specific word. The acoustic feature or the semantic feature information used herein may be one designated as a keyword for dividing the segment (segment), or information representing the feature of the segment (segment) after performing segmentation. You can use any of these methods. The information representing the feature of the audio section may be expressed in the form of a tag in the UI structure. That is, a metadata tag is entered into each segment and visually represented.

사용자가 특정 구간(세그먼트)으로의 이동을 원할 경우 해당 세그먼트의 시작 위치나 그 세그먼트에 표현되고 있는 키워드 위치로 이동하여 재생한다. 즉, 세그멘테이션(segmentaion)에 의해서 분할된 구간(segment) 단위로 이동하여 재생하며, 이 때 과거와 현재의 세그먼트를 동시에 재생함으로써, 원하는 구간의 보다 빠른 검색을 가능하게 한다.When the user wants to move to a specific section (segment), the user moves to the start position of the corresponding segment or the keyword position represented in the segment and plays the video. That is, playback is performed by moving in the unit of a segment divided by segmentation, and at this time, the past and the current segment are played at the same time, thereby enabling faster searching of a desired section.

도2는 본 발명의 실시예에 따른 데이터 처리장치의 블록 구성도이다. 도2에 나타낸 데이터 처리장치는 디지털 오디오 신호를 재생하는 기능을 갖는 실시예이다. 데이터 처리장치는 사용자로부터 기기 조작 명령을 입력받기 위한 기기 조작부(10), 디지털 오디오 데이터의 입력 및/또는 출력을 위한 데이터 입출력부(20), 디지털 오디오 데이터의 저장을 위한 변환이나 출력을 위한 데이터 변환부(30), 디지털 오디오 데이터를 저장하는 저장부(40), 디지털 오디오 데이터의 입출력, 변환, 저장을 위한 처리 및 제어를 수행하는 데이터 처리 및 제어부(50), 기기 조작 및 동작 상태 등의 정보를 영상으로 표시하기 위한 디스플레이부(60)를 포함한다.2 is a block diagram of a data processing apparatus according to an embodiment of the present invention. The data processing apparatus shown in Fig. 2 is an embodiment having a function of reproducing a digital audio signal. The data processing apparatus includes a device operation unit 10 for receiving a device operation command from a user, a data input / output unit 20 for input and / or output of digital audio data, and data for conversion or output for storage of digital audio data. Conversion unit 30, storage unit 40 for storing digital audio data, data processing and control unit 50 for performing processing and control for input / output, conversion, and storage of digital audio data, device operation and operation states, and the like. And a display unit 60 for displaying the information as an image.

기기 조작부(10)는 다양한 종류의 사용자 인터페이스 장치를 사용할 수 있겠으나, 키(key) 입력장치를 사용할 수 있다. 데이터 입출력부(20)는 외부 기기나 인 터넷, 통신망 등의 네트워크를 통해서 오디오 컨텐츠를 다운로드받기 위한 통신 인터페이스 장치를 사용할 수 있으며, 음성 녹음을 위한 마이크 장치를 사용할 수 있다. 또한 재생되는 오디오 신호의 출력을 위한 스피커 장치나 이어폰 장치를 사용할 수 있다. 데이터 변환부(30)는 디지털 오디오 데이터의 저장을 위한 인코딩이나, 재생을 위한 디코딩 모듈을 사용할 수 있다. 저장부(40)는 다양한 종류의 저장매체를 사용할 수 있으며 플래쉬 메모리 장치나 하드 디스크 드라이브 메모리 장치를 사용할 수 있다. 데이터 처리 및 제어부(50)는 디지털 오디오 데이터의 입출력, 변환, 저장, 재생 등을 제어하며 마이크로 프로세서, 디지털 신호 처리기, 통신 제어 모듈을 포함할 수 있다.The appliance operation unit 10 may use various types of user interface devices, but may use a key input device. The data input / output unit 20 may use a communication interface device for downloading audio content through a network such as an external device, an internet, or a communication network, and may use a microphone device for voice recording. In addition, a speaker device or an earphone device for outputting the reproduced audio signal can be used. The data converter 30 may use an encoding module for storing digital audio data or a decoding module for reproduction. The storage unit 40 may use various types of storage media and may use a flash memory device or a hard disk drive memory device. The data processing and control unit 50 may control input / output, conversion, storage, and reproduction of digital audio data, and may include a microprocessor, a digital signal processor, and a communication control module.

기기 조작부(10)의 조작에 따라 데이터 입출력부(20)를 통해서 오디오 컨텐츠가 입력되거나, 음성신호를 직접 녹음한다. 예를 들면 오디오 컨텐츠(디지털 오디오 데이터)를 네트워크를 통해서 다운로드받거나, 기기 자체의 마이크 장치를 통해서 음성신호를 입력받는다. 입력된 오디오 신호는 데이터 변환부(30)에 의해서 디지털 데이터로 변환되며, 필요에 따라서는 압축된 데이터 형태로 변환된다. 데이터 변환부(30)에 의해서 변환된 오디오 데이터는 데이터 처리 및 제어부(50)의 제어를 받아 저장부(40)에 저장된다.Audio content is input through the data input / output unit 20 or a voice signal is directly recorded according to the operation of the device operation unit 10. For example, audio content (digital audio data) is downloaded through a network or a voice signal is input through a microphone device of the device itself. The input audio signal is converted into digital data by the data converter 30 and, if necessary, is converted into compressed data. The audio data converted by the data converter 30 is stored in the storage 40 under the control of the data processing and the controller 50.

저장부(40)에 저장된 오디오 데이터는 오디오 구간 검색을 위하여 키워드 스포팅(keyword spotting) 및 세그멘테이션(segmentation) 과정을 거친다. 여기서, 키워드 스포팅 및 세그멘테이션 과정은 기기에 저장된 오디오 데이터에 대해서 기기 자체에서 실행될 수도 있겠지만, 키워드 스포팅 및 세그멘테이션 과정을 다른 외부 기기에서 실행하여 오디오 데이터를 구간 단위로 분할하고 각각의 구간마다 해당 구간의 특징을 표현하는 정보가 부여된 형태로 사전에 제공될 수도 있다.The audio data stored in the storage unit 40 undergoes keyword spotting and segmentation to search for an audio section. Here, the keyword spotting and segmentation process may be performed on the device itself with respect to the audio data stored in the device, but the keyword spotting and segmentation process is executed on another external device to divide the audio data into sections and the characteristics of the corresponding section for each section. Information representing the expression may be provided in advance in the form given.

키워드 스포팅(keyword spotting) 과정은 데이터 처리 및 제어부(50)에서 오디오 신호를 분석하여 특정 키워드 사운드의 위치를 찾아내는 과정이며, 세그멘테이션(segmentation) 과정은 오디오 신호를 여러 개의 세그먼트로 분리하는 과정이다. 키워드 스포팅 과정이 선행되고 이어서 세그멘테이션 과정이 수행되거나, 세그멘테이션 과정이 선행되고 이어서 키워드 스포팅 과정이 수행될 수 있다. 세그멘테이션에 의하여 구간(segment) 단위로 분할된 오디오 데이터는 그 구간 및 해당 구간의 특징을 표현하는 정보와 함께 저장부(40)에 저장된다.The keyword spotting process is a process of locating a specific keyword sound by analyzing the audio signal in the data processing and control unit 50, and the segmentation process is a process of separating the audio signal into several segments. The keyword spotting process may be followed by the segmentation process, or the segmentation process may be followed by the keyword spotting process. The audio data divided into segments by segmentation is stored in the storage unit 40 together with information representing the section and the characteristics of the section.

오디오 구간 검색 시에는 데이터 처리 및 제어부(50)의 제어를 받아 저장부(40)로부터 오디오 구간 및 그 구간의 특징을 표현하는 정보가 디스플레이부(60)에 출력된다. 디스플레이부(60)에 표현되는 사용자 인터페이스 화면의 예는 후에 도4를 참조하여 상세히 설명할 것이다. 디스플레이부(60)에 출력된 구간정보를 토대로 기기 조작부(10)를 통한 사용자 명령에 따라 해당 구간으로 이동하고, 이동된 구간의 재생을 수행한다. 구간 재생은 데이터 처리 및 제어부(50)의 제어를 받아 데이터 변환부(30)에서 해당 오디오 구간의 데이터를 변환하여 데이터 입출력부(20)를 통해서 스피커나 이어폰/헤드폰 등으로 출력하는 것이다.When searching for an audio section, under the control of the data processing and the controller 50, information representing the audio section and the characteristics of the section is output from the storage unit 40 to the display unit 60. An example of the UI screen displayed on the display unit 60 will be described in detail later with reference to FIG. 4. Based on the section information output to the display unit 60, the user moves to the corresponding section according to a user command through the device operation unit 10 and reproduces the moved section. Section playback is the data processing and control under the control of the control unit 50 converts the data of the audio section in the data converter 30 and outputs to the speaker, earphone / headphone, etc. through the data input and output unit 20.

도3은 본 발명의 실시예에 따른 오디오 구간 검색 시스템의 구조를 나타낸 도면이다. 여기서는 녹음된 음성신호를 예로 들어서 구간 검색을 위한 세그멘테이션과 키워드 스포팅이 이루어지고 있다. 본 발명에서 설명되는 오디오 구간 검색방 법은 녹음된 음성뿐만 아니라, 음악 파일을 비롯하여 자연 상에 존재할 수 있는 모든 음향신호에 대해서 적용될 수 있으며, 이는 지금부터 설명될 녹음된 음성신호의 구간 검색 기법과 실질적으로 동일하거나 당업자에 의해서 용이하게 구현될 수 있는 수준에 이른다.3 is a diagram illustrating a structure of an audio section search system according to an exemplary embodiment of the present invention. In this example, segmentation and keyword spotting are performed for section searching using the recorded audio signal as an example. The audio section search method described in the present invention can be applied not only to recorded audio, but also to all sound signals that may exist in nature, including music files. Substantially the same or at a level that can be readily implemented by one skilled in the art.

도3에 나타낸 바와 같이 본 발명의 실시예에 따른 오디오 구간 검색 시스템은 음성신호의 녹음, 음성 데이터 베이스 구축, 세그멘테이션, 세그멘테이션 데이터 베이스 구축, 키워드 추출, 키워드 데이터 베이스 구축, 키워드 스포팅, 사용자 인터페이스의 유기적인 수순 및 요소들의 결합 구조를 이룬다. 음성 데이터 베이스나 세그먼트 데이터 베이스, 그리고 키워드 데이터 베이스는 기기의 저장장치에 구축된다. 녹음모듈은 데이터 입력장치-예를 들면 마이크를 포함한다. 세그멘테이션 모듈이나 키워드 추출모듈, 키워드 스포팅 모듈은 기기의 마이크로 프로세서를 포함하여 해당 알고리즘을 이루는 어플리케이션으로 이해할 수 있다. 사용자 인터페이스 모듈은 디스플레이 장치와 기기 조작부 및 해당 어플리케이션으로 이해할 수 있다.As shown in FIG. 3, the audio section search system according to an embodiment of the present invention records audio signals, constructs a voice database, constructs a segmentation database, constructs a segmentation database, extracts a keyword, constructs a keyword database, keyword spotting, and makes an organic user interface. The order of phosphorus and the elements form a combined structure. The voice database, the segment database, and the keyword database are built in the storage of the device. The recording module comprises a data input device, for example a microphone. Segmentation module, keyword extraction module, keyword spotting module can be understood as an application that implements the algorithm, including the microprocessor of the device. The user interface module may be understood as a display device, a device operation unit, and a corresponding application.

녹음모듈은 데이터 입출력장치-마이크를 이용해서 음성신호를 녹음한다(S10). 음성 녹음 단계에서는 필요하다면 적절한 필터링이나 증폭 등의 신호처리가 동반될 수 있다. 녹음된 음성신호는 데이터 변환장치를 이용해서 디지털 음성 데이터로 변환되고 음성 데이터 베이스로 저장된다(S20). 음성 데이터 베이스는 저장장치에 포함된다. 저장된 음성 데이터는 세그멘테이션 모듈에 의해서 적절한 몇 개의 구간으로 분할된다(S30). 또한 세그멘테이션 모듈은 음성 데이터의 구간 분리 정보, 즉 세그먼트의 시작과 종료, 길이 등과 관련된 정보를 제공한다.The recording module records the voice signal using the data input / output device-microphone (S10). In the voice recording step, signal processing such as appropriate filtering or amplification may be accompanied if necessary. The recorded voice signal is converted into digital voice data using a data converter and stored in the voice database (S20). The voice database is included in storage. The stored voice data is divided into several appropriate sections by the segmentation module (S30). The segmentation module also provides interval separation information of voice data, that is, information related to the start and end and length of a segment.

세그멘테이션된 구간정보는 세그먼트 데이터 베이스로 구축된다(S40). 이 때 단일 계층의 세그멘테이션이 이루어질 수도 있겠으나, 앞서 설명한 바와 같이 다중 세그먼트 구조를 이룰 수도 있다. 여기서는 다중 세그먼트 구조로 음성 데이터를 세그멘테이션하는 것으로 가정한다.The segmented section information is constructed as a segment database (S40). In this case, segmentation of a single layer may be performed, but as described above, a multi-segment structure may be formed. In this case, it is assumed that voice data is segmented in a multi-segment structure.

세그멘테이션된 음성 데이터에 대해서 키워드 추출이 이루어진다(S50). 추출된 키워드는 키워드 데이터 베이스로 구축된다(S60). 키워드의 추출은 각각의 세그먼트의 반복되는 횟수를 구하고, 사용 빈도가 높은 세그먼트를 자동으로 키워드로 지정하는 방법을 사용할 수 있다. 그렇지만, 키워드의 지정은 사용자가 키워드를 입력하는 방법도 가능하다. 예를 들면 사용자가 키워드를 별도로 녹음하고, 녹음된 키워드를 키워드 데이터 베이스로 구축하여 두는 것이다. 다른 예로는, 기존에 녹음된 음성 데이터 베이스의 일부분을 사용자가 발췌(선택)하여 그 것을 키워드로 지정하는 방법을 사용하는 것도 가능하다. 도3에 나타낸 시스템 구조의 실시예에서는 녹음된 음성신호를 세그멘테이션하고 각 세그먼트의 반복 횟수를 구하여 사용 빈도가 높은 세그먼트를 자동으로 키워드로 지정하는 경우를 보여주고 있다.Keyword extraction is performed on the segmented voice data (S50). The extracted keyword is built up of a keyword database (S60). Extraction of a keyword can be used to find the number of times each segment is repeated, and to automatically designate a high frequency segment as a keyword. However, the keyword designation is also possible by the user inputting the keyword. For example, a user records a keyword separately and builds the recorded keyword into a keyword database. As another example, it is possible to use a method in which a user extracts (selects) a portion of an existing recorded voice database and designates it as a keyword. In the embodiment of the system structure shown in Fig. 3, the recorded voice signal is segmented and the number of repetitions of each segment is obtained to automatically designate a high-frequency segment as a keyword.

즉, 세그멘테이션 모듈로부터 녹음된 음성들의 세그멘테이션 정보를 세그멘테이션 데이터 베이스에 저장해 놓고, 이로부터 중요 키워드를 선택하며, 선택된 키워드 정보는 키워드 데이터 베이스에 저장되어 필요 시마다 키워드 스포팅 모듈에 의해 사용되는 것이다.That is, the segmentation information of the voices recorded from the segmentation module is stored in the segmentation database, and important keywords are selected therefrom, and the selected keyword information is stored in the keyword database and used by the keyword spotting module whenever necessary.

키워드 스포팅 모듈은 특정 키워드에 대해 그 위치를 표현하는 정보를 부가 하는데, 예를 들면 키워드 태그(tag)를 부가함으로써, 특정 세그먼트를 그 세그먼트의 특징을 잘 표현(대표)하는 정보와 함께 사용자 인터페이스로 제공한다(S70).The keyword spotting module adds information representing a location for a particular keyword, for example, by adding a keyword tag to a user interface with information representing the segment's characteristics well. Provided (S70).

사용자 인터페이스 모듈은 세그멘테이션 모듈과 키워드 스포팅 모듈에서 제공되는 구간 분리정보 및 특정 키워드에 대한 위치정보 태그를 토대로 사용자 인터페이스를 구성하여 이를 디스플레이 장치를 통해서 디스플레이한다(S80). 그리고, 디스플레이된 구간 정보를 토대로 사용자가 희망하는 음성 구간으로의 이동을 실행하며, 이동된 구간(segment)의 음성을 재생한다. 이동된 구간의 음성을 재생할 때 이전의 세그먼트도 함께 재생하는 다중 재생을 수행할 수도 있음은 앞서 설명한 바와 같다.The user interface module configures the user interface based on the segmentation information provided by the segmentation module and the keyword spotting module and the location information tag for the specific keyword and displays the same through the display device (S80). Then, the user moves to the desired voice section based on the displayed section information, and reproduces the voice of the moved segment. As described above, when reproducing the voice of the moved section, multiple reproduction, which also reproduces the previous segment, may be performed.

도4는 본 발명의 실시예에 따른 사용자 인터페이스의 예를 나타낸 도면이다. 도4에 나타낸 사용자 인터페이스는 하나의 예일 뿐이며, 세그먼트들의 표현방식이나 키워드와 같은 특징정보를 표현하는 방식은 보다 자유롭게 실시될 수 있다.4 is a diagram illustrating an example of a user interface according to an embodiment of the present invention. The user interface shown in FIG. 4 is just one example, and a method of expressing feature information such as a method of representing segments or keywords may be more freely implemented.

도4에서는 2개의 채널(Channel 1, Channel 2)에 대한 세그멘테이션 결과물, 즉, 구간 단위로 다중 세그먼트의 배열이 표현되고 있다. 부호 100은 제1 채널에 대한 구간정보 인터페이스, 부호 200은 제2 채널에 대한 구간정보 인터페이스를 각각 표현한다. 각각의 구간정보 인터페이스는 2개의 시간축(310,320)에 의해서 표현된다. 각각의 시간축(310,320)에는 세그먼트가 시작되는 시각과 끝나는 시각이 표현된다. 구간정보 인터페이스에 표현되는 세그먼트 중에서 현재 재생되는 세그먼트의 재생 위치를 표현하는 요소(120)가 함께 나타난다. 또한, 각각의 세그먼트들에는 키워드(Key Word)를 표현하는 요소(130)가 함께 나타난다.In FIG. 4, the segmentation result of two channels (Channel 1, Channel 2), that is, an array of multiple segments in interval units is represented. Reference numeral 100 denotes an interval information interface for the first channel, and reference numeral 200 represents an interval information interface for the second channel. Each interval information interface is represented by two time axes 310 and 320. Each time axis 310 or 320 represents a time at which a segment starts and ends. An element 120 representing a reproduction position of the currently reproduced segment among the segments represented in the section information interface appears together. In addition, each segment is accompanied by an element 130 representing a keyword.

도4에 나타낸 사용자 인터페이스의 예에서는 오디오 기기의 조작과 검색 등의 제어를 위한 사용자 인터페이스부(400)가 표현되고 있다. 여기서 부호 410은 재생, 부호 420은 일시 정지, 부호 430은 정지, 부호 440은 빠르게 뒤로 탐색(이동)-Fast Reward Jump, 부호 450은 빠르게 앞으로 탐색(이동)-Fast Forward Jump을 위한 인터페이스이다. 이 인터페이스는 디스플레이와 병행하여 기기 조작부의 적절한 키와 매칭된다.In the example of the user interface shown in Fig. 4, the user interface 400 for controlling the operation and searching of the audio device is represented. Here, 410 is a playback, 420 is a pause, 430 is a stop, 440 is a fast backward search (move)-Fast Reward Jump, 450 is a interface for fast forward (move)-Fast Forward Jump. This interface is matched with the appropriate keys of the instrument control panel in parallel with the display.

본 발명에 따른 사용자 인터페이스에서 살펴보면 구간(segment) 단위로 분할(segmentation)된 데이터들에 그 세그먼트의 특징을 표현하는 정보가 함께 표현되고 있다. 여기서는 키워드 태그가 그 예가 된다. 이러한 특징 정보는 그 부분의 음성(오디오) 신호가 가지는 음향학적인 특성 및 의미적인 특성을 나타내는 시각적인 표현이다. 음향학적인 특성은 예를 들면 메인 주파수 및 포먼트 특징을 이용한 남성, 여성의 성별 특성을 들 수 있다. 즉, 녹음된 음성신호의 특정 구간이 남성의 음성인지 아니면 여성의 음성인지를 표현할 수 있는 것이다. 이러한 음향학적인 특징정보를 이용한다면 녹음된 음성 전체 구간 중에서 특정 성별의 음성신호 구간만을 빠르게 검색하는 것이 가능할 것이다. 의미적인 특성은 키워드 스포팅을 이용한 의미 인식을 들 수 있다. 예를 들어, 어떤 문장의 종료를 나타내는 어조사를 키워드로 이용하여 문장을 구별할 수 있거나, 또는 특정한 단어가 등장하는 위치를 표시해 줌으로써, 녹음된 음성 전체 구간 중에서 문장 단위의 검색이나 특정 단어가 등장하는 구간을 검색할 수 있게 된다.In the user interface according to the present invention, the information representing the characteristics of the segment is expressed together in the data segmented in segment units. The keyword tag is an example here. This characteristic information is a visual expression representing the acoustic and semantic characteristics of the voice (audio) signal of the portion. Acoustical characteristics include male and female gender characteristics using, for example, main frequency and formant characteristics. That is, it is possible to express whether a specific section of the recorded voice signal is a male voice or a female voice. If the acoustic characteristic information is used, it may be possible to quickly search only the voice signal section of a specific gender among the entire recorded voice sections. Semantic characteristics include semantic recognition using keyword spotting. For example, sentences can be distinguished by using a survey indicating the end of a sentence, or by indicating a position where a specific word appears, a sentence-based search or a specific word appears in the entire recorded voice section. The section can be searched.

사용자가 전체 음성(오디오) 구간 중에서 특정 구간의 검색을 위해 검색 키(440,450)를 이용해서 구간 이동 명령을 입력한다. 구간 이동 명령이 입력되면 세그먼트 단위로 이동한다. 즉, 특정 세그먼트의 시작 위치로 이동하여 그 세그먼트를 재생하거나, 혹은 특정 키워드의 위치로 이동하여 그 키워드 위치로부터 해당 세그먼트를 재생한다. 이와같이 세그먼트 단위의 위치 이동이 이루어지면 검색 위치의 지정을 인간의 지각 특성(예: 성별이나 주요 단어, 문장을 중심 사상으로 이해하는 특성)에 따른 검색이 가능하게 되므로 검색 효율성과 속도의 향상이 이루어지게 된다.The user inputs a section movement command using the search keys 440 and 450 to search for a specific section among all voice (audio) sections. When the section movement command is input, it moves in segments. That is, it moves to the start position of a specific segment and plays the segment, or moves to the position of a specific keyword and plays the segment from the keyword position. In this way, if the location is moved in units of segments, the search position can be designated according to the human perception characteristics (for example, the characteristics of understanding gender, key words, and sentences as the central idea), thereby improving search efficiency and speed. You lose.

구간 이동에 의한 재생 위치의 변화가 일어났을 때 이전 위치에서의 재생을 중단하고 새로운 위치로부터의 재생을 수행할 수도 있겠으나, 본 발명에서는 다중 세그먼트의 동시 재생을 수행한다. 즉, 세그먼트 이동이 이루어졌을 때, 현재 이동된 위치의 재생과 동시에 이전 세그먼트(들)의 재생을 함께 수행하는 것이다. 이와 같이 복수의 과거 세그먼트 내용도 동시에 재생을 수행하면 과거 세그먼트에서 희망하는 정보(음성)가 없다고 판단하여 그 재생 도중에 다른 세그먼트로 이동하였다고 하더라도 과거 세그먼트의 내용을 계속하여 파악할 수 있기 때문에 잘못된 판단을 하였는지의 여부를 보다 쉽게 알 수 있게 된다.When a change in the playback position due to the movement of the section occurs, playback at the previous position may be stopped and playback from the new position may be performed. However, in the present invention, simultaneous playback of multiple segments is performed. That is, when the segment movement is made, the reproduction of the previous segment (s) is performed simultaneously with the reproduction of the current moved position. In this way, if the contents of a plurality of past segments are played simultaneously, it is judged that there is no information (voice) desired in the past segment, and even if the segment is moved to another segment during the playback, the contents of the past segments can be continuously understood. It will be easier to know whether or not.

다중 구간 재생 시, 동시에 재생되는 모든 세그먼트들을 보다 쉽게 청각적으로 구별할 수 있게 하기 위하여 음향 특성의 변형을 이용하는 방법을 사용할 수 있다. 예를 들면, 기존에 재생 중이던 과거 세그먼트의 음량을 점진적으로 감소시켜서 소정 시간 동안 현재 세그먼트와 함께 재생해 준다. 음향 특성의 조절은 음량 조절 뿐만 아니라 음성 주파수를 변조해 준다던가 하는 등의 다양한 방법을 사용할 수 있다. 즉, 청각적으로 구별 가능한 알려진 신호처리 기술을 이용한다.In the case of multi-section playback, a method using a variation of acoustic characteristics may be used to more easily and acoustically distinguish all segments that are played simultaneously. For example, the volume of the past segment that is being played back is gradually decreased to play with the current segment for a predetermined time. The sound characteristics can be adjusted in various ways, such as modulating the voice frequency as well as adjusting the volume. That is, it uses a known signal processing technique that is audibly distinguishable.

본 발명에서 세그멘테이션을 계층적인 구조, 즉 다중 세그먼트를 기반으로 한다면 각각의 계층마다 서로 다른 특징 기반의 세그멘테이션 및 키워드 스포팅을 할 수 있다. 이렇게 하면 사용자의 관심사항이나 주관에 따라 다른 관점에서의 오디오 구간 검색이 가능하다.In the present invention, if segmentation is based on a hierarchical structure, that is, multiple segments, segmentation and keyword spotting based on different features may be performed for each layer. This allows you to search for audio segments from different perspectives, depending on your interests and subjectivity.

예를 들어, 제 1 세그멘테이션 및 키워드 스포팅은 성별을 기반으로 수행하고, 제 2 세그멘테이션 및 키워드 스포팅은 문장을 기반으로 수행하고, 제 3 세그멘테이션 및 키워드 스포팅은 단어를 기반으로 수행한다면, 사용자가 음성(오디오) 구간을 검색할 때 성별에 따른 검색, 문장에 따른 검색, 단어에 따른 검색 중의 어느 하나 또는 둘 이상의 조합을 선택할 수 있으므로, 사용자의 관심사항이나 주관에 따라 다른 관점에서의 오디오(음성) 구간 검색이 이루어질 수 있는 것이다.For example, if the first segmentation and keyword spotting are based on gender, the second segmentation and keyword spotting are based on sentence, and the third segmentation and keyword spotting are based on word, Audio) When searching for a section, you can select one or more combinations of search based on gender, search based on sentence, search based on words, and so on. The search can be made.

이러한 세그멘테이션 및 키워드 스포팅에 관련된, 앞서 설명한 모든 정보들은 해당 오디오 데이터와 별도로 관리되거나, 해당 오디오 데이터에 부가되어 관리될 수 있다. 전자의 경우는 별도의 세그멘테이션 및 키워드 스포팅 데이터를 해당 오디오 데이터와 연결하여 관리하는 경우이며, 후자의 경우에는 오디오 데이터가 그 오디오 데이터 자체 뿐만 아니라 부가정보로서 세그멘테이션 및 스포팅 데이터를 함께 포함하고 있는 경우가 된다. 전자의 경우는 세그멘테이션 및 키워드 스포팅 정보가 어떤 오디오 데이터에 대한 것인지를 표현하는 정보를 포함하는 파일로 관리될 수 있으며, 후자의 경우는 오디오 데이터의 헤더 등에 세그멘테이션 구간정보와 그 세그먼트의 키워드 정보를 기록함으로써 관리가 가능하게 된다.All the above-described information related to such segmentation and keyword spotting may be managed separately from the corresponding audio data or may be managed in addition to the corresponding audio data. In the former case, separate segmentation and keyword spotting data is managed by linking with the corresponding audio data. In the latter case, the audio data includes not only the audio data itself but also segmentation and spotting data as additional information. do. In the former case, the segmentation and keyword spotting information may be managed as a file including information representing what audio data the segment data is included in. In the latter case, segmentation section information and keyword information of the segment may be recorded in the header of the audio data. This makes it possible to manage.

본 발명의 오디오 구간 검색 시스템은 특정 오디오를 특징 기반으로 여러 개의 구간으로 분리하고, 각각의 구간 단위로 관심 구간의 검색이 이루어지므로 오디오 구간 검색의 효율성이 높고, 분할 정보의 시각적인 표현을 토대로 하는 사용자 인터페이스 환경이 제공되므로 음성을 비롯한 각종 오디오 컨텐츠 중에서 특정 구간의 빠르고 손쉬운 검색이 가능하다.The audio section search system of the present invention divides a specific audio into a plurality of sections based on a feature, and searches for a section of interest by each section unit, so that the audio section search efficiency is high, and based on a visual expression of the segmentation information. Since a user interface environment is provided, it is possible to quickly and easily search a specific section among various audio contents including voice.

Claims

오디오 특징정보를 기반으로 분할된 구간정보를 디스플레이하는 단계;Displaying segment information divided based on audio feature information;

상기 디스플레이된 구간 단위로 이동하여 해당 구간의 오디오를 재생하는 단계;Reproducing the audio of the corresponding section by moving in the displayed section unit;

를 포함하여 이루어지는 것을 특징으로 하는 오디오 구간 검색방법.Audio section search method comprising a.

구간 단위로 분할되고, 분할된 각 구간의 특징을 표현하는 정보가 포함된 오디오 검색 인터페이스를 디스플레이하는 단계;Displaying an audio search interface divided into sections and including information representing characteristics of each divided section;

오디오 구간 검색 명령 입력에 따라, 상기 분할된 구간 단위로 이동하여 해당 구간의 오디오 신호를 재생하는 단계;Reproducing an audio signal of a corresponding section by moving in the divided section unit according to an audio section search command input;

제 2 항에 있어서, 상기 특징 정보는 음향학적인 특징이나 의미적인 특징을 표현하는 정보인 것을 특징으로 하는 오디오 구간 검색방법.The method of claim 2, wherein the feature information is information representing an acoustic feature or a semantic feature.

제 2 항에 있어서, 상기 특징 정보는 메타데이터 태그(metadata tag)로 부가 및 표현되는 것을 특징으로 하는 오디오 구간 검색방법.The method of claim 2, wherein the feature information is added and represented by a metadata tag.

제 2 항에 있어서, 상기 특징정보는 키워드 형태로 부가 및 표현되는 것을 특징으로 하는 오디오 구간 검색방법.The method of claim 2, wherein the feature information is added and expressed in the form of a keyword.

제 2 항에 있어서, 상기 구간 단위 분할 및, 분할된 각 구간의 특징을 표현하는 정보가 포함된 오디오 검색 인터페이스를 디스플레이하기 위하여, 오디오 신호의 세그멘테이션(segmentation) 및 키워드 스포팅(key-word spotting)을 수행하는 것을 특징으로 하는 오디오 구간 검색방법.The method of claim 2, wherein segmentation and key spotting of an audio signal are performed to display an audio search interface including the segmentation and information representing the characteristics of each segment. Audio section search method characterized in that performed.

제 2 항에 있어서, 상기 오디오 신호의 구간 분할은 계층적인 다중 구조를 이루는 것을 특징으로 하는 오디오 구간 검색방법.The method of claim 2, wherein the division of the audio signal has a hierarchical multiple structure.

제 2 항에 있어서, 상기 이동된 위치의 오디오 구간 재생시, 이전 위치의 오디오 데이터와 현재 위치의 오디오 데이터를 함께 재생하는 것을 특징으로 하는 오디오 구간 검색방법.The audio section search method according to claim 2, wherein, when the audio section of the moved position is played back, audio data of a previous position and audio data of the current position are played together.

제 2 항에 있어서, 상기 오디오 신호의 구간 분할 및 분할된 구간의 특징을 표현하는 정보는 사용자에 의해서 지정되는 키워드에 근거하거나, 오디오 신호 특성의 분석에 근거하여 자동으로 추출 및 설정되는 것을 특징으로 하는 오디오 구간 검색방법.The method of claim 2, wherein the segmentation of the audio signal and the information representing the characteristic of the segmented section are automatically extracted and set based on a keyword designated by a user or based on an analysis of the audio signal characteristic. Audio section search method.

오디오 특징정보를 기반으로 구간 단위로 분할된 오디오 데이터를 저장하는 저장부;A storage unit which stores the audio data divided in intervals based on the audio feature information;

상기 오디오 데이터의 구간정보를 디스플레이하는 디스플레이부;A display unit which displays section information of the audio data;

구간 검색 명령에 따라 상기 구간 단위로 오디오 데이터를 선택하여 재생하는 재생 제어부;A playback controller for selecting and playing back audio data in units of sections according to a section search command;

를 포함하여 이루어지는 것을 특징으로 하는 데이터 처리장치.Data processing apparatus comprising a.

제 10 항에 있어서, 상기 디스플레이되는 구간정보는 오디오 구간의 시간적인 배열 및, 각 구간마다의 특징을 표현하는 정보에 근거하여 디스플레이되는 것을 특징으로 하는 데이터 처리장치.The data processing apparatus of claim 10, wherein the displayed section information is displayed based on a temporal arrangement of audio sections and information representing a feature of each section.

제 10 항에 있어서, 상기 특징정보는 각각의 오디오 구간의 음향학적인 특징이나 의미적인 특징에 근거하여 표현되는 키워드의 형태로 부가 및 표현되는 것을 특징으로 하는 데이터 처리장치.The data processing apparatus of claim 10, wherein the feature information is added and expressed in the form of a keyword expressed based on an acoustic feature or a semantic feature of each audio section.