KR102274275B1

KR102274275B1 - Application and method for generating text link

Info

Publication number: KR102274275B1
Application number: KR1020190037184A
Authority: KR
Inventors: 박정호
Original assignee: 아이피랩 주식회사
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2021-07-08
Also published as: KR20200114824A

Abstract

음성녹음파일과 연결된 텍스트 링크 생성 방법은 (A) 변환모듈에서 녹음파일을 생성하고, 녹음파일의 음성신호를 텍스트(text)로 변환하는 단계; (B) 분할모듈에서 녹음파일을 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하는 단계; (C) 가공모듈에서 분할된 음성구간파일의 시작시점과 종료 시점정보를 각 음성구간파일에 부가하고, 텍스트로 변환된 단어 각각의 발화 시작 시점을 변환된 텍스트 단어의 메타데이터로 삽입하는 단계; 및 (D) 연동모듈에서 텍스트 단어의 메타데이터를 기반으로 녹음파일 또는 음성구간파일 스트리밍 바의 위치정보를 변환된 텍스트 단어에 링크하여 음성파일과 연동된 단어 별 텍스트 링크를 생성하는 단계; 를 포함한다.A method of generating a text link linked to a voice recording file includes the steps of: (A) generating a recording file in a conversion module, and converting the audio signal of the recorded file into text; (B) analyzing the recorded file in the division module and dividing the file into a plurality of voice section files including at least one sentence or paragraph; (C) adding the start time and end time information of the speech section file divided in the processing module to each speech section file, and inserting the speech start time of each word converted into text as metadata of the converted text word; and (D) linking the location information of the recording file or the streaming bar of the voice section file based on the metadata of the text word in the interworking module to the converted text word to generate a text link for each word linked with the voice file; includes

Description

음성파일이 연동된 텍스트 링크 생성 어플리케이션 및 방법{APPLICATION AND METHOD FOR GENERATING TEXT LINK}Application and method for generating text link linked with voice file {APPLICATION AND METHOD FOR GENERATING TEXT LINK}

텍스트 링크생성 어플리케이션 및 방법에 관한 것으로 구체적으로, 녹음된 음성에 해당하는 텍스트 파일을 생성하고, 변환된 텍스트의 단어나 문장을 발화하는 음성파일의 재생 구간으로 바로 이동하는 텍스트 링크를 생성하는 어플리케이션 및 방법에 관한 것이다. It relates to an application and method for generating a text link, and specifically, an application that generates a text file corresponding to a recorded voice, and a text link that directly moves to a playback section of a voice file that utters words or sentences of the converted text; it's about how

본 명세서에서 달리 표시되지 않는 한, 이 섹션에 설명되는 내용들은 이 출원의 청구항들에 대한 종래 기술이 아니며, 이 섹션에 포함된다고 하여 종래 기술이라고 인정되는 것은 아니다.Unless otherwise indicated herein, the material described in this section is not prior art to the claims of this application, and inclusion in this section is not an admission that it is prior art.

음성 인식(Speech Recognition)이란 사람이 말하는 음성 언어를 컴퓨터가 해석해 그 내용을 문자 데이터로 전환하는 처리를 말하고, STT(Speech-to-Text)라고도 한다. 음성을 텍스트로 변환하는 기술(STT)은 컴퓨터 문서에서 텍스트를 음성으로 변환하는데 사용되는 음성 합성 어플리케이션과 음성 합성 기능이 부가된 컴퓨터에서 음성을 문자로 바꿔 스마트 단말로 출력시킨다. 음성인식 기능은 멀티미디어의 중요한 일부분으로서 각종 메시지 및 명령문을 소리로 알려줌으로써 손쉽게 시스템을 이용할 수 있게 한다. 또한, 음성 이메일, 음성 프롬프트, 음성 인식 등에서 사용되며, 펜형 문자 판독기, 아스키 문자 판독기, 사운드 카드 대용 장비에도 활용된다.Speech Recognition refers to the process of converting human speech language into text data by a computer interpreting it, also called Speech-to-Text (STT). The speech-to-text technology (STT) converts speech into text in a speech synthesis application used to convert text into speech in a computer document and a computer with speech synthesis function and outputs it to a smart terminal. The voice recognition function is an important part of multimedia, and it makes it easy to use the system by notifying various messages and commands by sound. In addition, it is used in voice e-mail, voice prompt, voice recognition, etc., and is also used in pen-type text readers, ASCII text readers, and sound card substitutes.

아울러 음성인식 기술은 로봇, 텔레매틱스 등 음성으로 기기제어, 정보검색이 필요한 경우에 응용된다. 대표적인 알고리즘은 HMM(Hidden Markov Model)으로서, 다양한 화자들이 발성한 음성들을 통계적으로 모델링하여 음향모델을 구성하며 말뭉치 수집을 통하여 언어모델을 구성한다. In addition, voice recognition technology is applied in cases where device control and information retrieval are required by voice, such as robots and telematics. A representative algorithm is HMM (Hidden Markov Model), which statistically models the voices uttered by various speakers to construct an acoustic model, and constructs a language model through corpus collection.

최근 전자기기들이 복합적이고 다양한 기능들을 제공함에 따라, 어플리케이션의 실행 기능을 포함한 사용자 인터페이스(user interface; UI)의 편리성에 대한 고려가 요구되고 있다. 일반적으로, 사용자가 음성을 녹음하는 과정에서 특정 부분을 기억하거나 강조하고 싶은 경우가 있다. 그런데 녹음 후 음성 파일의 내용을 탐색하는 과정은 영상이나 텍스트 파일의 내용을 탐색하는 과정보다 직관적이지 않기 때문에, 사용자가 다시 듣고자 하는 부분을 나중에 탐색하는데 많은 어려움이 있다. 또한, 기억하거나 강조하고 싶은 부분의 녹음 시간을 미리 메모해 두었다가 이후에 탐색하는 과정도 번거로울 수 있다.Recently, as electronic devices provide complex and diverse functions, it is required to consider the convenience of a user interface (UI) including an application execution function. In general, there is a case in which a user wants to remember or emphasize a specific part in the process of recording a voice. However, since the process of searching for the contents of the audio file after recording is less intuitive than the process of searching for the contents of the video or text file, it is difficult for the user to search for the part that the user wants to hear again later. Also, it can be cumbersome to memorize the recording time of the part you want to remember or emphasize in advance and then search for it later.

예컨대, 기록을 남기기 위해 회의나 강의 내용을 모두 녹음하는 경우, 장시간의 녹음파일에서 다시 들어야 하는 구간을 정확히 찾는 것은 매우 어렵고 번거로운 과정이다. 특히, 강의를 녹음한 경우, 녹음파일에서 화자가 강조한 부분은 수 차례 반복 재생하며 학습해야 하는 경우가 빈번하다. 하지만 이때마다 사용자가 직접 중요 구간을 찾고, 직접 스트리밍 바를 정확히 조정하기는 쉽지 않다.For example, in the case of recording all the contents of a meeting or lecture in order to leave a record, it is a very difficult and cumbersome process to accurately find a section to be heard again from a long recording file. In particular, when a lecture is recorded, it is often necessary to learn the part emphasized by the speaker in the recorded file by repeating it several times. However, it is not easy for the user to find important sections and adjust the streaming bar directly each time.

1. 한국 특허공개 제10-2018-0128653호(2018.12.04)1. Korean Patent Publication No. 10-2018-0128653 (2018.12.04) 2. 한국 특허공개 제10-2018-0133195호(2018.12.13)2. Korean Patent Publication No. 10-2018-0133195 (2018.12.13)

녹음된 음성파일을 텍스트로 전환하고, 텍스트에 포함된 문단, 문장 또는 에 녹음파일에서 해당 텍스트의 재생 부분을 링크하여, 사용자가 듣고자 하는 텍스트를 터치하면 텍스트가 재생되는 녹음파일의 시점으로 스트리밍 바를 곧바로 이동시킬 수 있도록 하는 음성파일이 연동된 텍스트 링크 생성 어플리케이션 및 방법을 제공한다.Converts the recorded voice file into text, links the playback part of the text in the recorded file to paragraphs, sentences, or to the text included in the text, and streams to the point of the recorded file where the text is played when the user touches the text they want to hear Provided are an application and method for creating a text link linked with a voice file that allows a bar to be moved immediately.

하나의 실시예에 따른 음성녹음파일과 연동된 텍스트 링크 생성 어플리케이션은 녹음파일을 생성하고, 상기 녹음파일의 음성신호를 텍스트(text)로 변환하는 변환모듈; 녹음파일을 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하는 분할모듈; 분할된 음성구간파일 각각의 시작시점과 종료시점정보를 해당 음성구간파일에 부가하고, 텍스트로 변환된 단어 각각의 발화 시작 시점을 텍스트 단어의 메타데이터로 삽입하는 가공모듈; 및 텍스트 단어의 메타데이터를 기반으로 상기 녹음파일 또는 음성구간파일 스트리밍 바의 재생시점인 위치정보를 링크하여 음성파일과 연동된 단어 별 텍스트 링크를 생성하는 연동모듈; 을 포함한다.A text link generation application linked to a voice recording file according to an embodiment includes a conversion module for generating a recorded file and converting a voice signal of the recorded file into text; a dividing module for analyzing the recorded file and dividing it into a plurality of voice section files including at least one sentence or paragraph; a processing module for adding start time and end time information of each of the divided speech section files to the corresponding speech section file, and inserting the speech start time of each word converted into text as metadata of the text word; and an interlocking module for generating text links for each word linked with the voice file by linking the location information that is the playback time point of the recording file or the voice section file streaming bar based on the metadata of the text word. includes

다른 실시예에 따른 음성녹음파일과 연결된 텍스트 링크 생성 방법은 (A) 변환모듈에서 녹음파일을 생성하고, 녹음파일의 음성신호를 텍스트(text)로 변환하는 단계; (B) 분할모듈에서 녹음파일을 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하는 단계; (C) 가공모듈에서 분할된 음성구간파일의 시작시점과 종료 시점정보를 각 음성구간파일에 부가하고, 텍스트로 변환된 단어 각각의 발화 시작 시점을 변환된 텍스트 단어의 메타데이터로 삽입하는 단계; 및 (D) 연동모듈에서 텍스트 단어의 메타데이터를 기반으로 상기 녹음파일 또는 음성구간파일 스트리밍 바의 위치정보를 변환된 텍스트 단어에 링크하여 음성파일과 연동된 단어 별 텍스트 링크를 생성하는 단계; 를 포함한다.A method for generating a text link associated with a voice recording file according to another embodiment includes the steps of: (A) generating a recorded file in a conversion module, and converting a voice signal of the recorded file into text; (B) analyzing the recorded file in the division module and dividing the file into a plurality of voice section files including at least one sentence or paragraph; (C) adding the start time and end time information of the speech section file divided in the processing module to each speech section file, and inserting the speech start time of each word converted into text as metadata of the converted text word; and (D) linking the location information of the recording file or the streaming bar of the voice section file based on the metadata of the text word in the interworking module to the converted text word to generate a text link for each word linked with the voice file; includes

이상에서와 같은 음성파일이 연동된 텍스트 링크 생성 어플리케이션 및 방법은 장시간 녹음된 음성을 텍스트로 변환하고, 사용자가 터칭 하는 텍스트가 발화되는 스트리밍 위치를 정확히 추출할 수 있다. 이로써, 사용자는 녹음파일에서 다시 듣고자 하는 텍스트가 녹음된 부분을 편리하게 찾아 손쉽게 반복 재생할 수 있다.The text link generation application and method in which the voice file is linked as described above can convert a long-time recorded voice into text and accurately extract the streaming position where the text touched by the user is uttered. Accordingly, the user can conveniently find the recorded portion of the text that he wants to hear again from the recorded file and easily repeat it.

실시예를 통해 회의록, 강의록 등의 녹음기록을 직접 타이핑할 필요가 없어지고, 녹음파일을 이용해 학습 시 다시 듣고자 하는 부분을 사용자 스스로 추적해 가며 찾을 필요가 없기 때문에 효율적인 학습 및 업무를 가능하게 한다. Through the embodiment, there is no need to directly type recording records such as meeting minutes and lecture notes, and it is possible to efficiently learn and work because there is no need to track and find the part that the user wants to hear again when learning using the recording file. .

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.It should be understood that the effects of the present invention are not limited to the above-described effects, and include all effects that can be inferred from the configuration of the invention described in the detailed description or claims of the present invention.

도 1은 실시예에 따른 텍스트 링크 생성 어플리케이션의 기능을 설명하기 위한 도면
도 2는 실시예에 따른 텍스트 링크 생성 어플리케이션의 데이터 처리 블록을 나타낸 도면
도 3은 텍스트 링크 생성 어플리케이션의 동작 실시 예를 설명하기 위한 도면
도 4는 실시예에 따른 텍스트 링크 생성 방법의 데이터 처리 흐름을 나타낸 도면
도 5는 실시예에 따른 음성녹음파일과 연결된 텍스트 링크 생성 어플리케이션 사용 예를 설명하기 위한 도면
도 6은 실시예에 따른 음성녹음파일과 연결된 텍스트 링크 생성 어플리케이션의 다른 사용 예를 설명하기 위한 도면
도 7은 실시예에 따른 텍스트 링크 생성 어플리케이션의 또 다른 사용 예를 설명하기 위한 도면1 is a diagram for explaining the function of a text link generation application according to an embodiment;
2 is a diagram illustrating a data processing block of a text link generation application according to an embodiment;
3 is a diagram for explaining an operation embodiment of a text link generation application;
4 is a diagram illustrating a data processing flow of a method for generating a text link according to an embodiment;
5 is a diagram for explaining an example of using an application for generating a text link connected to a voice recording file according to an embodiment;
6 is a diagram for explaining another use example of a text link generation application connected to a voice recording file according to an embodiment;
7 is a view for explaining another use example of the text link generation application according to the embodiment;

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 도면부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and a method for achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains. It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

본 발명의 실시 예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, if it is determined that a detailed description of a well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification.

도 1은 실시예에 따른 텍스트 링크 생성 어플리케이션의 기능을 설명하기 위한 도면이다.1 is a diagram for explaining a function of a text link generation application according to an embodiment.

도 1을 참조하면, 실시예에 따른 텍스트 링크 생성 어플리케이션은 스마트 폰, 스마트 워치, 스마트 패드, 노트 북 등 휴대 가능한 디지털 기기에 설치되어, 음성녹음 기능과 함께 사용할 수 있다. 실시예에 따른 텍스트 링크 생성 어플리케이션은 강의, 회의, 연설 등 다양한 상황에서 음성을 녹음하고 이를 음성인식(voice recognition) 및 음성변환(STT, sound to text) 기술을 통해 텍스트로 변환한다. 텍스트 링크 생성 어플리케이션은 음성녹음 파일과, 음성이 텍스트로 변환된 텍스트 파일을 모두 생성하거나, 독립적으로 생성된 녹음파일을 분할 및 분석하여 텍스트로 변환할 수 있다.Referring to FIG. 1 , the text link generation application according to the embodiment may be installed in a portable digital device such as a smart phone, a smart watch, a smart pad, or a notebook, and may be used together with a voice recording function. The text link generation application according to the embodiment records voice in various situations such as lectures, meetings, and speeches, and converts it into text through voice recognition and sound to text (STT) technology. The text link generation application may generate both a voice recording file and a text file in which voice is converted into text, or may convert an independently generated recording file into text by dividing and analyzing it.

이후, 텍스트 링크 생성 어플리케이션을 이용하는 사용자가 텍스트에 있는 특정 단어 또는 문장을 터치하면 텍스트의 단어 또는 문장에 링크된 녹음파일의 스트리밍 시점 정보를 통해 사용자가 터치한 단어나 문장이 재생되는 시점으로 스트리밍 바가 이동된다. 종래 녹음파일과 텍스트 파일을 학습에 이용하는 사용자들은 중요한 컨텐츠가 녹음된 부분을 반복청취 하기 위해서는 직접 스트리밍 바를 옮겨야만 했다. 음성파일의 스트리밍 바는 사용자의 손가락으로 분, 초까지 고려한 위치조정을 정확하게 하기 어렵다. 보통의 경우 사용자는 직접 스트리밍 바의 위치를 섬세하게 조정할 수 없기 때문에, 특정 단어나 문장이 재생되기 십여 초 전에 스트리밍 바를 옮겨 놓고 다시 듣고자 하는 컨텐츠와 다른 컨텐츠를 함께 청취하는 경우가 많았다. 하지만, 실시예에서는 사용자가 다시 듣고자 하는 단어나 문장(S1) 텍스트를 선택하면, 선택한 텍스트(S1)에 링크된 스트리밍 바의 위치 정보에 의해, 사용자가 선택한 텍스트가 재생되는 부분(S2)으로 스트리밍 바를 정확하게 이동 시킬 수 있도록 한다. 이를 통해, 장시간 녹음된 파일에서 다시 듣고자 하는 부분을 정확하고 빠르게 찾아낼 수 있어 시청각 컨텐츠를 이용한 학습 효율을 극대화 시킬 수 있다. After that, when the user using the text link generation application touches a specific word or sentence in the text, the streaming bar is displayed at the point in time when the word or sentence touched by the user is played through the streaming time information of the recording file linked to the word or sentence of the text. is moved Conventionally, users who use recorded files and text files for learning had to move the streaming bar directly in order to repeatedly listen to the recorded part of important content. It is difficult to accurately adjust the position of the streaming bar of the audio file considering minutes and seconds with the user's finger. In general, users cannot fine-tune the location of the streaming bar directly, so they often moved the streaming bar about ten seconds before a specific word or sentence is played, and listened to the content they want to hear again and other content together. However, in the embodiment, when the user selects a word or sentence (S1) text that the user wants to hear again, by the location information of the streaming bar linked to the selected text (S1), the text selected by the user is played back to the part (S2) Allows you to move the streaming bar accurately. Through this, it is possible to accurately and quickly find a part to be heard again from a file recorded for a long time, thereby maximizing the learning efficiency using audiovisual content.

도 2는 실시예에 따른 텍스트 링크 생성 어플리케이션의 데이터 처리 블록을 나타낸 도면이다. 2 is a diagram illustrating a data processing block of a text link generation application according to an embodiment.

도 2를 참조하면, 실시예에 따른 텍스트 링크 생성 어플리케이션은 변환모듈(110), 분할모듈(130), 가공모듈(150) 및 연동모듈(170)을 포함하여 구성될 수 있다. 본 명세서에서 사용되는 '모듈' 이라는 용어는 용어가 사용된 문맥에 따라서, 소프트웨어, 하드웨어 또는 그 조합을 포함할 수 있는 것으로 해석되어야 한다. 예를 들어, 소프트웨어는 기계어, 펌웨어(firmware), 임베디드코드(embedded code), 및 애플리케이션 소프트웨어일 수 있다. 또 다른 예로, 하드웨어는 회로, 프로세서, 컴퓨터, 집적 회로, 집적 회로 코어, 센서, 멤스(MEMS; Micro-Electro-Mechanical System), 수동 디바이스, 또는 그 조합일 수 있다.Referring to FIG. 2 , the text link generation application according to the embodiment may include a conversion module 110 , a division module 130 , a processing module 150 , and an interworking module 170 . As used herein, the term 'module' should be construed to include software, hardware, or a combination thereof, depending on the context in which the term is used. For example, the software may be machine language, firmware, embedded code, and application software. As another example, the hardware may be a circuit, a processor, a computer, an integrated circuit, an integrated circuit core, a sensor, a Micro-Electro-Mechanical System (MEMS), a passive device, or a combination thereof.

변환모듈(110)은 녹음파일을 생성하고, 녹취된 음성신호를 텍스트(text)로 변환한다. 실시예에서 변환모듈(110)은 음성인식 데이터서버와 통신하며 녹취된 음성을 텍스트로 변환할 수 있다. 실시예에서 변환모듈(110)은 녹음파일에서 음성을 제외한 노이즈는 제거한 후, 음성인식 및 텍스트 변환과정을 수행하도록 하여 음성인식 및 텍스트 변환 정확도를 높일 수 있다. 또한 실시예에 따른 변환모듈(110)은 번역 기능을 수행 하여 외국어가 녹음되는 경우, 외국어 녹음 파일을 사용자가 지정한 언어의 텍스트로 번역한 후 텍스트로 변환할 수 있다. 예컨대, 영어, 일어, 중국어 음성이 녹음된 경우, 변환모듈(110)은 이를 녹음된 언어인 원문 텍스트로 변환하거나 한국어 또는 사용자가 지정한 언어로 번역된 텍스트로 변환 할 수 있다. 변환모듈(110)은 번역 및 텍스트 변환 시 외부 번역기 서버와 통신가능하고, 외부 번역 서버 데이터를 이용해 일련의 번역과정을 수행할 수 있다. The conversion module 110 generates a recorded file and converts the recorded voice signal into text. In an embodiment, the conversion module 110 may convert the recorded voice into text while communicating with the voice recognition data server. In an embodiment, the conversion module 110 may increase the accuracy of speech recognition and text conversion by removing noise except for the voice from the recorded file and then performing the speech recognition and text conversion process. In addition, when a foreign language is recorded by performing a translation function, the conversion module 110 according to an embodiment may translate the foreign language recording file into text in a language designated by the user and then convert it into text. For example, when English, Japanese, or Chinese voices are recorded, the conversion module 110 may convert them into original texts in the recorded language or into texts translated into Korean or a language designated by the user. The conversion module 110 may communicate with an external translator server during translation and text conversion, and may perform a series of translation processes using external translation server data.

분할모듈(130)은 녹음파일을 음성구간파일로 분할한다. 예컨대, 분할모듈(130)은 녹음파일에 녹음된 음성신호를 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할한다. 구체적으로, 문장의 끝부분 또는 단락의 끝부분을 언어 함의 인식이나 음성인식 기술을 통해 파악하여 복수개의 음성구간파일을 생성할 수 있다. The division module 130 divides the recorded file into voice section files. For example, the dividing module 130 analyzes the voice signal recorded in the recorded file and divides it into a plurality of voice section files including at least one sentence or paragraph. Specifically, a plurality of voice section files may be generated by grasping the end of a sentence or the end of a paragraph through speech recognition or speech recognition technology.

또한 실시예에서 분할모듈(130)은 녹음파일 분석을 통해 미리 설정된 시간 또는 발화된 문장 개수에 따라 녹음파일을 분할하여 음성구간파일을 생성할 수 있다. 예컨대, 실시예에서는 녹음 전 사용자가 인터뷰모드, 강의모드, 회의모드, 대화모드 등의 녹음 모드를 선택할 수 있다. 실시예에서는 녹음 모드에 따라 다르게 녹음파일을 분할하여 음성구간파일을 생성한다. 예컨대, 인터뷰 모드에서는 하나의 질문과 답변을 포함하는 음성구간파일을 생성할 수 있다. 녹음파일을 분할하는 시간 및 문장 개수는 사용자가 직접 지정할 수 있고 녹음 모드에 따라 다르게 설정될 수 있다. Also, in an embodiment, the division module 130 may generate a voice section file by dividing the recorded file according to a preset time or the number of uttered sentences through analysis of the recorded file. For example, in the embodiment, before recording, the user may select a recording mode such as an interview mode, a lecture mode, a conference mode, and a conversation mode. In the embodiment, the audio section file is generated by dividing the recording file differently according to the recording mode. For example, in the interview mode, a voice section file including one question and one answer may be generated. The time for dividing the recorded file and the number of sentences can be directly specified by the user and can be set differently according to the recording mode.

또한, 실시예에 따른 분할모듈(130)은 변환된 텍스트 단어의 발화 시작 시점 정보를 이용하여, 사용자의 지정 및 녹음 모드에 따라 녹음파일을 분할한 음성구간파일을 생성할 수 있다. 아울러, 변환된 텍스트 단어 각각의 메타데이터를 이용하여 사용자가 선택한 텍스트를 포함하는 음성구간파일을 생성할 수 있다.In addition, the division module 130 according to the embodiment may generate a voice section file in which the recorded file is divided according to a user's designation and a recording mode by using the information on the start time of utterance of the converted text word. In addition, a voice section file including the text selected by the user may be generated by using the metadata of each converted text word.

가공모듈(150)은 분할된 음성구간파일 각각의 시작시점과 종료 시점정보를 각 음성구간파일에 부가하고, 텍스트로 변환된 단어 각각의 발화 시작 시점을 텍스트 단어의 메타데이터로 부가한다. 녹음파일에 녹음된 단어들은 발화의 시작 시점이 단어 각각의 고유 정보가 된다. 실시예에서는 녹음파일이 생성될 때 각 단어가 발화 시작 될 때의 시점 정보를 메타데이터로 지정하고, 단어 텍스트 각각에 시점정보를 부가할 수 있다.The processing module 150 adds the start time and end time information of each of the divided voice segment files to each voice segment file, and adds the utterance start time of each word converted into text as metadata of the text word. In the words recorded in the recorded file, the start time of the speech becomes the unique information of each word. In an embodiment, when a recording file is generated, time point information when each word starts to be spoken may be designated as metadata, and time point information may be added to each word text.

연동모듈(170)은 텍스트 단어의 발화시점 정보인 메타데이터를 이용하여 각 텍스트 단어에 녹음파일 또는 음성구간파일 스트리밍 바의 위치정보를 링크한다. 이를 통해 연동모듈(170)은 변환된 텍스트의 특정 단어를 터칭 하면, 터칭한 단어가 재생되는 위치로 스트리밍 바를 바로 이동 시킬 수 있도록 한다. 구체적으로, 실시예에서 변환된 텍스트 단어 또는 문장에는 그 단어가 재생되는 시점의 스트리밍 바 위치 정보가 링크되어 있기 때문에, 사용자가 특정 단어나 문장을 선택하면, 사용자가 선택한 단어의 발화 시작 부분으로 스트리밍 바를 바로 이동시킬 수 있다. The interworking module 170 links the location information of the streaming bar of the recording file or the voice section file to each text word using metadata that is information on the timing of utterance of the text word. Through this, when a specific word of the converted text is touched, the interworking module 170 can directly move the streaming bar to a position where the touched word is reproduced. Specifically, since the text word or sentence converted in the embodiment is linked to the streaming bar location information at the time the word is reproduced, when the user selects a specific word or sentence, it is streamed to the beginning of the utterance of the word selected by the user. The bar can be moved directly.

실시예에서 텍스트 링크는 단어 텍스트 링크 및 문장 텍스트 링크를 포함할 수 있다. 단어 텍스트 링크는 사용자가 변환된 텍스트 단어 각각을 선택하면, 선택된 단어가 발화 시작 되는 시점으로 스트리밍 바를 이동시키도록, 변환된 텍스트 단어가 음성 출력되는 스트리밍 바의 위치 정보를 포함한다. In an embodiment, the text link may include a word text link and a sentence text link. When the user selects each of the converted text words, the word text link includes location information of the streaming bar from which the converted text word is outputted so as to move the streaming bar to the point in time when the selected word is spoken.

문장 텍스트 링크는 변환된 텍스트의 문장에 스트리밍 바의 위치 정보가 링크된 것이다. 구체적으로, 사용자가 특정 텍스트 문장을 선택하면, 문장 텍스트 링크에는 문장의 시작단어가 발화되는 시점의 스트리밍 바 위치 정보가 링크되어, 문장 시작단어의 발화시점으로 스트리밍 바를 이동시키게 된다.The sentence text link is a link of the location information of the streaming bar to the sentence of the converted text. Specifically, when the user selects a specific text sentence, the streaming bar location information at the time the start word of the sentence is uttered is linked to the sentence text link, and the streaming bar is moved to the utterance time of the sentence start word.

도 3은 텍스트 링크 생성 어플리케이션의 동작 실시 예를 설명하기 위한 도면이다.3 is a diagram for explaining an operation embodiment of a text link generation application.

실시예에서는 사용자가 특정 상황에서 녹음 기능을 실행하면, 텍스트 링크 생성 어플리케이션은 녹음 이후 음성 분석 및 사용자 설정에 따라 복수개의 음성구간파일(음성녹음 05, 06, 07 …… )을 자동 생성한다. 생성된 음성구간파일은 녹음시점에 따라 순차적으로 넘버링 된다.In the embodiment, when the user executes the recording function in a specific situation, the text link generation application automatically creates a plurality of voice section files (voice recordings 05, 06, 07 ...) according to the voice analysis and user settings after recording. The generated voice section files are sequentially numbered according to the recording time.

녹음파일 또는 음성구간파일은 사용자가 설정한 언어 텍스트로 변환된다. 실시예에서는 영문이 녹음된 경우, 이를 영문 텍스트로 그대로 변환하거나 번역과정을 거쳐 사용자가 지정한 언어 텍스트로 변환할 수도 있다. 도 3에 도시된 바와 같이, 실시예에서는 분할된 음성구간파일 번호가 변환된 텍스트 구간에 동일하게 표시된다. 변환된 텍스트에 기재된 녹음파일 번호(음성녹음 05)에는 변환된 텍스트가 녹음되어 있다. The recorded file or voice section file is converted into text in the language set by the user. In the embodiment, when English is recorded, it may be converted into English text as it is or may be converted into text in a language designated by a user through a translation process. 3, in the embodiment, the divided voice section file number is equally displayed in the converted text section. The converted text is recorded in the recorded file number (voice recording 05) written in the converted text.

이하에서는 텍스틀 링크 생성 방법에 대해서 차례로 설명한다. 실시예에 따른 텍스트 링크 생성 방법의 작용(기능)은 텍스트 링크 생성 어플리케이션의 기능과 본질적으로 같은 것이므로 도 1 내지 도 3과 중복되는 설명은 생략하도록 한다.Hereinafter, a text link generation method will be described in turn. Since the operation (function) of the method for generating a text link according to the embodiment is essentially the same as that of the text link generating application, a description overlapping with FIGS. 1 to 3 will be omitted.

도 4는 실시예에 따른 텍스트 링크 생성 방법의 데이터 처리 흐름을 나타낸 도면이다.4 is a diagram illustrating a data processing flow of a method for generating a text link according to an embodiment.

S410 단계에서는 변환모듈에서 녹음파일을 생성하고, S430 단계에서는 녹음파일의 음성신호를 텍스트(text)로 변환한다. 실시예에서 S410 단계에서 외국어가 녹음된 경우에는 S430 단계에서 번역 과정을 거친 후 사용자가 설정한 텍스트로 변환할 수 있다. 또한, S410 단계에서는 녹음된 음성파일에서 목소리 이외의 노이즈를 삭제한 후 S430 단계에서 노이즈가 제거된 음성파일을 분석하여 텍스트로 변환할 수 있다. In step S410, the conversion module generates a recorded file, and in step S430, the audio signal of the recorded file is converted into text. In the embodiment, when a foreign language is recorded in step S410, it may be converted into text set by the user after a translation process in step S430. In addition, after deleting noise other than the voice from the recorded voice file in step S410, the voice file from which the noise has been removed in step S430 may be analyzed and converted into text.

S410 단계에서는 음성을 텍스트로 변환하는 과정에서, 텍스트로 변환되는 각 단어의 발화 시작 시점 정보를 추출하고, 추출된 각 단어의 발화 시점 시작 정보를 텍스트로 변환된 단어 각각에 메타데이터로 부가할 수 있다.In step S410, in the process of converting speech into text, information on the start point of speech of each word converted into text is extracted, and the speech start point information of each extracted word can be added as metadata to each word converted into text. have.

S450 단계에서는 분할모듈에서 녹음파일을 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하고, 가공모듈에서 분할된 음성구간파일 각각의 시작시점과 종료 시점정보를 각 음성구간파일에 부가하고, 텍스트로 변환된 단어 각각의 발화 시작 시점을 텍스트 단어의 메타데이터로 삽입한다. 실시예에서는 녹음파일의 음성 분석을 통해 복수개의 문장 또는 문단 별 음성구간파일을 순차적으로 생성하거나, 미리 설정된 시간 또는 발화된 문장 개수에 따라 상기 녹음파일을 분할하여 음성구간파일을 생성할 수 있다. In step S450, the dividing module analyzes the recorded file and divides it into a plurality of voice section files including at least one sentence or paragraph, and the processing module divides the start and end time information of each of the divided voice section files into each voice section file. In addition, the utterance start time of each word converted into text is inserted as metadata of the text word. In an embodiment, a plurality of sentences or paragraph-specific voice section files may be sequentially generated through voice analysis of the recorded file, or the voice section file may be generated by dividing the recorded file according to a preset time or the number of uttered sentences.

실시예에서는 사용자가, 변환된 텍스트의 일정 부분을 선택하는 경우 선택된 텍스트가 포함된 재생부분의 음성구간파일을 생성한다. 구체적으로, 실시예에서는 사용자가 변환된 텍스트의 일정 부분을 선택하는 경우, 선택된 텍스트 시작 단어가 발화되는 시점정보와 선택된 텍스트에 포함된 마지막 단어가 발화되는 시점정보인 텍스트 단어 별 메타데이터를 이용하여, 선택된 텍스트가 포함된 음성구간파일을 녹음파일에서 추출할 수 있다.In the embodiment, when the user selects a certain part of the converted text, a voice section file of the playback part including the selected text is generated. Specifically, in the embodiment, when the user selects a certain part of the converted text, using metadata for each text word that is information on when the selected text start word is uttered and when the last word included in the selected text is uttered. , a voice section file including the selected text can be extracted from the recorded file.

S470 단계에서는 연동모듈에서 텍스트 단어의 메타데이터를 기반으로 녹음파일 또는 음성구간파일 스트리밍 바의 위치정보를 링크하여 음성파일과 연동된 단어 별 텍스트 링크를 생성한다. 실시예에서는 텍스트에 포함된 단어 각각의 발화 시작 시점 정보를 기반으로 텍스트 단어 각각에 스트리밍 바의 위치 정보를 연동하여, 단어 별 텍스트 링크를 생성할 수 있다.In step S470, based on the metadata of the text word in the interworking module, the location information of the streaming bar of the recording file or the voice section file is linked to generate a text link for each word linked with the voice file. In an embodiment, based on the utterance start time information of each word included in the text, the location information of the streaming bar may be linked to each text word to generate a text link for each word.

도 5는 실시예에 따른 음성녹음파일과 연결된 텍스트 링크 생성 어플리케이션 사용 예를 설명하기 위한 도면이다.5 is a diagram for explaining an example of using an application for generating a text link connected to a voice recording file according to an embodiment.

실시예에서는 녹음된 음성파일이 텍스트로 변환(10)되어 출력된다. 실시예에서는 녹음 모드 및 사용자 설정에 따라 음성구간파일이 자동 생성되거나, 문단, 문장의 종결 부분에 녹음파일에서의 재생시점이 표시될 수 있다. 사용자가 특정 문단이 녹음된 시점을 터치하면, 시점 표시에 링크된 문단의 첫 단어를 발화하는 재생위치로 스트리밍 바가 자동 조절될 수 있다. 또한, 사용자가 특정 문장(20)을 터치하면, 문장의 첫 단어가 발화되는 재생시점으로 스트리밍 바가 이동된다.In the embodiment, the recorded voice file is converted into text (10) and output. In an embodiment, a voice section file may be automatically generated according to a recording mode and user settings, or a playback time of the recorded file may be displayed at the end of a paragraph or sentence. When the user touches the time point at which a specific paragraph is recorded, the streaming bar may be automatically adjusted to a playback position where the first word of the paragraph linked to the viewpoint display is uttered. In addition, when the user touches the specific sentence 20, the streaming bar is moved to the playback time point at which the first word of the sentence is uttered.

도 6은 실시예에 따른 음성녹음파일과 연결된 텍스트 링크 생성 어플리케이션의 다른 사용 예를 설명하기 위한 도면이다.6 is a diagram for explaining another use example of a text link generation application connected to a voice recording file according to an embodiment.

도 6에 도시된 바와 같이, 실시예에서는 변환된 텍스트 문장 하나하나에 모두 문장 발화 시작 시점(T1, T2)을 표시하고, 시점이 표시된 영역에 녹음파일 스트리밍 바의 위치 정보를 링크하여 사용자가 특정 문장의 발화 시작 시점을 터치하면 녹음파일의 스트리밍 바가 터치한 시점으로 자동 이동하여 사용자가 선택한 문장을 청취할 수 있도록 할 수 있다. 만일, 변환된 텍스트가 번역된 문장인 경우, 사용자가 특정 문장을 터치하면, 녹음된 원문에 대응하는 문장의 발화 시점으로 이동하게 된다. As shown in FIG. 6 , in the embodiment, each of the converted text sentences is marked with the starting point of utterance (T1, T2), and the location information of the streaming bar of the recording file is linked to the area where the time point is displayed to allow the user to specify When the starting point of the speech of the sentence is touched, the streaming bar of the recorded file automatically moves to the point of touch, so that the user can listen to the selected sentence. If the converted text is a translated sentence, when the user touches a specific sentence, it moves to an utterance point of the sentence corresponding to the recorded original text.

도 7은 실시예에 따른 텍스트 링크 생성 어플리케이션의 또 다른 사용 예를 설명하기 위한 도면이다.7 is a diagram for explaining another use example of the text link generation application according to the embodiment.

도 7을 참조하면, 실시예에서는 녹음파일을 텍스트로 변환한 후 사용자가 텍스트 중 특정 문단이나 복수개의 문장을 선택하면, 사용자가 선택한 텍스트가 재생되는 부분을 메타데이터를 이용해 스트리밍 바에서 추출할 수 있다. 이후, 사용자는 추출된 부분을 편리하게 자동반복 청취할 수 있다. 만일, 변환된 텍스트가 번역된 문장이라면, 실시예는 원문 녹음파일에서 선택된 텍스트에 해당하는 재생부분을 추출한다. 실시예에서 제공하는 자동 추출기능은 어학 학습 시 활용도가 높다. Referring to FIG. 7 , in the embodiment, when the user selects a specific paragraph or a plurality of sentences from the text after converting the recorded file into text, the part where the text selected by the user is played can be extracted from the streaming bar using metadata. have. Thereafter, the user can conveniently and automatically repeatedly listen to the extracted part. If the converted text is a translated sentence, the embodiment extracts a reproduction part corresponding to the selected text from the original text recording file. The automatic extraction function provided in the embodiment is highly useful in language learning.

이상에서와 같은 음성파일이 연동된 텍스트 링크 생성 어플리케이션 및 방법은 장시간 녹음된 음성을 텍스트로 자동 전환하고, 사용자가 터칭 하는 텍스트가 발화되는 스트리밍 위치를 추출할 수 있다. 이로써, 사용자는 녹음파일에서 다시 듣고자 하는 재생위치를 정확히 찾아 손쉽게 반복 재생할 수 있다. The text link generation application and method in which the voice file is interlocked as described above can automatically convert a voice recorded for a long time into text, and extract a streaming position where the text touched by the user is uttered. Accordingly, the user can easily find the playback position to listen to again in the recorded file and easily repeat the playback.

또한 실시예를 통해 회의록, 강의록 등을 녹음기록을 직접 타이핑 할 필요가 없어지고, 녹음파일을 이용해 학습 시 다시 듣고자 하는 부분을 사용자 스스로 추적해 가며 찾을 필요가 없기 때문에, 업무와 학습 효율을 향상 시킬 수 있다. In addition, through the embodiment, there is no need to directly type the recording of meeting minutes, lecture notes, etc., and there is no need to track and find the part that the user wants to hear again when learning using the recording file, so work and learning efficiency are improved. can do it

개시된 내용은 예시에 불과하며, 특허청구범위에서 청구하는 청구의 요지를 벗어나지 않고 당해 기술분야에서 통상의 지식을 가진 자에 의하여 다양하게 변경 실시될 수 있으므로, 개시된 내용의 보호범위는 상술한 특정의 실시예에 한정되지 않는다.The disclosed content is merely an example, and can be variously changed and implemented by those of ordinary skill in the art without departing from the gist of the claims claimed in the claims, so the protection scope of the disclosed content is limited to the specific It is not limited to an Example.

Claims

삭제delete

음성녹음파일과 연결된 텍스트 링크 생성 방법에 있어서,
(A) 변환모듈에서 녹음파일을 생성하고, 상기 녹음파일의 음성신호를 텍스트(text)로 변환하는 단계;
(B) 분할모듈에서 녹음파일을 분석하여 미리 설정된 시간 또는 발화된 문장 개수에 따라 녹음파일을 분할하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하는 단계로서, 상기 분할모듈은 인터뷰모드, 강의모드, 회의모드, 대화모드를 포함하는 녹음 모드 중 녹음 전 사용자에 의해 선택된 녹음 모드에 따라 녹음파일을 다르게 분할하는, 단계;
(C) 가공모듈에서 분할된 음성구간파일의 시작시점과 종료 시점정보를 각 음성구간파일에 부가하고, 상기 텍스트로 변환된 단어 각각의 발화 시작 시점을 변환된 텍스트 단어의 메타데이터로 삽입하는 단계; 및
(D) 연동모듈에서 상기 텍스트 단어의 메타데이터를 기반으로 상기 녹음파일 또는 음성구간파일 스트리밍 바의 위치정보를 변환된 텍스트 단어에 링크하여 음성파일과 연동된 단어 별 텍스트 링크를 생성하는 단계; 를 포함하는 텍스트 링크 생성방법.
In the method of generating a text link linked to a voice recording file,
(A) generating a recorded file in the conversion module, and converting the audio signal of the recorded file into text;
(B) the dividing module analyzes the recorded file and divides the recorded file according to a preset time or the number of uttered sentences to divide the recorded file into a plurality of voice section files including at least one sentence or paragraph, wherein the dividing module comprises: dividing a recording file differently according to a recording mode selected by a user before recording among recording modes including an interview mode, a lecture mode, a conference mode, and a conversation mode;
(C) adding the start time and end time information of the speech section file divided in the processing module to each speech section file, and inserting the speech start time of each word converted into the text as metadata of the converted text word ; and
(D) linking the location information of the recording file or the streaming bar of the voice section file based on the metadata of the text word in the interworking module to the converted text word to generate a text link for each word linked with the voice file; How to create a text link containing

제 5항에 있어서, 상기 (A) 변환모듈에서 녹음파일을 생성하고, 상기 녹음파일의 음성신호를 텍스트(text)로 변환하는 단계; 는
텍스트로 변환되는 각 단어의 발화시점 정보를 추출하는 단계;
상기 추출된 각 단어의 발화시점정보를 텍스트로 변환된 단어 각각에 메타데이터로 부가하는 단계; 를 포함하는 것을 특징으로 하는 텍스트 링크 생성방법.
The method of claim 5, further comprising: (A) generating a recorded file in the conversion module, and converting the audio signal of the recorded file into text; is
extracting utterance time information of each word converted into text;
adding the extracted utterance timing information of each word to each of the words converted into text as metadata; Text link generation method comprising a.

삭제delete

제 5항에 있어서, 상기 (B) 분할모듈에서 녹음파일을 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하는 단계; 는
상기 변환된 텍스트의 일정 부분을 선택하는 경우, 선택된 텍스트가 포함된 재생부분의 음성구간파일을 생성하는 것을 특징으로 하는 텍스트 링크 생성방법.
[6] The method of claim 5, further comprising: (B) analyzing the recorded file by the dividing module and dividing it into a plurality of voice section files including at least one sentence or paragraph; is
When a certain part of the converted text is selected, a text link generation method characterized in that a voice section file of the playback part including the selected text is generated.

제 8항에 있어서, 상기 (B) 분할모듈에서 녹음파일을 분석하여 적어도 하나의 문장 또는 문단을 포함하는 복수개의 음성구간파일로 분할하는 단계; 는
상기 변환된 텍스트의 일정 부분을 선택하는 경우, 선택된 텍스트 시작 단어가 발화되는 시점정보와 선택된 텍스트에 포함된 마지막 단어가 발화되는 시점정보인 텍스트 단어 별 메타데이터를 이용하여, 상기 선택된 텍스트가 포함된 음성구간파일을 녹음파일에서 추출하는 것을 특징으로 하는 텍스트 링크 생성방법.
The method of claim 8, further comprising: (B) analyzing the recorded file by the dividing module and dividing the file into a plurality of voice section files including at least one sentence or paragraph; is
When a certain part of the converted text is selected, the selected text is included using metadata for each text word, which is information on when the start word of the selected text is uttered and when the last word included in the selected text is uttered. A method of creating a text link, characterized in that the voice section file is extracted from the recorded file.

제 5항에 있어서, 상기 (D) 연동모듈에서 상기 텍스트 단어의 메타데이터를 기반으로 상기 녹음파일 또는 음성구간파일 스트리밍 바의 위치정보를 변환된 텍스트 단어에 링크하여 음성파일과 연동된 단어 별 텍스트 링크를 생성하는 단계; 는
텍스트에 포함된 단어 각각의 발화 시작 시점 정보를 기반으로 상기 텍스트 단어 각각에 스트리밍 바의 위치 정보를 링크하여, 상기 단어 별 텍스트 링크를 생성하는 것을 특징으로 하는 텍스트 링크 생성방법.
The text for each word linked to the voice file by linking the location information of the recording file or the streaming bar of the voice section file to the converted text word based on the metadata of the text word in the (D) interworking module creating a link; is
A method of generating a text link, characterized in that the text link is generated for each word by linking the location information of the streaming bar to each of the text words based on the utterance start time information of each word included in the text.

제 5항에 있어서, 상기 (A) 단계; 는
녹음된 음성파일에서 목소리 이외의 노이즈를 제거하는 단계; 및
노이즈가 제거된 음성파일을 분석하여 텍스트로 변환하는 단계; 를 포함하는 것을 특징으로 하는 텍스트 링크 생성방법.

The method of claim 5, wherein the step (A); is
removing noise other than the voice from the recorded voice file; and
analyzing the noise-removed voice file and converting it into text; Text link generation method comprising a.