KR102529241B1

KR102529241B1 - Evaluation method for consecutive interpretation training, recording medium and device for performing the method

Info

Publication number: KR102529241B1
Application number: KR1020200137859A
Authority: KR
Inventors: 이주리애; 박혜경; 고유정; 김영주; 김혜지; 추지온; 상우연
Original assignee: 이화여자대학교 산학협력단
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2023-05-08
Also published as: KR102529241B9; KR20220053412A

Abstract

순차통역 학습을 위한 평가 방법은, 원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환하는 단계; 상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트하는 단계; 녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트하는 단계; 녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성하는 단계; 및 상기 전사 파일을 통계 처리하여 결과를 도출하는 단계;를 포함한다. 이에 따라, 순차통역 학습에서 즉각적이고 신뢰도 높은 평가 피드백을 제공할 수 있다.The evaluation method for sequential interpretation learning includes: converting to a recording mode in which the interpretation is automatically recorded after reproducing a tap to be interpreted from the original voice; counting an interpretation start delay time from the start of the recording mode to the start of interpretation; Recognizing and counting a blank section after the interpretation start delay time as a silent section through waveform analysis of the recorded interpretation voice; Generating a transcription file in which interpretation start delay, silence, and filler are displayed in text for each interpreted phrase by applying a filler prediction model that distinguishes filler in a non-silent section of the recorded interpretation voice; and deriving a result by statistically processing the transcription file. Accordingly, it is possible to provide immediate and highly reliable evaluation feedback in sequential interpretation learning.

Description

순차통역 학습을 위한 평가 방법, 이를 수행하기 위한 기록 매체 및 장치{EVALUATION METHOD FOR CONSECUTIVE INTERPRETATION TRAINING, RECORDING MEDIUM AND DEVICE FOR PERFORMING THE METHOD}Evaluation method for sequential interpretation learning, recording medium and device for performing it

본 발명은 순차통역 학습을 위한 평가 방법, 이를 수행하기 위한 기록 매체 및 장치에 관한 것으로서, 더욱 상세하게는 순차통역 학습을 위해 통역개시지연 시간 파악 및 가시화 기능을 제공하고 전사 파일 채점과 자가평가를 위한 통계분석을 제공하는 기술에 관한 것이다.The present invention relates to an evaluation method for sequential interpretation learning, a recording medium and an apparatus for performing the same, and more particularly, provides a function of identifying and visualizing interpretation start delay time for sequential interpretation learning, and provides transcription file scoring and self-assessment. It relates to techniques that provide statistical analysis for

스마트폰 및 무선 인터넷과 관련된 기술의 발달이 고도화됨에 따라, 시간과 장소에 구애받지 않는 온라인 기반의 학습 프로그램이 등장하고 있다. 특히, 외국어 학습에 있어서 온라인 기반 학습은 비교적 저렴한 비용으로 학습자의 수준에 따른 교육을 제공받을 수 있어 매우 각광받고 있다.With the advancement of technology related to smart phones and wireless Internet, online-based learning programs independent of time and place are appearing. In particular, in foreign language learning, online-based learning is very popular because it can provide education according to the learner's level at a relatively low cost.

한편, 통역 학습에 있어서는 끊임없는 반복 훈련과 적절한 피드백이 중요하다. 이에 따라, 시간 및 장소에 제약 없이 스스로 학습할 수 있는 온라인 기반의 프로그램들이 출시되었다. 그러나, 현재 출시된 프로그램들을 이용하더라도 음성 녹음된 통역 결과물을 문자화하여 원문과 비교하는 데에는 어려움이 있다.On the other hand, constant repetitive training and appropriate feedback are important in interpreting learning. Accordingly, online-based programs have been released that allow self-learning without time and place limitations. However, even if currently released programs are used, it is difficult to convert the result of voice-recorded interpretation into text and compare it with the original text.

구체적으로, 통역 학습 평가 시에는 언어적인 부분과 비 언어적인 부분을 학습자와 교육자가 공히 확인하기 위해서는, 통역 음성 파일 뿐만 아니라 통역을 전사한 파일이 필요하다.Specifically, when evaluating interpretation learning, in order for learners and educators to both check the verbal and non-verbal parts, not only the interpretation voice file but also the file in which the interpretation is transcribed are required.

통역 전사 파일은 오역, 누락, 필러 등 내용의 정확성을 파악하기 위해 필요한데, 음성 파일을 텍스트화 하는 과정에서 교수자와 학습자 모두 많은 시간과 수고가 들어 실제 학습자들의 통역을 평가하는데 시간이 부족하다.Interpretation transcription files are necessary to determine the accuracy of content such as mistranslations, omissions, and fillers. In the process of converting audio files into text, both instructors and learners spend a lot of time and effort, so there is not enough time to evaluate the actual learners' interpretations.

또한, 통역 음성 파일만으로는 통역개시지연 시간을 직관적으로 파악하기가 불가능하다. 순차 통역에서는 연사의 발화가 끝난 후 가능한 빨리 통역이 시작되는 것이 좋으므로, 원활한 통역을 수행했는가를 보기 위해서는 원문 음성 출력 이후의 통역 개시 시점이 매우 중요한 평가요소이다.In addition, it is impossible to intuitively grasp the interpretation start delay time only with the interpretation voice file. In sequential interpretation, it is desirable to start interpretation as soon as possible after the speaker has finished speaking. Therefore, the timing of interpretation starting after outputting the original audio is a very important evaluation factor in order to determine whether interpretation has been performed smoothly.

학습자가 제출한 통역 음성 파일만으로는 연사의 발화가 끝나는 시점을 파악할 수 없다. 따라서, 현재 통번역학과에서는 다음과 같은 두 가지 방법으로 통역 개시 시간을 파악한다.It is not possible to determine the point at which the speaker's utterance ends only with the interpretation audio file submitted by the learner. Therefore, in the Department of Interpretation and Translation, the interpretation start time is determined in the following two ways.

교수자는 학습자에게 원문이 끝나자마자 녹음을 개시하게 하여 녹음파일을 재생한 후 통역이 시작될 때까지의 침묵 시간을 측정하여 파악한다. 그러나, 학습자가 녹음을 시작하는 시간이 부정확하여 정확한 측정이 어렵다. 다른 방법으로, 통역 음성 파일에 원문 음성 파일이 같이 녹음되게 하여 시작 시점을 파악하는데, 이는 학습자의 통역 음성에 집중하기 어려운 환경이다. 이 역시 교수자가 직접 카운트해야 하기 때문에 부정확한 문제가 있다.The instructor instructs the learner to start recording as soon as the original text is finished, plays the recorded file, and then measures and understands the silence time until the interpretation begins. However, since the time at which the learner starts recording is inaccurate, it is difficult to measure accurately. As another method, the original text voice file is recorded together with the interpretation voice file to determine the starting point, which is an environment in which it is difficult to focus on the learner's interpretation voice. This also has an inaccurate problem because the instructor has to count it himself.

또한, 통역 학습 평가 시에 현재 객관적 평가 지표를 확인하기 어렵다. 교수자가 모든 학습자의 통역 전사 파일을 비교해 점수를 매겨야 하기 때문에 평가하는데 오랜 시간이 든다. 학습자는 정확한 평가 기준을 알 수 없고 다른 학습자와 직관적 비교가 어렵다는 문제가 있다.In addition, it is difficult to confirm the current objective evaluation index when evaluating interpretation learning. It takes a long time to evaluate because the instructor has to compare all learners' translation transcription files and score them. The problem is that the learner cannot know the exact evaluation criteria and it is difficult to compare intuitively with other learners.

기존 구글 STT API, 네이버 STT API 등은 결과 텍스트에 침묵구간의 위치 및 길이가 나타나지 않는다. 또한, 결과 텍스트에 필러(음, 그, 어 등)가 나타나지 않으며, 음성 그대로가 아니라 일부 수정된 텍스트가 출력된다는 한계가 있다.In the existing Google STT API and Naver STT API, the location and length of the silence section do not appear in the resulting text. In addition, there is a limitation in that fillers (um, he, eo, etc.) do not appear in the resulting text, and partially corrected text is output instead of the audio as it is.

KR 10-1438088 B1KR 10-1438088 B1 JP 2010-282058 AJP 2010-282058 A KR 10-1438087 B1KR 10-1438087 B1

이에, 본 발명의 기술적 과제는 이러한 점에서 착안된 것으로 본 발명의 목적은 통역 훈련에서 즉각적으로 통역 피드백을 위한 정량적 데이터를 도출하여 순차통역 학습을 위한 평가 방법을 제공하는 것이다.Accordingly, the technical problem of the present invention is conceived in this respect, and an object of the present invention is to provide an evaluation method for consecutive interpretation learning by deriving quantitative data for immediate interpretation feedback in interpretation training.

본 발명의 다른 목적은 상기 순차통역 학습을 위한 평가 방법을 수행하기 위한 컴퓨터 프로그램이 기록된 기록 매체를 제공하는 것이다.Another object of the present invention is to provide a recording medium on which a computer program for performing the evaluation method for sequential interpretation learning is recorded.

본 발명의 또 다른 목적은 상기 순차통역 학습을 위한 평가 방법을 수행하기 위한 장치를 제공하는 것이다.Another object of the present invention is to provide an apparatus for performing the evaluation method for sequential interpretation learning.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 순차통역 학습을 위한 평가 방법은, 원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환하는 단계; 상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트하는 단계; 녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트하는 단계; 녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성하는 단계; 및 상기 전사 파일을 통계 처리하여 결과를 도출하는 단계;를 포함한다.An evaluation method for sequential interpretation learning according to an embodiment for realizing the object of the present invention described above includes the steps of: converting to a recording mode for automatically recording the interpretation after reproducing a tap to be interpreted from the original voice; counting an interpretation start delay time from the start of the recording mode to the start of interpretation; Recognizing and counting a blank section after the interpretation start delay time as a silent section through waveform analysis of the recorded interpretation voice; Generating a transcription file in which interpretation start delay, silence, and filler are displayed in text for each interpreted phrase by applying a filler prediction model that distinguishes filler in a non-silent section of the recorded interpretation voice; and deriving a result by statistically processing the transcription file.

본 발명의 실시예에서, 상기 전사 파일을 생성하는 단계는, 상기 필러 예측 모델을 통해 필러 단어를 검출하여 필러와 비필러로 구분하여 태깅하는 단계; 및 필러를 종류별로 분류하는 단계;를 포함할 수 있다.In an embodiment of the present invention, the generating of the transcription file may include: detecting filler words through the filler prediction model, classifying them into filler words and non-filler words, and tagging them; And classifying the filler by type; may include.

본 발명의 실시예에서, 상기 필러와 비필러로 구분하여 태깅하는 단계는, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧은 경우에는 바로 상기 필러 구분 예측 모델에 적용할 수 있다.In an embodiment of the present invention, in the tagging step of dividing into filler and non-filler, when the non-silent section of the recorded interpretation voice is shorter than the preset input layer length, it can be directly applied to the filler classification prediction model. .

본 발명의 실시예에서, 상기 필러와 비필러로 구분하여 태깅하는 단계는, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이와 같거나 긴 경우에는 상기 인풋 레이어의 길이보다 짧아질 때까지 반복적으로 분할할 수 있다.In an embodiment of the present invention, in the tagging step of dividing into filler and non-filler, when the non-silent section of the recorded interpretation voice is equal to or longer than the length of the preset input layer, it is shorter than the length of the input layer. It can be repeatedly divided up to .

본 발명의 실시예에서, 상기 필러와 비필러로 구분하여 태깅하는 단계는, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧아지는 시점에 상기 필러 구분 예측 모델에 적용할 수 있다.In an embodiment of the present invention, the tagging by dividing into filler and non-filler can be applied to the filler classification prediction model at a point in time when a non-silent section of a recorded interpretation voice becomes shorter than a preset input layer length. .

본 발명의 실시예에서, 상기 전사 파일을 생성하는 단계는, 공백의 경우 공백 길이를 n초로 출력하는 단계; 필러의 경우 해당 필러로 텍스트화 및 통계 처리하여 출력하는 단계; 및 필러 외 단어의 경우 지정된 데이터베이스의 API로 텍스트화하여 출력하는 단계;를 포함할 수 있다.In an embodiment of the present invention, the generating of the transcription file may include: outputting a blank length in n seconds in the case of blank; In the case of a filler, textualizing and statistical processing with the corresponding filler and outputting it; And in the case of words other than filler, textualizing and outputting through API of a designated database; may include.

본 발명의 실시예에서, 상기 순차통역 학습을 위한 평가 방법은, 원문 텍스트에서 통역할 꼭지별로 상기 단계들을 반복할 수 있다.In an embodiment of the present invention, the evaluation method for sequential interpretation learning may repeat the above steps for each vertex to be interpreted in the original text.

본 발명의 실시예에서, 상기 순차통역 학습을 위한 평가 방법은, 녹음된 통역 음성의 노이즈 제거를 통해 전처리 하는 단계;를 더 포함할 수 있다.In an embodiment of the present invention, the evaluation method for sequential interpretation learning may further include preprocessing the recorded interpretation voice through noise removal.

상기한 본 발명의 다른 목적을 실현하기 위한 일 실시예에 따른 컴퓨터로 판독 가능한 저장 매체에는, 상기 순차통역 학습을 위한 평가 방법을 수행하기 위한 컴퓨터 프로그램이 기록되어 있다. A computer program for performing the evaluation method for sequential interpretation learning is recorded in a computer-readable storage medium according to an embodiment for realizing another object of the present invention described above.

상기한 본 발명의 또 다른 목적을 실현하기 위한 일 실시예에 따른 순차통역 학습을 위한 평가 장치는, 원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환하는 녹음부; 상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트하는 지연시간 측정부; 녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트하는 침묵 판단부; 녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성하는 전사 파일 생성부; 및 상기 전사 파일을 통계 처리하여 결과를 도출하는 결과 출력부;를 포함한다.An evaluation apparatus for sequential interpretation learning according to an embodiment for realizing the above object of the present invention includes a recording unit that reproduces a tap to be translated from an original voice and then automatically converts the interpretation into a recording mode for recording; a delay time measurement unit counting an interpretation start delay time from the start of the recording mode to the start of interpretation; a silence determination unit for recognizing and counting a blank section after the interpretation start delay time as a silence section through waveform analysis of the recorded interpretation voice; a transcription file generation unit for generating a transcription file in which the interpretation start delay, silence, and filler are displayed in text for each interpreted phrase by applying a filler prediction model that distinguishes filler in a non-silent section of the recorded interpretation voice; and a result output unit for statistically processing the transcription file and deriving a result.

이와 같은 순차통역 학습을 위한 평가 방법에 따르면, 학습자가 전사 파일을 제공하기 위해 반복해서 통역 음성파일을 들을 필요 없이 바로 전사 파일을 만들어준다. 이에 따라, 교수자와 학습자가 통역 음성에 대한 전사 자료를 만드는데 소요되는 시간 및 노력이 현저히 줄어든다.According to such an evaluation method for learning sequential interpretation, a transcription file is immediately created without a learner needing to repeatedly listen to an interpretation audio file to provide a transcription file. Accordingly, the time and effort required for the instructor and the learner to create transcription data for the interpretation voice is significantly reduced.

또한, 교수자에게 학습자의 정성적인 통역 평가를 위한 충분한 시간을 제공할 수 있고, 통역 평가의 중요한 요소인 통역개시지연 시간을 측정해주어, 교수자가 원활한 평가를 할 수 있도록 돕는다. 나아가, 학습자의 통역 평가를 위한 통계분석 자료를 제공하여 교수자의 객관적 판단 및 학습자의 자가평가를 돕는다.In addition, it can provide sufficient time to the instructor for qualitative evaluation of the learner's interpretation, and helps the instructor to conduct a smooth evaluation by measuring the interpretation start delay time, which is an important factor in the interpretation evaluation. Furthermore, by providing statistical analysis data for the learner's interpretation evaluation, it helps the instructor's objective judgment and learner's self-assessment.

도 1은 본 발명에 따른 순차통역 학습을 위한 스마트러닝 시스템의 전체 개념도이다.
도 2는 본 발명의 일 실시예에 따른 순차통역 학습을 위한 평가 장치의 블록도이다.
도 3은 본 발명에서 통역 개시 지연 시간을 카운트하는 것을 설명하기 위한 도면이다.
도 4는 본 발명에서 원문 음성 슬라이싱 및 통역 꼭지별 자동 녹음을 설명하기 위한 도면이다.
도 5는 본 발명에서 한 음절을 스플릿(split)하는 것을 설명하기 위한 도면이다.
도 6은 본 발명에서 파형 분석을 통해 공백 구간과 비공백 구간으로 분리하는 것을 설명하기 위한 도면이다.
도 7은 본 발명에서 필러 구분을 통해 필러와 그 외 단어로 분리하는 것을 설명하기 위한 도면이다.
도 8은 본 발명에서 필러 분류를 통해 필러의 종류를 구별하는 것을 설명하기 위한 도면이다.
도 9는 본 발명의 성능을 평가한 시뮬레이션 결과를 보여주는 그래프이다.
도 10은 본 발명의 일 실시예에 따른 순차통역 학습을 위한 평가 방법의 흐름도이다.1 is an overall conceptual diagram of a smart learning system for sequential interpretation learning according to the present invention.
2 is a block diagram of an evaluation device for sequential interpretation learning according to an embodiment of the present invention.
3 is a diagram for explaining how to count an interpretation start delay time in the present invention.
4 is a diagram for explaining original text voice slicing and automatic recording for each interpreter in the present invention.
Figure 5 is a diagram for explaining the split (split) of one syllable in the present invention.
6 is a diagram for explaining the separation into a blank section and a non-blank section through waveform analysis in the present invention.
7 is a diagram for explaining the separation into filler and other words through filler classification in the present invention.
8 is a view for explaining the classification of filler types through filler classification in the present invention.
9 is a graph showing simulation results evaluating the performance of the present invention.
10 is a flowchart of an evaluation method for sequential interpretation learning according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description of the present invention which follows refers to the accompanying drawings which illustrate, by way of illustration, specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable one skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different from each other but are not necessarily mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in one embodiment in another embodiment without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description set forth below is not to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all equivalents as claimed by those claims. Like reference numbers in the drawings indicate the same or similar function throughout the various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명에 따른 순차통역 학습을 위한 스마트러닝 시스템의 전체 개념도이다. 도 2는 본 발명의 일 실시예에 따른 순차통역 학습을 위한 평가 장치의 블록도이다.1 is an overall conceptual diagram of a smart learning system for sequential interpretation learning according to the present invention. 2 is a block diagram of an evaluation device for sequential interpretation learning according to an embodiment of the present invention.

본 발명에 따른 순차통역 학습을 위한 평가 장치(10, 이하 장치)는 원문 음성의 종료와 동시에 통역 음성을 자동으로 녹음하고, 꼭지별 통역 음성 파일의 통역개시지연을 계산하고, 통역 음성 파일의 전사 파일을 생성하여 제공하는 통역 학습을 위한 스마트러닝 기술이다.The evaluation device (10, hereinafter) for sequential interpretation learning according to the present invention automatically records the interpretation voice at the same time as the original voice ends, calculates the interpretation start delay of the interpretation voice file for each vertex, and transcribes the interpretation voice file. It is a smart learning technology for interpretation learning that creates and provides files.

도 1을 참조하면, 스마트러닝 시스템(1)은 통역 음성 파일을 입력으로 받아들여 전처리 모듈(11)을 거친다. 전처리 모듈(11)은 노이즈 제거 및 MFCC(Mel Frequency Cepstral Coefficient)을 통해 음성의 특징을 추출한다.Referring to FIG. 1, the smart learning system 1 receives an interpretation voice file as an input and passes through a preprocessing module 11. The pre-processing module 11 extracts voice features through noise removal and Mel Frequency Cepstral Coefficient (MFCC).

이어, 모델 모듈(13)을 통해 추출된 음성의 특징의 공백 및 비공백을 판단하고, 비공백 음성을 필러 및 비필러로 구분한다. 필러는 다시 종류별로 분류된다. 분류된 필러 및 비필러는 태깅되어 기존 데이터베이스나 새로운 데이터베이스(15)의 학습 데이터로 활용될 수 있다.Subsequently, blanks and non-blanks of the voice characteristics extracted through the model module 13 are determined, and the non-blank voices are classified into fillers and non-fillers. Fillers are again classified by type. The classified fillers and non-fillers may be tagged and used as learning data of an existing database or a new database 15 .

본 발명에서는 이러한 시스템을 통해 통역 음성의 전사 파일을 생성하고, 통역에 대한 평가를 할 수 있게 한다. In the present invention, transcription files of interpretation voices are generated through such a system, and interpretation can be evaluated.

도 2를 참조하면, 본 발명에 따른 장치(10)는 녹음부(110), 지연시간 측정부(130), 침묵 판단부(150), 전사 파일 생성부(170) 및 결과 출력부(190)를 포함한다. 본 발명에 따른 장치(10)는 도 1의 스마트러닝 시스템(1)의 일부를 구성할 수 있다.Referring to FIG. 2, the device 10 according to the present invention includes a recording unit 110, a delay time measurement unit 130, a silence determination unit 150, a transcription file generation unit 170, and a result output unit 190. includes The device 10 according to the present invention may constitute a part of the smart learning system 1 of FIG. 1 .

본 발명의 상기 장치(10)는 순차통역 학습을 위한 평가를 수행하기 위한 소프트웨어(애플리케이션)가 설치되어 실행될 수 있으며, 상기 녹음부(110), 상기 지연시간 측정부(130), 상기 침묵 판단부(150), 상기 전사 파일 생성부(170) 및 상기 결과 출력부(190)의 구성은 상기 장치(10)에서 실행되는 상기 순차통역 학습을 위한 평가를 수행하기 위한 소프트웨어에 의해 제어될 수 있다. In the apparatus 10 of the present invention, software (application) for performing evaluation for sequential interpretation learning may be installed and executed, and the recording unit 110, the delay time measurement unit 130, and the silence determination unit 150, the transcription file generation unit 170, and the result output unit 190 may be controlled by software running in the device 10 for performing evaluation for sequential interpretation learning.

상기 장치(10)는 별도의 단말이거나 또는 단말의 일부 모듈일 수 있다. 또한, 상기 녹음부(110), 상기 지연시간 측정부(130), 상기 침묵 판단부(150), 상기 전사 파일 생성부(170) 및 상기 결과 출력부(190)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어 질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다.The device 10 may be a separate terminal or a part of a module of the terminal. In addition, the recording unit 110, the delay time measurement unit 130, the silence determination unit 150, the transcription file generation unit 170, and the result output unit 190 are formed as an integrated module or , may consist of one or more modules. However, on the contrary, each component may be composed of a separate module.

상기 장치(10)는 이동성을 갖거나 고정될 수 있다. 상기 장치(10)는, 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), 무선기기(wireless device), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다. The device 10 may be mobile or stationary. The apparatus 10 may be in the form of a server or engine, and may be a device, an apparatus, a terminal, a user equipment (UE), a mobile station (MS), or a wireless device. It can be called by other terms such as wireless device, handheld device, etc.

상기 장치(10)는 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.The device 10 may execute or manufacture various software based on an operating system (OS), that is, a system. The operating system is a system program for enabling software to use the hardware of the device, and is a mobile computer operating system such as Android OS, iOS, Windows mobile OS, Bada OS, Symbian OS, Blackberry OS, and Windows-based, Linux-based, Unix-based, It can include all computer operating systems such as MAC, AIX, and HP-UX.

상기 녹음부(110)는 원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환한다. 여기서 꼭지란, 통역하고자 하는 원문의 임의의 단위이며, 한 문장, 한 단락 또는 한 페이지, 몇 개의 문장 등 사용자(예를 들어, 교수)에 의해 정의될 수 있다.The recording unit 110 reproduces a tap to be translated from the original voice and then automatically converts to a recording mode for recording the interpretation. Here, a vertex is an arbitrary unit of an original text to be interpreted, and may be defined by a user (eg, a professor), such as one sentence, one paragraph or one page, or several sentences.

상기 녹음부(110)는 원문 음성의 슬라이싱을 통해 원문의 꼭지 단위로 자동 통역 녹음을 제공한다. 즉, 지정한 꼭지가 끝나면 자동으로 통역 녹음 모드로 들어가고 이때부터 통역시간이 카운트 된다.The recording unit 110 provides automatic interpretation recording in units of vertices of the original text through slicing of the original text voice. That is, when the designated tap is finished, the interpretation recording mode is automatically entered, and the interpretation time is counted from this time.

상기 지연시간 측정부(130)는 상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트한다.The delay time measuring unit 130 counts an interpretation start delay time from the start of the recording mode to the start of interpretation.

도 3을 참조하면, 원문 음성과 통역 음성 사이의 통역 시간의 간격을 통역개시지연 시간으로 정의하고, 이후의 공백 구간은 침묵 구간으로 인식하여 카운트한다.Referring to FIG. 3 , the interval of interpretation time between the original voice and the interpreted voice is defined as an interpretation start delay time, and the subsequent blank section is recognized as a silence section and counted.

도 4를 참조하면, 원문 음성을 꼭지별로 슬라이싱하고, 원문 음성을 공백 구간과 비공백 구간으로 분리한다. 파일에서 공백 구간이 설정한 임계치(Threshold) 보다 클 경우, 해당 공백 구간의 시작시간을 꼭지의 끝으로 간주한다. 또한, 해당 공백 구간의 종료시간을 다음 꼭지의 시작으로 간주하여 슬라이싱한다.Referring to FIG. 4 , the original speech is sliced by vertex, and the original speech is divided into a blank section and a non-blank section. If the blank section in the file is larger than the set threshold, the start time of the blank section is regarded as the end of the tip. In addition, the end time of the blank section is considered as the start of the next tap and sliced.

이를 통해, 통역 꼭지별 자동 녹음도 가능하며, 슬라이싱된 문장을 학습자에게 재생하고, 원문 음성의 종료와 동시에 통역 음성을 녹음할 수 있다. 또한, 통역개시지연을 시간을 측정할 수 있다. 꼭지별 통역 음성 파일의 첫번째 공백 구간의 길이를 통역개시지연 시간으로 계산한다.Through this, automatic recording for each interpretation tap is possible, the sliced sentence is played back to the learner, and the interpretation voice can be recorded simultaneously with the end of the original text voice. In addition, it is possible to measure the time of the interpretation start delay. The length of the first blank section of the interpretation voice file for each vertex is calculated as the interpretation start delay time.

상기 침묵 판단부(150)는 녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트한다. 도 6은 본 발명에서 파형 분석을 통해 공백 구간과 비공백 구간 분리하는 것을 설명하기 위한 도면이다. 도 6을 참조하면, 파형 분석을 통해 공백 구간과 비공백 구간을 분리한다.The silence determination unit 150 recognizes and counts a blank section after the interpretation start delay time as a silence section through waveform analysis of the recorded interpretation voice. 6 is a diagram for explaining the separation of a blank section and a non-blank section through waveform analysis in the present invention. Referring to FIG. 6, a blank section and a non-blank section are separated through waveform analysis.

상기 전사 파일 생성부(170)는 녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성한다.The transcription file generation unit 170 generates a transcription file in which the interpretation start delay, silence, and filler are displayed in the text for each interpreted vertex by applying a filler prediction model that classifies filler in the non-silent section of the recorded interpretation voice.

본 발명에서 제공하는 전사자료는 통역 음성 그대로 텍스트화된 것으로, 필러, 침묵구간 및 반복단어가 함께 텍스트로 가시화하여 표시된다. 이에 따라, 사용자는 통역개시지연 시간 및 침묵구간 등을 직관적으로 파악할 수 있다.Transcription data provided by the present invention is textualized as the voice of the interpreter, and is visualized and displayed as text together with fillers, silent sections, and repeated words. Accordingly, the user can intuitively grasp the interpretation start delay time and the silence period.

상기 전사 파일 생성부(170)는 필러 예측 모델을 통해 필러 단어를 검출하여 필러와 비필러로 구분하여 태깅하고, 필러를 종류별로 분류한다. 상기 전사 파일 생성부(170)에서 필러를 구분할 때, 녹음된 통역 음성의 비침묵 구간에 따라 다른 방식으로 판단될 수 있다.The transcription file generation unit 170 detects filler words through a filler prediction model, classifies them into filler words and non-fillers, tags them, and classifies filler words by type. When distinguishing fillers in the transcription file generation unit 170, it may be determined in a different way according to the non-silent section of the recorded interpretation voice.

상기 전사 파일 생성부(170)는 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧은 경우에는 바로 상기 필러 구분 예측 모델에 적용한다.When the non-silent section of the recorded interpretation voice is shorter than the preset length of the input layer, the transcription file generation unit 170 directly applies the filler discrimination prediction model.

반면, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이와 같거나 긴 경우에는 상기 인풋 레이어의 길이보다 짧아질 때까지 반복적으로 분할한다. 도 5는 본 발명에서 한 음절을 스플릿(split)하는 것을 설명하기 위한 도면이다.On the other hand, when the non-silent section of the recorded interpretation voice is equal to or longer than the length of the preset input layer, it is repeatedly divided until it is shorter than the length of the input layer. Figure 5 is a diagram for explaining the split (split) of one syllable in the present invention.

이때 반복적으로 분할할 때마다 기준이 되는 공백의 길이를 점점 더 짧게 조정하여 공백으로 판단되는 최소 길이가 더 짧게 조정한다. 반복이 멈춰지는 시점, 즉 음성의 길이가 인풋 레이어의 길이보다 짧아지는 시점에 필러와 비필러 예측 모델에 적용한다. 예측 모델의 결과에 따라 필러일 경우에는 필러의 종류를 예측하는 모델에 들어가 태깅되고, 비필러인 경우에는 비필러로 태깅된다.At this time, each time it is repeatedly divided, the length of the standard blank is adjusted to be shorter and the minimum length determined as blank is adjusted to be shorter. When repetition stops, that is, when the length of the voice becomes shorter than the length of the input layer, filler and non-filler prediction models are applied. According to the result of the predictive model, if it is a filler, it is entered into a model that predicts the type of filler and tagged, and if it is a non-filler, it is tagged as a non-filler.

도 7은 본 발명에서 필러 구분을 통해 필러와 그 외 단어로 분리하는 것을 설명하기 위한 도면이다. 도 7과 같이, Filler word detector를 통해 필러와 그 외 단어로 분리할 수 있다.7 is a diagram for explaining the separation into filler and other words through filler classification in the present invention. As shown in FIG. 7, it can be separated into filler words and other words through a filler word detector.

도 8은 본 발명에서 필러 분류를 통해 필러의 종류를 구별하는 것을 설명하기 위한 도면이다. 도 8과 같이, Filler word classifier를 통해 필러의 종류를 구별할 수 있다.8 is a view for explaining the classification of filler types through filler classification in the present invention. As shown in FIG. 8, the type of filler can be distinguished through the filler word classifier.

필러의 예로는, '어', '음', '그', 비필러 등 약 1100개의 음성 데이터가 있고, 데이터를 수집하여 직접 생성할 수 있다. 본 발명에서 예측 모델은 Feature extraction(예를 들어, librosa의 mfcc)를 사용하여 Keras 신경망 모델로 구현할 수 있다. As an example of filler, there are about 1100 voice data such as 'uh', 'um', 'he', and non-filler, and the data can be collected and generated directly. In the present invention, the prediction model can be implemented as a Keras neural network model using feature extraction (eg, librosa's mfcc).

결과적으로, 전사 파일은 통역 음성의 Array of json을 후처리하여 예를 들어 아래와 같이 출력될 수 있다.As a result, the transcription file may be output as follows, for example, by post-processing the Array of json of the interpretation voice.

1) 공백(1_ _ _)이면 공백 길이를 '??n초??' 출력1) If it is blank (1_ _ _), set the blank length to '??n seconds??' Print

2) 필러(01_ _)이면 해당 필러로 텍스트화 2) If it is a filler (01_ _), text with that filler

3) 그 외 단어(00_ _)이면 Google STT API로 텍스트화 3) Other words (00_ _) are converted into text using Google STT API

본 발명에서 한 꼭지의 통역이 끝나면 다음 원문이 재생되도록 설정할 수 있고, 상기 과정들이 반복된다. 원하는 부분의 통역이 모두 끝나면 통역이 꼭지 별로 파일로 만들어지고 통역개시지연, 침묵, 필러가 표시된 텍스트가 출력된다.In the present invention, after the interpretation of one tap is finished, the next original text can be set to be reproduced, and the above process is repeated. When all interpretation of the desired part is completed, the interpretation is created as a file for each vertex, and the text with the interpretation start delay, silence, and filler displayed is output.

상기 결과 출력부(190)는 상기 전사 파일을 통계 처리하여 결과를 도출한다. 예를 들어, 통역개시지연, 침묵 및 필러의 카운트 결과(예를 들어, 초단위)가 통계 처리되어 제시될 수 있고, 텍스트화되어 출력될 수 있다.The result output unit 190 derives a result by statistically processing the transcription file. For example, interpretation start delay, silence, and filler count results (eg, in seconds) may be statistically processed and presented, and may be output as text.

도 9는 본 발명의 성능을 평가한 시뮬레이션 결과를 보여주는 그래프이다.9 is a graph showing simulation results evaluating the performance of the present invention.

도 9를 참조하면, 본 발명에 따른 학습 결과 각 필러의 판별이 가능하고, 약 90%의 정확도를 보였다.Referring to FIG. 9, as a result of learning according to the present invention, it is possible to discriminate each filler, showing about 90% accuracy.

본 발명은 침묵구간의 위치 및 길이, 필러를 텍스트에 표시해주는 새로운 전사 시스템을 제공하고, 통역개시지연 시간 파악 및 가시화가 가능하다. 또한, 전사 파일 채점과 자가평가를 위한 통계분석 자료를 제공하여 교수자의 객관적 판단과 학습자의 자가평가를 돕는다.The present invention provides a new transcription system that displays the location and length of a silent section and filler in text, and it is possible to grasp and visualize the interpretation start delay time. In addition, by providing statistical analysis data for transcription file scoring and self-assessment, it helps instructors' objective judgment and learners' self-assessment.

도 10은 본 발명의 일 실시예에 따른 순차통역 학습을 위한 평가 방법의 흐름도이다.10 is a flowchart of an evaluation method for sequential interpretation learning according to an embodiment of the present invention.

본 실시예에 따른 순차통역 학습을 위한 평가 방법은, 도 1의 장치(10)와 실질적으로 동일한 구성에서 진행될 수 있다. 따라서, 도 1의 장치(10)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다. The evaluation method for sequential interpretation learning according to the present embodiment may be performed in substantially the same configuration as the apparatus 10 of FIG. 1 . Accordingly, components identical to those of the apparatus 10 of FIG. 1 are given the same reference numerals, and repeated descriptions are omitted.

또한, 본 실시예에 따른 순차통역 학습을 위한 평가 방법은 순차통역 학습을 위한 평가를 수행하기 위한 소프트웨어(애플리케이션)에 의해 실행될 수 있다.In addition, the evaluation method for sequential interpretation learning according to the present embodiment may be executed by software (application) for performing evaluation for sequential interpretation learning.

본 발명에 따른 순차통역 학습을 위한 평가 방법은 원문 음성의 종료와 동시에 통역 음성을 자동으로 녹음하고, 꼭지별 통역 음성 파일의 통역개시지연을 계산하고, 통역 음성 파일의 전사 파일을 생성하여 제공하는 통역 학습을 위한 스마트러닝 기술이다.The evaluation method for sequential interpretation learning according to the present invention automatically records the interpretation voice at the same time as the original voice is finished, calculates the interpretation start delay of the interpretation voice file for each vertex, and generates and provides a transcription file of the interpretation voice file. It is a smart learning technology for learning interpretation.

도 10을 참조하면, 본 실시예에 따른 순차통역 학습을 위한 평가 방법은, 원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환한다(단계 S10). Referring to FIG. 10 , in the evaluation method for sequential interpretation learning according to the present embodiment, after reproducing a tap to be translated from the original voice, it is converted to a recording mode in which the interpretation is automatically recorded (step S10).

여기서 꼭지란, 통역하고자 하는 원문의 임의의 단위이며, 한 문장, 한 단락 또는 한 페이지, 몇 개의 문장 등 사용자(예를 들어, 교수)에 의해 정의될 수 있다. 본 발명은 원문 음성의 슬라이싱을 통해 원문의 꼭지 단위로 자동 통역 녹음을 제공한다. 즉, 지정한 꼭지가 끝나면 자동으로 통역 녹음 모드로 들어가고 이때부터 통역시간이 카운트 된다.Here, a vertex is an arbitrary unit of an original text to be interpreted, and may be defined by a user (eg, a professor), such as one sentence, one paragraph or one page, or several sentences. The present invention provides automatic interpretation recording in units of vertices of the original text through slicing of the original text. That is, when the designated tap is finished, the interpretation recording mode is automatically entered, and the interpretation time is counted from this time.

상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트한다(단계 S20). 이후의 공백 구간은 침묵 구간으로 인식하여 카운트할 수 있다.Interpretation start delay time from the start of the recording mode to the start of interpretation is counted (step S20). The subsequent blank section may be recognized as a silent section and counted.

녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트한다(단계 S30).Through waveform analysis of the recorded interpretation voice, a blank section after the interpretation start delay time is recognized as a silent section and counted (step S30).

이때, 녹음된 통역 음성의 전처리 하는 단계를 더 포함할 수 있다. 전처리는 노이즈 제거 및 MFCC(Mel Frequency Cepstral Coefficient)을 통해 음성의 특징을 추출할 수 있다.In this case, a step of pre-processing the recorded interpretation voice may be further included. Pre-processing may extract speech features through noise removal and Mel Frequency Cepstral Coefficient (MFCC).

녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성한다(단계 S40).A transcription file in which the interpretation start delay, silence, and filler are displayed in the text is generated for each interpreted phrase by applying a filler prediction model that distinguishes filler in the non-silent section of the recorded interpretation voice (step S40).

상기 전사 파일을 생성하는 단계(단계 S40)는, 상기 필러 예측 모델을 통해 필러 단어를 검출하여 필러와 비필러로 구분하여 태깅하고, 필러를 종류별로 분류할 수 있다.In the step of generating the transcription file (step S40), filler words may be detected through the filler prediction model, classified into filler and non-filler, and tagged, and filler may be classified by type.

상기 필러와 비필러로 구분하여 태깅하는 단계는, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧은 경우에는 바로 상기 필러 구분 예측 모델에 적용한다.In the tagging step of classifying into filler and non-filler, when the non-silent section of the recorded interpretation voice is shorter than the preset input layer length, it is directly applied to the filler classification prediction model.

반면, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이와 같거나 긴 경우에는 상기 인풋 레이어의 길이보다 짧아질 때까지 반복적으로 분할한 후, 녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧아지는 시점에 상기 필러 구분 예측 모델에 적용할 수 있다.On the other hand, if the non-silent section of the recorded interpretation voice is equal to or longer than the length of the preset input layer, it is repeatedly divided until it is shorter than the length of the input layer, and then the non-silence section of the recorded interpretation voice is set in advance. When the length of the input layer becomes shorter, it can be applied to the filler classification prediction model.

상기 전사 파일을 생성하는 단계(단계 S40)는, 후처리를 통해 공백의 경우 공백 길이를 n초로 출력하고, 필러의 경우 해당 필러로 텍스트화 및 통계 처리하여 출력할 수 있다. 또한, 필러 외 단어의 경우 지정된 데이터베이스의 API로 텍스트화하여 출력할 수 있다.In the step of generating the transcription file (step S40), in the case of blanks, the length of the blanks is output in n seconds through post-processing, and in the case of fillers, textualization and statistical processing with the corresponding fillers can be performed and output. In addition, words other than filler can be textualized and output through the API of the designated database.

본 발명에서 제공하는 전사 파일은 통역 음성 그대로 텍스트화된 것으로, 필러, 침묵구간 및 반복단어가 함께 텍스트로 가시화하여 표시된다. 이에 따라, 사용자는 통역개시지연 시간 및 침묵구간 등을 직관적으로 파악할 수 있다.The transcription file provided in the present invention is textualized as the interpretation voice is, and the filler, silent section, and repeated word are visualized and displayed as text. Accordingly, the user can intuitively grasp the interpretation start delay time and the silence period.

상기 전사 파일을 통계 처리하여 결과를 도출한다(단계 S50). 예를 들어, 통역개시지연, 침묵 및 필러의 카운트 결과(예를 들어, 초단위)가 통계 처리되어 제시될 수 있고, 텍스트화되어 출력될 수 있다.The transcription file is statistically processed to derive a result (step S50). For example, interpretation start delay, silence, and filler count results (eg, in seconds) may be statistically processed and presented, and may be output as text.

원문 텍스트에서 통역할 꼭지별로 상기 단계들을 반복 수행하여 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성할 수 있다.By repeating the above steps for each vertex to be interpreted in the original text, a transcription file in which the interpretation start delay, silence, and filler are displayed in the text can be generated for each vertex.

이에 따라, 본 발명은 학습자가 전사 파일을 제공하기 위해 반복해서 통역 음성파일을 들을 필요 없이 바로 전사 파일을 만들어준다. 따라서, 교수자와 학습자가 통역 음성에 대한 전사 자료를 만드는데 소요되는 시간 및 노력이 현저히 줄어든다.Accordingly, the present invention creates a transcription file directly without the need for the learner to repeatedly listen to the interpretation voice file to provide the transcription file. Accordingly, the time and effort required for instructors and learners to create transcripts of interpretation voices is significantly reduced.

이와 같은, 순차통역 학습을 위한 평가 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. Such an evaluation method for sequential interpretation learning may be implemented as an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. Program instructions recorded on the computer-readable recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes such as those produced by a compiler. The hardware device may be configured to act as one or more software modules to perform processing according to the present invention and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to embodiments, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. You will understand.

최근 글로벌 시대에 걸맞게 국제 회의 등 모든 분야에 걸쳐 통역 현장은 더욱 다양화되고 있다. 또한, AI 시장이 확장됨에 따라 기계 통번역 수요를 피할 순 없지만, 구어의 경우 문장구조가 완전치 않아 AI 통역에는 여전히 한계가 존재한다. 따라서, 통역개시지연 시간을 직관적으로 파악할 수 있고, 평가 피드백에 시간이 적게 드는 본 발명은 4차 산업혁명 시대의 기술로 유용하게 활용 가능하다.In line with the recent global era, interpretation sites are becoming more diversified across all fields, including international conferences. In addition, as the AI market expands, demand for machine translation cannot be avoided, but in the case of spoken language, the sentence structure is not perfect, so AI interpretation still has limitations. Therefore, the present invention, which can intuitively grasp the interpretation start delay time and requires less time for evaluation feedback, can be usefully used as a technology in the era of the 4th industrial revolution.

1: 순차통역 학습을 위한 스마트러닝 시스템
11: 전처리 모듈
13: 모델 모듈
15: 데이터베이스
10: 순차통역 학습을 위한 평가 장치
110: 녹음부
130: 지연시간 측정부
150: 침묵 판단부
170: 전사 파일 생성부
190: 결과 출력부1: Smart Learning System for Learning Consecutive Interpretation
11: preprocessing module
13: Model module
15: database
10: Evaluation device for sequential interpretation learning
110: recording unit
130: delay time measuring unit
150: silence judgment unit
170: transcription file generation unit
190: result output unit

Claims

순차통역 학습을 위한 평가 장치에서의 순차통역 학습을 위한 평가 방법에 있어서,
녹음부에서 원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환하는 단계;
지연시간 측정부에서 상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트하는 단계;
침묵 판단부에서 녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트하는 단계;
전사 파일 생성부에서 녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성하는 단계; 및
결과 출력부에서 상기 전사 파일을 통계 처리하여 결과를 도출하는 단계;를 포함하되,
상기 전사 파일을 생성하는 단계는,
상기 필러 예측 모델을 통해 필러 단어를 검출하여 필러와 비필러로 구분하여 태깅하는 단계; 및
필러를 종류별로 분류하는 단계;를 포함하고,
상기 필러와 비필러로 구분하여 태깅하는 단계는,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이와 같거나 긴 경우에는 상기 인풋 레이어의 길이보다 짧아질 때까지 반복적으로 분할하여,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧아지는 시점에 상기 필러 예측 모델에 적용하는 것이고,
상기 반복적으로 분할하는 것은, 분할할 때마다 공백으로 판단되는 최소 길이를 점점 더 짧게 조정하여 반복적으로 분할하는 것이고,
상기 필러 예측 모델에 적용하는 것은, 상기 반복적으로 분할하는 것이 멈춰지는 시점에 상기 필러 예측 모델에 적용하는 것인, 순차통역 학습을 위한 평가 방법.
In the evaluation method for sequential interpretation learning in an evaluation device for sequential interpretation learning,
converting a recording unit into a recording mode in which the interpretation is automatically recorded after reproducing a tap to be interpreted from the original voice;
counting an interpretation start delay time from the start of the recording mode to the start of interpretation in a delay time measuring unit;
Recognizing and counting a blank section after the interpretation start delay time as a silence section through waveform analysis of the recorded interpretation voice in the silence determination unit;
Generating a transcription file in which the interpretation start delay, silence, and filler are displayed in text for each interpreted vertex by applying a filler prediction model that distinguishes filler from a non-silent section of the interpretation voice recorded by the transcription file generator; and
Deriving a result by statistically processing the transcription file in a result output unit; including,
The step of generating the transcription file,
detecting filler words through the filler prediction model and classifying them into filler words and non-filler words and tagging them; and
Classifying the filler by type; Including,
The step of tagging by dividing into the filler and the non-filler,
When the non-silent section of the recorded interpretation voice is equal to or longer than the length of the preset input layer, it is repeatedly divided until it is shorter than the length of the input layer,
It is applied to the filler prediction model at a time when the non-silent period of the recorded interpretation voice becomes shorter than the length of the preset input layer,
The repetitive division is to repeatedly divide by adjusting the minimum length determined to be blank each time it is divided to become shorter and shorter,
The evaluation method for sequential interpretation learning, wherein the applying to the filler prediction model is applied to the filler prediction model at a point in time when the iterative segmentation stops.

제1항에 있어서, 상기 전사 파일을 생성하는 단계는,
상기 필러 예측 모델을 통해 필러 단어를 검출하여 필러와 비필러로 구분하여 태깅하는 단계; 및
필러를 종류별로 분류하는 단계;를 포함하는, 순차통역 학습을 위한 평가 방법.
The method of claim 1, wherein generating the transcription file comprises:
detecting filler words through the filler prediction model and classifying them into filler words and non-filler words and tagging them; and
An evaluation method for sequential interpretation learning, including classifying fillers by type.

제2항에 있어서, 상기 필러와 비필러로 구분하여 태깅하는 단계는,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧은 경우에는 바로 상기 필러 예측 모델에 적용하는, 순차통역 학습을 위한 평가 방법.
The method of claim 2, wherein the tagging by dividing into filler and non-filler,
An evaluation method for sequential interpretation learning, in which the filler prediction model is applied immediately when the non-silent section of the recorded interpretation voice is shorter than the length of the preset input layer.

제2항에 있어서, 상기 필러와 비필러로 구분하여 태깅하는 단계는,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이와 같거나 긴 경우에는 상기 인풋 레이어의 길이보다 짧아질 때까지 반복적으로 분할하는, 순차통역 학습을 위한 평가 방법.
The method of claim 2, wherein the tagging by dividing into filler and non-filler,
An evaluation method for sequential interpretation learning, in which a non-silent section of a recorded interpretation voice is repeatedly divided until it is shorter than the length of a preset input layer when it is equal to or longer than the length of the preset input layer.

제4항에 있어서, 상기 필러와 비필러로 구분하여 태깅하는 단계는,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧아지는 시점에 상기 필러 구분 예측 모델에 적용하는, 순차통역 학습을 위한 평가 방법.
The method of claim 4, wherein the tagging by dividing into filler and non-filler,
An evaluation method for learning sequential interpretation, which is applied to the filler classification prediction model when a non-silent period of a recorded interpretation voice becomes shorter than a preset length of an input layer.

제1항에 있어서, 상기 전사 파일을 생성하는 단계는,
공백의 경우 공백 길이를 n초로 출력하는 단계;
필러의 경우 해당 필러로 텍스트화 및 통계 처리하여 출력하는 단계; 및
필러 외 단어의 경우 지정된 데이터베이스의 API로 텍스트화하여 출력하는 단계;를 포함하는, 순차통역 학습을 위한 평가 방법.
The method of claim 1, wherein generating the transcription file comprises:
In case of blank, outputting the blank length in n seconds;
In the case of a filler, textualizing and statistical processing with the corresponding filler and outputting it; and
In the case of words other than filler, the step of textualizing and outputting to the API of the designated database; including, evaluation method for sequential interpretation learning.

제1항에 있어서,
원문 텍스트에서 통역할 꼭지별로 상기 단계들을 반복하는, 순차통역 학습을 위한 평가 방법.
According to claim 1,
An evaluation method for sequential interpretation learning in which the above steps are repeated for each vertex to be interpreted in the original text.

제1항에 있어서,
녹음된 통역 음성의 노이즈 제거를 통해 전처리 하는 단계;를 더 포함하는, 순차통역 학습을 위한 평가 방법.
According to claim 1,
An evaluation method for sequential interpretation learning, further comprising: pre-processing the recorded interpretation voice by removing noise.

제1항에 따른 상기 순차통역 학습을 위한 평가 방법을 수행하기 위한 컴퓨터 프로그램이 기록된 컴퓨터로 판독 가능한 저장 매체.
A computer-readable storage medium on which a computer program for performing the evaluation method for sequential interpretation learning according to claim 1 is recorded.

원문 음성에서 통역할 꼭지를 재생한 후 자동으로 통역을 녹음하는 녹음 모드로 변환하는 녹음부;
상기 녹음 모드가 시작된 때부터 통역이 시작되는 때까지의 통역개시지연 시간을 카운트하는 지연시간 측정부;
녹음된 통역 음성의 파형 분석을 통해 통역개시지연 시간 이후의 공백 구간을 침묵 구간으로 인식하여 카운트하는 침묵 판단부;
녹음된 통역 음성의 비침묵 구간에서 필러를 구분하는 필러 예측 모델을 적용하여 통역된 꼭지별로 통역개시지연, 침묵 및 필러가 텍스트에 표시된 전사 파일을 생성하는 전사 파일 생성부; 및
상기 전사 파일을 통계 처리하여 결과를 도출하는 결과 출력부;를 포함하되,
상기 전사 파일 생성부는,
상기 필러 예측 모델을 통해 필러 단어를 검출하여 필러와 비필러로 구분하여 태깅하고, 필러를 종류별로 분류하고,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이와 같거나 긴 경우에는 상기 인풋 레이어의 길이보다 짧아질 때까지 반복적으로 분할하여,
녹음된 통역 음성의 비침묵 구간이 미리 설정된 인풋 레이어의 길이보다 짧아지는 시점에 상기 필러 예측 모델에 적용하는 것이고,
상기 반복적으로 분할하는 것은, 분할할 때마다 공백으로 판단되는 최소 길이를 점점 더 짧게 조정하여 반복적으로 분할하는 것이고,
상기 필러 예측 모델에 적용하는 것은, 상기 반복적으로 분할하는 것이 멈춰지는 시점에 상기 필러 예측 모델에 적용하는 것인, 순차통역 학습을 위한 평가 장치.
a recording unit that converts to a recording mode that automatically records the interpretation after reproducing a tap to be translated from the original voice;
a delay time measurement unit counting an interpretation start delay time from the start of the recording mode to the start of interpretation;
a silence determination unit for recognizing and counting a blank section after the interpretation start delay time as a silence section through waveform analysis of the recorded interpretation voice;
a transcription file generation unit for generating a transcription file in which the interpretation start delay, silence, and filler are displayed in text for each interpreted phrase by applying a filler prediction model that distinguishes filler in a non-silent section of the recorded interpretation voice; and
A result output unit for statistically processing the transcription file to derive a result; including,
The transcription file generation unit,
Through the filler prediction model, filler words are detected, classified into filler and non-filler, and tagged, filler is classified by type,
When the non-silent section of the recorded interpretation voice is equal to or longer than the length of the preset input layer, it is repeatedly divided until it is shorter than the length of the input layer,
It is applied to the filler prediction model when the non-silent period of the recorded interpretation voice becomes shorter than the length of the preset input layer,
The repetitive division is to repeatedly divide by adjusting the minimum length determined to be blank each time it is divided to become shorter and shorter,
The evaluation apparatus for sequential interpretation learning, wherein the application to the filler prediction model is applied to the filler prediction model at a point in time when the iterative segmentation stops.