KR20190030307A

KR20190030307A - Voice recognition apparatus, and control method thereof

Info

Publication number: KR20190030307A
Application number: KR1020170117577A
Authority: KR
Inventors: 이수화; 김만수; 훈 허
Original assignee: (주) 엠티콤
Priority date: 2017-09-14
Filing date: 2017-09-14
Publication date: 2019-03-22
Also published as: KR102004187B1

Abstract

The present invention relates to a voice recognizing apparatus and an operation method thereof which reduce ambiguity of words and voices in lecture voices by performing a learning process for text (examples: words, sentences, or a chain of words or sentences), extracted from a lecture textbook, and transfer content, written in text by a leaner listening the lecture voices, as training data so as to increase voice recognition accuracy and enable reinforcement learning of a leaner listening to lecture voices.

Description

음성인식처리장치 및 그 동작 방법{VOICE RECOGNITION APPARATUS, AND CONTROL METHOD THEREOF}BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a voice recognition processing apparatus,

본 발명은 강의 교재(학습 교재)로부터 추출되는 텍스트(예: 단어, 문장, 단어 또는 문장의 연쇄)와, 학습자가 강의 음성을 청취하여 텍스트로 작성한 전사내용을 훈련 데이터로서 학습 처리함으로써, 강의 음성에 대한 인식 정확도를 제고할 수 있으며, 강의 음성을 청취하는 학습자의 강화 학습을 가능케 하는 방안에 관한 것이다.The present invention is characterized in that a text (e.g., a word, a sentence, a word or a sentence chain) extracted from a teaching material of a lecture (a learning teaching material) and a learner's lecture- The present invention relates to a method for enabling reinforcement learning of a learner listening to lecture voice.

통상적으로 음성 인식에서는 사운드 스트림으로부터 대략 20ms 길이의 청크(Chunk)를 50% 정도 이동시키는 중복윈도우의 이동을 통해tj 대략 한번에 13개의 청크(Chunk)를 잘라낸 후, 각각의 청크를 패스트 퓨리에 변환(FFT)를 통해 파워스펙트럼의 분포 매트릭스로 변환시키게 된다.Generally, in speech recognition, 13 chunks are cut out at approximately tj approximately one time through movement of a redundant window that moves a chunk of about 20 ms in length from the sound stream by 50%, and then each chunk is subjected to Fast Fourier Transform ) Into a distribution matrix of the power spectrum.

이후, 각 청크의 흐름은 n-gram을 통한 단어의 연결모형(언어모형)과 음향의 연결모형(음향모형)을 학습시킨 훈련모형으로 생성되며, 생성된 훈련모형에 임의의 입력 사운드 스트림을 입력하였을 때, 분포의 추정을 학습된 언어모형과 음향모형의 가이드를 받아 가장 추정 값이 높은 상태의 흐름을 추론할 수 있다.Then, the flow of each chunk is generated by a training model that learns a connection model (language model) of words and an acoustic connection model (acoustic model) through n-gram, and inputs an arbitrary input sound stream to the generated training model , We can infer the flow with the highest estimated value by taking the guidance of the learned language model and the acoustic model.

이처럼, 학습된 언어모형과 음향모형의 가이드를 받아 가장 추정 값이 높은 상태의 흐름을 추론하는 것을 음성 인식 기술의 특성으로 볼 수 있다.In this way, it can be seen as a characteristic of the speech recognition technology that the flow of the state with the highest estimated value is inferred by receiving the guide of the learned language model and the acoustic model.

헌데, 기존 음성 인식 기술에서는 모수 추정의 원칙상, 다양한 곳에서 표본을 추출하여 말뭉치, 음향뭉치를 형성하여 언어모형과 음향모형을 학습시켜야만 했으며, 그럼에도 불구하고 실질적인 음성 인식의 대상이 되는 화자가 변경되는 경우에 단어와 음성의 모호성(Ambiguity)이 커질 수 있다는 한계점이 존재한다.However, in the existing speech recognition technology, the principle of parameter estimation was to extract a sample from various places to form a corpus and a sound pack to learn a language model and an acoustic model. Nevertheless, There is a limitation that the ambiguity of words and voices can be increased.

본 발명은 상기한 사정을 감안하여 창출된 것으로서, 본 발명에서 도달하고자 하는 목적은, 강의 교재로부터 추출되는 텍스트(예: 단어, 문장, 단어 또는 문장의 연쇄)와, 학습자가 강의 음성을 청취하여 텍스트로 작성한 전사내용을 훈련 데이터로서 학습 처리하는 방식을 통해서 강의 음성 내 단어와 음성의 모호성(Ambiguity)을 줄여 음성 인식 정확도를 제고할 수 있으며, 강의 음성을 청취하는 학습자의 강화 학습을 가능케 하는데 있다.SUMMARY OF THE INVENTION The object of the present invention is to provide a speech recognition apparatus and a speech recognition method in which a text (e.g., a word, a sentence, a word or a sentence chain) extracted from a textbook of a lecture, The method of training the text of the transcript as textual training data can improve the accuracy of speech recognition by reducing the ambiguity of words and voices in lecture voices and enables reinforcement learning of learners listening to lecture voices .

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 음성인식처리장치는, 강의 교재로부터 추출되는 텍스트를 훈련 데이터로 설정하여 음성인식엔진 내 언어모형에 대한 학습을 처리하는 학습부; 및 상기 강의 교재에 근거한 강의 음성에 대해서, 상기 음성인식엔진을 기반으로 한 음성 인식을 처리하여, 상기 강의 음성을 텍스트로 변환한 초벌전사내용을 상기 음성 인식 처리 결과로서 학습자에게 제공하는 처리부를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a speech recognition apparatus comprising: a learning unit configured to process a language model in a speech recognition engine by setting text extracted from a textbook as training data; And a processing unit for processing speech recognition based on the speech recognition engine with respect to speech of the lecture based on the teaching material of the lecture and providing the learner with the result of the speech recognition processing that the lecture speech is converted into text, .

보다 구체적으로, 상기 학습부는, 상기 강의 음성을 청취한 상기 학습자에 의해 상기 초벌전사내용 내 텍스트가 변경된 학습자전사내용이 확인되는 경우, 상기 학습자전사내용으로부터 추출되는 텍스트를 훈련 데이터로 설정하여 상기 음성인식엔진 내 음향모형에 대한 학습을 처리하는 것을 특징으로 한다.More specifically, the learning unit sets text extracted from the learner transcription content as training data when the learner transfiguration content in which the text in the primary transcription content is changed is confirmed by the learner who listened to the lecture voice, And the learning of the acoustic model in the recognition engine is processed.

보다 구체적으로, 상기 음향모형은, 상기 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 각각 학습 처리되며, 상기 학습부는, 상기 강의 차시 별로 확인되는 상기 학습자전사내용 내 텍스트를 각각의 후속 강의 차시들에 대한 훈련 데이터로 각각 설정하여 음향모형에 대한 학습을 처리하는 것을 특징으로 한다.More specifically, the acoustic model is subjected to a learning process for each of consecutive lectures classified in relation to the lecture text material, and the learning unit extracts texts of the learner transcript content, which is checked for each lecture, And the training data for the acoustic model is set.

보다 구체적으로, 상기 음성인식처리장치는, 상기 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 상기 학습자의 이해도를 측정하는 측정부를 더 포함하며, 상기 이해도는, 상기 초벌전사내용에서 텍스트 또는 텍스트의 조합으로 지정되는 이해도측정항목이, 상기 학습자전사내용에서 변경되었는지 여부를 기초로 판별되는 것을 특징으로 한다.More specifically, the speech recognition processing apparatus further includes a measurement unit for measuring the learner's comprehension degree for each lecture of consecutive lectures divided in relation to the lecture text material, And a determination is made based on whether or not the understanding measurement item designated by the combination is changed in the content of the learner transfer.

보다 구체적으로, 상기 이해도측정항목은, 상기 음성인식엔진에서 판별되는 음성 인식 정확도가 임계치 이상인 텍스트 또는 텍스트의 조합을 포함하며, 상기 초벌전사내용 내에서 상기 음성인식엔진의 음성 인식 처리 결과와는 다른 별도의 텍스트 또는 텍스트의 조합으로 변경되어 지정되는 것을 특징으로 한다.More specifically, the understanding degree measurement item includes a text or a combination of texts whose speech recognition accuracy determined by the speech recognition engine is equal to or higher than a threshold value, and is different from the speech recognition processing result of the speech recognition engine Or a combination of texts or texts.

보다 구체적으로, 상기 음성인식처리장치는, 상기 학습자의 이해도 측정 결과를 기초로 학습상태 별 부가학습자료를 제공하는 제공부를 더 포함하며, 상기 부가학습자료는, 외부지식데이터베이스(LOD, Linked Open Database), 및 개방형 API를 통해 조회 가능한 검색포털 중 적어도 하나로부터 획득되는 정보인 것을 특징으로 한다.More specifically, the speech recognition processing apparatus may further include a providing unit for providing additional learning data for each learning state based on the learner's degree of understanding measurement result, and the additional learning data may include an external knowledge database (LOD) Database), and a search portal searchable through an open API.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 음성인식처리장치의 동작 방법은, 강의 교재로부터 추출되는 텍스트를 훈련 데이터로 설정하여 음성인식엔진 내 언어모형에 대한 학습을 처리하는 언어학습단계; 및 상기 강의 교재에 근거한 강의 음성에 대해서, 상기 음성인식엔진을 기반으로 한 음성 인식을 처리하여, 상기 강의 음성을 텍스트로 변환한 초벌전사내용을 상기 음성 인식 처리 결과로서 학습자에게 제공하는 처리단계를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a method of operating a speech recognition processor, the method comprising: learning a language model in a speech recognition engine by setting text extracted from a textbook as training data; ; And a processing step of processing the speech recognition based on the speech recognition engine with respect to speech of the lecture based on the teaching material of the lecture and providing the learner with the result of the speech recognition processing in which the lecture speech is converted into text, .

보다 구체적으로, 상기 방법은, 상기 강의 음성을 청취한 상기 학습자에 의해 상기 초벌전사내용 내 텍스트가 변경된 학습자전사내용이 확인되는 경우, 상기 학습자전사내용으로부터 추출되는 텍스트를 훈련 데이터로 설정하여 상기 음성인식엔진 내 음향모형에 대한 학습을 처리하는 음향학습단계를 더 포함하는 것을 특징으로 한다.More specifically, the method further comprises setting, as training data, a text extracted from the content of the learner transcription, when the learner transcription content whose text is changed by the learner who listened to the lecture voice is identified, And an acoustic learning step of learning the acoustic models in the recognition engine.

보다 구체적으로, 상기 음향모형은, 상기 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 각각 학습 처리되며, 상기 음향학습단계는, 상기 강의 교재와 관련하여 상기 강의 차시 별로 확인되는 상기 학습자전사내용 내 텍스트를 각각의 후속 강의 차시들에 대한 훈련 데이터로 각각 설정하여 상기 음향모형에 대한 학습을 처리하는 것을 특징으로 한다.More specifically, the acoustic model is subjected to learning processing for each of consecutive lectures classified in relation to the teaching material of the lecture, and the acoustic learning step includes learning of the lecture contents of the lecture, And the learning process for the acoustic model is performed by setting the text as training data for the respective subsequent courses of the lecture.

보다 구체적으로, 상기 방법은, 상기 초벌전사내용에서 텍스트 또는 텍스트의 조합으로 지정되는 이해도측정항목이, 상기 학습자전사내용에서 변경되었는지 여부를 기초로 상기 학습자의 이해도를 측정하는 측정단계를 더 포함하는 것을 특징으로 한다.More specifically, the method may further comprise a measurement step of measuring the comprehension degree of the learner based on whether or not the comprehension degree item designated by the combination of text or text in the contents of the coarse transfer has changed in the content of the learner transcription .

보다 구체적으로, 상기 이해도측정항목은, 상기 음성인식엔진에서 판별되는 음성 인식 정확도가 임계치 이상인 텍스트 또는 텍스트의 조합이며, 상기 초벌전사내용 내에서 상기 음성인식엔진의 음성 인식 처리 결과와는 다른 텍스트 또는 텍스트의 조합으로 변경되어 지정되는 것을 특징으로 한다.More specifically, the understanding degree measurement item is a text or a combination of texts whose speech recognition accuracy determined by the speech recognition engine is equal to or higher than a threshold value, and which is different from the speech recognition processing result of the speech recognition engine Text is changed and designated as a combination of texts.

보다 구체적으로, 상기 방법은, 상기 학습자의 이해도 측정 결과를 기초로 외부지식데이터베이스(LOD, Linked Open Database), 및 개방형 API를 통해 조회 가능한 검색포털 중 적어도 하나로부터 획득되는 학습상태 별 부가학습자료를 제공하는 제공단계를 더 포함하는 것을 특징으로 한다.More specifically, the method may further comprise the step of, based on the learner's measurement of the degree of comprehension, the supplementary learning material by learning status obtained from at least one of an LOD (Linked Open Database) and a search portal searchable through an open API And a providing step of providing the information.

이에, 본 발명에 따른 음성인식처리장치 및 그 동작 방법에 의하면, 강의 교재로부터 추출되는 텍스트(예: 단어, 문장, 단어 또는 문장의 연쇄)와, 학습자가 강의 음성을 청취하여 텍스트로 작성한 전사내용을 훈련 데이터로서 학습 처리하는 방식을 통해서 강의 음성 내 단어와 음성의 모호성(Ambiguity)을 줄여 음성 인식 정확도를 제고할 수 있으며, 강의 음성을 청취하는 학습자의 강화 학습을 가능케 할 수 있다.Thus, according to the speech recognition processing apparatus and the operation method thereof according to the present invention, the text (e.g., a word, a sentence, a word or a sentence chain) extracted from a teaching material of a lecture, The learning accuracy of speech recognition can be improved by reducing the ambiguity of words and voices in lecture voices and the reinforcement learning of learners listening to lecture voices can be enabled.

도 1은 본 발명의 일 실시예에 따른 음성 인식 처리 시스템의 개략적인 구성도.
도 2는 본 발명의 일 실시예에 따른 음성인식처리장치의 개략적인 구성도.
도 3 내지 도 5는 본 발명의 일 실시예에 따른 음성브리지장치에서의 동작 흐름을 설명하기 위한 순서도.1 is a schematic configuration diagram of a speech recognition processing system according to an embodiment of the present invention;
2 is a schematic configuration diagram of a speech recognition processing apparatus according to an embodiment of the present invention;
3 to 5 are flowcharts for explaining an operational flow in a voice bridge apparatus according to an embodiment of the present invention;

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예에 대하여 설명한다.Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 음성 인식 처리 시스템을 도시한 도면이다.1 is a diagram illustrating a speech recognition processing system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 시스템은, 강사단말(10), 음성인식처리장치(20), 및 학습자단말(3)을 포함하는 구성을 갖는다.1, the speech recognition processing system according to an embodiment of the present invention has a configuration including a lecturer terminal 10, a speech recognition processing device 20, and a learner terminal 3. [

강사단말(10)은 애플리케이션 실행 환경 또는 웹 브라우저를 기반으로 음성인식처리장치(20) 상에 강의 교재(학습 교재)와 이러한 강의 교재에 근거한 강의 음성을 등록하는 장치를 일컫는다.The instructor terminal 10 refers to a device for registering a lecture material (learning material) and a lecture voice based on the lecture text material on the speech recognition processing device 20 based on an application execution environment or a web browser.

이러한, 강사단말(10)은 예컨대, 스마트 폰(Smart Phone), 휴대 단말기(Portable Terminal), 이동 단말기(Mobile Terminal), 개인 정보 단말기(Personal Digital Assistant: PDA), PMP(Portable Multimedia Player) 단말기, 텔레매틱스(Telematics) 단말기, 내비게이션(Navigation) 단말기, 개인용 컴퓨터(Personal Computer), 노트북 컴퓨터, 슬레이트 PC(Slate PC), 태블릿 PC(Tablet PC), 울트라북(ultrabook), 웨어러블 디바이스(Wearable Device), 와이브로(Wibro) 단말기, 플렉서블 단말기(Flexible Terminal) 등이 해당될 수 있으며, 이에 제한되는 것이 애플리케이션의 설치 또는 웹 브라우저의 구동이 가능한 장치는 모두 포함될 수 있다.The instructor terminal 10 may be a smart phone, a portable terminal, a mobile terminal, a personal digital assistant (PDA), a portable multimedia player (PMP) A telematics terminal, a navigation terminal, a personal computer, a notebook computer, a slate PC, a tablet PC, an ultrabook, a wearable device, a WiBro (Wibro) terminal, a flexible terminal, and the like. However, the present invention is not limited thereto, and any device capable of installing an application or running a web browser can be included.

음성인식처리장치(20)는 언어모형과 음향모형을 학습한 음성인식엔진을 기반으로 강사단말(10)로부터 등록되는 강의 음성에 대한 음성 인식을 처리하는 서버를 일컫는다.The speech recognition processing unit 20 refers to a server that processes voice recognition of lecture voice registered from a lecturer terminal 10 based on a speech recognition engine that has learned a language model and an acoustic model.

이러한, 음성인식처리장치(20)는 예컨대, 웹 서버, 데이터베이스 서버, 프록시 서버 등의 형태로 구현될 수 있으며, 네트워크 부하 분산 메커니즘, 내지 서비스 장치가 인터넷 또는 다른 네트워크 상에서 동작할 수 있도록 하는 다양한 소프트웨어 중 하나 이상이 설치될 수 있으며, 이를 통해 컴퓨터화된 시스템으로도 구현될 수 있다. 또한, 네트워크는 http 네트워크일 수 있으며, 전용 회선(private line), 인트라넷 또는 임의의 다른 네트워크일 수 있고, 또한 본 발명의 일 실시예에 따른 음성 인식 처리 시스템 내 각 구성 간의 연결은, 데이터가 임의의 해커 또는 다른 제3자에 의한 공격을 받지 않도록 보안 네트워크로 연결될 수 있다.The speech recognition processing device 20 may be implemented in the form of a web server, a database server, a proxy server, or the like, and may include various types of software such as a network load balancing mechanism, May be installed, and may be implemented as a computerized system. The network may also be an http network and may be a private line, an intranet, or any other network, and the connections between the respective configurations within the speech recognition processing system according to an embodiment of the present invention may be arbitrary To a secure network so as not to be attacked by a hacker or other third party.

학습자단말(30)은 애플리케이션 실행 환경 또는 웹 브라우저를 기반으로 음성인식처리장치(20)로부터 강의 음성을 텍스트로 변환한 초벌전사내용을 음성 인식 처리 결과로서 수신하며, 강의 음성을 청취한 학습자에 의해 초벌전사내용 내 텍스트가 변경되는 경우 이를 학습자전사내용으로서 음성인식처리장치(20)에 전송하는 장치를 일컫는다.The learner terminal 30 receives the rough transfer content obtained by converting the lecture voice into text based on the application execution environment or the web browser from the speech recognition processing device 20 as a result of the speech recognition process, Refers to a device for transferring, when the text in the original transcription contents is changed, to the speech recognition processing device 20 as contents of the learner transcription.

이러한, 학습자단말(30)은 전술한 강사단말(10)와 마찬가지로 예컨대, 스마트 폰(Smart Phone), 휴대 단말기(Portable Terminal), 이동 단말기(Mobile Terminal), 개인 정보 단말기(Personal Digital Assistant: PDA), PMP(Portable Multimedia Player) 단말기, 텔레매틱스(Telematics) 단말기, 내비게이션(Navigation) 단말기, 개인용 컴퓨터(Personal Computer), 노트북 컴퓨터, 슬레이트 PC(Slate PC), 태블릿 PC(Tablet PC), 울트라북(ultrabook), 웨어러블 디바이스(Wearable Device), 와이브로(Wibro) 단말기, 플렉서블 단말기(Flexible Terminal) 등이 해당될 수 있으며, 이에 제한되는 것이 애플리케이션의 설치 또는 웹 브라우저의 구동이 가능한 장치는 모두 포함될 수 있다.The learner terminal 30 may be a smart phone, a portable terminal, a mobile terminal, a personal digital assistant (PDA), or the like, in the same manner as the instructor terminal 10 described above, , A portable multimedia player (PMP) terminal, a telematics terminal, a navigation terminal, a personal computer, a notebook computer, a slate PC, a tablet PC, an ultrabook, A wearable device, a wibro terminal, a flexible terminal, and the like, and the devices that can install the application or operate the web browser are all included.

이상 본 발명의 일 실시예에 따른 음성 인식 처리 시스템에서는 전술한 구성을 기반으로 강의 음성 내 단어와 음성의 모호성(Ambiguity)을 줄여 음성 인식 정확도를 제고할 수 있으며, 강의 음성을 청취하는 학습자의 강화 학습을 가능케 하는데, 이하에서는 이를 실현하기 위한 음성인식처리장치(20)의 구성에 대해 보다 구체적으로 설명하기로 한다.The speech recognition processing system according to an embodiment of the present invention can improve the accuracy of speech recognition by reducing the ambiguity of words and voices in the lecture speech based on the above configuration and can enhance the learner's listening ability Learning will be described below. In the following, the configuration of the speech recognition processing device 20 for realizing this will be described in more detail.

도 2는 본 발명의 일 실시예에 따른 음성인식처리장치(20)의 개략적인 구성을 보여주고 잇다.FIG. 2 shows a schematic configuration of a speech recognition processing apparatus 20 according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따른 음성인식처리장치(20)는 등록부(21), 학습부(22), 처리부(23), 측정부(24), 및 제공부(25)를 포함하는 구성을 가질 수 있다.2, the speech recognition processing apparatus 20 according to the embodiment of the present invention includes a registration unit 21, a learning unit 22, a processing unit 23, a measurement unit 24, 25). &Lt; / RTI >

이상의 등록부(21), 학습부(22), 처리부(23), 측정부(24), 및 제공부(25)를 포함하는 음성인식처리장치(20)의 핵심 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least a part of the core configuration of the speech recognition processing apparatus 20 including the registration unit 21, the learning unit 22, the processing unit 23, the measurement unit 24, Or a software module, or a combination of a hardware module and a software module.

여기서, 소프트웨어 모듈이란, 예컨대, 음성인식처리장치(20) 내에서 연산을 처리하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 음성인식처리장치(20) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module can be understood as, for example, a command executed by a processor that processes an operation in the speech recognition processing device 20, and the command is stored in a memory in the speech recognition processing device 20 .

한편, 본 발명의 일 실시예에 따른 음성인식처리장치(20)는 전술한 구성 이외에, 강사단말(10)과 학습자단말(20)과의 실질적인 통신 기능을 제공하는 통신부(26)를 더 포함하는 구성을 가질 수 있다.The speech recognition processing apparatus 20 according to the embodiment of the present invention may further include a communication unit 26 for providing a substantial communication function between the instructor terminal 10 and the learner terminal 20 in addition to the above- Configuration.

이를 위해, 통신부(26)는 예컨대, 안테나 시스템, RF 송수신기, 하나 이상의 증폭기, 튜너, 하나 이상의 발진기, 디지털 신호 처리기, 코덱(CODEC) 칩셋, 및 메모리 등을 포함하지만 이에 제한되지는 않으며, 이 기능을 수행하는 공지의 회로를 포함할 수 있다.To this end, the communication unit 26 includes, but is not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chipset, Lt; RTI ID = 0.0 > circuitry. &Lt; / RTI >

이러한, 통신부(26)가 지원하는 통신 프로토콜로는, 예컨대, 무선랜(Wireless LAN: WLAN), DLNA(Digital Living Network Alliance), 와이브로(Wireless Broadband: Wibro), 와이맥스(World Interoperability for Microwave Access: Wimax), GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), IEEE 802.16, 롱 텀 에볼루션(Long Term Evolution: LTE), LTE-A(Long Term Evolution-Advanced), 광대역 무선 이동 통신 서비스(Wireless Mobile Broadband Service: WMBS), 와이 파이(Wi-Fi), 와이 파이 다이렉트(Wi-Fi Direct) 등이 포함될 수 있다. 또한, 유선 통신망으로는 유선 LAN(Local Area Network), 유선 WAN(Wide Area Network), 전력선 통신(Power Line Communication: PLC), USB 통신, 이더넷(Ethernet), 시리얼 통신(serial communication), 광/동축 케이블 등이 포함될 수 있으며, 이제 제한되는 것이 아닌, 다른 장치와의 통신 환경을 제공할 수 있는 프로토콜은 모두 포함될 수 있다.As a communication protocol supported by the communication unit 26, for example, a wireless LAN (WLAN), a DLNA (Digital Living Network Alliance), a Wireless Broadband (Wibro), a World Interoperability for Microwave Access ), Code Division Multiple Access (CDMA), Code Division Multi Access (CDMA2000), Enhanced Voice-Data Optimized or Enhanced Voice-Only (EV-DO), Wideband CDMA (WCDMA) , HSDPA (High Speed Downlink Packet Access), HSUPA (High Speed Uplink Packet Access), IEEE 802.16, Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A) Wireless Mobile Broadband Service (WMBS), Wi-Fi, and Wi-Fi Direct. The wired communication network may be a wired LAN, a wired WAN, a power line communication (PLC), a USB communication, an Ethernet, a serial communication, Cables, and the like, and may include any protocol that can provide a communication environment with other devices, which is not limited at present.

결국, 본 발명의 일 실시예에 따른 음성인식처리장치(20)는 전술한 구성을 통해 강의 음성 내 단어와 음성의 모호성(Ambiguity)을 줄여 음성 인식 정확도를 제고할 수 있으며, 강의 음성을 청취하는 학습자의 강화 학습을 가능케 하는데, 이하에서는 이를 실현하기 위한 음성인식처리장치(20) 내 핵심 구성에 각각에 대해서 보다 구체적으로 설명하기로 한다.As a result, the speech recognition processing apparatus 20 according to the embodiment of the present invention can improve the speech recognition accuracy by reducing the ambiguity of words and voices in lecture voices through the above-described configuration, The learner's reinforcement learning is enabled. Hereinafter, each of the core components in the speech recognition processing device 20 for realizing the reinforcement learning will be described in more detail.

등록부(21)는 강의 자료의 등록을 처리하는 기능을 수행한다.The registering unit 21 performs a function of processing registration of lecture data.

보다 구체적으로, 등록부(21)는 강의 교재(학습 교재)와 이러한 강의 교재에 근거한 강의 음성을 강의 자료로서 강사단말(10)로부터 수신하여 등록 처리하게 된다.More specifically, the registering unit 21 receives from the instructor terminal 10 lecture data of a lecture material (learning material) and a lecture based on the lecture textbook, and performs registration processing.

여기서, 강의 음성은, 실질적인 음성 인식 처리 대상이 되는 강사의 음성을 일컫는 것으로서, 이러한 강의 음성은 강의 교재와 관련하여 구분되는 연속된 강의 차시(예: 1차시, 2차시, ..., N차시) 별로 등록 처리될 수 있다.Here, the voice of the lecture refers to the voice of the lecturer who is the subject of the actual voice recognition process. The voice of such lecture is a consecutive class of lectures (for example, first, second, ). &Lt; / RTI >

학습부(22)는 언어모형에 대한 학습을 처리하는 기능을 수행한다.The learning unit 22 performs a function of processing learning on the language model.

보다 구체적으로, 학습부(22)는 강의 교재에 대한 등록 처리가 완료되면, 강의 교재로부터 추출되는 텍스트 예컨대, 단어, 문장, 혹은 단어 또는 문장의 연쇄를 훈련 데이터로 설정하여 음성인식엔진 내 언어모형에 대한 학습을 처리하게 된다.More specifically, the learning unit 22 sets up a sequence of texts, such as a word, a sentence, or a word or a sentence extracted from the teaching material of the lecture as training data, And the like.

이처럼, 강의 교재로부터 텍스트를 추출하여 음성인식엔진 내 언어모형에 대한 학습을 처리하는 것은, 학습자가 학습하게 될 강의 교재(학습 교재)에 등장하는 단어, 문장, 혹은 단어 또는 문장을 음성 인식 처리 이전에 미리 학습하는 방식을 통해서 강의 음성에 대한 음성 인식 처리 시, 단어와 음성의 모호성(Ambiguity)을 줄이기 위함이다.As described above, in order to extract the text from the lecture textbook and process the learning of the language model in the speech recognition engine, words, sentences, or words or sentences appearing in the lecture textbook (learning textbook) In order to reduce the ambiguity of the words and voices in the speech recognition processing of the lecture speech through a method of learning in advance.

참고로, 이는 사람 학습자가 슬라이드 발표를 보면서 음성을 청취하였을 때 보다 청취가 잘되며, 슬라이드가 없이 이루어지는 질문-응답의 경우에 상대적으로 알아듣기 어렵다는 심리언어학(psycholinguistics)에 근거한다.For reference, it is based on psycholinguistics that listening is better than when a human learner listens to a voice while listening to a slide presentation, and is relatively difficult to hear in the case of a question-and-answer without a slide.

한편, 학습부(22)에서는 음성인식엔진 내 음향모형에 대한 학습 또한 처리할 수 있는데, 이에 대한 설명은 이어질 처리부(23)에 대한 설명 이후 구체적으로 다루기로 한다.Meanwhile, the learning unit 22 may also process the learning of the acoustic model in the speech recognition engine, which will be specifically described after the description of the following processing unit 23. [

처리부(23)는 강의 음성에 대한 음성 인식을 처리하는 기능을 수행한다.The processing unit 23 performs a function of processing speech recognition on the speech of the lecture.

보다 구체적으로, 처리부(23)는 음성인식엔진 내 언어모형에 대한 학습을 처리가 완료되면, 강의 교재에 근거한 강의 음성에 대해서, 음성인식엔진을 기반으로 한 음성 인식을 처리하게 되며, 강의 음성을 텍스트로 변환한 초벌전사내용을 음성 인식 처리 결과로서 학습자단말(20)에게 제공하게 된다.More specifically, when the processing on the language model in the speech recognition engine is completed, the processing unit 23 processes the speech recognition based on the speech recognition engine with respect to the lecture speech based on the lecture text material, And provides the learner terminal 20 with the result of the speech recognition process.

이처럼, 강의 음성에 대한 음성 인식 처리 결과로서 제공되는 초벌전사내용은 예컨대 강의 노트 형태를 지원하는 UI(User Interface) 방식을 통해 학습자단말(20)에 표시될 수 있으며, 이를 확인한 학습자는 학습자단말(20)이 제공하는 UI를 통해서 초벌전사내용을 변경(수정)할 수 있다.As described above, the primary transfer contents provided as a result of speech recognition processing on the speech of the lecture can be displayed on the learner terminal 20 through a UI (User Interface) method supporting a lecture note form, for example, 20) can change (modify) the contents of the primitive transfer.

또한, 학습부(22)는 음성인식엔진 내 음향모형에 대한 학습을 처리하는 기능을 수행할 수 있다.Further, the learning unit 22 can perform a function of processing learning of the acoustic model in the speech recognition engine.

보다 구체적으로, 학습부(22)는 강의 음성을 청취한 학습자에 의해 상기 초벌전사내용 내 텍스트가 변경된 학습자전사내용이 학습자단말(20)로부터 확인되는 경우, 학습자전사내용으로부터 추출되는 텍스트 예컨대, 단어, 문장, 혹은 단어 또는 문장의 연쇄를 훈련 데이터로 설정하여 음성인식엔진 내 음향모형에 대한 학습을 처리하게 된다.More specifically, when the learner transcription content whose text in the original transcription contents is changed by the learner who listened to the lecture voice is confirmed from the learner terminal 20, the learning unit 22 extracts a text extracted from the learner transcription content, , A sentence, or a chain of words or sentences is set as training data to process the learning of the acoustic model in the speech recognition engine.

한편, 강의 음성은, 강의 교재와 관련하여 구분되는 연속된 강의 차시(예: 1차시, 2차시, ..., N차시) 별로 등록 처리될 수 있음을 앞서 언급한 바 있는데, 이와 관련하여, 음향모형 역시 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 각각 학습이 처리된다.On the other hand, it has been mentioned above that the voice of a lecture can be registered for each of successive lectures (for example, first, second, ..., N-lecture) classified in relation to the lecture material. Acoustic models are also taught individually for each successive lecture in relation to the lecture materials.

즉, 학습부(22)는 강의 차시 별로 확인되는 학습자전사내용 내 텍스트를 각각의 후속 강의 차시들에 대한 훈련 데이터로 각각 설정하여 음향모형에 대한 학습을 처리할 수 있는 것이다.That is, the learning unit 22 can process the learning of the acoustic model by setting the text in the learner transcription content, which is checked for each lecture, as the training data for each subsequent lecture.

참고로, 강의 음성은 보통 시리즈 드라마와 같이, 한 명의 강사가 여러 차시의 강의를 연속적으로 담당하는 경우가 많다. 이러한 경우 1차시 2차시 부분의 음향모형을 전사작업을 통해 학습(튜닝)시키면, 후속 차시에서의 강의 음성에 대한 인식률이 높아질 수 있는 것이다.For reference, the voice of a lecture is usually the same as a series drama, and one lecturer often takes lectures of several lectures continuously. In such a case, if the acoustic model of the second-order part of the first time is learned (tuned) through the transcription task, the recognition rate of the speech of the lecture in the succeeding second class can be increased.

여기서, 전사작업이란, 입력되는 사운드 스트림을 사람이 받아 적어 넣는 작업을 일컫는 것으로서, 이는 본원 발명에서 음성 인식 처리 결과인 초벌전사내용을 강의 음성을 청취한 학습자가 직접 변경(교정)하는 것과 기능상 동일하다.Here, the transferring job refers to an operation in which a person inputs and records an input sound stream. This is because, in the present invention, the learner who listens to the lecture voice directly changes (corrects) Do.

결국, 이러한 취지에서 강의 차시 별로 확인되는 학습자전사내용 내 텍스트를 각각의 후속 강의 차시들에 대한 훈련 데이터로 각각 설정하여 음향모형에 대한 학습을 처리하는 것은 강사에 의한 발화모형을 텍스트(글자)로 전사시키는 모형을 정교화하는 것이므로 음성 인식에 있어서의 성능 향상을 기대할 수 있는 것이다.As a result, the text in the learner's transcript content, which is checked for each lecture, is set as the training data for each subsequent lecture, and the learning of the acoustic model is processed. The lecture- It is expected that the performance improvement in speech recognition can be expected since the model to be transferred is elaborated.

또한, 음성 인식 처리 결과인 초벌전사내용을 학습자가 변경(교정)하기 위해서는 강의 내용의 재 청취가 필수적으로 요구되므로, 복습 효과 즉, 강의 음성의 반복 청취를 통한 학습자의 강화 학습을 가능케 한다.In addition, since the learner needs to re-listen to lecture contents in order for the learner to change (correct) the contents of the original transcription as a result of the speech recognition processing, the learner can reinforce learning by repeating listening of the lecture voice.

이때 학습자 정신모형에 잘못 형성되어 있는 개념어의 스펠링이나 정의 오류를 바로잡을 수 있는 학습 기회, 다시 말해, 음성 인식 처리 결과인 초벌전사내용에서 학습자가 알지 못하는 내용이 등장한 경우, 이것이 음성 인식 오류인지 혹은 학습자가 미 학습이나 잘못 알고 있는지의 여부를 따져볼 수 있는 사고 과정의 기회를 제공할 수 있는 것이다.In this case, if the learner's mental model has a learning opportunity to correct spelling or definition errors that are erroneously formed, that is, contents that the learner does not know in the contents of the original word, which is the result of speech recognition processing, It is possible to provide an opportunity for the thought process to see whether the learner is not learning or misinformation.

측정부(24)는 학습자의 이해도를 측정하는 기능을 수행한다.The measuring unit 24 performs a function of measuring the comprehension of the learner.

보다 구체적으로, 측정부(24)는 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 학습자의 이해도를 측정하여 그에 따른 학습 피드백이 제공되도록 함으로써, 적응형 학습관리 시스템(ALMS, Adaptive Learning Management System)을 구현할 수 있도록 한다.More specifically, the measuring unit 24 measures an understanding degree of a learner according to each successive lecture divided in relation to a teaching material of a lecture, and provides learning feedback according to the learned lecture. Thus, the adaptive learning management system (ALMS) .

이때, 측정부(24)는 음성 인식 처리 결과로서 제공되는 초벌전사내용에서 텍스트 또는 텍스트의 조합으로 지정되는 이해도측정항목이, 학습자전사내용에서 변경되었는지 여부를 기초로 학습자의 이해도를 판별할 수 있다.At this time, the measurement unit 24 can determine the degree of comprehension of the learner based on whether or not the understanding measurement item designated by the combination of text or text in the original transfer contents provided as a result of the speech recognition process is changed in the content of the learner transference .

여기서, 이해도측정항목은, 예컨대, 음성인식엔진에서 자체적으로 판별되는 음성 인식 정확도가 임계치 이상인 텍스트 또는 텍스트의 조합을 일컫는 것으로서, 이는 초벌전사내용 내에서 실제 음성인식엔진의 음성 인식 처리 결과와는 다른 텍스트 또는 텍스트의 조합으로 변경되어 지정된다.Here, the understanding degree measurement item refers to, for example, a combination of text or text whose speech recognition accuracy that is determined by the speech recognition engine is equal to or greater than a threshold value. This means that the result of the speech recognition processing is different from the speech recognition processing result of the actual speech recognition engine Text, or a combination of text.

결국, 측정부(24)는 이처럼 초벌전사내용 내에서 실제 음성인식엔진의 음성 인식 처리 결과와는 다르게 지정된 텍스트 또는 텍스트의 조합을 학습자가 인지하여 정상적으로 변경(교정)하였는지 여부를 확인하는 방식을 통해서 학습자의 이해도를 측정할 수 있는 것이다.As a result, the measurement unit 24 determines whether or not the learner recognizes the specified text or combination of texts differently from the speech recognition processing result of the actual speech recognition engine and confirms whether the text has been changed (corrected) It is possible to measure learners' understanding.

제공부(25)는 학습상태 별 부가학습자료를 제공하는 기능을 수행한다.The providing unit 25 performs a function of providing additional learning data for each learning state.

보다 구체적으로, 제공부(25)는 학습자에 대한 강의 차시 별 이해도 측정이 완료되면, 이해도 측정 결과를 기초로 학습자의 학습상태(예: 부족, 충분)를 판단하여, 판단 결과에 상응하는 부가학습자료를 학습자단말(30)로 제공하게 된다.More specifically, the providing unit 25 determines a learning state (e.g., insufficient, sufficient) of the learner based on the comprehension degree measurement result when the learner's perception level of the lecture is completely measured for the learner, And provides supplementary learning materials to the learner terminal 30. [

여기서, 부가학습자료는 예컨대, 강의의 주단 목표가 되는 개념어를 자동 정리하고 분석하여 학습자에게 도움되는 외부 도표, 또 다른 설명, 알파벳 표기법 병기 등이 포함될 수 있으며, 이러한 부가학습자료는 외부지식데이터베이스(LOD, Linked Open Database), 및 개방형 API를 통해 조회 가능한 검색포털로부터 획득될 수 있다.Here, the supplementary learning data may include, for example, an external table, another explanation, an alphabet notation, and the like, which are helpful to the learner by automatically summarizing and analyzing the concept word that is the main goal of the lecture, , Linked Open Database), and searchable searchable through an open API.

이상에서 살펴본 바와 같이 본 발명의 일 실시예에 따른 음성인식처리장치(20)의 구성에 따르면, 강의 교재로부터 추출되는 텍스트(예: 단어, 문장, 단어 또는 문장의 연쇄)와, 학습자가 강의 음성을 청취하여 텍스트로 작성한 전사내용을 훈련 데이터로서 학습 처리함으로써, 강의 음성 내 단어와 음성의 모호성(Ambiguity)을 줄여 음성 인식 정확도를 향상시킴과 아울러, 학습자로 하여금 전사내용(학습자전사내용)을 작성을 위한 강의 음성의 반복 청취를 유도함으로써, 강화 학습을 가능케 함을 알 수 있다. 또한, 강의 차시 별로 학습자의 이해도를 측정하고, 그에 따른 부가학습자료를 추가로 제공함으로써, 적응형 학습관리 시스템(ALMS, Adaptive Learning Management System)을 구현과 함께, 1 회적으로 휘발하는 강의 음성을 텍스트(문자)로 정착시키는 첫 번째 효용에 대해 지식서비스가 부가됨으로써, 학습자의 학습 효율의 제고에 크게 일조할 수 있다.As described above, according to the configuration of the speech recognition processing apparatus 20 according to an embodiment of the present invention, text (e.g., a word, a sentence, a word or a sentence chain) extracted from a teaching material of a lecture, And learns the contents of the transcript created by the text as training data, thereby improving the accuracy of speech recognition by reducing the ambiguity of the words and the speech in the lecture voice, and furthermore, the learner is made to write the contents of the transcription It is understood that reinforcement learning is possible by inducing repeated listening of the lecture voice for the lecture. In addition, by measuring learners' understanding of each lecture and providing supplementary learning materials accordingly, an adaptive learning management system (ALMS) is implemented, The knowledge service is added to the first utility which fixes the learning result (character), thereby contributing greatly to the improvement of the learning efficiency of the learner.

이상 본 발명의 일 실시예에 따른 음성인식처리장치(20)의 구성에 대한 설명을 마치고, 이하에서는 음성인식처리장치(20)에서의 동작 흐름에 대해 설명하기로 한다.The operation of the speech recognition processor 20 will now be described with reference to the structure of the speech recognition processor 20 according to the embodiment of the present invention.

도 3은 본 발명의 본 발명의 일 실시예에 따른 음성인식처리장치(20)에서의 동작 흐름에 따른 순서도를 보여주고 있다.FIG. 3 is a flowchart illustrating an operation of the speech recognition processor 20 according to an embodiment of the present invention.

먼저, 등록부(21)는 단계 S10에 따라 강의 교재(학습 교재)와 이러한 강의 교재에 근거한 강의 음성을 강의 자료로서 강사단말(10)로부터 수신하여 등록 처리한다.First, in accordance with step S10, the registering unit 21 receives a lecture voice based on the lecture textbook (learning textbook) and the lecture text based on the lecture textbook as lecture data from the instructor terminal 10 and performs registration processing.

그리고 나서, 학습부(22)는 강의 교재에 대한 등록 처리가 완료되면, 단계 S20에 따라 음성인식엔진 내 언어모형에 대한 학습을 처리한다.Then, when the learning process for the lecture text material is completed, the learning unit 22 processes the learning of the language model in the speech recognition engine according to step S20.

이때, 학습부(22)는 도 4에 도시된 바와 같이, 단계 S21에 따라 등록된 강의 교재로부터 예컨대, 단어, 문장, 혹은 단어 또는 문장의 연쇄에 해당하는 텍스트를 추출하고, 단계 S22에 따라 추출된 텍스트를 훈련 데이터로 설정하며, 나아가 단계 S23에 따라 텍스트로 설정된 훈련 데이터를 기반으로 음성인식엔진 내 언어모형에 대한 학습을 처리할 수 있다.At this time, as shown in Fig. 4, the learning unit 22 extracts, for example, a word, a sentence, or a text corresponding to a chain of words or sentences from the teaching material registered in accordance with step S21, And the learning of the language model in the speech recognition engine can be processed based on the training data set as text in accordance with step S23.

그런 다음, 처리부(23)는 음성인식엔진 내 언어모형에 대한 학습을 처리가 완료되면, 단계 S30에 따라 강의 교재에 근거한 강의 음성에 대해서, 음성인식엔진을 기반으로 한 음성 인식을 처리하게 되며, 이어서 단계 S40에 따라 강의 음성을 텍스트로 변환한 초벌전사내용을 음성 인식 처리 결과로서 학습자단말(20)에게 제공하게 된다.Then, when the processing of the learning of the language model in the speech recognition engine is completed, the processing unit 23 processes the speech recognition based on the speech recognition engine with respect to the speech of the lecture based on the lecture textbook in accordance with step S30, Subsequently, in accordance with step S40, the learner terminal 20 is provided with the result of the speech recognition process as the result of the rough transfer in which the lecture voice is converted into text.

이때, 강의 음성에 대한 음성 인식 처리 결과로서 제공되는 초벌전사내용은 예컨대 강의 노트 형태를 지원하는 UI(User Interface) 방식을 통해 학습자단말(20)에 표시될 수 있으며, 이를 확인한 학습자는 학습자단말(20)이 제공하는 UI를 통해서 초벌전사내용을 변경(수정)할 수 있다.At this time, the primary transfer contents provided as a result of speech recognition processing on the speech of the lecture can be displayed on the learner terminal 20 through a UI (User Interface) method supporting lecture notes, for example, 20) can change (modify) the contents of the primitive transfer.

다음으로, 학습부(22)는 단계 S50에 따라 강의 음성을 청취한 학습자에 의해 상기 초벌전사내용 내 텍스트가 변경된 학습자전사내용이 학습자단말(20)로부터 확인되는 경우, 이어서 단계 S60에 따라 음성인식엔진 내 음향모형에 대한 학습을 처리한다.Next, when the learner transcription content whose text in the primary transfer content has been changed is confirmed by the learner who listened to the lecture voice according to the step S50 from the learner terminal 20, the learning unit 22 subsequently executes the speech recognition It handles learning about acoustic models in the engine.

이때, 학습부(22)는 도 5에 도시된 바와 같이, 단계 S61에 따라 학습자전사내용으로부터 예컨대, 단어, 문장, 혹은 단어 또는 문장의 연쇄에 해당하는 텍스트를 추출하고, 단계 S62에 따라 추출된 텍스트를 훈련 데이터로 설정하며, 나아가 단계 S63에 따라 텍스트로 설정된 훈련 데이터를 기반으로 음성인식엔진 내 음향모형에 대한 학습을 처리할 수 있다.At this time, as shown in Fig. 5, the learning unit 22 extracts, for example, a word, a sentence, or a text corresponding to a chain of words or sentences from the learner transcription contents in accordance with step S61, The text is set to the training data, and furthermore, the learning for the acoustic model in the speech recognition engine can be processed based on the training data set in the text in accordance with step S63.

나아가, 측정부(24)는 단계 S70에 따라 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 학습자의 이해도를 측정하여 그에 따른 학습 피드백이 제공되도록 함으로써, 적응형 학습관리 시스템(ALMS, Adaptive Learning Management System)을 구현할 수 있도록 한다.Further, the measuring unit 24 may measure the learner's level of understanding according to the lectures of the continuous lectures classified in accordance with the teaching material of the lecture in accordance with the step S70, System) can be implemented.

이후, 제공부(25)는 학습자에 대한 강의 차시 별 이해도 측정이 완료되면, 단계 S80에 따라 이해도 측정 결과를 기초로 학습자의 학습상태(예: 부족, 충분)를 판단하여, 판단 결과에 상응하는 부가학습자료를 학습자단말(30)로 제공한다.Thereafter, the providing unit 25 determines the learning state (e.g., insufficient, sufficient) of the learner based on the result of the comprehension degree measurement in accordance with step S80 when measurement of the learner's comprehension level for the learner is completed, And provides corresponding supplementary learning materials to the learner terminal 30.

이상에서 살펴본 바와 같이 본 발명의 일 실시예에 따른 음성인식처리장치(20)에서의 동작 흐름에 따르면, 강의 교재로부터 추출되는 텍스트(예: 단어, 문장, 단어 또는 문장의 연쇄)와, 학습자가 강의 음성을 청취하여 텍스트로 작성한 전사내용을 훈련 데이터로서 학습 처리함으로써, 강의 음성 내 단어와 음성의 모호성(Ambiguity)을 줄여 음성 인식 정확도를 향상시킴과 아울러, 학습자로 하여금 전사내용(학습자전사내용)을 작성을 위한 강의 음성의 반복 청취를 유도함으로써, 강화 학습을 가능케 함을 알 수 있다. 또한, 강의 차시 별로 학습자의 이해도를 측정하고, 그에 따른 부가학습자료를 추가로 제공함으로써, 적응형 학습관리 시스템(ALMS, Adaptive Learning Management System)을 구현과 함께, 1 회적으로 휘발하는 강의 음성을 텍스트(문자)로 정착시키는 첫 번째 효용에 대해 지식서비스가 부가됨으로써, 학습자의 학습 효율의 제고에 크게 일조할 수 있다.As described above, according to the operation flow of the speech recognition processing apparatus 20 according to the embodiment of the present invention, the text (e.g., a word, a sentence, a word or a sentence chain) extracted from a teaching material of a lecture, (Learner transcription content) is improved by reducing the ambiguity of the words and voices in lecture voices by learning processing of the contents of the transcript as text data, It is understood that reinforcement learning is enabled by inducing repeated listening of the lecture voice for creation. In addition, by measuring learners' understanding of each lecture and providing supplementary learning materials accordingly, an adaptive learning management system (ALMS) is implemented, The knowledge service is added to the first utility which fixes the learning result (character), thereby contributing greatly to the improvement of the learning efficiency of the learner.

한편, 본 명세서에서 설명하는 기능적인 동작과 주제의 구현물들은 디지털 전자 회로로 구현되거나, 본 명세서에서 개시하는 구조 및 그 구조적인 등가물들을 포함하는 컴퓨터 소프트웨어, 펌웨어 혹은 하드웨어로 구현되거나, 이들 중 하나 이상의 결합으로 구현 가능하다.　 본 명세서에서 설명하는 주제의 구현물들은 하나 이상의 컴퓨터 프로그램 제품, 다시 말해 제어 시스템의 동작을 제어하기 위하여 혹은 이것에 의한 실행을 위하여 유형의 프로그램 저장매체 상에 인코딩된 컴퓨터 프로그램 명령에 관한 하나 이상의 모듈로서 구현될 수 있다.It should be understood that the functional operations and subject matter implementations described herein may be implemented as digital electronic circuitry, or may be embodied in computer software, firmware, or hardware, including the structures disclosed herein, and structural equivalents thereof, . Implementations of the subject matter described herein may be implemented as one or more computer program products, i. E. As one or more modules for computer program instructions encoded on a program storage medium of the type for control of, or for execution by, the operation of the control system Can be implemented.

컴퓨터로 판독 가능한 매체는 기계로 판독 가능한 저장 장치, 기계로 판독 가능한 저장 기판, 메모리 장치, 기계로 판독 가능한 전파형 신호에 영향을 미치는 물질의 조성물 혹은 이들 중 하나 이상의 조합일 수 있다.The computer-readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter that affects the machine readable propagation type signal, or a combination of one or more of the foregoing.

본 명세서에서 "시스템"이나 "장치"라 함은 예컨대 프로그래머블 프로세서, 컴퓨터 혹은 다중 프로세서나 컴퓨터를 포함하여 데이터를 제어하기 위한 모든 기구, 장치 및 기계를 포괄한다. 제어 시스템은, 하드웨어에 부가하여, 예컨대 프로세서 펌웨어를 구성하는 코드, 프로토콜 스택, 데이터베이스 관리 시스템, 운영 체제 혹은 이들 중 하나 이상의 조합 등 요청 시 컴퓨터 프로그램에 대한 실행 환경을 형성하는 코드를 포함할 수 있다.As used herein, the term " system "or" device "encompasses any apparatus, apparatus, and machine for controlling data, including, for example, a programmable processor, a computer or a multiprocessor or computer. The control system may, in addition to the hardware, comprise code that forms an execution environment for a computer program upon request, such as code comprising a processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of these .

컴퓨터 프로그램(프로그램, 소프트웨어, 소프트웨어 어플리케이션, 스크립트 혹은 코드로도 알려져 있음)은 컴파일되거나 해석된 언어나 선험적 혹은 절차적 언어를 포함하는 프로그래밍 언어의 어떠한 형태로도 작성될 수 있으며, 독립형 프로그램이나 모듈, 컴포넌트, 서브루틴 혹은 컴퓨터 환경에서 사용하기에 적합한 다른 유닛을 포함하여 어떠한 형태로도 전개될 수 있다. 컴퓨터 프로그램은 파일 시스템의 파일에 반드시 대응하는 것은 아니다. 프로그램은 요청된 프로그램에 제공되는 단일 파일 내에, 혹은 다중의 상호 작용하는 파일(예컨대, 하나 이상의 모듈, 하위 프로그램 혹은 코드의 일부를 저장하는 파일) 내에, 혹은 다른 프로그램이나 데이터를 보유하는 파일의 일부(예컨대, 마크업 언어 문서 내에 저장되는 하나 이상의 스크립트) 내에 저장될 수 있다. 컴퓨터 프로그램은 하나의 사이트에 위치하거나 복수의 사이트에 걸쳐서 분산되어 통신 네트워크에 의해 상호 접속된 다중 컴퓨터나 하나의 컴퓨터 상에서 실행되도록 전개될 수 있다.A computer program (also known as a program, software, software application, script or code) may be written in any form of programming language, including compiled or interpreted language, a priori or procedural language, Components, subroutines, or other units suitable for use in a computer environment. A computer program does not necessarily correspond to a file in the file system. The program may be stored in a single file provided to the requested program, or in multiple interactive files (e.g., a file storing one or more modules, subprograms, or portions of code) (E.g., one or more scripts stored in a markup language document). A computer program may be deployed to run on multiple computers or on one computer, located on a single site or distributed across multiple sites and interconnected by a communications network.

한편, 컴퓨터 프로그램 명령어와 데이터를 저장하기에 적합한 컴퓨터로 판독 가능한 매체는, 예컨대 EPROM, EEPROM 및 플래시메모리 장치와 같은 반도체 메모리 장치, 예컨대 내부 하드디스크나 외장형 디스크와 같은 자기 디스크, 자기광학 디스크 및 CD-ROM과 DVD-ROM 디스크를 포함하여 모든 형태의 비휘발성 메모리, 매체 및 메모리 장치를 포함할 수 있다. 프로세서와 메모리는 특수 목적의 논리 회로에 의해 보충되거나, 그것에 통합될 수 있다.On the other hand, computer readable media suitable for storing computer program instructions and data include semiconductor memory devices such as, for example, EPROM, EEPROM and flash memory devices, such as magnetic disks such as internal hard disks or external disks, Non-volatile memory, media and memory devices, including ROM and DVD-ROM disks. The processor and memory may be supplemented by, or incorporated in, special purpose logic circuits.

본 명세서에서 설명한 주제의 구현물은 예컨대 데이터 서버와 같은 백엔드 컴포넌트를 포함하거나, 예컨대 어플리케이션 서버와 같은 미들웨어 컴포넌트를 포함하거나, 예컨대 사용자가 본 명세서에서 설명한 주제의 구현물과 상호 작용할 수 있는 웹 브라우저나 그래픽 유저 인터페이스를 갖는 클라이언트 컴퓨터와 같은 프론트엔드 컴포넌트 혹은 그러한 백엔드, 미들웨어 혹은 프론트엔드 컴포넌트의 하나 이상의 모든 조합을 포함하는 연산 시스템에서 구현될 수도 있다. 시스템의 컴포넌트는 예컨대 통신 네트워크와 같은 디지털 데이터 통신의 어떠한 형태나 매체에 의해서도 상호 접속 가능하다.Implementations of the subject matter described herein may include, for example, a back-end component such as a data server, or may include a middleware component, such as an application server, or may be a web browser or a graphical user, for example a user, who may interact with an implementation of the subject- Front-end components such as client computers with interfaces, or any combination of one or more of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication, such as, for example, a communications network.

본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 마찬가지로, 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.While the specification contains a number of specific implementation details, it should be understood that they are not to be construed as limitations on the scope of any invention or claim, but rather on the description of features that may be specific to a particular embodiment of a particular invention Should be understood. Likewise, the specific features described herein in the context of separate embodiments may be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented in multiple embodiments, either individually or in any suitable subcombination. Further, although the features may operate in a particular combination and may be initially described as so claimed, one or more features from the claimed combination may in some cases be excluded from the combination, Or a variant of a subcombination.

또한, 본 명세서에서는 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징될 수 있다는 점을 이해하여야 한다It is also to be understood that although the present invention is described herein with particular sequence of operations in the drawings, it is to be understood that it is to be understood that it is to be understood that all such illustrated acts have to be performed or that such acts must be performed in their particular order or sequential order, Can not be done. In certain cases, multitasking and parallel processing may be advantageous. Also, the separation of the various system components of the above-described embodiments should not be understood as requiring such separation in all embodiments, and the described program components and systems will generally be integrated together into a single software product or packaged into multiple software products It should be understood that

이와 같이, 본 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하려는 의도가 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, the present specification is not intended to limit the invention to the specific terminology presented. Thus, while the present invention has been described in detail with reference to the above examples, those skilled in the art will be able to make adaptations, modifications, and variations on these examples without departing from the scope of the present invention. The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

본 발명에 일 실시예에 따른 음성인식처리장치 및 그 동작 방법에 따르면, 강의 교재(학습 교재)로부터 추출되는 텍스트(예: 단어, 문장, 단어 또는 문장의 연쇄)와, 학습자가 강의 음성을 청취하여 텍스트로 작성한 전사내용을 훈련 데이터로서 학습 처리한다는 점에서 기존 기술의 한계를 뛰어 넘음에 따라 관련 기술에 대한 이용만이 아닌 적용되는 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.According to the speech recognition processing apparatus and the operation method thereof according to an embodiment of the present invention, a text (e.g., a word, a sentence, a word or a sentence chain) extracted from a teaching material (a learning teaching material) In addition, it is not only the use of related technologies, but also the possibility of marketing or operating the device, which is applied, is practically possible, This is an invention that is likely to be used in industry.

10: 강사단말
20: 음성인식처리장치
21: 등록부 22: 학습부
23: 처리부 24: 측정부
25: 제공부
30: 학습자단말10: Instructor terminal
20: Speech recognition processing device
21: registration unit 22: learning unit
23: processing section 24:
25:
30: learner terminal

Claims

강의 교재로부터 추출되는 텍스트를 훈련 데이터로 설정하여 음성인식엔진 내 언어모형에 대한 학습을 처리하는 학습부; 및
상기 강의 교재에 근거한 강의 음성에 대해서, 상기 음성인식엔진을 기반으로 한 음성 인식을 처리하여, 상기 강의 음성을 텍스트로 변환한 초벌전사내용을 상기 음성 인식 처리 결과로서 학습자에게 제공하는 처리부를 포함하는 것을 특징으로 하는 음성인식처리장치.A learning unit configured to process learning about a language model in a speech recognition engine by setting text extracted from a textbook as training data; And
And a processing unit for processing the speech recognition based on the speech recognition engine with respect to speech of the lecture based on the teaching material of the lecture and providing the learner with the result of the speech recognition processing as the result of the speech recognition processing, And the speech recognition processing unit.

제 1 항에 있어서,
상기 학습부는,
상기 강의 음성을 청취한 상기 학습자에 의해 상기 초벌전사내용 내 텍스트가 변경된 학습자전사내용이 확인되는 경우, 상기 학습자전사내용으로부터 추출되는 텍스트를 훈련 데이터로 설정하여 상기 음성인식엔진 내 음향모형에 대한 학습을 처리하는 것을 특징으로 하는 음성인식처리장치.The method according to claim 1,
Wherein,
The text extracted from the contents of the learner transcription is set as training data when the contents of the learner transcription in which the text in the basic transcription contents is changed are confirmed by the learner who listened to the lecture voice, To the speech recognition processing unit.

제 2 항에 있어서,
상기 음향모형은,
상기 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 각각 학습 처리되며,
상기 학습부는,
상기 강의 차시 별로 확인되는 상기 학습자전사내용 내 텍스트를 각각의 후속 강의 차시들에 대한 훈련 데이터로 각각 설정하여 음향모형에 대한 학습을 처리하는 것을 특징으로 하는 음성인식처리장치.3. The method of claim 2,
In the acoustic model,
The learning process is performed for each of the consecutive lectures divided in relation to the teaching material of the lecture,
Wherein,
Wherein the learning process for the acoustic model is performed by setting the text of the learner transcription content checked for each lecture of the lecture to training data for each subsequent lecture course.

제 2 항에 있어서,
상기 음성인식처리장치는,
상기 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 상기 학습자의 이해도를 측정하는 측정부를 더 포함하며,
상기 이해도는,
상기 초벌전사내용에서 텍스트 또는 텍스트의 조합으로 지정되는 이해도측정항목이, 상기 학습자전사내용에서 변경되었는지 여부를 기초로 판별되는 것을 특징으로 하는 음성인식처리장치.3. The method of claim 2,
The speech recognition processing apparatus includes:
Further comprising a measuring unit for measuring a degree of understanding of the learner for each of successive lectures divided in relation to the teaching material of the lecture,
The above-
Wherein a determination is made on the basis of whether or not the understanding measurement item designated by the combination of the text or the text in the early transcription contents has been changed in the content of the learner transcription.

제 4 항에 있어서,
상기 이해도측정항목은,
상기 음성인식엔진에서 판별되는 음성 인식 정확도가 임계치 이상인 텍스트 또는 텍스트의 조합을 포함하며, 상기 초벌전사내용 내에서 상기 음성인식엔진의 음성 인식 처리 결과와는 다른 텍스트 또는 텍스트의 조합으로 변경되어 지정되는 것을 특징으로 하는 음성인식처리장치.5. The method of claim 4,
The understanding degree measurement item includes:
Wherein the speech recognition engine includes a combination of text or text whose speech recognition accuracy is equal to or greater than a threshold value and is changed to a text or text combination different from the speech recognition processing result of the speech recognition engine And the speech recognition processing unit.

제 4 항에 있어서,
상기 음성인식처리장치는,
상기 학습자의 이해도 측정 결과를 기초로 학습상태 별 부가학습자료를 제공하는 제공부를 더 포함하며,
상기 부가학습자료는,
외부지식데이터베이스(LOD, Linked Open Database), 및 개방형 API를 통해 조회 가능한 검색포털 중 적어도 하나로부터 획득되는 정보인 것을 특징으로 하는 음성인식처리장치.5. The method of claim 4,
The speech recognition processing apparatus includes:
And a providing unit for providing additional learning data for each learning state based on the learner's degree of understanding measurement result,
The supplementary learning data may include:
Wherein the information is obtained from at least one of an external knowledge database (LOD) and a search portal searchable through an open API.

강의 교재로부터 추출되는 텍스트를 훈련 데이터로 설정하여 음성인식엔진 내 언어모형에 대한 학습을 처리하는 언어학습단계; 및
상기 강의 교재에 근거한 강의 음성에 대해서, 상기 음성인식엔진을 기반으로 한 음성 인식을 처리하여, 상기 강의 음성을 텍스트로 변환한 초벌전사내용을 상기 음성 인식 처리 결과로서 학습자에게 제공하는 처리단계를 포함하는 것을 특징으로 하는 음성인식처리장치의 동작 방법.A language learning step of setting a text extracted from a teaching material of the lecture as training data to process a learning of a language model in a speech recognition engine; And
And a processing step of processing the speech recognition based on the speech recognition engine with respect to speech of the lecture based on the teaching material of the lecture and providing the learner with the result of the speech recognition processing in which the lecture speech is converted into text, Wherein the speech recognition processing apparatus is a speech recognition apparatus.

제 1 항에 있어서,
상기 방법은,
상기 강의 음성을 청취한 상기 학습자에 의해 상기 초벌전사내용 내 텍스트가 변경된 학습자전사내용이 확인되는 경우, 상기 학습자전사내용으로부터 추출되는 텍스트를 훈련 데이터로 설정하여 상기 음성인식엔진 내 음향모형에 대한 학습을 처리하는 음향학습단계를 더 포함하는 것을 특징으로 하는 음성인식처리장치의 동작 방법.The method according to claim 1,
The method comprises:
The text extracted from the contents of the learner transcription is set as training data when the contents of the learner transcription in which the text in the basic transcription contents is changed are confirmed by the learner who listened to the lecture voice, And an acoustic learning step of processing the speech recognition result.

제 8 항에 있어서,
상기 음향모형은,
상기 강의 교재와 관련하여 구분되는 연속된 강의 차시 별로 각각 학습 처리되며,
상기 음향학습단계는,
상기 강의 교재와 관련하여 상기 강의 차시 별로 확인되는 상기 학습자전사내용 내 텍스트를 각각의 후속 강의 차시들에 대한 훈련 데이터로 각각 설정하여 상기 음향모형에 대한 학습을 처리하는 것을 특징으로 하는 음성인식처리장치의 동작 방법.9. The method of claim 8,
In the acoustic model,
The learning process is performed for each of the consecutive lectures divided in relation to the teaching material of the lecture,
In the acoustic learning step,
And the learning process for the acoustic model is performed by setting the text in the learner transcription contents checked for each lecture of the lecture in association with the teaching material of the lecture as training data for each subsequent lecture course, Lt; / RTI >

제 8 항에 있어서,
상기 방법은,
상기 초벌전사내용에서 텍스트 또는 텍스트의 조합으로 지정되는 이해도측정항목이, 상기 학습자전사내용에서 변경되었는지 여부를 기초로 상기 학습자의 이해도를 측정하는 측정단계를 더 포함하는 것을 특징으로 하는 음성인식처리장치의 동작 방법.9. The method of claim 8,
The method comprises:
Further comprising a measurement step of measuring the understanding degree of the learner based on whether or not the understanding measurement item designated by the text or the combination of text in the contents of the draft transfer has been changed in the content of the learner transference, Lt; / RTI >

제 10 항에 있어서,
상기 이해도측정항목은,
상기 음성인식엔진에서 판별되는 음성 인식 정확도가 임계치 이상인 텍스트 또는 텍스트의 조합이며, 상기 초벌전사내용 내에서 상기 음성인식엔진의 음성 인식 처리 결과와는 다른 텍스트 또는 텍스트의 조합으로 변경되어 지정되는 것을 특징으로 하는 음성인식처리장치의 동작 방법.11. The method of claim 10,
The understanding degree measurement item includes:
Characterized in that the speech recognition accuracy determined by the speech recognition engine is a combination of text or text having a threshold value or more and is changed by a combination of text or text different from the speech recognition processing result of the speech recognition engine Of the speech recognition processing apparatus.

제 10 항에 있어서,
상기 방법은,
상기 학습자의 이해도 측정 결과를 기초로 외부지식데이터베이스(LOD, Linked Open Database), 및 개방형 API를 통해 조회 가능한 검색포털 중 적어도 하나로부터 획득되는 학습상태 별 부가학습자료를 제공하는 제공단계를 더 포함하는 것을 특징으로 하는 음성인식처리장치의 동작 방법.11. The method of claim 10,
The method comprises:
And a providing step of providing additional learning data for each learning state obtained from at least one of an LOD, a Linked Open Database (LOD), and a search portal searchable through an open API, based on the learner's degree of understanding measurement result Wherein the speech recognition processing apparatus is a speech recognition apparatus.