KR100362292B1

KR100362292B1 - Method and system for english pronunciation study based on speech recognition technology

Info

Publication number: KR100362292B1
Application number: KR1020010008307A
Authority: KR
Inventors: 김우일
Original assignee: 보이스미디어텍(주)
Priority date: 2001-02-19
Filing date: 2001-02-19
Publication date: 2002-11-23
Also published as: KR20020067870A

Abstract

본 발명은 음성 인식 기술을 이용한 영어 학습 방법 및 시스템에 관한 것으로서, 구체적으로 컴퓨터 단말기상에서 학습자가 스스로 영어의 발음을 입력하면 부정확한 발음에 대한 위치를 음소 혹은 음절 단위로 지적하여 사용자에게 알려줌과 동시에 발음의 정확도, 강세, 억양, 속도 등을 자세히 분석하여 그 결과를 사용자에게 알려주며, 개방형 네트워크인 인터넷을 이용한 서버와 클라이언트 시스템에 응용되어 클라이언트로부터 전송되는 발음 정보를 분석하여 부정확한 발음에 대한 지적 및 발음의 정확도, 강세, 억양, 속도를 포함하여, 통계 데이터 등을 제공하여 클라이언트 사용자의 학습 능력 향상과 학습 성취도를 확인시켜줄 수 있는 음성 인식 기술을 이용한 영어 학습 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for learning English using speech recognition technology. Specifically, when a learner inputs an English pronunciation by himself / herself on a computer terminal, the user can point out the location of an incorrect pronunciation in a phoneme or syllable unit and inform the user. Analyzes pronunciation accuracy, stress, intonation, speed, etc. in detail and informs the user of the result, and analyzes the pronunciation information transmitted from the client by applying to the server and client system using the open network of the Internet to point out incorrect pronunciation and The present invention relates to a method and system for learning English using speech recognition technology that provides statistical data, including pronunciation accuracy, stress, intonation, and speed, so that client users can improve their learning ability and learning achievement.

본 발명에 따른 영어 발음 학습 방법은 종래의 유사 시스템에 비해 각 음소별로 발음의 정확도를 평가 및 진단하는 것이 가능하므로, 보다 효율적인 영어 발음 학습이 가능하며, 틀리게 발음한 부분에 대하여 보다 정확한 발음 교정 기능을 제공함으로 시스템의 신뢰도가 향상되는 효과가 있다.In the English pronunciation learning method according to the present invention, it is possible to evaluate and diagnose the accuracy of pronunciation for each phoneme compared to the conventional similar system, so that more efficient English pronunciation learning is possible, and more accurate pronunciation correction function for wrong pronunciation parts By providing this, the reliability of the system is improved.

그리고, 사용자의 발음의 정확도 뿐 아니라 강세, 억양, 속도 등에 대한 폭넓은 평가가 가능하므로 종합적인 발음 학습에 효과가 있고, 학습 결과에 대하여 통계화된 자료는 사용자의 발음 향상 정도를 가늠하는 정보를 제공함으로써 음소별, 자질별로 통계를 사용자가 모니터링하여 향상 정도를 확인하고 학습 의욕을 고취시킬 수 있는 효과가 있다.In addition, it is effective for comprehensive pronunciation learning as it enables not only the accuracy of the user's pronunciation but also a wide evaluation of stress, intonation and speed. By providing statistics, users can monitor statistics by phonemes and qualities to check the degree of improvement and inspire motivation for learning.

Description

음성 인식 기술을 이용한 영어 발음 학습 방법 및 시스템{Method and system for english pronunciation study based on speech recognition technology}Method and system for learning English pronunciation using speech recognition technology {Method and system for english pronunciation study based on speech recognition technology}

본 발명은 음성 인식 기술을 이용한 영어 학습 방법 및 시스템에 관한 것으로서, 구체적으로 개인용 컴퓨터 혹은 기타 단말기 상에서 학습자가 스스로 영어의 발음을 입력하면 부정확한 발음에 대한 위치를 음소 혹은 음절 단위로 지적하여 사용자에게 알려줌과 동시에 발음의 정확도, 강세, 억양, 속도 등을 자세히 분석하여그 결과를 사용자에게 알려주며, 개방형 네트워크인 인터넷을 이용한 서버와 클라이언트 시스템에 응용되어 클라이언트로부터 전송되는 발음 정보를 분석하여 부정확한 발음에 대한 지적 및 발음의 정확도, 강세, 억양, 속도를 포함하여, 통계 데이터 등을 제공하여 클라이언트 사용자의 학습능력 향상과 학습 성취도를 확인시켜줄 수 있는 음성 인식 기술을 이용한 영어 학습 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for learning English using speech recognition technology. Specifically, when a learner inputs an English pronunciation by himself / herself on a personal computer or other terminal, the user can point out a location for an incorrect pronunciation in a phoneme or syllable unit. At the same time, it analyzes the pronunciation accuracy, stress, intonation, speed, etc. in detail, and informs the user of the result, and analyzes the pronunciation information transmitted from the client by applying it to the server and client system using the Internet, which is an open network. The present invention relates to a method and system for learning English using speech recognition technology that provides statistical data, including accuracy, stress, intonation, and speed of intellectual and pronunciation, to confirm the improvement of learning ability and achievement of a client user.

일반적으로 현대인들은 산업의 전문화와 국제화의 추세에 따라 제 2 외국어에 대한 관심이 많아지는 경향을 보이는데, 이러한 추세에 대응하기 위해 어학용 학습기나, 다양한 어학용 프로그램들이 개발되고 있는 실정이다.In general, modern people tend to be interested in the second foreign language according to the trend of industrialization and internationalization. In order to cope with this trend, language learners and various language programs are being developed.

특히 어학용 프로그램은 인식하고자 하는 인식 대상 어휘를 미리 결정해놓고 결정된 인식 대상 어휘중의 하나 또는 몇 개를 사용자가 발성하면, 입력된 음성이 미리 등록된 어휘중 어느 것에 가장 가까운지를 찾아내어 출력하도록 구성되며, 이러한 결과를 정오(正誤) 혹은 점수 형태로 나타내게 되며, 사용자는 이 결과로 자신의 발음의 정확도 여부를 판단하는 것이 가능해진다.In particular, the language program is configured to determine the vocabulary to be recognized in advance and, when the user speaks one or several of the determined vocabulary words, finds out which of the vocabulary words is closest to the pre-registered vocabulary. This result is represented in the form of noon or score, and the user can determine whether the pronunciation of his or her pronunciation is correct.

이러한 형태의 평가 방법은 해당되는 단어나 문장의 음향학적 모델에 대한 발음한 음성 신호의 확률값을 구하고 원어민 화자에 대한 확률 값으로부터 미리 구해진 문턱값과의 비교를 통해 발음의 정확도를 계산함으로써 이루어지며 자세한 과정은 다음과 같이 대략적으로 특징 추출 과정, 훈련 과정, 평가과정으로 이루어진다.This type of evaluation method is performed by calculating the probability of pronunciation of the spoken speech signal for the acoustic model of the corresponding word or sentence and calculating the accuracy of the pronunciation by comparing it with a previously obtained threshold from the probability value for the native speaker. The process is roughly composed of feature extraction process, training process and evaluation process as follows.

구체적으로 특징 추출 과정은 음성 신호를 인식 및 훈련 과정에 사용하게 될 특징 데이터로 변환하는 과정으로서 일반적으로 인간의 청각 기관을 고려한 방식인MFCC(Mel Frequency Cepstral Coefficients, 이하,"MFCC"라 함)특징 추출 방법이 사용되며 약 20-30msec 구간의 샘플로부터 약 30-40개의 특징 데이터로 변환된다.Specifically, the feature extraction process is a process of converting a speech signal into feature data that will be used for recognition and training. It is a MFCC (Mel Frequency Cepstral Coefficients, hereinafter referred to as "MFCC") method that generally considers human hearing organs. An extraction method is used and is converted from about 20-30 msec interval samples to about 30-40 feature data.

참고로 MFCC란 일정한 밴드내에서는 그 사이의 여러 밴드가 합쳐진 소리는 그 밴드의 center frequency에서의 동일 크기의 음과 동일한 소리로 들리는 효과를 고려한 주파수 표현법을 말하는 Mel Frequency의 개념과, 음성같은 신호의 특징 벡터인 LPC를 화자의 변동에 따른 변화와 무관하게 robust하게 유지하는데 도움을 주는 벡터의 표현 방법으로서 일반적으로 FFT과정을 취한 후, LOG후 다시 FFT를 취하면 Cepstral Coefficients가 만들어지는 Cepstrum의 개념이 합쳐진 것이다.For reference, MFCC refers to the concept of Mel Frequency, which refers to the frequency expression that considers the effect that the sound of multiple bands in a certain band is the same sound with the same size at the center frequency of the band, As a vector expression method that helps to maintain the feature vector LPC robustly regardless of the change of the speaker, the concept of Cepstrum, in which the Cepstral Coefficients are made when FFT is performed after LOG, is generally performed It is combined.

그리고, 훈련 과정은 데이터를 수집하고, 훈련 단위(단어, 음소 등)에 대해서 각 훈련 샘플들의 특징 데이터를 확률적으로 가장 잘 표현할 수 있도록 하는 확률 모델을 생성하는 것이며, 일반적으로 확률 모델은 은닉 마르코프 모델(HMM, Hidden Markov Model, 이하, "HMM"라 함)을 사용하며 Baum-Welch 기법이나 segmental k-means 기법을 통해서 확률 모델을 얻는다.The training process collects data and generates a probability model that can best represent the feature data of each training sample for the training unit (words, phonemes, etc.). In general, the probability model is a hidden Markov. A model (HMM, Hidden Markov Model, hereinafter referred to as "HMM") is used and a probability model is obtained through the Baum-Welch technique or the segmental k-means technique.

이때, 원어민 발음에 대한 확률 모델을 구하고 평가할 발음에 대한 문턱값을 미리 정하여 놓는다.At this time, a probabilistic model for native speaker pronunciation is obtained and a threshold value for the pronunciation to be evaluated is determined in advance.

마지막으로 평가과정은 사용자가 발음한 음성 신호를 특징 데이터로 변환하고 발음한 단어의 확률 모델에 대한 확률값을 구하는 과정이며, 훈련 과정에서 미리 정한 문턱값과의 비교를 통해 발음의 유사도를 나타내는 평가 결과를 출력한다.Finally, the evaluation process is a process of converting a speech signal pronounced by a user into feature data and obtaining a probability value for a probabilistic model of a pronounced word. An evaluation result indicating the similarity of pronunciation through comparison with a predetermined threshold value in a training process Outputs

이와 같은 종래의 방법은 일반적인 음성 인식 과정과 동일한 형태로서 발음한 단어 혹은 문장 전체에 대한 원어민 발음과의 유사성을 나타내주게 되지만, 현재까지의 방법은 단어를 이루고 있는 음절이나 음소 각각에 대한 발음의 정확도에 대한 분석과 비교가 세밀하게 이루어지지 못하여 발음의 특징들이 관측될 확률을 보다 정확하게 나타내지 못하였고, 어떤 상태에 속한 프레임이 많아질수록 그 프레임으로부터 구해지는 특정 벡터들이 많아져 관측될 확률 분포가 다양해짐에 따라 일정한 가지 수로 훈련시키는 인식과정에 있어서 확률값에 영향을 미쳐 다른 단어로 잘못 인식하는 커다란 문제점을 초래하였다.Such a conventional method has the same form as a general speech recognition process and shows similarity with the native speaker's pronunciation of the whole word or sentence, but the method up to now is the accuracy of the pronunciation of each syllable or phoneme. Because the analysis and comparison of the results were not carried out in detail, the characteristics of the pronunciation could not be represented more accurately. The more frames belonging to a certain state, the more specific vectors obtained from those frames. As it became more and more, it affected the probability value in the recognition process of training with a certain number of points, which caused a big problem of misrecognition as another word.

그리고, 발음사전을 자동으로 생성하기 위한 기술들은 다루는 언어에 따라서 그 대처 방법이 달라지는데, 예를 들어 한국어의 경우에는 각 단어의 발음사전이 10여 가지의 기본 발성 규칙 및 몇 가지의 예외 규칙에 의하여 대부분 적절히 생성시킬 수 있어, 이 규칙들에 의하여 표현할 수 없는 발음 사전은 예외 발음사전으로 가지고 있으면 거의 모든 어휘, 특히 고유명사 등에 대하여 정확한 발음 사전을 생성시킬 수 있는데 반해, 영어의 경우는 몇 가지 규칙에 의하여 임의의 어휘에 대한 정확한 발음사전을 생성하는 것이 불가능하여 종래에는 10만 단어 이상의 대규모 발음사전을 미리 구축해놓고, 이에 포함될 수 없는 고유명사나 신조어에 대해서는 그 사전을 갱신하거나 간단한 발성규칙으로 발음사전을 생성하였다.In addition, the techniques for automatically generating pronunciation dictions vary depending on the language being handled. For example, in Korean, the pronunciation dictionary of each word is determined by a dozen basic voice rules and a few exception rules. Most of them can be properly generated, and pronunciation dictionaries that cannot be represented by these rules can produce accurate pronunciation dictionaries for almost all vocabularies, especially proper nouns, if they have exception pronouncing dictionaries. It is impossible to generate an accurate pronunciation dictionary for any vocabulary. Therefore, a large pronunciation dictionary of 100,000 words or more is conventionally constructed, and the dictionary for proper nouns or new words that cannot be included in the dictionary is updated or the pronunciation dictionary is made by simple utterance rules. Produced.

구체적으로 종래의 영어 단어 발음사전 생성 방법은 크게 두 가지로 분류되는데, 하나는 몇 가지의 발성 규칙을 프로그램화하여 생성하는 방법이고, 또 하나는 음성 합성(speech synthesis)에 주로 사용하는 방법으로 각 음소의 조음 특징(articulatory features)을 실험 음성학적인 지식을 기반으로 정의하여 이를 토대로 해당 단어의 자소 입력에 대하여 이에 해당하는 조음 특성을신경회로망(neural network)을 찾아내고 이를 다시 해당 음소로 대응시키는 방법인데, 전자의 방법은 영어가 가지고 있는 발성의 다양성으로 인하여 몇 가지 규칙을 임의의 단어에 대한 정확한 발음사전을 생성하는데 한계가 있고, 후자의 방법은 부정확한 지식 및 이의 음소 대응관계에 기반을 두고 있어서 정확한 발음사전을 생성하기 어려운 문제가 있었다.Specifically, the conventional English word pronunciation dictionary generation method is largely classified into two types, one is to generate a program by generating a few voice rules, the other is a method mainly used for speech synthesis (each speech) Based on the experimental phonetic knowledge, the articulation features of the phoneme are defined based on the phonetic knowledge of the phoneme. In the former method, due to the variety of vocalizations of English, some rules are limited in generating an accurate pronunciation dictionary for any word. The latter method is based on inaccurate knowledge and its phonetic correspondence. There was a problem that it is difficult to generate the correct pronunciation dictionary.

특히, 영어의 발음 평가에 있어서 중요한 요소인 강세, 억양 등에 대해 자세하게 분석 및 평가를 제공하지 못하였으며, 평가 결과에 대한 통계자료가 정확하고 세분화되어 제공되지 못하여 학습 의욕이 저하되는 문제가 있었다.In particular, it failed to provide detailed analysis and evaluation of stress, intonation, etc., which are important factors in the pronunciation evaluation of English, and there was a problem that the motivation for learning was deteriorated because statistical data on the evaluation result could not be provided accurately and granularly.

본 발명은 상술한 바와 같은 문제를 해결하기 위해 안출된 것으로서, 본 발명의 목적은 각각의 음절이나 음소에 대한 세부적인 발음의 정확도를 지적하는 것이 가능하고, 더 나아가서는 강세, 억양, 속도와 같은 발음 자질에 대한 평가 역시 수행하는 것이 가능하여 사용자가 자신의 발음에 대한 진단을 자세하게 수행할 수 있는 영어 발음 학습 시스템을 제공함에 있다.The present invention has been made to solve the above-described problems, the object of the present invention is to point out the accuracy of the detailed pronunciation of each syllable or phoneme, and furthermore, such as stress, intonation, speed It is also possible to perform an evaluation on pronunciation qualities, so that the user can perform an English pronunciation learning system that can perform a detailed diagnosis of his pronunciation.

본 발명의 다른 목적은 사용자의 발음에 대한 평가를 수행하고, 평가결과를 통계화하여 제공할 수 있는 영어 학습 시스템을 제공함에 있다.Another object of the present invention is to provide an English learning system that can evaluate the pronunciation of a user and provide statistical results of the evaluation.

본 발명의 또 다른 목적은 후술될 구성 및 작용에서 더욱 상세히 설명될 것이다.Another object of the present invention will be described in more detail in the configuration and operation to be described later.

도 1은 본 발명에 따른 MFCC(Mel Frequency Cepstral Coefficients) 특징 벡터로의 변환 과정을 나타내는 순서도.1 is a flowchart illustrating a process of converting a Mel Frequency Cepstral Coefficients (MFCC) feature vector according to the present invention.

도 2는 본 발명에 따른 Baum-Welch 알고리즘 혹은 Segmental k-means 알고리즘을 이용하여 HMM 음향 모델 훈련 과정을 실시하는 순서도.Figure 2 is a flow chart for performing the HMM acoustic model training process using the Baum-Welch algorithm or Segmental k-means algorithm according to the present invention.

도 3은 정확도 및 오류 부분 지적, 강세, 억양, 속도 등의 종합적인 평가 과정을 수행하는 순서도.3 is a flow chart for performing a comprehensive evaluation process, such as pointing out accuracy and error, accent, intonation, speed, and the like.

도 4는 본 발명의 다른 실시예를 나타내는 예시도.4 is an exemplary view showing another embodiment of the present invention.

도 5는 본 발명의 또 다른 실시예를 나타내는 예시도.5 is an exemplary view showing another embodiment of the present invention.

본 발명에 따른 음성 인식 기술을 이용한 영어 발음 학습 방법은 음성신호를입력받는 단계; 입력받은 음성신호로부터 특징을 추출하여 특징 벡터로 변환하며, 이산 코사인 변환 단계에서 적어도 20항 이상의 차수 항까지 분석하여 운율정보가 포함되는 특징 벡터를 추출하는 단계; 각 음소의 음향 모델을 위해서 연속 분포의 은닉 마르코프 모델을 사용하여 음향 모델을 훈련시키는 단계; 사용자의 발음을 입력받아 유사 음소 리스트를 이용하여 생성한 후보 음소열에 대한 확률값의 상대적인 비교를 수행하여 정확도 및 오류 부분을 지적하는 단계; 생성한 후부 음소열애 대응하는 훈련된 은닉 마르코프 모델을 적용하여 입력된 음성 신호에 대한 확률값을 계산하여 순위를 결정하고 올바른 음소열의 순위와 후보 음소열의 분포를 조사하여 최종적인 평가결과를 생성하는 단계; 및 사용자가 발음한 단어나 문장에서 각 음소별로 평가결과를 통계화하는 단계를 포함함을 특징으로 한다.English pronunciation learning method using the speech recognition technology according to the present invention comprises the steps of receiving a voice signal; Extracting a feature from an input voice signal and converting the feature into a feature vector, and extracting the feature vector including rhyme information by analyzing at least 20 or more order terms in a discrete cosine transform step; Training an acoustic model using a hidden Markov model of continuous distribution for the acoustic model of each phoneme; Inputting a user's pronunciation and performing a relative comparison of probability values with respect to candidate phoneme sequences generated using a similar phoneme list to indicate an accuracy and an error part; Generating a final evaluation result by calculating a probability value for the input speech signal by applying a trained hidden Markov model corresponding to the generated rear phoneme sequence and investigating the rank of the correct phoneme sequence and the distribution of the candidate phoneme sequence; And it characterized in that it comprises the step of statistic evaluation result for each phoneme in a word or sentence pronounced by the user.

바람직한 발음에 대한 정확도 및 오류 부분 지적단계 이후에 사용자의 발음이 다중 음절일 때 하나의 음절에 강세 모델을 대응하고 나머지 음절에는 비강세 모델을 대응하는 방법으로 음절 갯수 만큼의 후부를 생성하여, 입력된 음성 신호에 대해 후보들에 대한 확률값을 계산하여 가장 높은 확률값을 갖는 후보의 강세 모델 위치를 사용자가 발음한 강세의 위치로 판단하는 강세 판단 단계가 더 포함될 수 있으며, 사용자가 발음한 음성에서 추출한 에너지, 피치의 패턴과 원어민의 발음의 에너지, 피치 패턴과 비교하여 올바른 억양인지를 판단하거나 훈련 데이터 베이스로부터 유한 개의 억양 패턴 모델을 구하고 이의 조합에 의한 상대적인 평가로 억양의 정확도를 평가하는 억양평가단계가 더 포함될 수 있으며, 사용자의 발음과 원어민의 발음의 지속 길이를 비교하여 속도의 적절성을 판단하며, 음성 구간을 자동으로 분할하여 음소나 단어 단위등의 구간별로 비교하여 보다 정확한 속도 평가를 제공하는 속도 평가 단계가 더 포함될 수 있다.After the user's pronunciation is multiple syllables after the accuracy and error part of the preferred pronunciation, the stressed model is corresponded to one syllable and the unstressed model is corresponded to the other syllables. The accent determination step of calculating a probability value for the candidates with respect to the speech signal and determining the accent model position of the candidate having the highest probability value as the accent position pronounced by the user may further include an energy extracted from the speech pronounced by the user. In contrast, the intonation evaluation step determines whether the correct intonation is compared by comparing the pattern of the pitch, the energy of the native speaker, and the pitch pattern, or obtains a finite accent pattern model from the training database, and evaluates the accuracy of the intonation by the relative evaluation by the combination. It may further include the user's pronunciation and native speaker's pronunciation Compare length determining the suitability of the speed, is automatically split into a voice section compares each section, such as a phoneme or word by word to the speed evaluation method comprising: providing a more accurate assessment the speed be further included.

그리고, 발음의 정확도 평가를 위해서는 기존의 MFCC(Mel Frequency Cepstral Coefficients) 추출법을 사용하며, 강세의 평가를 위해서는 이를 일부 변형한 특징 추출 방법으로서, 이산 코사인 변환 단계에서 보다 높은 차수 항까지 분석함으로써 운율 정보가 포함되는 특징 벡터를 추출하도록 구성된다.In order to evaluate the accuracy of the pronunciation, the existing MFCC (Mel Frequency Cepstral Coefficients) extraction method is used, and in order to evaluate the stress, it is a modified feature extraction method that is partially modified, and analyzes the rhyme information by analyzing higher order terms in the discrete cosine transformation step. Is configured to extract a feature vector to be included.

그리고, 상기 음향 모델을 훈련하는 과정에서 각 음소의 음향 모델을 위해서 연속 분포의 은닉 마르코프 모델을 사용하며 일반적인 Baum-Welch 알고리즘과 Segmental k-means 알고리즘을 이용하여 음향 모델을 훈련시킨다. 이때, 강세 판단을 위한 음향 모델은 기존의 음소모델에서 강세를 받는 음소와 그렇지 않은 모델(강세/비강세 모델)로 확장하여 각각 훈련하도록 구성된다.In the training of the acoustic model, a continuous Markov model of continuous distribution is used for the acoustic model of each phoneme, and the acoustic model is trained using the general Baum-Welch algorithm and the Segmental k-means algorithm. At this time, the acoustic model for stress determination is configured to expand to the phonemes which are stressed in the existing phoneme model and the models that are not stressed (emphasis / non-emphasis model), respectively.

또한, 정확도 및 오류 부분의 지적을 위해서 각 음소에 대해 미리 제작되어 이용되는 유사 음소 리스트에서, 유사 음소는 훈련된 각 음소의 음향 모델간의 유사도를 측정함으로써 찾을 수 있으며 하나의 음소에 대해 유사도 정도에 따라 가장 유사한 음소부터 가장 상이한 음소까지의 리스트를 나열시킬 수 있도록 구성된다.In addition, in the list of similar phonemes that are pre-made and used for each phoneme for the purpose of accuracy and point of error, the similar phoneme can be found by measuring the similarity between acoustic models of each trained phoneme and the degree of similarity for one phoneme. Therefore, it is configured to list a list from the most similar phone to the most different phone.

그리고, 본 발명에 따라 음성 인식을 이용한 영어 발음 학습 시스템은 개방형 네트워크인 인터넷상의 사이트 URL을 가지며, 인터넷을 통해 접속한 클라이언트의 접속을 관리하도록 구성되며, 클라이언트측으로부터 제공되는 음성신호에 대한 음소별 비교, 분석을 수행하여 사용자의 발음에 대한 분석결과를 제공하도록 구성되며, 발음뿐만 아니라 발음의 정확도, 강세, 억양, 속도에 대한 분석을 수행하여결과를 제공하여, 이에 대한 통계 데이터 베이스를 생성하여 저장 및 전송하도록 구성되는 서버;와, TCP/IP 프로토콜을 이용하여 상기 서버측의 사이트 URL에 접속하여 발음에 대한 음성 신호를 전송하도록 구성되며, 서버로부터 제공되는 분석 결과 및 통계 데이터를 화면에 나타내어 사용자의 발음학습을 지원하는 클라이언트;로 구성되며, 서버측의 프로그램과 클라이언트측의 프로그램을 CD롬과 같은 하나의 저장매체에 저장하여 개별적인 컴퓨터 단말기에서 발음 학습이 가능하도록 구성될 수 있다.In addition, the English pronunciation learning system using speech recognition according to the present invention has a site URL on the Internet that is an open network, and is configured to manage access of a client connected through the Internet, and for each phoneme for a voice signal provided from the client side. Comparing and analyzing the pronunciation of the user by performing comparison and analysis, and to analyze the pronunciation, accent, accent, speed of pronunciation as well as pronunciation to provide the results, to generate a statistical database A server configured to store and transmit; and to access a site URL on the server side using a TCP / IP protocol, and to transmit a voice signal for pronunciation, and displaying analysis results and statistical data provided from the server on a screen. Client that supports pronunciation learning of users; consists of a server-side program The program and the client-side program may be stored in a single storage medium such as a CD-ROM to enable pronunciation learning in individual computer terminals.

이하, 첨부된 도면을 이용하여 본 발명의 바람직한 실시예에 대해 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 MFCC(Mel Frequency Cepstral Coefficients) 특징 벡터로의 변환 과정을 나타내는 개념도로서, 마이크로 입력받은 사용자의 음성 신호를 발음 평가에 사용될 특정 벡터로 변환하며, 발음의 정확도를 평가하기 위해서는 기존의 MFCC추출법을 사용하며, 강세의 평가를 위해서는 이를 일부 변환한 특징 추출방법을 이용하였다.1 is a conceptual diagram illustrating a process of converting a Mel Frequency Cepstral Coefficients (MFCC) feature vector according to an embodiment of the present invention. The existing MFCC extraction method is used, and the feature extraction method that is partially converted is used for the evaluation of the strength.

특히, 강세의 판단을 위해서 이산 코사인 변환단계에서 보다 높은 차수의 항, 예를 들어 적어도 20항 이상의 차수항까지 분석함에 따라 운율정보가 포함되도록 구성된다.In particular, in order to determine the strength, it is configured to include rhyme information according to higher order terms, for example, at least 20 or more order terms in the discrete cosine transformation step.

도 2는 본 발명에 따른 Baum-Welch 알고리즘 혹은 Segmental k-means 알고리즘을 이용하여 HMM 음향 모델 훈련 과정을 실시하는 순서도로서, 본 발명의 특징인 음소별로 음향 모델을 훈련시키고, 강세 판단을 위해 음향모델은 기존의 음소모델에서 강세를 받는 음소와 그렇지 않는 모델로 확장하여 각각 훈련시킴에 따라 음소별로 틀린 부분을 지적하거나 각종 강세, 운율, 억양 등에 대한 정확한 지적 및 평가가 이루어질 수 있다.FIG. 2 is a flow chart for performing an HMM acoustic model training process using a Baum-Welch algorithm or a segmental k-means algorithm according to the present invention. In the existing phoneme model, the stressed phonemes and those that are not extended to each model can be trained to point out the wrong parts for each phoneme, and various stresses, rhymes and intonations can be accurately pointed out and evaluated.

도 3은 발음의 정확도, 오류 부분 지적, 강세, 억양, 속도 등과 같은 발음 평가의 종합적인 과정을 수행하는 순서도로서, 보다 구체적으로는 다음과 같다.FIG. 3 is a flowchart for performing a comprehensive process of pronunciation evaluation, such as pronunciation accuracy, error part indication, stress, intonation, speed, and the like, more specifically as follows.

먼저 발음의 정확도 및 오류 부분 지적은 유사 음소 리스트를 이용하여 생성한 후보 음소열에 대한 확률값의 상대적인 비교로 이루어지게 되며, 이때의 유사 음소는 훈련된 각 음소의 음향 모델간의 유사도를 측정함으로써 찾을 수 있으며, 하나의 음소에 대해 유사도 정도에 따라 가장 유사한 음소부터 가장 상이한 음소까지의 리스트를 나열시킬 수 있다.First, the accuracy of the pronunciation and the point of error are made by comparing the probability values with respect to the candidate phoneme sequences generated using the similar phoneme list. The similar phoneme can be found by measuring the similarity between acoustic models of each phoneme trained. For example, one phoneme may list a list from the most similar phone to the most different phoneme according to the degree of similarity.

상술한 바와 같이 생성된 후보 음소열을 이용한 정확도 및 오류 부분 지적은 평가의 대상이 되는 발음의 음소열을 기준으로 유사 음소 리스트를 이용하여 각 음소에 대해 가장 유사한 음소부터 차례대로 하나씩 대치한 후보 음소열들을 생성하며, 훈련된 HMM을 적용하여 입력된 음성 신호에 대한 확률값을 계산하여 순위를 결정하고 올바른 음소열의 순위와 후보 음소열의 분포를 조사하여 최종적인 평가결과를 생성한다.The accuracy and error part using the candidate phoneme string generated as described above is based on the phoneme string of the pronunciation to be evaluated, using candidate phoneme lists. Columns are generated, and the trained HMM is applied to calculate the probability value of the input speech signal to determine the ranking, and the final evaluation result is generated by examining the rank of the correct phoneme string and the distribution of the candidate phoneme strings.

오류 발음 부분 지적은 올바른 음소열이 상위 순위에 위치하고, 상위 순위에 특정 위치의 음소가 대체된 후보 음소열들이 주로 분포할 때 그 위치의 발음이 틀린 것으로 간주하여 지적하게 된다.The error pronunciation point is pointed out that the pronunciation of the position is wrong when the correct phoneme string is located in the upper rank and candidate phoneme strings in which the phoneme of the specific position is replaced in the upper rank are mainly distributed.

강세와 억양의 평가는 훈련 단계에서 얻은 강세/ 비강세 음향 모델을 이용하며, 다중 음절의 단어일 때 하나의 음절에 강세 모델을 대응하고 나머지 음절에는 비강세 모델을 대응하는 방법으로 음절 개수 만큼의 후보를 생성하며, 입력된 음성 신호에 대해 후보들에 대한 확률값을 계산하여 가장 높은 확률값을 갖는 후보의 강세 모델 위치를 사용자가 발음한 강세의 위치로 판단한다.The accent and accent sound model is used to evaluate stress and accent in the training stage, and when the word is multi-syllable, the accent model corresponds to one syllable and the non-emphasis model corresponds to the number of syllables. The candidate is generated, and a probability value for the candidates is calculated for the input speech signal, and the position of the accent model of the candidate having the highest probability value is determined as the accent position pronounced by the user.

사용자가 발음한 음성에서 추출한 에너지, 피치의 패턴과 원어민의 발음 에너지, 피치 패턴과 비교하여 올바른 억양인지를 판단하며, 훈련 데이터 베이스로부터 유한 개의 억양 패턴 모델을 구하여 이의 조합에 의한 상대적인 평가로 억양의 정확도를 평가하게 된다.It is compared with the energy extracted from the user's pronunciation voice, the pattern of the pitch and the pronunciation energy of the native speaker, and the pitch pattern to determine whether it is the correct accent, and obtains a finite accent pattern model from the training database. Evaluate the accuracy.

또한, 속도의 평가는 사용자의 발음과 원어민의 발음의 지속길이를 비교하여 속도의 적절성을 판단하게 되며, 음성 구간을 자동으로 분할하여 음소, 단어 단위 등의 구간별로 비교하여 보다 정확한 속도 평가가 가능해진다.In addition, the speed evaluation is to determine the appropriateness of the speed by comparing the user's pronunciation and the duration of the native speaker's pronunciation, and it is possible to evaluate the speed more accurately by automatically segmenting the voice sections and comparing them by phoneme, word unit, etc. Become.

도 4는 본 발명의 다른 실시예를 나타내는 예시도로서, 본 발명의 개념을 개방형 네트워크인 인터넷에 접목시킨 것으로서, 인터넷상의 사이트 URL을 가지며, 일례로 TCP/IP 프로토콜을 이용하여 후술될 클라이언트와 데이터를 송, 수신하는 서버에서는 사용자를 접속 관리하며, 학습 결과를 통계화하며, 학습 내용을 갱신하는 프로그램이 메모리에 저장되어 그 각각의 기능을 수행하도록 구성되며, 인터넷을 통해 클라이언트측에서 상술한 프로그램이 다운로드될 수 있도록 구성된다.4 is an exemplary diagram showing another embodiment of the present invention, in which the concept of the present invention is incorporated into the Internet, which is an open network, having a site URL on the Internet, for example, a client and data to be described later using the TCP / IP protocol. The server for transmitting and receiving the data is configured to access and manage the user, to statistics the learning results, and to update the contents of the learning, which are stored in a memory to perform their respective functions. It is configured to be downloaded.

서버와 마찬가지의 일례로 TCP/IP 프로토콜을 사용하여 인터넷상의 서버측 사이트 URL에 접속하는 클라이언트는 서버측에서 제공하는 해당 홈페이지에 접속하거나 클라이언트측에 설치된 프로그램을 통해 접속하여 사용자 로그온하여 평가 프로그램을 시작할 수 있도록 구성된다.As an example of a server, a client accessing a server-side site URL on the Internet using the TCP / IP protocol can access the corresponding homepage provided by the server or log on through a program installed on the client to start the evaluation program. It is configured to be.

서버와 클라이언트에 설치되는 프로그램에 대해 보다 구체적으로 설명하면 다음과 같다.The following describes the programs installed on the server and the client in more detail.

먼저, 서버 프로그램은 사용자의 접속과 정보를 관리하고 학습 내용을 클라이언트 프로그램으로 전달하는 역할을 하고, 사용자 로그온이 이루어지면 사용자의 지난 학습 통계와 진행 가능한 학습 내용을 전달해주며, 클라이언트 프로그램이 종료되면 최종 학습 결과를 전달받아 학습 통계를 새로 갱신하도록 구성된다.First, the server program manages the user's access and information, and delivers the learning contents to the client program. When the user logs on, the server program delivers the user's past learning statistics and progression. It is configured to receive training results and update the training statistics.

그리고, 동시에 접속하는 다중 사용자를 관리하기 위한 최적의 구조로 이루어지며, 추가적인 학습 내용의 갱신이 가능하여 다양한 내용과 종류의 학습물을 선택할 수 있도록 구성된다.In addition, it is made of an optimal structure for managing multiple users accessing at the same time, it is possible to update the additional learning content is configured to select a variety of content and types of learning.

또한, 서버 프로그램 역시 사용자의 발음에 대한 음소별 확률값에 대한 비교, 분석이 수행되고, 아울러 강세 및 억양 등에 대한 분석 및 지적이 가능하다.In addition, the server program also compares and analyzes the phoneme probability values of the user's pronunciations, and also analyzes and points out stresses and intonations.

클라이언트 프로그램은 학습 내용을 서버로부터 전달받아 발음을 평가하고 결과를 서버로 전달하는 기능을 수행하도록 구성되고, 클라이언트측의 사용자가 홈페이지나 클라이언트 프로그램을 통해서 단어, 문장 또는 단계 등과 같은 학습 내용을 선택하면 서버로부터 해당 학습 내용을 전달받을 수 있도록 구성되며, 이때 전달받는 학습 내용에는 학습할 단어 혹은 문장에 대한 평가 정보, 그림 파일, 원어민 발음 파일 등이 포함되며, 사용자의 지시에 따라서 학습 내용을 표시해주고 발음을 입력 받도록 구성된다.The client program is configured to receive the learning content from the server, evaluate the pronunciation, and deliver the result to the server. When the user of the client side selects the learning content such as a word, sentence or step through the homepage or the client program, It is configured to receive the corresponding learning contents from the server, and the received learning contents include evaluation information on the words or sentences to be learned, picture files, native speaker pronunciation files, and display the learning contents according to the user's instructions. It is configured to receive a pronunciation input.

입력받은 음성신호는 인터넷을 통해 서버측으로 전달되어 분석되거나, 클라이언트 측에서 서버로부터 다운로드받은 프로그램으로 자체적으로 분석할 수 있도록 구성되며, 분석된 음성신호는 정확도, 틀린 부분, 강세, 억양, 속도 등에 대한 평가결과가 발생하게 되어 클라이언트측의 사용자가 발음한 음성신호에 대한 평가결과 인식의 자료로 이용된다.The input voice signal is transmitted to the server through the internet for analysis, or configured to be analyzed by the program downloaded from the server on the client side. The analyzed voice signal is analyzed for accuracy, wrong part, stress, intonation, speed, etc. The evaluation result is generated and used as the data for recognition of the evaluation result of the voice signal pronounced by the user on the client side.

이때의 결과 표시방식은 일례로 각 요소에 대해 점수화하여 알려주고 틀린 부분에 대해서는 위치를 표시해주는 방식이 사용되며, 사용자는 발음에 대한 반복적인 교정학습과 음소별 분석에 따른 보다 정확한 발음 학습이 수행될 수 있다.At this time, the result display method is, for example, by scoring the score for each element, and the method for displaying the position for the wrong part is used, the user is to repeat the corrective learning for pronunciation and pronunciation learning by more accurate phoneme analysis Can be.

도 5는 본 발명의 또 다른 실시예를 나타내는 예시도로서, 도 4와 같이 서버와 클라이언트로 구성되던 구성을 개별적인 PC에서 단독으로 수행할 수 있도록 구성한 실시예이다.FIG. 5 is an exemplary view illustrating still another embodiment of the present invention, in which the server and client components as shown in FIG. 4 are configured to be individually performed by individual PCs.

도 5에 따르면 본 발명에 따른 프로그램이 CD롬과 같은 저장매체에 기록되어 있으며, 도 4의 실시예에서 서버 프로그램의 기능을 담당하던 부분과 클라이언트 프로그램의 기능을 담당하던 부분을 포함하는 인터페이스 부분으로 이루어진다.According to FIG. 5, a program according to the present invention is recorded in a storage medium such as a CD-ROM. In the embodiment of FIG. 4, the program includes an interface part including a part serving as a server program and a part serving as a client program. Is done.

도 5의 실시예에서의 각각의 기능은 인터넷 기반의 각 기능과 동일하지만, 학습 내용의 전달, 학습 결과의 전달이 인터넷이 아닌 프로그램이 설치된 PC안에서 단독으로 동작하는 차이가 있다. 학습 내용은 별도의 CD로 지속적인 제공이 가능하다.Each function in the embodiment of FIG. 5 is the same as each function based on the Internet, but there is a difference that the delivery of learning contents and the delivery of learning results operate independently in a PC in which a program is installed, not the Internet. Learning content can be provided continuously on a separate CD.

도 1 내지 도 5의 실시예들에 나타난 본 발명에 따르면 사용자는 적게는 1000개에서 많게는 2000개의 단어를 학습할 수 있으며, 200에서 300여 가지의 회화를 학습할 수 있다.According to the present invention shown in the embodiments of FIGS. 1 to 5, the user may learn at least 1000 to as many as 2000 words, and may learn about 200 to 300 conversations.

그리고, 본 발명에 따른 성능과 기능을 분석해본 결과, 동시 접속 클라이언트 수가 100명 이상이고, 발음 오류 부분에 대한 지적 성공률이 95%이상이며, 상이 발음 오류 거부률이 98%이상, 강세 및 기타 자질 평가 성공률이 90%이상으로 나타나 탁월한 어학 학습 효과를 나타내는 것을 알 수 있다.As a result of analyzing the performance and function according to the present invention, the number of simultaneous access clients is 100 or more, the intellectual success rate for the pronunciation error part is 95% or more, the different pronunciation error rejection rate is 98% or more, stress and other qualities. The evaluation success rate is over 90%, which shows the excellent language learning effect.

상술한 바와 같이 본 발명의 바람직한 실시예에 대해 상세히 설명하였지만, 본 발명의 분야에 속하는 통상의 지식을 가진 자라면 본 발명의 정신 및 범위를 벗어나지 않는 범위 내에서 본 발명을 얼마든지 변형 또는 변경하여 실시할 수 있음을 잘 알것이다.As described above, the preferred embodiments of the present invention have been described in detail, but those skilled in the art may change or change the present invention as many as possible without departing from the spirit and scope of the present invention. It will be appreciated that it can be done.

본 발명에 따른 영어 발음 학습 방법은 종래의 유사 시스템에 비해 각 음소별로 발음의 정확도를 평가 및 진단하는 것이 가능하여 보다 효율적인 영어 발음 학습이 가능하며, 틀리게 발음한 부분에 대하여 보다 정확한 발음 교정 기능을 제공함으로 시스템의 신뢰도가 향상되는 효과가 있다.English pronunciation learning method according to the present invention is able to evaluate and diagnose the accuracy of pronunciation for each phoneme compared to the conventional similar system, it is possible to learn English pronunciation more efficiently, more accurate pronunciation correction function for the wrong pronunciation By providing a system, the reliability of the system is improved.

Claims

음성 인식 기술을 이용한 영어 발음 학습 방법에 있어서,In the English pronunciation learning method using the speech recognition technology,

음성신호를 입력받는 단계;Receiving a voice signal;

상기 입력받은 음성신호로부터 특징을 추출하여 특징 벡터로 변환하며, 이산 코사인 변환 단계에서 고차의 차수항까지 분석하여 운율정보가 포함되는 특징 벡터를 추출하는 단계;Extracting a feature from the input speech signal and converting the feature into a feature vector, and extracting the feature vector including the rhyme information by analyzing the higher order terms in the discrete cosine transformation step;

각 음소의 음향 모델을 위해서 연속 분포의 은닉 마르코프 모델을 사용하여 음향 모델을 훈련시키는 단계;Training an acoustic model using a hidden Markov model of continuous distribution for the acoustic model of each phoneme;

사용자의 발음을 입력받아 유사 음소 리스트를 이용하여 생성한 후보 음소열에 대한 확율값의 상대적인 비교를 수행하여 정확도 및 오류 부분을 지적하는 단계;Inputting a user's pronunciation and performing a relative comparison of probability values with respect to candidate phoneme strings generated using a similar phoneme list to indicate an accuracy and an error part;

생성한 후보 음소열에 대응하는 훈련된 은닉 마르코프 모델을 적용하여 입력된 음성 신호에 대한 확율값을 계산하여 순위를 결정하고 올바른 음소열의 순위와 후보 음소열의 분포를 조사하여 최종적인 평가결과를 생성하는 단계; 및Applying a trained hidden Markov model corresponding to the generated candidate phoneme sequence to calculate the probability value of the input speech signal to determine the rank, and investigating the rank of the correct phoneme sequence and the distribution of the candidate phoneme sequence to generate the final evaluation result ; And

상기 입력된 사용자의 발음에서 각 음소별로 평가결과를 통계화하는 단계를 포함함을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.English pronunciation learning method using speech recognition, comprising the step of statistically evaluating the evaluation result for each phoneme in the pronunciation of the user input.

제 1항에 있어서,The method of claim 1,

상기 발음에 대한 정확도 및 오류 부분 지적단계 이후에 사용자의 발음이 다중 음절일때 하나의 음절에 강세 모델을 대응하고 나머지 음절에는 비강세 모델을 대응하는 방법으로 음절 갯수 만큼의 후부를 생성하여, 입력된 음성 신호에 대해 후보들에 대한 확률값을 계산하여 가장 높은 확률값을 갖는 후보의 강세 모델 위치를 사용자가 발음한 강세의 위치로 판단하는 강세 판단 단계가 더 포함됨을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.After the accuracy and error portion of the pronunciation step, when the user's pronunciation is multiple syllables, the rear end of the number of syllables is generated by corresponding to the stressed model in one syllable and the non- stressed model in the remaining syllables. A method of learning English pronunciation using speech recognition, comprising: a stress determination step of calculating a probability value for candidates with respect to a speech signal and determining the position of the candidate's accent model having the highest probability value as the position of accentuated by the user .

제 1항에 있어서,The method of claim 1,

상기 발음에 대한 정확도 및 오류 부분 지적단계 이후에 사용자가 발음한 음성에서 추출한 에너지, 피치의 패턴과 원어민의 발음의 에너지, 피치 패턴과 비교하여 올바른 억양인지를 판단하거나 훈련 데이터 베이스로부터 유한 개의 억양 패턴 모델을 구하고 이의 조합에 의한 상대적인 평가로 억양의 정확도를 평가하는 억양평가단계가 더 포함됨을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.After the accuracy and error portion of the pronunciation point is compared with the energy extracted from the user's pronounced voice, the pattern of the pitch and the energy of the native speaker's pronunciation, the pitch pattern to determine whether the correct intonation or finite accent patterns from the training database English pronunciation learning method using speech recognition, characterized in that it further comprises an accent evaluation step of evaluating the accuracy of the intonation by obtaining a model and relative evaluation by a combination thereof.

제 1항에 있어서,The method of claim 1,

상기 발음에 대한 정확도 및 오류 부분 지적단계 이후에 사용자의 발음과 원어민의 발음의 지속 길이를 비교하여 속도의 적절성을 판단하며, 음성 구간을 자동으로 분할하여 음소나 단어단위등의 구간별로 비교하여 보다 정확한 속도 평가를 제공하는 속도 평가 단계가 더 포함됨을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.After the accuracy and error part of the pronunciation point, the user's pronunciation is compared with the duration of the native speaker's pronunciation to determine the appropriateness of the speed, and the speech section is automatically divided to compare each section such as phoneme or word unit. A method of learning English pronunciation using speech recognition, further comprising a speed evaluation step of providing accurate speed evaluation.

제 4항에 있어서,The method of claim 4, wherein

상기 발음의 정확도 평가를 위해서 MFCC(Mel Frequency Cepstral Coefficients) 추출법을 사용함을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.English pronunciation learning method using speech recognition, characterized in that using the Mel Frequency Cepstral Coefficients (MFCC) extraction method for evaluating the accuracy of the pronunciation.

제 2항에 있어서,The method of claim 2,

상기 강세의 평가를 위해서는 이산 코사인 변환 단계에서 적어도 20항 이상의 차수 항까지 분석함으로써 운율 정보가 포함되는 특징 벡터를 추출함을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.In order to evaluate the stress, the English pronunciation learning method using speech recognition, characterized by extracting a feature vector containing rhyme information by analyzing at least 20 or more order terms in the discrete cosine transform step.

제 6항에 있어서,The method of claim 6,

상기 강세의 평가를 위한 음향 모델은, 기존의 음소모델에서 강세를 받는 음소와 그렇지 않은 모델인 강세, 비강세 모델로 확장되어 훈련됨을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.The sound model for evaluating stress is extended to a phoneme which is stressed in a conventional phoneme model and a stressed and unstressed model which is not a model.

제 1항에 있어서,The method of claim 1,

상기 정확도 및 오류 부분의 지적을 위해서 각 음소에 대해 유사 음소 리스트가 구성되고, 유사 음소 리스트내의 유사 음소는 훈련된 각 음소의 음향 모델간의 유사도를 측정함으로써 찾을 수 있으며 하나의 음소에 대해 유사도 정도에 따라가장 유사한 음소부터 가장 상이한 음소까지의 리스트를 나열시킬 수 있도록 구성됨을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 방법.Similar phonemes list is constructed for each phoneme to point out the accuracy and error part, and similar phonemes in the phoneme list can be found by measuring the similarity between the acoustic models of each trained phoneme. English pronunciation learning method using speech recognition, characterized in that it is configured to list the list from the most similar phonemes to the most different phonemes.

음성 인식을 이용한 영어 발음 학습 시스템에 있어서,In the English pronunciation learning system using speech recognition,

개방형 네트워크인 인터넷상의 사이트 URL을 가지며, 인터넷을 통해 접속한 클라이언트의 접속을 관리하도록 구성되며, 클라이언트측으로부터 제공되는 음성신호에 대한 음소별 비교, 분석을 수행하여 사용자의 발음에 대한 분석결과를 제공하도록 구성되며, 발음뿐만 아니라 발음의 정확도, 강세, 억양, 속도에 대한 분석을 수행하여 결과를 제공하여, 이에 대한 통계 데이터 베이스를 생성하여 저장 및 전송하도록 구성되는 서버;와,It has a site URL on the Internet, which is an open network, and is configured to manage the client's access through the Internet, and provides analysis results on the pronunciation of the user by performing phoneme comparison and analysis on voice signals provided from the client side. A server configured to perform not only pronunciation but also analysis of pronunciation accuracy, stress, intonation, and speed of the pronunciation, thereby providing a result, and generating, storing, and transmitting a statistical database thereof;

TCP/IP 프로토콜을 이용하여 상기 서버측의 사이트 URL에 접속하여 발음에 대한 음성 신호를 전송하도록 구성되며, 서버로부터 제공되는 분석 결과 및 통계 데이터를 화면에 나타내어 사용자의 발음학습을 지원하는 클라이언트로 구성됨을 특징으로 하는 음성 인식을 이용한 영어 발음 학습 시스템.It is configured to transmit voice signal about pronunciation by accessing the site URL of the server side using TCP / IP protocol, and it is composed of a client that supports the user's pronunciation learning by displaying analysis results and statistical data provided on the screen. English pronunciation learning system using speech recognition, characterized in that.